Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Third International Conference on High Performance Scientific Computing, March 6-10, 2006, Hanoi, Vietnam

Bock · Kostina · Phu · Rannacher (Eds.) Modeling, Simulation and Optimization of Complex Processes Hans Georg Bock · ...

Author: Hans Georg Bock | Ekaterina Kostina | Xuan Phu Hoang | Rolf Rannacher

154 downloads 1184 Views 13MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Bock · Kostina · Phu · Rannacher (Eds.) Modeling, Simulation and Optimization of Complex Processes

Hans Georg Bock · Ekaterina Kostina Hoang Xuan Phu · Rolf Rannacher Editors

Modeling, Simulation and Optimization of Complex Processes Proceedings of the Third International Conference on High Performance Scientific Computing, March 6–10, 2006, Hanoi, Vietnam

123

Hans Georg Bock Universität Heidelberg Interdisziplinäres Zentrum für Wissenschaftliches Rechnen (IWR) Im Neuenheimer Feld 368 69120 Heidelberg Germany [email protected]

Ekaterina Kostina Universität Marburg FB Mathematik und Informatik Hans-Meerwein-Str. 35032 Marburg Germany [email protected]

Hoang Xuan Phu Vietnamese Academy of Science and Technology (VAST) Institute of Mathematics 18 Hoang Quoc Viet Road 10307 Hanoi Vietnam [email protected]

Rolf Rannacher Universität Heidelberg Interdisziplinäres Zentrum für Wissenschaftliches Rechnen (IWR) Im Neuenheimer Feld 368 69120 Heidelberg Germany [email protected]

The cover picture shows a computer reconstruction (courtesy of Angkor Project Group, IWR) of the mountain temple of Phnom Bakheng in the Angkor Region, Siem Reap, Cambodia, where the pre-conference workshop “Scientific Computing for the Cultural Heritage” took place on March 2–5, 2006.

ISBN: 978-3-540-79408-0

e-ISBN: 978-3-540-79409-7

Library of Congress Control Number: 2008925522 Mathematics Subject Classification: 49-06, 60-06, 65-06, 68-06, 70-06, 76-06, 85-06, 90-06, 93-06, 94-06

© 2008 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: WMX Design GmbH, Heidelberg Printed on acid-free paper 9 8 7 6 5 4 3 2 1 springer.com

Preface

High Performance Scientiﬁc Computing is an interdisciplinary area that combines many ﬁelds such as mathematics, computer science and scientiﬁc and engineering applications. It is a key high-technology for competitiveness in industrialized countries as well as for speeding up development in emerging countries. High performance scientiﬁc computing develops methods for computer aided simulation and optimization of systems and processes. In practical applications in industry and commerce, science and engineering, it helps to save resources, to avoid pollution, to reduce risks and costs, to improve product quality, to shorten development times or simply to operate systems better. Diﬀerent aspects of scientiﬁc computing have been the topics of the Third International Conference on High Performance Scientiﬁc Computing held at the Hanoi Institute of Mathematics, Vietnamese Academy of Science and Technology (VAST), March 6-10, 2006. The conference has been organized by the Hanoi Institute of Mathematics, Ho Chi Minh City University of Technology, Interdisciplinary Center for Scientiﬁc Computing (IWR), Heidelberg, and its International PhD Program “Complex Processes: Modeling, Simulation and Optimization”. The conference had about 200 participants from countries all over the world. The scientiﬁc program consisted of more than 130 talks, 10 of them were invited plenary talks given by John Ball (Oxford), Vincenzo Capasso (Milan), Paolo Carloni (Trieste), Sebastian Engell (Dortmund), Donald Goldfarb (New York), Wolfgang Hackbusch (Leipzig), Satoru Iwata (Tokyo), Hans Petter Langtangen (Oslo), Tao Tang (Hong Kong) and Philippe Toint (Namur). Topics were mathematical modelling, numerical simulation, methods for optimization and control, parallel computing, software development, applications of scientiﬁc computing in physics, chemistry, biology and mechanics, environmental and hydrology problems, transport, logistics and site location, communication networks, production scheduling, industrial and commercial problems.

VI

Preface

This proceeding volume contains 44 carefully selected contributions referring to lectures presented at the conference. We would like to thank all contributors and referees. We would like also to use the opportunity to thank the sponsors whose support signiﬁcantly contributed to the success of the conference: Interdisciplinary Center for Scientiﬁc Computing (IWR) and its International PhD Program “Complex Processes: Modeling, Simulation and Optimization” of the University of Heidelberg, Gottlieb Daimler- and Karl Benz-Foundation, the DFG Research Center Matheon, Berlin/Brandenburg Academy of Sciences und Humanities, the Abdus Salam International Centre for Theoretical Physics, the Vietnamese Academy of Science and Technology (VAST) and its Institute of Mathematics, the Vietnam National Program for Basic Sciences and its Key Project “Selected Problems of Optimization and Scientiﬁc Computing” and Ho Chi Minh City University of Technology. Heidelberg January 2008

Hans Georg Bock Ekaterina Kostina Hoang Xuan Phu Rolf Rannacher

Contents

Development of a Fault Detection Model-Based Controller Nitin Afzulpurkar, Vu Trieu Minh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Sensitivity Generation in an Adaptive BDF-Method Jan Albersmeyer, Hans Georg Bock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design Christopher K. Anand, Stephen J. Stoyan, Tam´ as Terlaky . . . . . . . . . . . . . 25 Modelling the Performance of the Gaussian Chemistry Code on x86 Architectures Joseph Antony, Mike J. Frisch, Alistair P. Rendell . . . . . . . . . . . . . . . . . . 49 Numerical Simulation of the December 26, 2004: Indian Ocean Tsunami J. Asavanant, M. Ioualalen, N. Kaewbanjak, S.T. Grilli, P. Watts, J.T. Kirby, F. Shi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Approximate Dynamic Programming for Generation of Robustly Stable Feedback Controllers Jakob Bj¨ ornberg, Moritz Diehl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Integer Programming Approaches to Access and Backbone IP Network Planning Andreas Bley, Thorsten Koch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 An Adaptive Fictitious-Domain Method for Quantitative Studies of Particulate Flows Sebastian B¨ onisch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Adaptive Sparse Grid Techniques for Data Mining H.-J. Bungartz, D. Pﬂ¨ uger, S. Zimmer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

VIII

Contents

On the Stochastic Geometry of Birth-and-Growth Processes. Application to Material Science, Biology and Medicine Vincenzo Capasso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Inverse Problem of Lindenmayer Systems on Branching Structures Somporn Chuai-Aree, Willi J¨ ager, Hans Georg Bock, Suchada Siripant . . 163 3D Cloud and Storm Reconstruction from Satellite Image Somporn Chuai-Aree, Willi J¨ ager, Hans Georg Bock, Susanne Kr¨ omker, Wattana Kanbua, Suchada Siripant . . . . . . . . . . . . . . . 187 Providing Query Assurance for Outsourced Tree-Indexed Data Tran Khanh Dang, Nguyen Thanh Son . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 An Adaptive Space-Sharing Scheduling Algorithm for PC-Based Clusters Viet Hung Doan, Nam Thoai, Nguyen Thanh Son . . . . . . . . . . . . . . . . . . . . 225 Fitting Multidimensional Data Using Gradient Penalties and Combination Techniques Jochen Garcke, Markus Hegland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Mathematical Modelling of Chemical Diﬀusion through Skin using Grid-based PSEs Christopher Goodyer, Jason Wood, Martin Berzins . . . . . . . . . . . . . . . . . . . 249 Modelling Gene Regulatory Networks Using Galerkin Techniques Based on State Space Aggregation and Sparse Grids Markus Hegland, Conrad Burden, Lucia Santoso . . . . . . . . . . . . . . . . . . . . . 259 A Numerical Study of Active-Set and Interior-Point Methods for Bound Constrained Optimization Long Hei, Jorge Nocedal, Richard A. Waltz . . . . . . . . . . . . . . . . . . . . . . . . . 273 Word Similarity In WordNet Tran Hong-Minh, Dan Smith . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Progress in Global Optimization and Shape Design D. Isebe, B. Ivorra, P. Azerad, B. Mohammadi, F. Bouchette . . . . . . . . . . 303 EDF Scheduling Algorithm for Periodic Messages On Switched Ethernet Myung-Kyun Kim, Dao Manh Cuong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

Contents

IX

Large-Scale Nonlinear Programming for Multi-scenario Optimization Carl D. Laird, Lorenz T. Biegler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 On the Eﬃciency of Python for High-Performance Computing: A Case Study Involving Stencil Updates for Partial Diﬀerential Equations Hans Petter Langtangen, Xing Cai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Designing Learning Control that is Close to Instability for Improved Parameter Identiﬁcation Richard W. Longman, Kevin Xu, Benjamas Panomruttanarug . . . . . . . . . 359 Fast Numerical Methods for Simulation of Chemically Reacting Flows in Catalytic Monoliths Hoang Duc Minh, Hans Georg Bock, Hoang Xuan Phu, Johannes P. Schl¨ oder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 A Deterministic Optimization Approach for Generating Highly Nonlinear Balanced Boolean Functions in Cryptography Le Hoai Minh, Le Thi Hoai An, Pham Dinh Tao, Pascal Bouvry . . . . . . . 381 Project-Oriented Scheduler for Cluster Systems T. N. Minh, N. Thoai, N. T. Son, D. X. Ky . . . . . . . . . . . . . . . . . . . . . . . . . 393 Optimizing Spring-Damper Design in Human Like Walking that is Asymptotically Stable Without Feedback Katja D. Mombaur, Richard W. Longman, Hans Georg Bock, Johannes P. Schl¨ oder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Stability Optimization of Juggling Katja Mombaur, Peter Giesl, Heiko Wagner . . . . . . . . . . . . . . . . . . . . . . . . 419 Numerical Model of Far Turbulent Wake Behind Towed Body in Linearly Stratiﬁed Media N. P. Moshkin, G. G. Chernykh, A. V. Fomina . . . . . . . . . . . . . . . . . . . . . . 433 A New Direction to Parallelize Winograd’s Algorithm on Distributed Memory Computers D. K. Nguyen, I. Lavallee, M. Bui . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Stability Problems in ODE Estimation Michael R. Osborne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459

X

Contents

A Fast, Parallel Performance of Fourth Order Iterative Algorithm on Shared Memory Multiprocessors (SMP) Architecture M. Othman, J. Sulaiman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Design and Implementation of a Web Services-Based Framework Using Remoting Patterns Phung Huu Phu, Dae Seung Yoo, Myeongjae Yi . . . . . . . . . . . . . . . . . . . . . 479 Simulation of Tsunami and Flash Floods S. G. Roberts, O. M. Nielsen, J. Jakeman . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Diﬀerentiating Fixed Point Iterations with ADOL-C: Gradient Calculation for Fluid Dynamics Sebastian Schlenkrich, Andrea Walther, Nicolas R. Gauger, Ralf Heinrich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Design Patterns for High-Performance Matrix Computations Hoang M. Son . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 Smoothing and Filling Holes with Dirichlet Boundary Conditions Linda Stals, Stephen Roberts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 Constraint Hierarchy and Stochastic Local Search for Solving Frequency Assignment Problem T.V. Su, D.T. Anh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Half-Sweep Algebraic Multigrid (HSAMG) Method Applied to Diﬀusion Equations J. Sulaiman, M. Othman, M. K. Hasan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 Solving City Bus Scheduling Problems in Bangkok by Eligen-Algorithm Chotiros Surapholchai, Gerhard Reinelt, Hans Georg Bock . . . . . . . . . . . . . 557 Partitioning for High Performance of Predicting Dynamical Behavior of Color Diﬀusion in Water using 2-D tightly Coupled Neural Cellular Network A. Suratanee, K. Na Nakornphanom, K. Plaimas, C. Lursinsap . . . . . . . 565 Automatic Information Extraction from the Web: An HMM-Based Approach M. S. Tran-Le, T. T. Vo-Dang, Quan Ho-Van, T. K. Dang . . . . . . . . . . . 575

Contents

XI

Advanced Wigner Method for Fault Detection and Diagnosis System Do Van Tuan, Sang Jin Cho, Ui Pil Chong . . . . . . . . . . . . . . . . . . . . . . . . . 587 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605

Development of a Fault Detection Model-Based Controller Nitin Afzulpurkar1 and Vu Trieu Minh2 1 2

Asian Institute of Technology (AIT) [email protected] King Mongkut’s Institute of Technology North Bangkok (KMITNB)/Sirindhorn International Thai-German Graduate School (TGGS) [email protected]

Abstract This paper describes a model-based control system that can online determine the optimal control actions and also detect faults quickly in the controlled process and reconﬁgure the controller accordingly. Thus, such system can perform its function correctly even in the presence of internal faults. A fault detection modelbased (FDMB) controller consists of two main parts, the ﬁrst is fault detection and diagnosis (FDD); and the second is controller reconﬁguration (CR). Systems subject to such faults are modeled as stochastic hybrid dynamic model. Each fault is deterministically represented by a mode in a discrete set of models. The FDD is used with interacting multiple- model (IMM) estimator and the CR is used with generalized predictive control (GPC) algorithm. Simulations for the proposed controller are illustrated and analyzed.

1 Introduction Various methods for fault detection of dynamic systems have been studied and developed over recent years ([2–8]) but there are relatively few successful development of controller systems that can deal with faults in sensors and actuators. In process control system, for example, failures of its actuator or sensor may cause serious problems and need to be detected and isolated as soon as possible. We therefore examine and propose a fault detection modelbased (FDMB) controller system for fault detection and diagnosis (FDD) and controller reconﬁguration (CR). The proposed FDMB controller is just theoretically simulated for simple examples. Results show its strong ability for real applications of a controller to detect sensor/actuator faults in dynamic systems. The outline for this paper is as follows: Section 2 describes the design and veriﬁcation of fault modeling; Section 3 analyzes the selection of an FDD system; Section 4 develops a CR in an integrated FDMB controller, examples and simulations are given after each section to illustrate the main ideas in the section; Finally conclusions are given in section 5.

2

N. Afzulpurkar and V.T. Minh

2 Fault Modeling Faults are diﬃcult to foresee and prevent. Traditionally faults were handled by describing the resulting behavior of the system and grouped into a hierarchic structure of fault models [1]. This approach is still widely used in practice: When a failure occurs, the system behavior changes and should be described by a diﬀerent mode from the one that corresponds to the normal mode. For dynamic systems in which state may jump as well vary continuously in a discrete set of modes, an eﬀective way to model the faults is so-called stochastic hybrid system. Apart from the applications to problems involving failures, hybrid systems have found great success in other areas such as target tracking and control that involves possible structural changes [2]. The stochastic hybrid model assumes that the actual system at any time can be modeled suﬃciently and accurately by: x(k + 1) = A(k, m(k + 1))x(k) + B(k, m(k + 1))u(k) +T (k, m(k + 1))ξ(k, m(k + 1))

(1)

z(k + 1) = C(k, m(k + 1))x(k) + η(k, m(k + 1)),

(2)

and the system mode sequence assumed to be a ﬁrst-order Markov chain with transition probabilities: Π{mj (k + 1)|mi (k)} = πi j(k), ∀mi , mj ∈ I,

(3)

where A, B, T, and C are the system matrices, x ∈ n is the state vector; z ∈ p is the measured output; u ∈ m is the control input; ξ ∈ nξ and ¯ and η¯(k), and covariances Q(k) η ∈ p are independent noises with mean ξ(k) and R(k); Π{.} denotes probability; m(k) is the discrete-valued modal state, i.e. the index of the normal or fault mode, at time k; I = {m1 , m2 , ..., mN } is the set of all possible system modes; πi j(k) is the transition probability from mode mi to mode mj , i.e. the probability that the system will jump to mode mj at time instant k. Obviously, the following relation must be held for any mi ∈ M : N j=1

πi j(k) =

N

Π{mj (k + 1)|mi (k)} = 1, i = 1, ..., N.

(4)

j=1

Faults can be modeled by changing the appropriate matrices A, B, and C in equation (1) or (2) representing the eﬀectiveness of failures in the system: x(k + 1) = Ai (k)x(k) + Bi (k)u(k) + Ti (k)ξi (k) mi ∈ M k = (5) z(k) = Ci (k)x(k) + ηi (k) where the subscript i denotes the fault modeling in model set mi ∈ Mk = {m1 , ..., mN }, each mi corresponds to a node (a fault) occuring in the process

Development of a Fault Detection Model-Based Controller

3

at time instant k. The number of failure combinations may be huge if the model set Mk is ﬁxed over time, that is for all k. We can solve this diﬃculty by designing variable structure model in which the model set Mk varies at any time in the total model set M or Mk ∈ M . Variable structure set overcomes fundamental limitations of ﬁxed structure set because the ﬁxed model set does not always exactly match the true system at any time, or the set of possible modes at any time varies and depends on the previous state of the system. The design of model set should assure the clear diﬀerence between models so that they are identiﬁable by the multiple model estimators. We then now verify and check the distance between models in the model set Mk . At the present time, there is still no algorism to determine oﬄine the diﬀerence between models for FDD detection. Therefore we propose to check the distance between models via the diﬀerences of the H∞ norm, i.e. the distance d between two models m1 and m2 is deﬁned as: d = |N orm(m1 − m2 )|∞

(6)

Eventhough this distance does not reﬂect the real diﬀerence between models (the magnitude of this value depends also on the system dimension units) but it can help to verify the model set. If the distance between two model is short, it may not be identiﬁable by the FDD. Example 1: Fault Model Set Design and Veriﬁcation. Simulations throughout of this paper are used the following process model of a distillation column with four state variables, two inputs (feed ﬂow rate and reﬂux ﬂow rate) and two outputs (overhead ﬂow rate and overhead composition). For simplicity, we verify only one input (feed ﬂow rate). The space state model of the system is: ⎧ ⎡ ⎤ ⎡ ⎤ −0.05 −6 0 0 −0.2 ⎪ ⎪ ⎪ ⎢ −0.01 −0.15 0 0 ⎥ ⎢ ⎥ ⎪ ⎪ ⎥ x(t) + ⎢ 0.03 ⎥ u(t) + ξ(t) ⎪ ˙ =⎢ ⎨ x(t) ⎣ 1 ⎣ 2 ⎦ 0 0 13 ⎦ M= (7) 1 0 0 0.1 ⎪ ⎪

0 ⎪ ⎪ 1 −0.5 1 1 ⎪ ⎪ x(t) + ξ(t) ⎩ z(t) = −1 0.6 0 1 It is assumed the following ﬁve models including in the model set: - Model 1: Nominal model (no fault), then nothing changes

in equation (7) 0 0 00 x(t) + ξ(t) - Model 2: Total sensor 1 failure, zm2 (t) = −1 0.6 0 1

1 −0.5 1 1 - Model 3: Total sensor 2 failure, zm3 (t) = x(t) + ξ(t) 0 0 00

0.5z1 0.5 −0.25 0.5 0.5 = x(t) + - Model 4: -50% sensor 1 failure, zm4 (t) = −1 0.6 0 1 z2 ξ(t)

z1 1 −0.5 1 1 = - Model 5: -50% sensor 2 failure, zm5 (t) = x(t) + ξ(t) 0.5z2 −0.5 0.3 0 0.5 The above system was discretized with a sampling period T = 1s. Now, we check the distance between models (table 1): The distance from m1 to m3 is relatively

4

N. Afzulpurkar and V.T. Minh Table 1. Distances between models M odels m1 m2 m3 m4 m5 m1 m2 m3 m4 m5

0 479 85 239 322

479 0 486 239 578

85 486 0 254 239

239 239 254 0 401

322 578 239 401 0

smaller than other distances. It will cause diﬃculty for FDD to identify which model is currently running: The issue is addressed in the next section.

3 Fault Detection and Diagnosis (FDD) In this section, we analyze and select a fast and reliable FDD system applied for the above set of models by using algorithms of multiple model (MM) estimators. MM estimation algorithms appeared in early 1980s when Shalom and Tse [5] introduced a suboptimal, computationally-bounded extension of Kalman ﬁlter to cases where measurements were not always available. Then, several multiple-model ﬁltering techniques, which could provide accurate state estimation, have been developed. Major existing approaches for MM estimation are discussed and introduced in [4–8] including the Non-Interacting Multiple Model (NIMM), the Gaussian Pseudo Bayesian (GPB1), the Secondorder Gaussian Pseudo Bayesian (GPB2), and the Interacting Multiple Model (IMM). From the design of model set, a bank of Kalman ﬁlters runs in parallel at every time, each based on a particular model, to obtain the model-conditional estimates. The overall state estimate is a probabilistically weighted sum of these model-conditional estimates. The jumps in system modes can be modeled as switching between the assumed models in the set. Figure 1 shows the operation of a recursive multiple model estimator, where x ˆi (k|k) is the estimate of the state x(k) obtained from the Kalman ﬁlter based on the model mi at time k given the measurement sequence through time z(k|k); x ˆ0i (k − 1|k − 1) is the equivalent reinitialized estimate at time ˆ(k|k) is (k − 1) as the input to the ﬁlter based on model mi at time k; x the overall state estimate; Pi (k|k), Pi0 (k − 1)|(k − 1), and P (k|k) are the corresponding covariances. A simple and straightforward way of ﬁlter reinitialization is that each single model based recursive ﬁlter uses its own previous state estimation and state covariance as the input at the current cycle: 0 ˆi (k − 1)|(k − 1) x ˆi (k − 1|k − 1) = x (8) Pi0 (k − 1|k − 1) = Pi (k − 1|k − 1)

Development of a Fault Detection Model-Based Controller

5

z(k|k)

Filter based on model 1

P1 (k-1|k-1)

Filter Reinitialization

0

xˆ 02(k-1|k-1)

Filter based on model 2

P2 (k-1|k-1) 0

xˆN0 (k -1|k-1)

Filter based on model N

P (k-1|k-1) 0 N

xˆ 1 (k|k) P1 (k| k)

xˆ 2 (k| k) P2 (k|k)

Estimate Combustion

xˆ 01(k -1|k-1)

xˆ (k| k) P(k| k)

xˆ N (k| k) PN (k| k)

Figure 1. Structure of a MM estimator

This leads to the non-interacting multiple model (NIMM) estimator because the ﬁlters operate in parallel without interactions with one another, which is reasonable only under the assumption that the system mode does not change (ﬁgure 2). xˆi (1|1)

xˆ (1| 1)

xˆi (2|2)

xˆ (2| 2)

xˆi (3| 3)

xˆ (3| 3)

i=1 i=2 i=3

Figure 2. Illustration of the NIMM estimator

Another way of reinitialization is to use the previous overstate estimate and covariance for each ﬁlter as the required input: 0 x ˆi (k − 1|k − 1) = x ˆ(k − 1)|(k − 1) (9) Pi0 (k − 1|k − 1) = P (k − 1|k − 1) This leads to the ﬁrst order Generalized Pseudo Bayesian (GPB1) estimator. It belongs to the class of interacting multiple model estimators since it uses the previous overall state estimate, which carries information from all ﬁlters. Clearly, if the transition probability matrix is an identity matrix this method of reinitialization reduces to the ﬁrst one (ﬁgure 3). The GPB1 and GPB2 algorithms were the result of early work by Ackerson and Fu [6] and good overviews are provided in [7], where suboptimal hypothesis pruning techniques are compared. The GPB2 diﬀered from the GPB1 by including knowledge of the previous timestep’s possible mode transitions, as

6

N. Afzulpurkar and V.T. Minh i=1

xˆi (1|1)

xˆ (1|1)

xˆi (2|2)

xˆ (2| 2)

xˆi (3| 3)

xˆ (3| 3)

i=2 i=3

Figure 3. Illustration of the GPB1 estimator

modeled by a Markov chain. Thus, GPB2 produced slightly smaller tracking errors than GPB1 during non-maneuvering motion. However in the size of this paper, we do not include GPB2 into our simulation test and comparison. A signiﬁcantly better way of reinitialization is to use IMM. The IMM was introduced by Blom in [8] and Zhang and Li in [4]:

ˆi (k|k)P {mi (k)|z k , mj (k + 1)} x ˆ0j (k|k) = E[x(k|z k , mj (k + 1)] = i x 0 0 xj (k|k)] = i P {mi (k)|z k , mj (k + 1){Pi (k|k) + |˜ x0ij (k|k)|2 Pi (k|k) = cov[ˆ

(10)

where cov[.] stands for covariance and x ˜0ij (k|k) = x ˆ0i (k|k) − x ˆ0j (k|k). Figure 4 depicts the reinitialization in the IMM estimator. In this paper we use this approach for setting up a FDD system. The probability of each model matching xˆi (1|1) i=1

xˆi0 (1|1)

xˆi (2 | 2)

xˆi0 (2 |2)

xˆi (3 |3)

xˆi0(3 |3)

i=2 i=3

Figure 4. Illustration of the IMM estimator

to the system mode provides the required information for mode chosen decision. The mode decision can be achieved by comparing with a ﬁxed threshold probability µT . If the mode probabilities max(µi (k)) ≥ µT , mode at µi (k) is occurred and taken place at the next cycle. Otherwise, there is no new mode detection. The system maintains the current mode for the next cycle calculation. Example 2: Analysis and Selection of FDD system. It is assumed that the model ⎡mode in example 1 can jump ⎤ from each other in a mode 0.96 0.01 0.01 0.01 0.01 ⎢ 0.05 0.95 0 0 0 ⎥ ⎢ ⎥ 0.05 0 0.95 0 0 ⎥ probability matrix as: πij = ⎢ ⎢ ⎥. ⎣ 0.05 0 0 0.95 0 ⎦ 0.05 0 0 0 0.95 The threshold value for the mode probabilities is chosen as µT = 0.9. Now we begin to compare the three estimators of NIMM, GPB1, and IMM to test their ability to detect faults. The ﬁve models are run with time: m1 for k = 1 − 20, k = 41 − 60, k = 81 − 100, k = 121 − 140, and k = 161 − 180; m2 for k = 21 − 40; m3 for

Development of a Fault Detection Model-Based Controller 1

1

0.9

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.6

0.6

0.6

(a)0.5

(b)0.5

(c)0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0

0

20

40

60

80

100

120

140

160

0

180

7

1

0.7

0.1

0

20

40

60

80

100

120

140

160

0

180

0

20

40

60

80

100

120

140

160

180

Figure 5. Probabilites of estimators (a) NIMM, (b) GPB1, and (c) IMM k = 61 − 80; m4 for k = 101 − 120; and m5 for k = 141 − 160. Results of simulation are shown in ﬁgure 5. In Figure 5, we can see that the GPB1 estimator performs as good as IMM estimator while NIMM estimator fails to detect sensor failures in the model set. Next we continue to test the ability of the GPB1 and IMM estimators by narrowing the distances between modes as close as possible until one of methods cannot detect the failures. Now we design the following mode set: m1 = M odel1: Nominal mode; 0.95z1 m2 = M odel2: −5% sensor 1 failure or zm2 (t) = ; m3 = M odel3: −5% z2

z1 sensor 2 failure or zm3 (t) = ; m4 = M odel4: −2% sensor 1 failure or 0.95z2

0.98z1 z1 zm4 (t) = ; and m5 = M odel5: −2% sensor 2 failure or zm5 (t) = ; z2 0.98z2 With these parameters, we achieve the distances between models in table 2.

Table 2. Distances between models M odels m1 m1 m2 m3 m4 m5

0 0.048 0.009 0.240 0.004

m2

m3

m4

m5

0.048 0 0.049 0.192 0.048

0.009 0.049 0 0.240 0.004

0.240 0.192 0.240 0 0.240

0.004 0.048 0.004 0.240 0

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

(a)0.5

(b)0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0 0

0.1

20

40

60

80

100

120

140

160

180

0

0

20

40

60

80

100

120

140

160

180

Figure 6. Probabilites of estimators (a) GPB1, and (b) IMM

8

N. Afzulpurkar and V.T. Minh

Since the distances between models are very close, the GPB1 fails to detect failures while IMM still proves it’s much superior to detect failures (ﬁgure 6). As a result, we select the IMM for our FDD system. Now we move to the next step to design a controller reconﬁguration for the FDMB system.

4 Controller Reconﬁguration (CR) In this section we develop a new CR which can online determine the optimal control actions and reconﬁgure the controller using Generalized Predictive Control (GPC). We will show how an IMM-based GMC controller can be a good FDMB system. Firstly we review the basic GPC algorithm: Generalized Predictive Control (GPC) is one of MPC techniques developed by Clarke et al. [9, 10]: GPC was intended to oﬀer a new adaptive control alternative. GPC uses the ideas with controlled autoregressive integrated moving average (CARIMA) plant in adaptive context and self-tuning by recursive estimation. Kinnaert [11] developed GPC from CARIMA model into a more general in state-space form for multiple inputs and multiple outputs (MIMO) system as in equation (5). The optimal control problem for the general cost function GPC controller for MIMO is Ny −1

minJ(U, x(k)) = x(k+Ny ) Φx(k+Ny )+ U

{x(k) Ωx(k)+u(k) Θu(k)} (11)

k=0

where the column vector U = [u0 , u1 , ..., uNU −1 ] is the predictive optimization vector, Ω = Ω > 0, Θ = Θ ≥0 are the weighting matrices for predictive state and input, respectively. The Liapunov matrix Φ > 0 is the solution of Riccati equation for the system in equation (5). Ny and NU are the predictive output horizon and the predictive control horizon, respectively. By substituting x(k + N −1 N ) = AN x(k) + j=0 Aj Bu(k + N − 1 − j) + AN −1 T ξ(k), equation (11) can be rewritten as 1 min{ U HU + x(k) F U + ξ(k) Y U } (12) U 2 where H = H > 0, and H, F, Y are obtained from equations (5) and (11). Then the optimization problem (12) is a quadratic program and depends on the current state x(k) and noise ξ(k). In the case of unconstrained GPC, the optimization input vector can be calculated as U = −H −1 {x(k) F + ξ(k) Y }

(13)

then the ﬁrst input U (1) in U will be implemented into the sytem at each time step. The above GPC algorism is proposed for non-output tracking system. However in reality, the primary control objective is to force the plant outputs to

Development of a Fault Detection Model-Based Controller

9

track their setpoints. In this case, the space state of model in equation (5) can be transformed into a new innovation form as follows: ˜x(k|k − 1) + Bu(k) ˜ ˜ x ˆ(k + 1|k) = Aˆ + Ke(k) ¯ M: (14) ˜ z(k) = C x ˆ(k|k − 1) + e(k) ˜ B, ˜ C, ˜ and K ˜ are ﬁxed matrices from A, B, C, and T in equation where A, (5), η(k) = ξ(k)=e(k), z(k) ∈ ny , u(k) = u(k) − u(k − 1) ∈ nu , and x ˆ(k|k − 1) is an estimate of the state x(k) obtained from a Kalman ﬁlter. For a moving horizon control, the prediction of x(k + j|k) given the information {z(k), z(k − 1), ..., u(k − 1), u(k − 2), ...} is: ˆ(k|k − 1) + x ˆ(k + j|k) = Aj x

j−1

Aj−1−i Bu(k + i) + Aj−1 Ke(k)

(15)

i=0

and the prediction of the ﬁltered output will be: zˆ(k + j|k) = CAj x ˆ(k|k − 1) +

j−1

CAj−1−i Bu(k + i) + CAj−1 Ke(k)

(16)

i=0

If we form u ˜(k) = [u(k) , ..., u(k + NU −1 ) ] and z˜(k) = [ˆ z (k + N0 |k) , ..., zˆ(k + Ny−1 |k) ] , we can write the global predictive model for the ﬁltered out for from 0 to Ny output prediction and for from 0 to NU −1 input predition horizon as: ⎡

CB ⎢ CAB ⎢ ⎢ : zˆ(k) = ⎢ ⎢ CANU −1 B ⎢ ⎣ : CANy −1 B

⎡ ⎡ ⎤ ⎤ ⎤ ... 0 CA C 2 ⎢ CA ⎥ ⎢ CA ⎥ ⎥ ... 0 ⎢ ⎢ ⎥ ⎥ ⎥ ⎢ ... ⎥ ⎢ ⎥ ⎥ ... ... ... ⎢ ⎢ ⎥u ⎥x ⎥ ˜ (k) + ˆ (k|k − 1) + ⎢ ⎢ ⎥ ⎥ ⎥ ... ... ... CB ⎢ ⎢ ⎥ ⎥ ⎥ ⎣ ... ⎦ ⎣ ⎦ ⎦ ... ... ... CANy −1 CANy ... CANy −NU B

×Ke(k).

For simplicity, we can rewrite as: z˜(k) = G˜ u(k) + V x ˆ(k|k − 1) + W Ke(k)

(17)

Consider the new general cost function of GPC for tracking setpoints: Ny −1

minJ(˜ u(k), x(k)) = u ˜ (k)

{[z(k) − w(k)] [z(k) − w(k)] + [u(k)] Γ [u(k)]}

(18)

k=0

where w(k) is the output reference setpoints and Γ = Γ ≥0 is the control weighting matrix. Similarly, the optimization problem in equation (18) is a quadratic program and in the case of unconstrained GPC, the control law that minimizes this cost funtion is:

10

N. Afzulpurkar and V.T. Minh

u ˜(k) = −(G G + Γ )−1 (V x ˆ(k|k − 1) + W Ke(k) − w(k))

(19)

then the ﬁst input u ˜(1) in u ˜(k) will be implemented into the system in each time step. Note that the optimization solution in equation (13) and (19) are for unconstrained GPC. In the case of constrained GPC, the quadratic prgram in equation (11) and (18) must be solved subject to constraints on states, inputs and outputs. Now, we can combine GPC with IMM estimator: Since GPC follows a stochastic perspective, we can use GPC controller for the CR using the inputs of the CR as the outputs of the IMM. The overall state estimate x(k) ≈ x ˆ(k) = N xi (k) where N is the number of models in the model set. So we i=1 µi (k)ˆ can assume that the “true” system is the weighted sum with µi (k) of models in a convex combination Mk = {m1 , ..., mN } . A generalized diagram of IMM based GPC controller is shown in ﬁgure 7. u(t)

z(t)

Plant x1

IMM m1 - -

--

mN

xN

GPC w(t) Figure 7. Illustration of the NIMM estimator

So, we build a bank of GPC controllers for each model in the model set. Assuming the mode probabilities are constant during the control N horizon, we can easily derive the GPC control law by forming G = ( i=1 µi Gi ), V = N N ( i=1 µi Vi ) and W = ( i=1 µi Wi ) matrices that correspond to the “true” N model m = ( i=1 µi mi ) in equation (17) and ﬁnd out the optimal control action in equation (19). A notation is taken here for one of disadvantages of the IMM-based GPC controller is that the type and the magnitude of the input excitation play an important role in its performance. When the magnitude of the input signal is very small, the residuals of the Kalman ﬁlters will be very small and, therefore, the likelihood functions for the modes will approximately be equal. This will lead to unchanging (or very slowly changing) mode probabilities which in turn makes the IMM estimator incapable to detect failures. Next, we will run some simulations to test the above proposed fault detection and control system. Example 3: Controller Reconﬁguration (CR).

Development of a Fault Detection Model-Based Controller

11

Example 3.1: We run a normal GPC controller with Ny = 4, NU = 4, the weighting matrix Γ = 0.1 and with a reference set-point w = 1. It is assumed that 0.5z1 from time k = 0−50, −50% sensor 1 failure or zm2 (t) = ; from k = 51−100, z2 run

the normal mode; and from k = 101 − 150, 50% sensor 1 failure or zm3 (t) = 1.5z1 . The normal GPC controller provides wrong output (ﬁgure 8). z2 0.02 1.6

1.4

0.015

1.2

0.01 1

(b) 0.005

(a)0.8 0.6

0 0.4

−0.005 0.2

0

0

50

100

150

−0.01

0

50

100

150

Figure 8. GPC controller with sensor errors (a) Output, and (b) Input Example 3.2: We run the same parameters in example 3.1 ⎡ using our proposed ⎤ 0.90 0.05 0.05 IMM-base GPC controller with the transition matrix πi j = ⎣ 0.05 0.90 0.05 ⎦. Re0.05 0.05 0.90 sults are shown in ﬁgure 9: Our new FDMB system still keeps the output at the desired setpoint since the IMM estimator easily ﬁnds accurate fault mode and activate the CR system online. 1.01

0.1

1.008

0.08

0.9

1.006

0.06

0.8

1.004

0.7

0.04

1.002

(a)

1

0.6

0.02

(b)

1

(c)0.5

0 0.998

0.4

−0.02 0.996

0.3

−0.04 0.994

0.2

−0.06 0.992

0.1

−0.08 0.99

0

50

100

150

0

50

100

150

0

0

50

100

150

Figure 9. IMM-based GPC controlle (a) Output, (b) Input, and (c) Probabilities

Example 3.3: Low magnitude of input signals can lead to failure of IMM estimator. We run the same parameters in example 3.2 but reduce the reference setpoint to a very low value at w = 0.01. The system becomes uncontrollable (ﬁgure 10).

5 Conclusion Systems subject to sensor and actuator failures can be modeled as a stochastic hybrid system with fault modeling nodes in the model set. In stead

12

N. Afzulpurkar and V.T. Minh 9

11

0.5

8

x 10

x 10

1

z z p

0

0.9

6 0.8

−0.5

0.7

4

−1

0.6

(b)2

−1.5

(a)

(c)0.5

−2 0.4

0

−2.5

0.3

−3

0.2

−2 −3.5

−4

0.1

0

50

100

150

−4

0

50

100

150

0

0

50

100

150

Figure 10. IMM-based GPC controller with low magnitude of input signal

of a ﬁxed structure model, the model set can be designed in variable structures. Variable structure model can overcome fundamental limitations of a ﬁxed structure model when the number of failure combinations becomes huge and the ﬁxed model set does not match the true system mode set at any time, or the set of possible modes at any time varies and depends on the previous state of the system. Our proposed IMM based GPC controller can provide real-time control performance, detection and diagnosis of sensor and actuator failures online. Simulations in this study show that the system can maintain the output setpoints amid internal failures. One of the main advantages of the GPC algorithm is that the controller can provide soft switching signals based on the weighted probabilities of the outputs of diﬀerent models. The main diﬃculty of this approach is the choice of modes in the model set as well as the transition probability matrix that assigns probabilities to jump from one mode to another since the IMM algorithms are very sensitive to the transition probability matrix. Another limitation related to IMM based GPC controller is the magnitude of the input excitation. When we change the output setpoints to small values, the input signal may become very small and this leads to unchanging mode probabilities or IMM based GPC controller cannot detect failures. Lastly, this approach does not consider issues of uncertainty in the controller system.

References 1. Cristian, F., Understanding Fault Tolerant Distributed Systems. Communications of ACM, Vol. 34, pp. 56-78, 1991 2. Li, R., Hybrid Estimation Techniques, Control and Dynamic Systems, New York, Academic Press, Vol. 76, pp. 213-287, 1996 3. Kanev, S., Verhaegn, M., Controller Reconﬁguration for Non-linear Systems, Control Engineering Practice, Vol 8, 11, pp. 1223-1235, 2000 4. Zhang, Y., Li, R., Detection and Diagnosis of Sensor and Actuator Failures Using IMM Estimator, IEEE Trans. On Aerospace and Elect Sys, Vol. 34, 4, 1998 5. Shalom Y., and Tse, E., Tracking in a Cluttered Environment with Probabilistic Data Association, Automatica, Vol. 11, pp. 451-460, 1975

Development of a Fault Detection Model-Based Controller

13

6. Ackerson, G., and Fu, K., On State Estimation in Switching Environments. IEEE Trans. On Automatic Control, Vol. 15, 1, pp. 10-17, 1970 7. Tugnait, J., Detection and Estimation of Abruptly Changing Systems. Automatica, Vol 18, pp. 607-615, 1982 8. Blom, H., and Shalom, Y., Interacting Multiple Model for System with Markovian Switching Coeﬃcients, IEEE Trans on Auto Cont, Vol 33, 8, pp. 780-785, 1983 9. Clarke D. W., Mohtadi, C., and Tuﬀs, P. S. Generalized Predictive Control – Extensions and Interpretations, Automatica, 23(2), 149-160, 1987 10. Clarke, D.W., Mohtadi, C., and Tuﬀs, P. S. Generalized Predictive Control: I. The Basic Algorithm, Automatica, 23(2), 137-147, 1987 11. Kinnaert, M. Adaptive Generalized Predictive Control for MIMO Systems, Int. J. Control. 50(1), 161-172, 1987

Sensitivity Generation in an Adaptive BDF-Method Jan Albersmeyer and Hans Georg Bock Interdisciplinary Center for Scientiﬁc Computing (IWR), University of Heidelberg, Im Neuenheimer Feld 368, D-69120 Heidelberg, Germany [email protected], [email protected] Abstract In this article we describe state-of-the-art approaches to calculate solutions and sensitivities for initial value problems (IVP) of semi-explicit systems of diﬀerential-algebraic equations of index one. We start with a description of the techniques we use to solve the systems eﬃciently with an adaptive BDF-method. Afterwards we focus on the computation of sensitivities using the principle of Internal Numerical Diﬀerentiation (IND) invented by Bock [4]. We present a newly implemented reverse mode of IND to generate sensitivity information in an adjoint way. At the end we show a numerical comparison for the old and new approaches for sensitivity generation using the software package DAESOL-II [1], in which both approaches are implemented.

1 Introduction In real-world dynamical optimization problems the underlying process models can often be described by systems of ordinary diﬀerential equations (ODEs) or diﬀerential-algebraic equations (DAEs). Many state-of-the-art algorithms for optimal control, parameter estimation or experimental design used to solve such problems are based on derivative information, e.g. sequential quadratic programming (SQP) type or Gauss-Newton type methods. This results in a demand for an eﬃcient and reliable way to calculate accurate function values and high-precision derivatives of the objective function and the constraints. Therefore we have to calculate (directional) derivatives of the solutions of the model ODEs/DAEs, so-called (directional) sensitivities. The need of accurate sensitivity information arises for example also from model reduction and network analysis e.g. in system biology [13]. In the following we ﬁrst will describe how to compute solutions of IVPs for DAEs of index one eﬃciently and then how to generate accurate sensitivity information along with the computation of the nominal trajectories by using the principle of Internal Numerical Diﬀerentiation (IND) [4]. At the end we show some numerical results, obtained with the integrator DAESOL-II, a

16

J. Albersmeyer and H.G. Bock

C++ code based on BDF-methods written by Albersmeyer [1], where the presented ideas have been implemented.

2 Eﬃcient Solution of the Initial Value Problems We will now brieﬂy describe how the solutions of the IVPs themselves can be generated eﬃciently using an adaptive BDF-method and sketch the ideas and features that are implemented in the code DAESOL-II. 2.1 Problem formulation We consider in this article IVPs for semi-implicit DAEs of the type A(t, x, z, p, q)x˙ − f (t, x, z, p, q) = 0, g(t, x, z, p, q) − θ(t) g(t0 , x0 , z0 , p, q) = 0,

x(t0 ) = x0 , z(t0 ) = z0 .

(1a) (1b)

Here t ∈ [t0 , tf ] ⊂ R denotes the time, x ∈ Rnx represents the diﬀerential states, z ∈ Rnz the algebraic states, p ∈ Rnp parameter and q ∈ Rnq control parameter. We use here a relaxed formulation of the algebraic equations. This allows us to start the integration with inconsistent algebraic variables, which is often an advantage when solving optimization problems. The damping function θ(·) is a nonnegative, strictly decreasing real-valued function satisfying θ(t0 ) = 1. Furthermore we assume that A and ∂g ∂z are regular along the solution trajectory (index 1 condition). In our practical problems the ODE/DAEsystem is usually stiﬀ, nonlinear and high-dimensional. 2.2 Strategies used in DAESOL-II The code DAESOL-II is based on variable-order variable-stepsize Backward Diﬀerentiation Formulas (BDF). BDF methods were invented by Curtiss and Hirschfelder [6] for the solution of stiﬀ ODEs, and were later also very successfully applied to DAEs. They are known for their excellent stability properties for stiﬀ equations. Beside that they give a natural and eﬃcient way to obtain an error-controlled continuous representation of the solution [5] by interpolation polynomials, which are calculated during the solution process of the IVP anyway. In every BDF-step one has to solve the implicit system of corrector equations A(tn+1 , xn+1 , zn+1 , p, q)

kn

αln xn+1−l − hn f (tn+1 , xn+1 , zn+1 , p, q) =0 (2a)

l=0

g(tn+1 , xn+1 , zn+1 , p, q) − θ(tn+1 ) g(t0 , x0 , z0 , p, q) =0, (2b)

Sensitivity Generation in an Adaptive BDF-Method

17

where n is the number of the actual BDF-step, kn is the BDF-order in step n, and hn the actual stepsize. The coeﬃcients αln of the BDF corrector polynomial are calculated and updated eﬃciently via modiﬁed divided diﬀerences. For the solution of this implicit system we apply a Newton-like method, where we follow a monitor-strategy to use existing Jacobian information as long as possible. Based on the contraction rates of the Newton-like method we decide whether to reuse the Jacobian, to decompose it anew with the actual stepsize and coeﬃcients and old derivative information of the model functions, or to build it from scratch. Especially the second step often saves a lot of evaluations of model derivatives compared to ordinary approaches [9]. Note that in any case in our algorithm at most one iteration matrix is used per BDF-step and at most three Newton iterations are made. The stepsize and order selection in DAESOL-II is based on local error estimation on the variable integration grid, and aims for relatively smooth stepsize changes because of stability reasons. Compared to stepsize strategies based on equidistant grids, this approach leads to better estimates and results in fewer step rejections [3, 10]. DAESOL-II allows the use of inconsistent initial values by a relaxed formulation and provides alternatively routines to compute consistent initial values. The generation of derivatives of the model functions is done optionally by user-supplied derivative functions, internally by ﬁnite diﬀerences or by automatic diﬀerentiation via built-in ADOL-C support. In any case directional derivatives are used whenever possible to reduce memory consumption and computational eﬀort. Linear algebra subproblems are currently solved either using routines from the ATLAS [15] library in case of dense matrices, or using UMFPACK [7] in case of unstructured sparse matrices. A complete survey of the strategies and features of DAESOL-II can be found in [1].

3 Sensitivity Generation In the following section we will explain how to generate (directional) sensitivities eﬃciently using the principle of Internal Numerical Diﬀerentiation and reusing information from the computation of the solution of the IVP. 3.1 The principle of Internal Numerical Diﬀerentiation The simplest approach to obtain sensitivity information - the so-called External Numerical Diﬀerentiation - is treating the integrator as a black box and calculate ﬁnite diﬀerences after solving the IVP for the original IVs and again for slightly perturbed IVs. This approach - although very easy to implement suﬀers from the fact that the output of an adaptive integrator usually depends discontinuously on the input: Jumps in the range of the integration tolerance can always occur for diﬀerent sets of parameter and IVs. Therefore the number of accurate digits of the solution of the IVP has to be approximately twice

18

J. Albersmeyer and H.G. Bock

as high as the needed number of accurate digits of the derivatives. This leads to a very high and often unacceptable numerical eﬀort. The idea of Internal Numerical Diﬀerentiation is now to freeze the adaptive components of the integrator and to diﬀerentiate not the adaptive integrator itself, but the adaptively generated discretization scheme (consisting of used stepsizes, BDF-orders, iteration matrices, compare Fig. 1). This scheme can be interpreted as a sequence of diﬀerentiable mappings, each leading from one integration time to the next, and can therefore be diﬀerentiated for example using the ideas of Automatic Diﬀerentiation (AD). We assume in the following that the reader is familiar with the basics of AD, especially with the forward and the adjoint mode of AD. For an introduction to AD see e.g. [11].

h0 , M0

t0 , y 0

h1 , M1

h2 , M2

hN −3 , MN −3

p Rp p p p p p

^

^

t1 , y 1

t2 , y 2

hN −2 , MN −2

R tN −2 , yN −2

hN −1 , MN −1

R

^

tN −1 , yN −1

tN , yN

Figure 1. The adaptively generated discretization scheme of the solution of the IVP, hi denotes the used stepsize, Mi the used iteration matrix in step i

Diﬀerentiating the scheme using the forward respectively adjoint mode of AD, leads to the two variants of IND: The forward IND and the reverse IND. The ﬁrst generates sensitivity information of the type ⎛ ⎞ vx ∂(x(tf ), z(tf )) ⎜ vz ⎟ ⎟, ·⎜ ∂(x0 , z0 , p, q) ⎝vp ⎠ vq preferable to calculate directional sensitivities or the full sensitivity matrices when only few parameters and controls are present. The latter generates sensitivity information of the type ∂(x(tf ), z(tf )) λTx λTz · , ∂(x0 , z0 , p, q)

very eﬃciently and is therefore advantageous if we need gradient-type information, the sensitivities of only a few solution components, or the full sensitivity matrix in case many parameter and control parameter are present. 3.2 Forward IND Diﬀerentiation of the integration scheme (2) using the forward mode of AD leads to an integration scheme that is equivalent to solving the corresponding variational DAE using the same discretization scheme as for the computation of the nominal trajectory.

Sensitivity Generation in an Adaptive BDF-Method

19

Depending on whether we prefer to solve the linear systems occurring in this scheme directly or to diﬀerentiate also the Newton-like iterations used in the integration procedure for the solution of the IVP, we speak of direct forward IND or iterative forward IND. For more details on how to apply the ideas of forward IND to BDF-methods refer to [1]. 3.3 Adjoint Sensitivities We now combine the ideas of IND and the adjoint mode of AD and present a reverse mode of IND. Analogous to the forward IND, we will obtain two slightly diﬀerent schemes, the direct and the iterative reverse IND. Direct Reverse IND Here we assume that we have solved the IVP, all trajectory values on the integration grid are available and that we have solved the corrector equations (2) in each step exactly. We interpret now each integration step as one elementary operation with inputs xn+1−i , zn+1−i (i = 1, . . . , kn ), p, q, x0 , z0 and outputs xn+1 , zn+1 , of which we have to calculate the derivatives to apply AD. The fact that xn+1 depends via one elementary operation directly on xn+1−i we denote with the symbol and their indices, i.e. n + 1 n + 1 − i. We use the implicit function theorem on the function

F n (xn+1 , zn+1 ; xn , . . . , xn+1−kn , p, q, x0 , z0 ) := kn n αl xn+1−l − hn f (tn+1 , xn+1 , zn+1 , p, q) A(tn+1 , xn+1 , zn+1 , p, q) l=0 = 0. g(tn+1 , xn+1 , zn+1 , p, q) − θ(tn+1 ) g(t0 , x0 , z0 , p, q) (3)

For the derivative with respect to the new trajectory points one obtains ∂F n = ∂(xn+1 , zn+1 ) Ax,n+1 x˙ C n+1 + α0 An+1 − hn fx,n+1 gx,n+1

Az,n+1 x˙ C n+1 − hn fz,n+1 , (4) gz,n+1

k n n with the abbreviation x˙ C n+1 ≡ l=0 αl xn+1−l for the corrector polynomial and the subscripts of the form Bd,m meaning here and in the following the partial derivative of the function B with respect to variable d, evaluated at the values for tm . For the derivative with respect to the already known trajectory points xn+1−i , i = 1, . . . , nk we obtain n ∂F n αi An+1 0 = . (5) 0 0 ∂(xn+1−i )

20

J. Albersmeyer and H.G. Bock

Furthermore we have the derivatives with respect to parameter and control parameter ∂F n Ap,n+1 x˙ C Aq,n+1 x˙ C n+1 − hn fp,n+1 n+1 − hn fq,n+1 = . (6) gp,n+1 gq,n+1 ∂(p, q) If we use the relaxed formulation from (2b), we have also the derivatives with respect to initial diﬀerential and algebraic states ∂F n 0 0 = . (7) −θ(tn+1 )gx,0 −θ(tn+1 )gz,0 ∂(x0 , z0 ) Hence by using the implicit function theorem we ﬁnally obtain for the derivatives of the new trajectory values

−1 ∂F n ∂F n ∂(xn+1 , zn+1 ) =− , ∂D ∂(xn+1 , zn+1 ) ∂D

(8)

with D ∈ {xn+1−i , p, q, x0 , z0 }. Note that many of the derivatives of the model functions needed here can be evaluated in practice very eﬃciently, without forming the entire model Jacobian, as adjoint directional derivatives by using AD. Algorithm 1 Basic form of direct reverse IND 1: Initialize x ¯NintSteps = LT with adjoint sensitivity directions LT . 2: Initialize all intermediate variables x ¯i , z¯0 , p¯, q¯ to zero. 3: for n = N intSteps − 1, . . . , 0 do ∂xk k−1 k−1 ¯k ∂x ≡ − kn x ¯k (I 0)(F(x )−1 F(x . 4: x ¯n + = kn x n) n k ,zk )

n n n n n+1 5: [¯ p, q¯, z¯0 ]+ = x ¯n+1 ∂(p,q,z ≡ −¯ xn+1 (I 0)(F(x )−1 [F(p) , F(q) , F(z ] 0) n+1 ,zn+1 ) 0) 6: end for ∂x

Reverse sweep for direct reverse IND With the derivatives of our elementary functions (8) we are now able to perform an adjoint AD sweep, using the values of the nominal trajectories at the integration points, to obtain adjoint sensitivity information for given adjoint directions (cf. algorithm 1). Let now N be the number of accepted integration steps from the integration of the IVP, kmax the maximal BDForder during integration and ndir the number of directional sensitivities to be calculated. The computational eﬀort for the reverse sweep can then be estimated as follows: In each step we need to build and factorize the Jacobian (4) once, to solve ndir linear equation systems and to compute ndir directional derivatives of the model functions. Overall, for the direct reverse IND we need N matrix factorizations and N · ndir directional derivatives of the model functions and have to solve also N · ndir linear equation systems. The memory

Sensitivity Generation in an Adaptive BDF-Method

21

demand for the (intermediate) adjoint sensitivity quantities can be estimated by ndir ·[(kmax + 1) · (nx + nz ) + np + nq ], because at most kmax earlier trajectory values contribute directly to a trajectory value. Note that for one adjoint direction this is of the order of storage needed for the interpolation during integration of the IVP anyway. Remark: For practical implementation it is more eﬃcient to introduce k−1 ¯ k := x ¯k (I 0)(F(x )−1 and to rewrite the algorithm in these new variables λ k ,zk ) new quantities. Iterative Reverse IND In this approach we follow more closely the predictor-corrector scheme which is applied in practice for the integration of the IVP: There we use as start-value for the Newton-like method in step n the predictor value P yn+1 , obtained by the polynomial through the last kn + 1 trajectory valP = ues yn , . . . , yn−kn , extrapolated at tn+1 . This can be written as yn+1 kn P,n yn−l . Note that for notational convenience we combine here difl=0 αl ferential and algebraic states to y ≡ (x, z)T . (0) P We then perform, starting in yn+1 := yn+1 , 1 ≤ ln ≤ 3 Newton-like iterations with an iteration Matrix Mn to search the root of the function F n (i) deﬁned in (3). We denote the iterates by yn+1 , 1 ≤ i ≤ ln and set the new (ln ) . trajectory value to the last iterate yn+1 := yn+1 If we analyze the dependencies inside this predictor-corrector scheme we ﬁnd that the predictor value only depends on earlier trajectory values and its derivative is obtained by P ∂yn+1 = αlP,n · I ∂(yn+1−i )

(9)

with i = 1, . . . , kn + 1. By studying the Newton-like iterations we obtain for the derivatives of one iterate with respect to the previous iterate (i+1)

∂yn+1

(i) ∂yn+1

= Id − M Fynn+1 ,

i = 0, . . . , ln − 1

(10)

and for the derivatives with respect to initial states, parameter and controls (i+1)

∂yn+1 = −M (Fpn , Fqn , Fyn0 ), ∂(p, q, y0 )

i = 0, . . . , ln − 1, (i)

where the derivatives of F n are evaluated at the values yn+1 . Reverse sweep for iterative reverse IND We assume again that all trajectory values on the integration grid are known, and additionally all iterates and iteration matrices used during the

22

J. Albersmeyer and H.G. Bock

integration. Note that due to the use of the monitor strategy in the Newtonlike method the number of used iteration matrices is signiﬁcantly smaller than the number of integration steps. We then make an adjoint AD sweep, interpreting the integration as an alternating sequence of two kinds of elementary operations: Evaluating the predictor polynomial and performing Newton-like iteration steps. If using this approach we have to account for diﬀerent start-up strategies of the BDF-method by adapting the last operations of the backward sweep to the actually used starter method. The computational eﬀort for the iterative reverse IND can be estimated by N ndir · i=0 li needed solutions of linear systems and the same number of directional derivatives of the model functions. In return no additional matrix decompositions are needed here. The storage needed for the adjoint quantities is the same as for the direct approach.

4 Numerical Examples As a proof of concept we tested the reverse IND approach on two small ODE examples. We conﬁned ourselves to comparing the results of the reverse IND with the well tested forward IND approach, also implemented in DAESOL-II. For a comparison of the ideas implemented in DAESOL-II with other codes such as for example DASSL, DDASAC and LIMEX, refer to [2]. We considered two chemical reaction systems: the pyridine reaction and the peroxidase-oxidase reaction. The model for the pyridine reaction consists of 7 ODEs and 11 parameter, the model for the peroxidase-oxidase reaction of 10 ODEs and 17 parameter. A detailed description of the systems and their properties can be found e.g. in [1]. We chose as task for the comparison the evaluation of the whole sensitivity matrix, i.e. the derivatives of all states at the ﬁnal time with respect to all initial states and parameter, using the direct and iterative forward respectively reverse IND approach. The necessary (directional) derivatives of the model function were generated inside DAESOL-II with the help of the AD tool ADOL-C [12]. For the solution of the linear systems we used here the dense linear algebra package ATLAS [15]. We tested with error tolerances for the nominal trajectories of T OL = 10−3 and T OL = 10−6 . The calculations were all performed on a standard 2.8GHz Pentium IV Computer with a SuSE 9.2 Linux operating system. It was observed that the diﬀerence between the sensitivities calculated using the forward respectively reverse IND were practically of the order of machine precision, i.e. the maximal relative diﬀerence was smaller than 10−14 for both the direct and iterative mode. Table 1 gives a comparison of the eﬀort for the direct variants on these two examples. Note that the computational eﬀort per sensitivity direction for the forward IND is theoretically the same as for the reverse IND in terms of matrix decompositions and linear systems to be solved and model derivatives to be calculated. But in the adjoint mode

Sensitivity Generation in an Adaptive BDF-Method

T OL # BDF-steps # Matrix factorizations # Forward Sensitivity directions # Linear Systems in Forward Mode # Dir. Der. in Forward Mode # Reverse Sensitivity directions # Linear Systems in Reverse Mode # Dir. Der. in Reverse Mode Time savings

23

Pyridine Pyridine Peroxi Peroxi 10−3 10−6 10−3 10−6 82 162 723 1796 18 1476

18 2916

27 27 19521 48492

7 574

7 1134

10 7230

14.1%

18.6%

19.2% 35.1%

10 17960

Table 1. Comparison of numerical and runtime eﬀort for direct forward and reverse IND on two examples from reaction kinetics

less sensitivity directions are needed to form the complete sensitivity matrix. In exchange, a forward directional derivative of a model function is slightly cheaper than an adjoint one. Overall, we already have a visible speed-up on these small systems with a moderate number of parameter when using the reverse IND, even while there is still room for algorithmic improvement and the results indicate that the adjoint mode will show the most beneﬁts on larger systems with many parameter, when the overhead from trajectory integration and general implementation is less signiﬁcant. Or of course in the case only sensitivities for a few adjoint directions are needed.

5 Conclusion and Outlook In this article we have brieﬂy discussed how to solve IVPs for ODEs and DAEs of index 1 and how to eﬃciently generate sensitivities for these solutions. We explained the idea of Internal Numerical Diﬀerentiation (IND) and derived a reverse mode of IND for semi-implicit DAEs of index 1. We gave a ﬁrst numerical proof of concept of the reverse IND by applying it to two ODE-examples from reaction kinetics. We demonstrated that it is working accurately and that already on these small problems the reverse approach is more eﬃcient for the computation of the whole sensitivity matrix than the forward approach. In the future we want to use the reverse IND in connection with special adjoint-based optimization methods for large-scale systems which only need one or two adjoint directional sensitivities per optimization step, cf. [8]. As the memory consumption of the reverse mode could become a problem for very large systems, a closer investigation of checkpointing schemes (cf. [14]) will be interesting. Furthermore we want to extend the sensitivity generation by the use of IND to directional second order sensitivities, which would allow e.g.

24

J. Albersmeyer and H.G. Bock

the calculation of a directional derivative of a gradient, as needed in robust optimization and optimum experimental design.

References 1. J. Albersmeyer. Eﬃziente Ableitungserzeugung in einem adaptiven BDF-Verfahren. Master’s thesis, Universit¨ at Heidelberg, 2005. 2. I. Bauer. Numerische Verfahren zur L¨ osung von Anfangswertaufgaben und zur Generierung von ersten und zweiten Ableitungen mit Anwendungen bei Optimierungsaufgaben in Chemie und Verfahrenstechnik. PhD thesis, Universit¨ at Heidelberg, 1999. 3. G. Bleser. Eine eﬃziente Ordnungs–und Schrittweitensteuerung unter Verwendung von Fehlerformeln f¨ ur variable Gitter und ihre Realisierung in Mehrschrittverfahren vom BDF–Typ. Master’s thesis, Universit¨at Bonn, 1986. 4. H. G. Bock. Numerical treatment of inverse problems in chemical reaction kinetics. In K. H. Ebert, P. Deuﬂhard, and W. J¨ ager, editors, Modelling of Chemical Reaction Systems, volume 18 of Springer Series in Chemical Physics, pages 102–125. Springer, 1981. 5. H. G. Bock and J. P. Schl¨ oder. Numerical solution of retarded diﬀerential equations with state-dependent time lags. Zeitschrift f¨ ur Angewandte Mathematik und Mechanik, 61:269, 1981. 6. C. F. Curtiss and J. O. Hirschfelder. Integration of stiﬀ equations. Proc. Nat. Acad. Sci, 38:235–243, 1952. 7. T. A. Davis. Algorithm 832: UMFPACK - an unsymmetric-pattern multifrontal method with a column pre-ordering strategy. ACM Trans. Math. Software, 30:196–199, 2004. 8. M. Diehl, A. Walther, H. G. Bock, and E. Kostina. An adjoint-based SQP algorithm with quasi-newton jacobian updates for inequality constrained optimization. Technical Report Preprint MATH-WR-02-2005, TU Dresden, 2005. 9. E. Eich. Numerische Behandlung semi-expliziter diﬀerentiell-algebraischer Gleichungssysteme vom Index I mit BDF Verfahren. Master’s thesis, Universit¨at Bonn, 1987. 10. E. Eich. Projizierende Mehrschrittverfahren zur numerischen L¨ osung von Bewegungsgleichungen technischer Mehrk¨ orpersysteme mit Zwangsbedingungen und Unstetigkeiten. PhD thesis, Universit¨ at Augsburg, 1991. 11. A. Griewank. Evaluating Derivatives, Principles and Techniques of Algorithmic Diﬀerentiation. Number 19 in Frontiers in Appl. Math. SIAM, 2000. 12. A. Griewank, D. Juedes, and J. Utke. Algorithm 755: ADOL-C: A package for the automatic diﬀerentiation of algorithms written in C/C++. ACM Trans. Math. Softw., 22(2):131–167, 1996. 13. D. Lebiedz, J. Kammerer, and U. Brandt-Pollmann. Automatic network coupling analysis for dynamical systems based on detailed kinetic models. Physical Review E, 72:041911, 2005. 14. A. Walther. Program reversal schedules for single- and multi-processor machines. PhD thesis, TU Dresden, 2000. 15. R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1–2):3–35, 2001.

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design Christopher K. Anand1 , Stephen J. Stoyan2 , and Tam´ as Terlaky3 1 2 3

McMaster University, Hamilton, Ontario, Canada. [email protected] University of Toronto, Toronto, Ontario, Canada. [email protected] McMaster University, Hamilton, Ontario, Canada. [email protected]

Abstract A Variable Rate Selective Excitation (VERSE) is a type of Radio Frequency (RF) pulse that reduces the Speciﬁc Absorption Rate (SAR) of molecules in a specimen. As high levels of SAR lead to increased patient temperatures during Magnetic Resonance Imaging (MRI) procedures, we develop a selective VERSE pulse that is designed to minimize SAR while preserving its duration and slice proﬁle; called the generalized VERSE (gVERSE). After the formulation of a rigorous mathematical model, the nonlinear gVERSE optimization problem is solved via an optimal control approach. Using the state of the art Sparse Optimal Control Software (SOCS), two separate variations of SAR reducing gVERSE pulses were generated. The Magnetic Resonance (MR) signals produced by numerical simulations were then tested and analyzed by an MRI simulator. Computational experiments involved with the gVERSE model provided constant RF pulse levels and had encouraging results with respect to MR signals. The testing results produced by the gVERSE pulse illustrate the potential advanced optimization techniques have in designing RF sequences.

1 Introduction to the Problem Magnetic Resonance Imaging (MRI) produces high resolution crosssectional images by utilizing selective Radio Frequency (RF) pluses and ﬁeld gradients. Selective excitations are obtained by applying simultaneous gradient waveforms and a RF pulse with the appropriate bandwidth [LL01]. Many conventional RF pulse sequences are geared towards generating high deﬁnition images, but fail to consider the SAR (Speciﬁc Absorption Rate) of the excitation. High levels of SAR during MRI procedures can cause undesired side eﬀects such as skin burns. Thus, one needs to reduce SAR levels and this is the focus of our paper. Instead of using common approaches to approximating the Bloch equation, we integrate the Bloch equation in a nonlinear optimization model that is designed to minimize RF SAR levels. Several researchers have studied the selective RF excitation problem and employed diﬀerent optimization methods in their designs. Simulated annealing

26

C.K. Anand et al.

[She01], evolutionary algorithms [WXF91], quadratic optimization [CGN88] and optimal control techniques [CNM86, UGI04] are the most common. Although they produce solutions that relate to their desired proﬁle, they are computationally intensive and in many cases their design for the pulseenvelope consists of relaxed conditions. In [UGI04], the excitation design under the optimal control approach leads to an ill-conditioned algebraic problem, which stems from the models attempt to include the Bloch equation in the Chebyshev domain. In [CGN88], Conolly et al. design the Variable Rate Selective Excitation (VERSE) pulse that is aimed at reducing MRI SAR levels; however, their model does not incorporate penalties to trade oﬀ energy and adhesion to the desired slice proﬁle, nor does it incorporate relaxation. The problem still remains, as pointed out in [CNM86,CGN88]. There is no suﬃcient mathematical formulation for pulse-envelope design. In this paper we address this problem, and use our model to construct a dynamical nonlinear optimization algorithm that is aimed at reducing RF SAR levels. Using Conolly’s et al. idea, we design the generalized VERSE (gVERSE) pulse. The gVERSE pulse is a highly selective pulse that diﬀers from its originator with respect to how SAR is minimized. The preﬁx “g” was added to VERSE because our objective function directly encompasses the high demands of RF pulse levels by allowing the gradient waveform to freely vary. In addition, we have signiﬁcantly increased the dynamics of the VERSE problem, added additional constraints, and enhanced the degrees of freedom. Using our RF pulse formulation, we develop two separate pulse sequences that include variable slice gradients (listed as future work in [UGI04]). In this paper, we begin with a review of general RF pulse sequences that leads to the development of our new SAR reducing gVERSE pulse model. In Section 3, the gVERSE model is fully detailed and the accompanying Nonlinear Optimization (NLO) problem is formulated. The implementation issues involved in computing the gVERSE pulse are brieﬂy described in Section 4. In Section 5, the computational results for the gVERSE pulse are shown for two diﬀerent test cases. The results are graphically illustrated and then tested by an MRI simulation in Section 6, where they are analyzed and examined with respect to the MR signals they generate. Finally, in Section 7 we conclude on how our results and MRI simulations show that mathematical optimization can have a strong eﬀect on improving RF pulse sequences.

2 MRI Background To understand the implications and eﬀects of the gVERSE pulse we will begin with a short outline of our notation and a review of two diﬀerent types of RF pulse sequences. For more information with regards to the MR formulations and/or general RF pulses, one can refer to [Bus96, HBTV99, LL01]. To begin, we deﬁne the Bloch equation which provides the rate of magne− → tization (dM (t)/dt),

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

27

− → − → → − → 1 dM (t) 1− = γ M (t) × B (t) + (M0 − Mz (t)) z − M ⊥ (t) dt τ1 τ2 → − where t is time, B (t) is the external magnetic ﬁeld in the z-axis direction, γ is the gyromagnetic constant, and ⎤ ⎤ ⎡ ⎡ Mx (t) Mx (t) − → − → M (t) = ⎣ My (t) ⎦, and M ⊥ (t) = ⎣ My (t) ⎦ Mz (t) 0 are respectively the net and transverse magnetization vectors. Furthermore, z is the z-axis unit vector, M0 is the initial magnetization in the z direction, τ1 is the spin-lattice interaction parameter and τ2 is the spin-spin interaction parameter. In addition, we let ⎤ ⎡ bx (t) → − B (t) = ⎣ by (t) ⎦ , bz (t) where bx (t), by (t) and bz (t) are the external magnetization vector coordinates, which will be used in the formulation of the gVERSE pulse. 2.1 Generic RF pulse When processing an image, a number of precise RF pulses are applied in combination with synchronized gradients in diﬀerent dimensional directions. For a detailed analysis of this process one can look at [CDM90, HBTV99, LL01]. We would like to highlight that RF pulses are only aimed at a speciﬁc portion of the object or specimen that the user intends to image. In addition, the RF pulse is accompanied by a gradient waveform that is used to spatially modulate the signals orientation [Bus96]. There are many diﬀerent techniques in which RF and gradient waveforms can generate useable signals. Gaussian and sinc pulses are two of the many RF pulse sequences used today. Figure 1 is an illustration of a slice select sinc pulse [HBTV99]. Sinc pulses are successful at exciting particular magnetization vectors into the transverse plane that generate signal readings, however, they fail to account for side eﬀects such as SAR levels. The heating eﬀect experienced by patients during MRI procedures is measured by the level of SAR, which is a direct result of the RF pulse used. The level of SAR becomes particularly important with pediatric patients and as a result the FDA has strict limitations on SAR; which subsequently restricts RF pulse potential and other elements involved in MRI procedures. In addition, as MRI researchers are constantly developing faster scanners, higher tesla magnets, enhanced software components and improved RF coils; they are all still limited by SAR levels. Hence, RF pulses that consider such a factor are in high demand.

28

C.K. Anand et al.

RF Pulse G(t)

MR Signal

Figure 1. A generic NMR slice select SINC pulse imaging sequence

2.2 The VERSE Pulse Originally proposed by Conolly et al. [CGN88], VERSE pluses were designed to generate MR signals similar to generic RF pulses, however, low pulse SAR levels were incorporated into the model. As mentioned, the SAR of a selective RF pulse is a critical parameter in clinical settings and may limit the use of a particular pulse sequence if the SAR limit exceeds given FDA requirements [LL01]. Due to the high SAR levels of various RF pluses the scan time for given pulse sequences are restricted [CGN88]. The key innovation with VERSE pulses is to allow a “trade oﬀ” between time and amplitude. By lowering RF pulse amplitude the duration of the pulse may be extended [CGN88]. As illustrated in Figure 2, VERSE pulses are similar to generic pulses, how-

RF Pulse G(t)

MR Signal

Figure 2. The VERSE pulse imaging sequence

ever, they contain a ﬂattened center peak and their gradient waveform posses two additional steps. It is this uniform redistribution of the pulse area that allows the decrease in SAR. Conolly et al. designed three diﬀerent types of SAR reducing pulses, each that had constraints on the strength of the RF pulse, however, they diﬀered with respect to how they minimized SAR levels. The ﬁrst model consisted of a minimum–SAR facsimile pulse for a speciﬁed duration, whereby the gradient waveform and RF pulse were integrated in the objective and subject to maximum gradient and constant duration constraints.

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

29

The second model used a minimum time formulation approach, whereby it searched for the briefest pulse that did not exceed a speciﬁed peak–RF level. The pulse was optimized for time and constrained by maximum gradient and RF levels. The ﬁnal model, called the parametric gradient, constrained both the maximum gradient and slew-rate, and involved the parametric gradient and the RF pulse in the objective [CGN88]. The ﬁrst two models consisted of a maximum of 3κ + 1 variables, where κ was the total number of samples or RF pulses. The ﬁnal model involved κ(p + 1) + 1 variables, where p represented the dimension of a parameter vector. Experimentation proved that only 256 sample values were necessary, which kept the variable count relatively low [CGN88]. Of the three algorithms, the parametric formulation oﬀered the most robust SAR minimization, however, the design still had areas for improvement as the results contained gradient and RF timing mismatches. Subsequently, further experimentation was necessary with VERSE pulses, as Conolly et al. were the ﬁrst to introduce this innovative concept.

3 The gVERSE Model Conolly et al. [CGN88] showed that SAR can be reduced by combined RF/gradient reductions and time dilations, starting with an initial pulse design. For our research we would like to search a larger parameter space by allowing arbitrary gradient waveforms (subject to machine constraints), including sign changes. The gVERSE pulse is illustrated in Figure 3; our aim is to lower RF pulse energy and more evenly distribute the RF pulse signal. This ﬂattened redistribution of the pulse will allow for a longer signal reading

RF Pulse G(t)

MR Signal

Figure 3. The gVERSE pulse imaging sequence

and potentially cause an even greater decrease in the level of SAR than the original VERSE pulse. Mathematically, this is the same as minimizing the → − external magnetic ﬁeld generated by the RF pulse ( B rf (t)), and therefore our objective is

30

C.K. Anand et al.

min SAR =

T

→ − | B rf (t)|2 dt =

T

b2x (t) + b2y (t)dt,

0

0

where T is the time at the end of the RF pulse and ⎤ ⎡ bx (t) → − B rf (t) = ⎣ by (t) ⎦ . 0 As MRI is based on the interaction of nuclear spin with an external magnetic → − → − ﬁeld, B rf (t) is simply the vertical and horizontal components of B (t). Also, if low pulse amplitudes are produced by the gVERSE pulse, the duration T of the pulse can be increased. Another part of MRI comes from the fact that since all magnetization vectors are spinning, there exists a rotational frame of reference. If we set up our equations in the rotating frame of reference then we exclude the uniform magnetic ﬁeld generated by the main super-conducting magnet, B0 . Instead, → − we are left with the magnetic ﬁeld of our RF pulse, B rf (t), and our gradient ⎡ ⎤ 0 → − G (t, s) = ⎣ 0 ⎦ , sG(t) where sG(t) is the gradient value at coordinate position s. The primary function of the gradient is to produce time-altering magnetic ﬁelds such that the MR signal can be spatially allocated [HBTV99]. Hence, diﬀerent parts of a specimen experience diﬀerent gradient ﬁeld strengths. Thus, by multiplying a constant gradient value by diﬀerent coordinate positions s, we have potentially produced an equivalent linear relationship to what is used in practice. Fundamentally, coordinate positions s split a specimen or object into “planes” or “slices” along the s direction, which for the purposes of this paper will be parallel to z, as depicted in Figure 4. Here, s corresponds to a speciﬁc coordi-

y

z

x

S

Figure 4. Specimen or object separated into planes or slices about the z-axis

nate value depending on its respective position and further it has a precise and

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

31

representative gradient strength. A RF pulse excites particular magnetization vectors into the transverse (x, y) plane where a signal is generated that is eventually processed into an image. In MRI a voxel corresponds to the unit volume of protons necessary to produce graphic information [HBTV99], and as this is directly related to a group or unit volume of magnetization vectors we will use the word voxel and magnetization vector interchangeably. Thus, s allows us to distinguish between voxels that are excited into the transverse plane by a RF pulse and those that are not. Coordinate positions, s, of voxels that are excited into the transverse plane will be recorded and referred to as being “in the slice.” Magnetization vectors that are not tipped into the transverse plane will be referred to as being “outside the slice.” Since any specimen or object we intend to image will have a ﬁxed length, given s ∈ S, we will restrict S by choosing a ﬁnite set S ⊂ R. S can then be further partitioned into the disjoint · out , where Sin represents the coordinate positions in the union of sets Sin ∪S slice and Sout represents the positions together with the magnetization vectors that we do not wish to tip into the transverse plane, i.e. those which are outside the slice. For each coordinate position s ∈ S we add constraints corresponding to the Bloch equation, however, boundary constraints correspond to diﬀerent conditions depending on the position of the slice, as we will discuss later. Fundamentally, voxels in Sin , ensure uniform magnetic tipping into the transverse plane, whereas s ∈ Sout , certify that external magnetization is preserved. → − Thus, we now have the magnetic ﬁeld B (t, s) with respect to coordinate positions s, whereby bx (t) and by (t) are independent of s, hence → − → − → − B (t, s) = B rf (t) + G (t, s). → − Also, since B (t, s) has divided the z component of our external magnetization into coordinate components, the same notation must be introduced into our net magnetization. By adding coordinate positions s to the magnetization vector we have, ⎤ ⎡ Mx (t, s) − → M (t, s) = ⎣ My (t, s) ⎦ . Mz (t, s) In addition, since VERSE pulses typically have short sampling times we will assume the same for the gVERSE pulse and thus omit proton interactions and relaxation. Therefore, including positions s into the Bloch equation, we are left with − → − → → − dM (t, s) = γ M (t, s) × B (t, s). dt Hence, we have ⎤⎡ ⎤ ⎡ Mx (t, s) 0 −sG(t) by (t) − → → − 0 −bx (t) ⎦ ⎣ My (t, s) ⎦ , M (t, s) × B (t, s) = ⎣ sG(t) −by (t) bx (t) Mz (t, s) 0

32

C.K. Anand et al.

and ﬁnally

⎤ ⎡ − → 0 −sG(t) by (t) − → dM (t, s) 0 −bx (t) ⎦ M (t, s). = γ ⎣ sG(t) dt 0 −by (t) bx (t)

(1)

When stimulating a speciﬁc segment of a specimen by a RF pulse, some of the magnetization vectors are fully tipped into the transverse plane, partially tipped, and those lying outside the slice proﬁle are minimally aﬀected. The magnetization vectors that are only partially tipped into the transverse plane are described as having oﬀ-resonance and tend to disrupt pulse sequences and distort the ﬁnal MRI image [HBTV99]. In anticipation of removing such inhomogeneities we introduce the angle α, at which net magnetization moves from the z direction to the transverse plane. By convention, α will be the greatest at the end of our RF pulse, at time T , and since we are in the rotating frame we can remove the y-axis from our equations. Thus, we can eliminate oﬀ-resonance s coordinates by bounding voxels aﬀected by the pulse ⎡ ⎤ ⎡ ⎤ M0 sin(α) Mx (T, s) ⎣ ⎦ − ⎣ My (T, s) ⎦ ≤ ε1 , 0 M0 cos(α) Mz (T, s) and those in Sout , with α = 0, hence ⎡ ⎤ ⎤ ⎡ 0 Mx (T, s) ⎣ 0 ⎦ − ⎣ My (T, s) ⎦ ≤ ε2 , M0 Mz (T, s) where ε1 , ε2 ≥ 0. By comparing these two bounds we can determine the s coordinates from which we would like the signal to be generated and exclude oﬀ-resonance. Another factor we must integrate into our pulse is slew rate W (t), also called gradient rise time. This identiﬁes how fast a magnetic gradient ﬁeld can be ramped to diﬀerent gradient ﬁeld strengths [CGN88]. As a result, higher slew rates enable shorter measurement times since the signal generated by the RF pulse to be imaged is dependent on it. Thus, the slew rate and gradient ﬁeld strength together determine an upper bound on the speed and ultimately minimum time needed to perform the pulse. Thus, there must be a bound on these two entities in our constraints, |G(t)| ≤ Gmax , dG(t) ≤ Wmax . W (t) = dt Finally, we have the semi-inﬁnite nonlinear optimization problem T b2x (t) + b2y (t)dt , min SAR = 0

(2)

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

33

subject to, ⎤ ⎡ − → 0 −sG(t) by (t) − → dM (t, s) 0 −bx (t) ⎦ M (t, s), = γ ⎣ sG(t) dt 0 −by (t) bx (t) ⎡ ⎤ ⎡ ⎤ M0 sin(α) Mx (T, s) ⎣ ⎦ − ⎣ My (T, s) ⎦ ≤ ε1 , 0 M0 cos(α) Mz (T, s) ⎡ ⎤ ⎤ ⎡ 0 Mx (T, s) ⎣ 0 ⎦ − ⎣ My (T, s) ⎦ ≤ ε2 , M0 Mz (T, s) |G(t)| ≤ Gmax , dG(t) dt ≤ Wmax , Mx (0, s) = 0, My (0, s) = 0, Mz (0, s) = M0 ,

(3)

(4Sin )

(4Sout ) (5) (6) (7)

where equations (2) – (7) hold for ∀ s ∈ S, t ∈ [0, T ]. Thus, depending on our bound for the pulse, we will construct two sets of constraints, one for the voxels Sin ⊂ R that will be excited by the RF pulse and one for those that will not, Sout ⊂ R. Which indices are aﬀected will be determined by the constraints (4Sin ) and (4Sout ). 3.1 Discretization By separating our specimen into coordinate positions we have ultimately created two dimensional segments that are similar to records in a record box, whereby s ∈ S represents the transverse plane at a particular position. Now we will discretize S into coordinate positions s1 , s2 , . . . , sn , where n is the total number of slices. As we have discussed earlier, Sin is the coordinate positions whose magnetization vectors have been tipped into the transverse plane by a RF pulse. Next we can deﬁne the ﬁnite band of particular coordinate positions in Sin to consist of positions sk , . . . , sk+δ , where 1 < k ≤ k + δ < n, δ ≥ 0 and k, δ ∈ Z. Subsequently Sout , which was deﬁned as positions that were not excited in the transverse plane, will consist of all coordinate positions not in Sin , hence, Sout = s1 , . . . , sk−1 , s(k+δ)+1 , . . . , sn . Figure 5 represents how si ∈ S for i = 1, . . . , n would separate magnetization vectors into coordinate positions that have been tipped into the transverse plane, and those that have not. One should also note that we have only discretized with respect to coordinate positions si ∈ S, not time t. Furthermore, we will deﬁne the ﬁrst coordinate position in Sin where RF pulse stimulation begins as s, and similarly, the last position in Sin where stimulation ends as s. Thus, we have s = sk and s = sk+δ , and we can now state the coordinate positions in the slice

34

C.K. Anand et al.

s1

... sk-1

sk

... sk+

s k+

+1

... sn

Figure 5. Separating magnetization vectors into coordinate positions which are in the slice, Sin , and out, Sout

as Sin = [s, s]. The ﬁrst position where RF stimulation is a minimum, closest to s, but in Sout and towards the direction of s1 , will be deﬁned as sl . As well, the same will be done for the position closest to s, which is in Sout and towards the direction of sn , deﬁned as su . Consequently, sl = sk−1 and su = s(k+δ)+1 , and therefore the coordinate positions outside the slice can be represented as · u , sn ]. As depicted in Figure 5, Sin is located between the Sout = [s1 , sl ]∪[s two subintervals of Sout , where si ∈ Sin is centered around 0, leaving Sout subintervals, [s1 , sl ] < 0 and [su , sn ] > 0. As well, [s1 , sl ] and [su , sn ] are symmetric with respect to each other, hence, the length of these subintervals are equivalent, sk−1 − s1 = sn − s(k+δ)+1 . Furthermore, the diﬀerence between respective coordinate positions within each interval are equal to one another such that, s2 − s1 = sn − sn−1 s3 − s2 = sn−1 − sn−2 .. .. . .

(8)

sk−1 − sk−2 = s(k+δ)+2 − s(k+δ)+1 . Also note that the discretization points, si , within any interval [s1 , sl ], [s, s] and [su , sn ] do not necessarily have to be uniformly distributed and thus, more or less coordinate positions can be positioned closer to the boundaries of Sin and Sout . The distance between coordinate positions (sl , s) and (s, su ) will be much larger in comparison to other increments of si . This is typically the area where voxels that have oﬀ-resonance characteristics are located. As mentioned earlier, magnetization vectors having oﬀ-resonance tend to disrupt pulse sequences and distort the MRI image. For this reason we will deﬁne tolerance gaps S0 of ﬁnite length between (sl , s) and (s, su ), where oﬀ-resonance · out ∪S · 0 where prominently resides. Hence, S can now be partitioned into Sin ∪S a general sequence of the intervals would be Sout , S0 , Sin , S0 , Sout . 3.2 gVERSE Penalty An important component of the model now becomes evident, the nonlinear optimization problem deﬁned in (2) – (7) may be infeasible or diﬃcult to solve

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

35

as the number n of si ∈ S becomes large and the slices are close together. In particular, constraints (4Sin ) and (4Sout ) pose a threat to the feasibility of the problem as the number of discretization points increase. A penalty for the violation of these constraints can be imposed such that an optimal solution is located for problems with large numbers of variables and small distances between si coordinate positions. The basic idea in penalty methods is to relax particular constraints and add a penalty term to the objective function that prescribes high cost to infeasible points [Ber95]. The penalty parameter determines the severity of violation and as a consequence, the extent to which the resulting unconstrained problem approximates the original constrained one. Thus, returning to the semi-inﬁnite nonlinear optimization problem formulated at the start of Section 3, we introduce penalty variables ξ1 and ξ2 to constraints (4Sin ) – (4Sout ), and the optimization problem objective becomes T min SAR = b2x (t) + b2y (t) dt + ξ1 ζ1 + ξ2 ζ2 , (9) 0

subject to constraints (3), (5) – (7), and ⎡ ⎤ ⎡ ⎤ M0 sin(α) Mx (T, si ) ⎣ ⎦ − ⎣ My (T, si ) ⎦ ≤ ε1 + ξ1 , 0 M0 cos(α) Mz (T, si ) ⎡ ⎤ ⎤ ⎡ 0 Mx (T, si ) ⎣ 0 ⎦ − ⎣ My (T, si ) ⎦ ≤ ε2 + ξ2 , M0 Mz (T, si )

(10Sin )

(10Sout )

where ζ1 , ζ2 ∈ R are scalar penalty parameters and as in the earlier equations of Section 3, (9) – (10Sout ) apply ∀ s ∈ S, t ∈ [0, T ]. One should note that the larger the value of ζ1 and ζ2 , the less violated constraints (10Sin ) and (10Sout ) become. In addition, as it is written the penalty variables are applied to each si ∈ S for constraints (10Sin ) and (10Sout ). However, depending on computational results, it may be appropriate to only penalize coordinate positions in the neighbourhood of the bounds [sl , s] and [s, su ]. This would enhance the constraints on the optimization problem and only allow violations to occur at the most vulnerable points of the problem. Adding penalty variables and parameters to our optimization problem is an option that may not be necessary. It is dependent on the number n of coordinate positions applied to the model as well as how close we would like si ∈ S to be to one another. Hence, for the remainder of this paper we will omit writing out the penalty variables and parameters, however, the reader should note that they can easily be incorporated into the formulation.

4 Results The gVERSE pulse was designed to improve RF pulse sequences by minimizing SAR levels while upholding MRI resolution, however, the complex

36

C.K. Anand et al.

mathematical requirements of the model may be diﬃcult to satisfy. Even simple NLO problems with large numbers of variables can be challenging to solve and threatens many software packages. Thus, when attempting to minimize the objective function in (2) under the constraints (3) – (7), the number of variables implemented was especially important. Preliminary results were found by implementing the gVERSE model using ﬁve coordinate positions and the SQP based Optimal Control software package SOCS solved the demanding time dependent NLO problems. This kept the variable count to a minimum of 19 (3n + 4), excluding the independent time variable t. The number of slices were systematically increased until software limitations on memory became a factor. Nonetheless, this was a remarkably larger number of variables than anticipated as it accounted for 15 slices with a total of 38 857 variables. By experimenting and consulting the literature, realistic MRI values for the constants were used during each computation. Namely, γ = 42.58 Hz/mT, Gmax = 0.02 mT/mm and Wmax = 0.2 mT/mm/ms, where Hz is Hertz, mm is millimeters, ms is milliseconds, and mT is millitelsa. The magnetization vectors in Sin were fully tipped into the transverse plane, hence, α = π/2. The magnitude of the initial magnetization vector for each coordinate position had an initial magnetization value of M0 = 1.0 spin density units. Initially, we chose ε1 , ε2 ≤ 0.1, however, as the number of variables increased for the problem, the larger the value of ε1 and ε2 had to be in order to ﬁnd a feasible solution, hence, ε1 , ε2 = 0.1 for the 15 slice results. Sout

Sin

Sout

s1, ..., s 6 s1, ...,s 6

s , s , s9 s 77, 8 ...,s 9

s 10, ..., s 15 s 10, ...,s 15

Figure 6. The separation of coordinate positions si into Sin and Sout for 15 magnetization vectors

4.1 Fifteen Slice Results The results for the 15 slice problem accounted for the largest number of variables that SOCS could solve. The problem became even more challenging as the distance from s to sl and s to su decreased. For smaller distances between the magnetization vectors in Sin and Sout penalty variables and parameters had to be incorporated into the formulation of the problem. We will begin with the 15 slice results without penalty, where a greater distances between s to sl and s to su were used.

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

37

Since there were 15 slices, the three middle magnetization vectors were tipped into the transverse plane to ensure that the symmetric structure of the problem was maintained. Hence, coordinate positions s7 , s8 and s9 were in Sin , while s1 , s2 , . . . s6 and s10 , s11 , . . . s15 remained in Sout . The arrangement of the coordinate positions is shown in Figure 6 and the exact values for the coordinate positions are as follows: −30 −28 −26 −24 −22 −20 −0.2 0 0.2 20 22 24 26 28 30 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 which is in mm. The results for the 15 slice coordinate simulation is illustrated in Figures 7, 8 and 11. Information on the magnetic vector projection is shown in the graphs found in Figures 7 – 8. Due to the symmetric structure of the problem, voxels s1 , . . . , s6 and s10 , . . . , s15 were identical, as were s7 and s9 . Hence, only the ﬁrst eight coordinate positions are shown. Thus, Figure 7 – 8 corresponds to magnetization vectors in Sout and Sin . The resulting RF Magnetization Vector s

2

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6 M (t,s )

1 0.9

2

1

M (t,s )

Magnetization Vector s1

0.5

z

z

0.5 0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 0.15

0 0.15

0.1

0.1

0.05 0.05

0 0

0.1

0.1

0.05

0.05

0 0

−0.05 −0.05

−0.1

My(t,s1)

−0.05 −0.05

,

Mx(t,s1)

−0.1

M (t,s ) y

M (t,s ) x 2

2

Magnetization Vector s3

Magnetization Vector s

4

1

1 0.9

0.8

0.8 0.7 Mz(t,s4)

0.6

Mz(t,s3)

0.6 0.5

0.4

0.4 0.3

0.2

0.2 0 0.2

0.1 0 0.15

0.15

0.1

−0.05

0

0

−0.05

0

0.05

0.05

0

0.05

My(t,s3)

0.1

0.1

0.05

0.1

−0.05

−0.1 Mx(t,s3)

,

My(t,s4)

−0.05 −0.1

−0.1 Mx(t,s4)

Figure 7. From Top to bottom, magnetization vectors corresponding to coordinate positions s1 , s2 , s3 and s4

pulse procedure, represented by the external magnetization components and

38

C.K. Anand et al. Magnetization Vector s

Magnetization Vector s

6

5

1

1 0.9

0.8

0.8 0.7 Mz(t,s6)

0.6

Mz(t,s5)

0.6 0.5

0.4

0.4 0.3

0.2

0.2 0 0.1

0.1 0 0.1

0.05

0.1 0.05

0.1

0.05 0

0

0 −0.05 −0.1

−0.1

My(t,s5)

5

Mx(t,s6)

8

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 M (t,s )

8

0.5

0.5

z

Mz(t,s7)

−0.05

Magnetization Vector s

Magnetization Vector s7

1

0 −0.1

My(t,s6)

,

M (t,s ) x

0.05 −0.05

−0.05

0.4

0.4

0.3

0.3 0.2

0.2 0.1

0.1 0

0 M (t,s ) y

7

−8

x 10

1 0 −1 0

0.1

0.2

0.3

0.4

0.5 M (t,s ) x

7

0.6

0.7

0.8

0.9

My(t,s8)

1

,

−8

x 10

−1

0

0 1

0.1

0.2

0.3

0.4

0.5 M (t,s ) x

0.6

0.7

0.8

0.9

1

8

Figure 8. From top to bottom, magnetization vectors corresponding to coordinate positions s5 , s6 , s7 and s8

the gradient waveform is shown in Figure 11; bx (t) is not shown as it was constant and equal to zero. One can observe that the precession of the magnetization vectors in Sout is evident, this is shown in the graphs of Figures 7 – 8. The initial point is close to the voxels precession range and at most it takes one full rotation for them to orbit uniformly. The magnetization vectors in Figure 8, those si that belong to Sin , smoothly tip into the transverse plane without any cusps or peaks. There are small diﬀerences between s7 and s8 as they begin to tip into the transverse plane, however, they act very similar after their height decreases below 0.8 spin density units. In Figure 11, the gradient waveform starts oﬀ negative and then ends up positive. It is not a smooth curve since it is composed of many local hills and valleys. Also, the gradient seems to be the opposite of what is used in practical MRI sequences, however this proves to be a proﬁcient sequence as we will investigate in the next Section. Finally, the external magnetization components, bx (t) and by (t), are constant and linear, precisely what we optimized for in the objective function. The value of bx (t) is zero mT/mm, while by (t) of Figure 11 has a constant value of 0.01925 mT/mm.

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design Magnetization Vector s

Magnetization Vector s

2

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 Mz(t,s2)

Mz(t,s1)

1

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2 0.1

0.1 0 0.6

0 0.6

0.5

0.5 0.4

0.4 0.2

0.2

0

0

0

0 −0.2

−0.2 −0.4

−0.5

−0.4

,

Mx(t,s1)

My(t,s1)

−0.5

My(t,s2)

M (t,s ) x

2

Magnetization Vector s

Magnetization Vector s3

4

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6 4

M (t,s )

0.6 0.5

0.5

z

Mz(t,s3)

39

0.4

0.4 0.3

0.3 0.2

0.2 0.1

0.1 0 0.6

0 0.6

0.5

0.2

0

0.2

0.5

0.4

0.4 0 −0.2 My(t,s3)

0 0 −0.2

−0.5 Mx(t,s3)

,

My(t,s4)

−0.5 Mx(t,s4)

Figure 9. From top to bottom, magnetization vectors corresponding to coordinate positions s1 , s2 , s3 and s4

4.2 Fifteen Slice Penalty Results To increase the distance between the coordinate positions that were tipped into the transverse plane and allow a smooth transition between magnetization vectors in Sin and Sout , penalty variables and parameters were introduced. As described in Section 3.2, penalty variables were added to each si vector in constraints (10Sin ) and (10Sout ) in order to decrease the distance between [s, sl ] and [s, su ]. The remaining variables, constants, and constraints were consistent with what was used in the 15 slice results. The exact values for the coordinate positions were as follows: −30 −28 −26 −24 −22 −20 −2 0 2 20 22 24 26 28 30 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 ξ2 ξ2 ξ2 ξ2 ξ2 ξ2 ξ1 ξ1 ξ1 ξ2 ξ2 ξ2 ξ2 ξ2 ξ2 where the positions that were penalized have their respective penalty variables listed below them. Notice that with the addition of penalty variables and parameters the distance from s7 to s9 increased to 4 mm, compared to the 0.4 mm diﬀerence in the 15 slice results on page 37. This allowed the diﬀerence

40

C.K. Anand et al.

between the vectors in Sin and Sout to be reduced. The results for the penalized 15 coordinate simulation is illustrated in Figures 9, 10 and 12, where the value of the penalty parameters were ζ1 = 100 and ζ2 = 100. The proﬁles of the magnetic moments are shown in Figures 9 – 10. Again, due to the problems symmetry we have omitted the graphs of the magnetization vectors corresponding to coordinate positions s9 , . . . , s15 . Hence, Figure 9 and the top two graphs in 10 correspond to magnetization vectors in Sout , whereas the bottom two graphs in Figure 10 refer to the coordinate positions in Sin . The resulting RF pulse procedure, represented by the external magnetization components and gradient sequence is shown in Figure 12; again bx (t) is not shown as it was constant and equal to zero. Magnetization Vector s

Magnetization Vector s

5

6

1

1

0.9

0.9 0.8

0.8

0.7

0.7

0.6 Mz(t,s6)

Mz(t,s5)

0.6 0.5 0.4

0.5 0.4 0.3

0.3

0.2 0.2

0.1 0.1

0 0.6

0.5

0 0.6

0.5 0.4

0.5

0.4

0

0.3

0.2

0.1

0

0.2

0 0

−0.5

−0.1

,

Mx(t,s5)

My(t,s5)

−0.2

M (t,s ) y

Mx(t,s6)

Magnetization Vector s7

Magnetization Vector s

8

1

1

0.9

0.9

0.8

0.8

0.7

0.7 0.6 8

M (t,s )

0.6 0.5

0.5

z

Mz(t,s7)

−0.5

6

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

0 −0.05 My(t,s7)

My(t,s8) 0

0.05 0

0.1

0.2

0.3

0.4

0.5 M (t,s ) x

7

0.6

0.7

0.8

0.9

1

−9

,

x 10

−0.5 −1

0

0.1

0.2

0.3

0.4

0.5 Mx(t,s8)

0.6

0.7

0.8

0.9

1

Figure 10. From top to bottom, magnetization vectors corresponding to coordinate positions s5 , s6 , s7 and s8

As illustrated, the precession of the magnetization vectors in Sout , Figures 9 – 10, have a much larger radius than that of the 15 slice problem. In fact, these magnetization vectors have at most three successive orbits in the entire time duration. The magnetization vectors in Figure 10, those si that belong to Sin , smoothly tip into the transverse plane and there is a greater similarity between s7 and s8 than in the preceding results. However, due to the penalty

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

41

variables these vectors only tip down to a spin density value of 0.2. Also, the y-axis is larger than was in the 15 slice problem, this is because the My (t, ·) vectors are increasing as they descend into the transverse plane. In Figure 12, the gradient waveform contains two large peaks. The ﬁrst is negative and it starts about one quarter into the time period. The second peak is positive and it starts approximately three quarters into the time period. Also, the gradient sequence has three linear segments. One that is zero at the start of the sequence and the other two occur within the peaks, each having a value of exactly ±Gmax . For the external magnetization components, bx (t) is again constant and has a value of zero mT/mm. Although the axis of by (t) in Figure 12 has been magniﬁed, it is not as linear as the previous results and has increased to a value of approximately 0.10116 mT/mm. Nevertheless, this is still less than the amplitude for a conventional pulse, such as the one illustrated in Figure 1, which has a typical by (t) value of approximately 0.7500 mT/mm. In fact, if we look at the value of the objective function in (2), the 15 slice penalty results have an objective value of 0.1874 SAR units, whereas the generic RF pulse produced a value of 0.5923 SAR units. The 15 slice results generated the lowest objective value of 0.0385 SAR units. 15 Slice Results Gradient Sequence (G(t)) 0.02

0.01928

0.015

0.01927

0.01

0.01926

0.005

Gradient (mT/mm)

0.01929

0.01925

y

b Pulse

by(t) RF Pulse Sequence

0.01924

0

−0.005

0.01923

−0.01

0.01922

−0.015

0

0.2

0.4

0.6

0.8

1 Time (ms)

1.2

1.4

1.6

1.8

2

−0.02

,

0

0.2

0.4

0.6

0.8

1 Time (ms)

1.2

1.4

1.6

1.8

2

Figure 11. External magnetization component by (t) and gradient sequence G(t) for the 15 slice results, bx (t) is zero

Using the simulated gVERSE magnetization results, we produce two different graphs showing the transverse and longitudinal magnetization proﬁles. The desired magnetization distributions for a 900 RF pulse with 15 coordinate positions are shown in Figure 13, for further information about the desired proﬁles the reader may consult [CNM86], [HBTV99]. The transverse magnetization proﬁle illustrated in Figure 14 is very similar to what is desired as given in Figure 13. The Mx magnetization component is free of ripples and contains the requested step function. The My magnetization proﬁle is included to illustrate its minimal presence. One should note that the lower axis in Figures

42

C.K. Anand et al. 15 Slice Penalty Results Gradient Sequence (G(t))

by(t) RF Pulse Sequence 0.1013

0.02

0.015

0.01

by Pulse

Gradient (mT/mm)

0.1012

0.005

0

−0.005

0.1011

−0.01

−0.015

0.1010

−0.02 0

0.2

0.4

0.6

0.8

1 Time (ms)

1.2

1.4

1.6

1.8

2

,

0

0.2

0.4

0.6

0.8

1 Time (ms)

1.2

1.4

1.6

1.8

2

Figure 12. External magnetization component by (t) and gradient sequence G(t) for the 15 slice penalty results, bx (t) is zero

13 – 14 represents the magnetization vectors coordinate positions and from the results this corresponds to a distance of 60 mm. The longitudinal magnetization proﬁle in Figure 14 is also similar to what is implored, however, the Mz dip is slightly higher than desired. In Figure 14 it is important to note that our resultant proﬁles have no ripples extending past the slice of interest, which is not the case for the results of [CNM86], [CGN88] and [UGI04]. By virtually omitting ripples in our magnetization proﬁles we potentially reduce aliasing and other such factors that disrupt MR image resolution.

5 Image Reconstruction To obtain an idea of how the gVERSE pulse performs with respect to MR imaging, we provide a simple illustration of its behaviour. First, one should be familiar with how the signal produced by the RF pulse is mathematically ampliﬁed, digitized, transformed, and then combined together with other signals to form a ﬁnal image [CDM90, HBTV99, LL01, Nis96]. There are several techniques that can be used to produce a ﬁnal image, however, the core of the systematic procedure is the same for all methods. For the purpose of our analysis we use 1D imaging coverage. 5.1 gVERSE Simulation An MRI simulation was implemented in Matlab to test the performance of the gVERSE pulse where using the Bloch equation we created an environment similar to that which is occurring in practical MRI. Thus, by feeding the optimized RF and gradient gVERSE values to a program that simulates the behaviour of a portion of a human spinal cord, we can show how the gVERSE

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

43

Desired Magnetization Profile 1 Mx 0.9 0.8

Desired Magnetization Magnitude

0.7 0.6 0.5 0.4 0.3 0.2 0.1 Mz 0 1

3

5

7

9

11

13

15

Coordinate Position

Figure 13. Desired Mx and Mz distribution proﬁles for a 900 pulse Logitudinal Magnetization Profile Transverse Magnetization Profile 1.5

1

0.8

1

Logitudinal Magnetization

Transverse Magnetization

Mx

0.5

0.6

Mz

0.4

My 0

0.2

−0.5

1

3

5

7

9

11

13

0

15

1

3

5

,

Coordinate Position

7

9

11

13

15

Coordiante Position

Figure 14. Transverse magnetization components highlighting Mx and My magnitudes (left), Longitudinal Mz magnetization component magnitude (right)

MR signal performs. Speciﬁcally, the gVERSE values of G(tj ), bx (tj ) and by (tj ) for j = 1, . . . , N were read into the Bloch equation (1) for magnetization vectors at diﬀerent s1 , . . . , sn positions. Although we used a total of n coordinate positions in the optimization of our model, the RF pulse and gradient sequence can be applied to > n positions for imaging purposes. Thus, given n > n coordinate positions, N time discretizations, the initial magne− → tization vector (M 0 ) in the z direction, the Bloch equation was numerically integrated for each si value and j = 1, . . . , N . The VESRE pulse sequence, G(tj ), bx (tj ) and by (tj ), was then inserted into the integral of − → M (t, si ) =

tN

t1

− → dM ( t, si ) dt, dt

(11)

for i = 1, . . . , n and where t = [t1 , t2 , . . . , tN ]T . The values for the magnetization vectors were then converted into a signal by simulating the ampliﬁcation and digitization used in MRI. For a complete description of how (11) was

44

C.K. Anand et al.

integrated and ampliﬁed one can refer to [Sto04]. At this step we would be able to investigate the signal produced by our simulation and examine its properties. Using the gVERSE gradient and RF pulse sequence many MRI simulations were conducted over various tissues. We will show one of the results, for more simulation examples the reader can see [Sto04]. As there was relatively no diﬀerence with regards to simulation results using either of the gVERSE cases, the 15 slice penalty results are shown as they were the better of the two. Using cerebrospinal ﬂuid, the most graphically signiﬁcant results were tested by placing the tissue on an angle, as shown in Figure 15. As the signal generated by the pulse has a direct relationship with that of the tissues spin density, each tissues spin density value was substituted into M0 at its respective position. Thus, a spin density value of 1.0 for cerebrospinal ﬂuid was used when performing the MR imaging simulation. Also note, the gVERSE pulse was designed to tip only the magnetization vectors in Sin into the transverse plane. Thus, the coordinate positions si ∈ Sin would produce a peak in the signal when the gVERSE pulse reaches the cerebrospinal ﬂuid for these si ∈ Sin voxels. As detailed in the preceding sections, voxels si ∈ Sin are located at the center coordinate positions, Figure 16.(A) represents the signal Cerebrospinal Fluid Image Positioning

s Coordinate Position

Sout

Cerebrospinal Fluid

Sin

Sout

x Coordinate Position

Figure 15. The angular position of cerebrospinal ﬂuid to be imaged by our MRI simulation

generated after the gVERSE pulse and gradient waveform was used to excite particular voxels within the cerebrospinal ﬂuid into the transverse plane. As it is shown in Figure 16.(A), the large central peak in the signal represents when the gVERSE pulse reaches the voxels in Sin of the ﬂuid. The peak in the center of the ﬁgure is very distinctive and although noise was not integrated into the simulation, the signal produced a strong step function. Figure 16.(B) represents the signal produced when a generic sinc pulse and gradient waveform is used. In comparing Figure 16.(A) to Figure 16.(B), one can see that the signal produced by the gVERSE pulse has a highly distinctive central

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design B: Generic RF Sinc Signal

Signal Amplitude

Signal Amplitude

A: gVERSE Signal

45

x Coordinate Position

,

x Coordinate Position

Figure 16. The signal produced by the gVERSE pulse MRI simulation over the diagonal cerebrospinal ﬂuid (A), and when a generic sinc RF pulse and gradient sequence is applied (B)

peak and a much clearer division with regards to what is tissue and what is not. The base of the signal in Figure 16.(A) is also more representative of when the voxels in Sin reach the ﬂuid, which is not the case for the sinc pulse. In addition, the objective value, which deﬁnes the strength of the RF pulse necessary to produce such a signal, was 0.1874 SAR units for the gVERSE pulse, substantially lower than that of the conventional pulse, which had an objective value of 0.5923 SAR units.

6 Conclusions and Future Work We designed the gVERSE model to reduce the SAR of RF pulses by main→ − taining a constant RF pulse strength ( B rf value) and generating high quality MR signals. It was shown that the gVERSE results produced strong MR signals with clear divisions of the location of the tissue being imaged. For this reason various MRI studies utilizing gVERSE pulses could be developed in the near future. The observations noted in Section 4 deserve some additional reasoning and explanation. To begin, the reader should understand that the symmetry displayed between coordinate position vectors in each of the result cases was precisely designed in (8) of the gVERSE model. However, the precession illustrated by the magnetization vectors was not directly part of the gVERSE design, it was a consequence of the Bloch constraint (3). Nonetheless, the precession shown in our results validated our design since it occurs within the nucleus of atoms in vivo. Furthermore, investigating the precession of the magnetization vectors in the 15 slice results, it was shown that they had a much tighter radial orbit than the 15 slice penalty results. This was due to the fact that the penalty parameters allowed the feasible range of the constraints

46

C.K. Anand et al.

on these variables to be larger. With respect to precession, the 15 slice results were the most realistic. However, penalty variables in the 15 slice penalty results allowed the span of the magnetization vectors in Sin to be fairly large, which is probably what occurs in practice. In addition, investigating only the coordinate positions in Sin one should note that the penalty variables relaxed the constraints of the 15 slice penalty results, which did not induce the wavelike motion found in the vectors of the 15 slice results. One could conclude that in order to have improved transverse tipping and increase the length of magnetization vectors in Sin , larger ε2 values are necessary, however, whether or not such a large precessional value is a realistic approximation would then become a factor. The aim of the gVERSE pulse was to minimize SAR by maintaining a constant RF pulse (bx (t) and by (t) values), which was established in both of the results. Although the values of bx (t) were identical for both cases, by (t) values increased as the distance between the slices in Sin became larger. This was expected since an increase in the distance between s and s would require additional energy to tip the voxels into the transverse plane, yielding an increase in the strength of the RF pulse, or larger by (t) value. The by (t) values for the penalty results were the greatest and were not as constant as the other case. This was again due to the penalty variables and parameters, however, the nonlinear portions of the by (t) graph only had small diﬀerences with respect to the other values; and they were lower. Also, when comparing the gVERSE pulse to conventional pulses the gVERSE objective value was lower for all cases, and hence, did not require as much energy to tip the magnetization vectors into the transverse plane. Finally, the most surprising part of the gVERSE pulse results is the gradient waveform. Since we optimized for the RF pulse in our model, this process returned the gradient waveform that would allow such a pulse to occur. In other words, in order to use the bx (t) and by (t) pulse design, the accompanying gradient waveform, mainly derived from the Bloch constraint, would have to be imposed to acquire a useable signal. With regards to practical MR gradient waveforms, the 15 slice penalty results produced the simplest and most reasonable gradient vales to implicate, particularly due to its large linear portions. However, if necessary, regardless of the diﬃculty, either gradient could be implemented. Finally, both results had similar features in the sense that they each started oﬀ fairly negative and then ended up quite positive. This is a very interesting consequence of the gVERSE pulse, as shown in Section 2 and 4, conventional gradient sequences usually have the opposite characteristics. In terms of our MRI simulation, good signal results were produced for such unique gradient waveforms, which would justify further research with gVERSE pulses. In fact, Sections 5 and 6 demonstrated that the gVERSE RF pulse and gradient sequence were viable and could be applied to practical MRI.

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

47

Future Work The gVERSE pulse proved to have encouraging MRI results and performed to be better than anticipated with respect to useable MR imaging signals. However, there are still areas left for investigation and various elements of the gVERSE model can be improved. A few of the issues that should be taken into account for future developments are: • • • • •

Specializing the model to the rotating structure of the equations; Apply the gVERSE model to more than 50 slices; Add spin-lattice and spin-spin proton interactions to the gVERSE formulation; Apply alternative optimization software to the problem; Include gradient distortions to the gVERSE model.

The issues are listed in sequential order, starting with what we believe is the most important item to be addressed. As most are self explanatory, adding rotation into the equations was one of the factors that deemed to be important after the results were examined. By integrating the rotating frame of reference into our equations we eliminated the y-axis. It is possible that this was a source of singularities when optimizing and therefore caused SOCS to increase the size of its working array, occasionally creating memory problems.

References [Ber95] [Ber01]

[BH01]

[Bus96] [CNM86]

[CGN88]

[CDM90]

[HBTV99]

Bertsekas, D. P.: Nonlinear Programming. Athena Scientiﬁc, Belmont, Massachusetts (1995) Betts, J. T.: Practical Methods for Optimal Control Using Nonlinear Programming. Society for Industrial and Applied Mathematics, Philadelphia (2001) Betts, J. T., and Huﬀman, W. P.: Manual: Release 6.2 M and CTTECH-01-014. The Boeing Company, PO Box 3707. Seattle, WA 981242207 (2001) Bushong, S. C.: Magnetic Resonance Imaging: Physical and Biological Principles. Mosby, Toronto, 2nd edition (1996) Conolly, S. M., Nishimura, D. G., and Macovski, A.: Optimal Control Solutoins to the Magnetic Resonace Selelctive Excitation Problem. IEEE Transacitons on Medical Imaging, MI-5, 106-115 (1986) Conolly, S. M., Glover, G., Nishimura, D. G., and Macovski, A.: Variable-Rate Selective Excitation. Journal of Magnetic Resonance, 78, 440-458 (1988) Curry, T. S., Dowdey, J. E., and Murry, R. C.: Christensen’s Physics of Diagnostic Radiology. Lippincott Williams and Wilkins, New York, 4th edition (1990) Haacke, E. M., Brown, R. W., Thompson, M. R., and Venkatesan, R.: Magnetic Resonance Imaging: Physical Principles and Sequence Design, John Wiley and Sons, Toronto (1999)

48

C.K. Anand et al.

[LL01]

[Nis96]

[She01] [Sto04] [UGI04]

[WXF91]

Liang, Z. P., and Lauterbur, P. C.: Principles of Magnetic Resonance Imaging: A Signal Processing Perspective. IEEE Press, New York, New York (2001) Nishimura, D. G.: Principles of Magnetic Resonance Imaging. Department of Electrical Engineering, Stanford University, San Francisco (1996) Shen, J.: Delayed-Focus Pulses Obtimized Using Simulated Annealing. Journal of Magnetic Resonance, 149, 234-238 (2001) Stoyan, S. J.: Variable Rate Selective Excitation RF Pulse in MRI. M.Sc. Thesis: McMaster University, Hamilton (2004) Ulloa, J. L., Guarini, M., and Irarrazaval, P.: Chebyshev Series for Designing RF Pulses Employing an Optimal Control Approach. IEEE Transacitons on Medical Imaging, 23, 1445-1452 (2004) Wu, X. L., Xu, P., and Freeman, R.: Delayed-Focus Pulses for Magnetic Resonace Imaging: An Evolutionary Approach. Magnetic Resonance Medicine, 20, 165-170 (1991)

Modelling the Performance of the Gaussian Chemistry Code on x86 Architectures Joseph Antony1 , Mike J. Frisch2 , and Alistair P. Rendell1 1 2

Department of Computer Science, The Australian National University, ACT 0200, Australia. {joseph.antony, alistair.rendell}@anu.edu.au Gaussian Inc. 340 Quinnipiac St., Bldg. 40, Wallingford CT 06492, USA.

Abstract Gaussian is a widely used scientiﬁc code with application areas in chemistry, biochemistry and material sciences. To operate eﬃciently on modern architectures Gaussian employs cache blocking in the generation and processing of the two-electron integrals that are used by many of its electronic structure methods. This study uses hardware performance counters to characterise the cache and memory behavior of the integral generation code used by Gaussian for Hartree-Fock calculations. A simple performance model is proposed that aims to predict overall performance as a function of total instruction and cache miss counts. The model is parameterised for three diﬀerent x86 processors – the Intel Pentium M, the P4 and the AMD Opteron. Results suggest that the model is capable of predicting execution times to an accuracy of between 5 and 15%.

1 Introduction It is well known that technological advances have driven processor speeds faster than main memory speeds, and that to address this issue complex cache based memory hierarchies have been developed. Obtaining good performance on cache based systems requires that the vast majority of the load/store instructions issued by the processor are serviced using data that resides in cache. In other words, to achieve good performance it is necessary to minimize the number of cache misses [4]. One approach to achieving this goal is to implement some form of cache blocking [10]. The objective here is to structure the computational algorithm in such a way that it spends most of its time working with blocks of data that are suﬃciently small to reside in cache, and only periodically moves data between main memory and cache. Gaussian [5] is a widely used computational chemistry code that employs cache blocking to perform more eﬃcient integral computations [8,14]. The integrals in question lie at the heart of many of the electronic structure methods implemented within Gaussian, and are associated with the various interactions between and among the electrons and nuclei in the system under study.

50

J. Antony et al.

Since many electronic structure methods are iterative, and the number of integrals involved too numerous for them to be stored in memory, the integrals are usually re-computed several times during the course of a typical calculation. For this reason algorithms that compute electronic structure integrals fast and on-demand are extremely important to the computational chemistry community. To minimize the operation count, integrals are usually computed in batches, where all integrals in a given batch share a number of common intermediates [7]. In the PRISM algorithm used by Gaussian, large batch sizes give rise to large inner loop lengths. This is good for pipelining, but poor if it causes cache overﬂows and the need to fetch data from main memory. To address this problem Gaussian imposes cache blocking by limiting the maximum size of an integral batch. This in eﬀect says that the time required to recompute the common shared intermediates is less than the time penalty associated with having inner loops fetch data quantities from main memory. In the current version of Gaussian there is a “one size ﬁts all” approach to cache blocking, in that the same block size is used regardless of the exact characteristics of the integrals being computed. A long term motivation for our work is to move away from this model towards a dynamic model where cache blocking is tailored to each and every integral batch. As a ﬁrst step towards this goal, this paper explores the ability of a simple Linear Performance Model (LPM) to predict the performance of Gaussian’s integral evaluation code purely as a function of instruction count and cache misses. It is important to note that the LPM is very diﬀerent to that used in typical analytic or simulation based performance studies. Analytic models attempt to weight various system parameters and present an empirical equation for performance, whereas simulation studies are either trace3 or execution4 driven with each instruction considered in order to derive performance metrics. Analytic models fail, however, to capture dynamic aspects of code execution that are only evident at runtime, while execution or trace driven simulations are extremely slow, often being 100-1000 times slower than execution of the actual code. The LPM on the other hand, eﬀectively ignores all the intricate details of program execution and assumes that, over time these details can be averaged out and incorporated into penalty factors associated with the average cost of issuing an instruction and the average cost of a cache miss. In this study, three x86 platforms – the Intel Pentium M, Pentium 4 (P4) and AMD Opteron – are considered. On-chip hardware performance counters are used to gather instruction and cache miss data from which the LPM is derived. The paper is broken into the following sections: section 2 discusses the background to this study, the tools and methodology used, and introduces 3

Trace driven simulation uses a pre-recorded list of instructions in a traceﬁle for later interpretation by a simulator. 4 An execution driven simulator interprets instructions from a binary source to perform its simulation.

Gaussian Performance Modelling

51

the LPM; section 3 uses the LPM for a series of experiments on the three diﬀerent platforms and discusses the results. Previous work, conclusions and future work are covered in sections 4 and 5.

2 Background 2.1 The Hartree-Fock Method Electronic structure methods aim to solve Schr¨ odinger’s wave equation for atomic and molecular systems. For all but the most trivial systems it is necessary to make approximations. Computational chemists have developed a hierarchy of methods each with varying computational cost and accuracy. Within this hierarchy the Hartree-Fock (HF) method is relatively inaccurate, but it is also the bedrock on which more advanced and accurate methods are built. For these reasons the HF method was chosen as the focus of this work. At the core of HF theory is the concept of a molecular orbital (MO), where one MO is used to describe the motion of each electron in the system. The MOs (φ) are expanded in terms of a number (N ) of basis functions (χ) such that for MO φi : N cαi χα (1) φi = α

where cαi are the expansion or molecular orbital coeﬃcients. In HF methods the form of these coeﬃcients is optimized so that the total energy of the system is minimized. The basis functions used are normally located at the various atomic nuclei, and are a product of a radial function that depends on the distance from that nuclei, and an angular function such as a spherical harmonic Ylm 5 [8]. Usually the radial function is a Gaussian Gnl (r) 6 , and it is for this reason that the Gaussian code is so named. The matrix form of HF equations is given by: F C = SC

(2)

where C is the matrix of molecular orbital coeﬃcients, S a matrix of (overlap) integrals between pairs of basis functions, a vector with elements corresponding to the energy of each MO, and F is the so called Fock matrix deﬁned by: core + Fµν = Hµν

Ne N

∗ Cλi Cσi [(µν | λσ) − (µλ | νσ)]

(3)

i

λσ

where Ne is the number of electrons in the system. In equation 3 each element of the Fock matrix is expressed in terms of another two-index quantity core ) that involves other integrals between pairs of basis functions, and (Hµν the molecular orbital coeﬃcients (C) contracted with a four-index quantity (µν | λσ). Since F depends on C, which is the same quantity that we seek 5 6

Ylm (θ, ϕ) = Gnl (r) =

2l+1 (l−m)! m P (cos θ)(eimϕ ) 4π (l+m)! l

3/4

2(2α) π 1/4

√

22n−l−2 ( (4n−2l−3)!!

2αr)2n−l−2 exp(−αr2 )

52

J. Antony et al.

to determine, equation 2 is solved iteratively by guessing C, building a Fock matrix, solving equation 2 and then repeating this process until convergence is reached. The four-index quantities, (µν | λσ), are the electron repulsion integrals (ERIs) that are of interest to this work, and arise due to repulsive interactions between pairs of electrons. They are given by: (µν | λσ) =

χµ (r1 ) χν (r1 )

1 χλ (r2 ) χσ (r2 ) dr1 dr2 |r1 − r2 |

(4)

where r1 and r2 are the coordinates of two electrons. For a given basis the number of two-electron integrals grows as O(N 4 ), so evaluation and processing of these quantities quickly becomes a bottleneck. (We note that for large systems it is possible to reduce this asymptotic scaling through the use of pre-screening and other techniques [13], but these alternative approaches still require a substantial number of ERIs to be evaluated and processed.) In the outline given above it has been assumed that each basis function is a single Gaussian function multiplied by a spherical harmonic (or similar). In fact it is common to combine several Gaussian functions with diﬀerent exponents together in a ﬁxed linear combination, treating the result as one contracted Gaussian basis function. We also note that when a basis function involves a spherical harmonic (or similar) of rank one or higher (i.e. l ≥ 1), it is normal to include all orders of spherical harmonic functions within that rank (i.e. ∀m : −l ≤ m ≤ l). Thus if a basis function involves a spherical harmonic of rank 2, all 5 components are included as basis functions. Thus there are three parameters that characterise a basis function; i) its location, ii) its degree of contraction and the exponents of the constituent Gaussians, and iii) the rank of its angular component. In the PRISM [12] algorithm functions with equivalent ii) and equivalent iii) the same are treated together, with a batch of ERI integrals deﬁned by doing this for all of the four functions involved. The size of these batches can quickly become very large since the same basis set is generally applied to all atoms of the same type within the system under study, e.g. all oxygen or hydrogen atoms in the system. It is for this reason that Gaussian imposes cache blocking to limit maximum batch sizes. 2.2 Linear Performance Model The Linear Performance Model (LPM) gives the total number of cycles required to execute a given code segment as: Cycles = α ∗ (ICount ) + β ∗ (L1M isses ) + γ ∗ (L2M isses )

(5)

where ICount is the instruction count, L1M isses the total number of Level 1 cache misses, L2M isses the total number of Level 2 cache misses, and α, β, and γ are ﬁtting parameters. Intuitively the value of α reﬂects the ability of the code to exploit the underlying superscalar architecture, β is the average

Gaussian Performance Modelling

53

cost of an L1 cache miss, and γ is the average cost of an L2 cache miss. We will collectively refer to α, β and γ as the Processor and Platform speciﬁc coeﬃcients (PPCoeﬀs). They will be derived by performing a least squares ﬁt of the Cycles, ICount , L1M isses and L2M isses counts obtained from hardware performance counters for a variety of cache blocking sizes. 2.3 PAPI PAPI [3], a cross platform performance counter library, is used to obtain hardware counter data. It uses on-chip performance counters to measure application events of interest like instruction and cycle counts as well as other cache events. Of the three x86 machines used, the Intel Pentium M, P4 and the AMD Opteron, have diﬀerent numbers of on-chip performance counters. Each on-chip performance counter can count one particular hardware event. The following hardware events are used in this study; PAPI L1 TCM (Total level one (L1) misses (data and instruction)), PAPI L2 TCM (Total level two (L2) misses), PAPI TOT INS (Total instructions) and PAPI TOT CYC (Total cycles). PAPI also supports hardware performance counter event multiplexing. This uses an event sampling approach to enable more events to be counted than there are available hardware registers. Events counted using multiplexing will therefore have some statistical uncertainty associated with them. It is noted that on the P4 processor PAPI does not have a PAPI L1 TCM preset event, as it is not exposed by the underlying hardware counters. Instead PAPI L1 DCM and PAPI L1 ICM are used to count the total number of data and instruction cache misses respectively. Table 1 lists processor characteristics and the cache and memory latencies measured using lmbench [11]. The P4’s L1 instruction cache is a trace cache [6], unlike the Pentium M and Opteron. Hyperthreading on the P4 was turned oﬀ for this study. PAPI’s event multiplexing was used on the Pentium M, as this processor has only two hardware counters, but four hardware events are required by the LPM. 2.4 Methodology Gaussian computations are performed on a small system consisting of a solvated potassium ion surrounded by 11 water molecules with the geometry obtained from a snapshot of a molecular dynamics simulation. This work uses two basis sets denoted as 6-31G* and 6-31G++(3df,3pd) [8]. The former is a relatively modest basis set, while the latter would be considered large. Cache blocking in Gaussian is controlled by an input parameter cachesize, this was set to values of 2, 8, 32, 128, 256 and 512 kilowords (where a word is 8 bytes). The default value of this parameter equates to the approximate size of the highest level of cache on the machine being used, and from this value the sizes of various intermediate buﬀers are derived. For each blocking size, performance counter results were recorded for one complete iteration of the HF procedure and averaged over ﬁve runs.

54

J. Antony et al.

Clock Rate Ops. per Cycle Memory Subsystem Perf. Counters L1 DCache

L2 Uniﬁed

lmbench Latencies for L1 DCache L2 Uniﬁed Main Memory

(Ghz) (Cy) (No.) Size (KB) Associativity (Ways) Line size (Bytes) Cache Policies Size (MB) Associativity (Ways) Line size (Bytes) Relation to L1 Cache Policies Latency (Cy) Latency (Cy) Latency (Cy, ≈ )

Pentium M P4 Opteron 1.4 3.0 2.2 3 3 3 NtBr NtBr HT 2 18 4 32 16 64 8 8 2 64 64 64 LRU, WB P-LRU LRU, WB, WA 1 1 1 8 8 16 64 64 64 Inclusive Inclusive Exclusive LRU P-LRU P-LRU 3 9 201

4 28 285

3 20 405

NtBr = Northbridge, HT = HyperTransport, LRU = Least Recently Used, P-LRU = Pseudo-LRU, WB = Write Back, WA = Allocate on Write Table 1. Processor characteristics of clock rate, cache sizes and measured latencies for L1 DCache, L2 cache and main memory latencies for the three x86 processors used in the study. Block 6-31G* 6-31G++(3df,3pd) Size Pentium M P4 Opteron Pentium M P4 Opteron 2 42.0 28.2 20.8 4440 3169 2298 8 36.7 24.2 17.0 3849 2821 1970 32 30.0 19.8 13.6 2914 2210 1484 128 31.8 20.2 17.0 2869 2121 1701 256 37.0 22.0 20.2 3349 2259 1856 512 42.0 24.8 22.0 3900 2516 2214 x 36.6 23.2 18.4 3554 2516 1921 σ 5.0 3.2 3.1 618 409 308 Table 2. Timings (seconds) for HF benchmark using the 6-31G* and 631G++(3df,3pd) basis sets as a function of the cache blocking parameter. Also shown are the average (x) times and their standard deviations (σ).

3 Observed Timings and Hardware Counter Data Observed timings Table 2 shows the execution times obtained on the three diﬀerent hardware platforms as a function of the diﬀerent cache block sizes and when using the 631G* and 6-31G++(3df,3pd) basis sets. These results clearly show that cache blocking for integral evaluation has a major eﬀect on the overall performance of the HF code in Gaussian. As the block size is increased from 2 to 512 kilowords the total execution time initially decreases, reaches a minimum,

Gaussian Performance Modelling

55

and then increases again. Exactly where the minimum is located is seen to vary slightly across the diﬀerent platforms, and between the two diﬀerent basis sets. Also shown in Table 2 are the execution times averaged over all the diﬀerent cache block sizes on a given platform, together with the corresponding standard deviation. Although, the absolute value of the standard deviations are signiﬁcantly smaller for the 6-31G* basis, as a percentage of average total execution times they are roughly equal for both basis sets at around 15%.

Figure 1. Hardware counter data as a function of the cache blocking parameter for the HF method, using the 6-31G++(3df,3pd) basis set on the three diﬀerent hardware platforms

Hardware counter data Hardware counter data for the 6-31G++(3df,3pd) basis set is given in Figure 1. The left hand scale of the graph quantiﬁes Total Level 1 misses (L1misses ) and Total Level 2 misses (L2misses ), while the right hand scale quantiﬁes Total Cycles (Cycles) and Instruction Count (ICount ). The x axis is plotted using a log2 scale. The cycle counts shown in Figure 1 are directly related to the times given in Table 2 by the clock speeds (see Table 1). Hence they show a similar behavior, decreasing initially as the block size increases, reaching a minimum and then increasing. In contrast the instruction counts show a steep initial decrease, but then appear to level oﬀ for large block sizes. This behavior reﬂects the fact that similar integrals, previously split into multiple batches, will be computed in fewer batches as the block size increases. Mirroring this behavior the L1 and L2 cache misses are initially low, increase when the blocking size is expanded, and ultimately will plateau when there are no more split batches to be combined (although this is not evident for the block sizes given in the ﬁgure). Obtained PPCoeﬀs Using the LPM (equation 5) and the hardware performance counter data for the HF/6-31G* calculations, a least squares ﬁt was performed in order to obtain the PPCoeﬀs values given in Table 3. For the Pentium M and Opteron the values of α, β and γ appear reasonable. Speciﬁcally a value of 0.67 for

56

J. Antony et al. Processor α β γ Pentium M 0.67 13.39 63.60 P4 2.87 -59.35 588.46 Opteron 0.64 7.13 388.23 P4a 0.86

–

323.18

Table 3. PPCoeﬀ (α, β, γ) values for the Pentium M, P4 and Opteron obtained from HF/6-31G* results. See text for further details. a Results obtained when ignoring counts for L1 cache misses.

α on the Pentium M and 0.64 on the Opteron, implies that the processors are issuing 1.5 and 1.6 instructions per cycle respectively. Given that both processors (and also the P4) can issue upto three instructions per cycle these values are in the typical range of what might be expected. The values for β and γ are average L1 and L2 cache miss penalties respectively, or alternatively β is the average cost of referencing data in L2 cache, while γ is the average cost of referencing data in main memory. The actual costs for referencing the L2 cache and main memory as measured using lmbench are given in Table 1. Thus for the Pentium M a value for β of 13.39 can be compared with the L2 latency of 9 cycles (Table 1), and a value for γ of 63.60 can be compared with 201 cycles. On the Opteron the equivalent comparisons are 7.13 to 20, and 388.23 to 405. These results for β, and particularly those for γ are roughly in line with what we might expect if we note that they are averages, while those measured by lmbench are worst case scenarios; hardware features such as prefetching and out-of-order execution are likely to mask some of the latencies associated with a cache miss in Gaussian, but not for lmbench (by design). In contrast to the Pentium M and Opteron systems the results for the P4 are clearly unphysical with a negative value for β. The reason for this will be outlined in a future publication [2], but in essence it is due to the nature of the P4 micro-architecture which makes it very hard to count accurately the L1 cache misses. If, however, we ignore L1 misses and restrict the LPM to just the instruction count and L2 cache misses we obtain the second set of P4 data given in Table 3. This is far more reasonable,with a value for α that now equates to 1.2 instructions per cycle compared to an unlikely previous value of 0.37. Similarly the latency for a main memory reference is now less than that recorded by lmbench. The PPCoeﬀs in Table 3 were derived using performance counter data obtained from running with the 6-31G* basis set. It is of interest to combine these values for α, β and γ with the instruction and cache miss counts recorded with the larger 6-31G++(3df, 3pd) basis set, and thereby obtain predicted cycle counts for this larger calculation. The diﬀerence between these predicted cycle counts and the actual cycle counts gives a measure of the ability of the LPM to make predictions outside of the domain in which it was originally parameterised. Doing this we ﬁnd RMS diﬀerences between the predicted and measured execution times of 456, 268 and 95 seconds for the Pentium M, P4

Gaussian Performance Modelling

57

(for the 2 parameter LPM) and Opteron processors respectively. Compared to the average execution times given in Table 2, this represents an error of ∼13% on the Pentium M, ∼10% on the P4, and ∼5% on the Opteron. Since the total execution time varies by over 50% as the block size is changed, these results suggest that the LPM is accurate enough to make useful predictions concerning the performance of Gaussian as a function of total instruction and cache misses.

4 Previous Work Using a sparse set of trace based cache simulations, Gluhovsky and O’Krafka [9] build a multivariate model of multiple cache miss rate components. This can then be used to extrapolate for other hypothetical system conﬁgurations. Vera et al. use cache miss equations [15] to obtain an analytical description of cache memory behavior of loop based codes. These are used at compile time to determine near optimal cache layouts for data and code. Snavely et. al use proﬁle convolving [1] a trace based method which involves the creation of a machine proﬁle and an application proﬁle. Machine proﬁles describe the behavior of loads and stores for the given processor, while the application proﬁle is a runtime utility which captures and statistically records all memory references. Convolving involves creating a mapping of the machine signature and application proﬁle; this is then fed to an interconnect simulator to create traces that aids in predicting performance. In comparison to these methods, the LPM is lightweight in obtaining application speciﬁc performance characteristics. PPCoeﬀs are obtained using hardware counter data which can then be used by either trace based or execution based simulators.

5 Conclusions and Future Work A linear performance model was proposed to model the cache performance of Gaussian. PPCoeﬀs (α, β, γ) obtained intuitively correspond to how well the code uses the superscalar resources of the processor, the average cost in cycles of an L1 cache miss and the average cost in cycles of an L2 miss. Experiments show optimal batch sizes are both platform and computation speciﬁc, hinting that a dynamic means of varying batch sizes at runtime might be useful. In which case the LPM could be used to determine cache blocking sizes prior to computing a batch of integrals. On completing each batch cache metrics gathered could then be used to guide a runtime search toward the most optimal blocking size. The predictive ability of the LPM can be used to aid experiments which use cache simulation tools. These tools are capable of simulating caches of current and possible future processors and yield instruction counts, number of L1 and L2 misses. In tandem with the LPM, cycle counts can be computed thus allowing determination of which microarchitectural features have the greatest impact on code performance.

58

J. Antony et al.

For future work we propose to test the usefulness of the LPM at runtime to aid in searching for optimal blocking factors and use it to study the eﬀect of microarchitectural changes on code performance.

Acknowledgments This work was possible due to funding from the Australian Research Council, Gaussian Inc. and Sun Microsystems Inc. under ARC Linkage Grant LP0347178. JA and APR wish to thank Alexander Technology and DCS TSG for access to various hardware platforms.

References 1. A. Snavely, N. Wolter, and L. Carrington. Modelling Application Performance by Convolving Machine Signatures with Application Proﬁles. IEEE Workshop on Workload Characterization, December 2001. 2. J. Antony, M. J. Frisch, and A. P. Rendell. Future Publication. 3. S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. PAPI. Intl. Journal of HPC Applications, 14(3):189–204, 2000. 4. D. E. Culler, A. Gupta, and J. P. Singh. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers, Inc., San Francisco, California, USA, 1999. 5. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, and J. R. Cheeseman et al. Gaussian 03, Revision C.01. Gaussian Inc., Gaussian, Inc., Wallingford CT, USA, 2004. 6. G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technical Journal, 2001. 7. M. Head-Gordon and J. A. Pople. A method for two-electron gaussian integral and integral derivative evaluation using recurrence relations. J. Chem. Phys., 89(9):5777–5786, 1988. 8. T. Helgaker, P. Jorgensen, and J. Olsen. Molecular Electronic-Structure Theory. John Wiley & Sons, 2001. 9. I. Gluhovsky and B. O’Krafka. Comprehensive Multiprocessor Cache Miss Rate Generation Using Multivariate Models. ACM Transactions on Computer Systems, May 2005. 10. M. D. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. SIGOPS Oper. Syst. Rev., 25:63–74, 1991. 11. L. W. McVoy and C. Staelin. lmbench: Portable tools for performance analysis. In USENIX Annual Technical Conference, pages 279–294, 1996. 12. P. M. W. Gill. Molecular Integrals over Gaussian Basis Functions. Advances in Quantum Chemistry, 25:141–205, 1994. 13. P. M. W. Gill, B. G. Johnson, and J. A. Pople. A simple yet powerful upper bound for Coulomb integrals. Chemical Physics Letters, 217:65–68, 1994. 14. R. Lindh. Integrals of Electron Repulsion. In P. v. R. Schleyer et al., eds, Encyclopaedia of Computational Chemistry, volume 2, page 1337. Wiley, 1998. 15. X. Vera, N. Bermudo, and A. G. J. Llosa. A Fast and Accurate Framework to Analyze and Optimize Cache Memory Behavior. ACM Transactions on Programming Languages and Systems, March 2004.

Numerical Simulation of the December 26, 2004: Indian Ocean Tsunami J. Asavanant,1 M. Ioualalen,2 N. Kaewbanjak,1 S.T. Grilli,3 P. Watts,4 J.T. Kirby,5 and F. Shi5 1

2 3 4 5

Advanced Virtual and Intelligent Computing (AVIC) Center, Department of Mathematics, Faculty of Science, Chulalongkorn University, Bangkok, 10330, Thailand [email protected], [email protected] Geosciences Azur (IRD, CNRS, UPMC, UNSA), Villefranche-sur-Mer, France [email protected] Department of Ocean Engineering, University of Rhode Island, Narragansett, RI 02882 [email protected] Applied Fluids Engineering, Inc., 5710 E. 7th Street, Long Beach, CA 90803, USA [email protected] Center for Applied Coastal Research, University of Delaware, Newark, DE 19761, USA [email protected]

Abstract The December 26, 2004 tsunami is one of the most devastating tsunami in recorded history. It was generated in the Indian Ocean oﬀ the western coast of northern Sumatra, Indonesia at 0:58:53 (GMT) by one of the largest earthquake of the century with a moment magnitude of Mw = 9.3. In the study, we focus on best ﬁtted tsunami source for tsunami modeling based on geophysical and seismological data, and the use of accurate bathymetry and topography data. Then, we simulate the large scale features of the tsunami propagation, runup and inundation. The numerical simulation is performed using the GEOWAVE model. GEOWAVE consists of two components: the modeling of the tusnami source (Okada, 1985) and the initial tsunami surface elevation, and the computation of the wave propagation and inundation based on fully nonlinear Boussinesq scheme. The tsunami source is used as initial condition in the tsunami propagation and inundation model. The tsunami source model is calibrated by using available tide gage data and anomalous water elevations in the Indian Ocean during the tsunami event, recorded by JASON’s altimeter (pass 129, cycle 109). The simulated maximum wave heights for the Indian Ocean are displayed and compared with observations with a special focus on the Thailand coastline.

1 Introduction On December 26, 2004 at 0:58:53 GMT a 9.3 Magnitude earthquake occurred along 1300 km of the Sundra and Andaman trenches in the eastern Indian Ocean, approximately 100 km oﬀ the west coast of northern Sumatra.

60

J. Asavanant et al.

The main shock epicenter was located at 3.32◦ N and 95.85◦ E, 25-30 km deep. Over 200,000 people across the entire Indian Ocean basin were killed with tens of thousands reported missing as a result of this disastrous event. In accordance with modern practice, several international scientiﬁc team were organized to conduct quantitative survey of the tsunami characteristics and hazard analysis in the impacted coastal regions. Numerous detailed eyewitness observations were also reported in the form of video digital recordings. Information concerning the survey and some ship-based expeditions on tsunami source characteristics can be found in Grilli et al (2007), Moran et al (2005), and McNeill (2005), Kawata et al (2005), Satake et al (2005), Fritz and Synolakis (2005). In this paper a resonable tsunami source based on available geological, seismological and tsunami elevation and timing data is constructed using the standard half-plane solution for an elastic dislocation formula (Okada, 1985). Inputs to these formula are fault plane location, depth, strike, dip, slip, length, and width as well as seismic moment and rigidity. Okada’s solution is implemented in TOPICS (Tsunami Open and Progressive Initial Conditions System) which is a software tool that provides the vertical coseismic displacements as output. Tsunami propagation and inundation are simulated with FUNWAVE (Fully nonlinear Wave Model) based on the dispersive Boussinesq system. Comparisons of simulated surface elevations with tide gage data, satellite transect and runup observations show good agreement both in amplitudes and wave periods. This validates our tsunami source and propagation model of the December 26, 2004 event. Dispersive eﬀects in the simulations are brieﬂy discussed.

2 Source and Propagation Models The generation mechanism for the Indian Ocean tsunami is mainly due to the static sea ﬂoor uplift caused by abrupt slip at the India/Burma plate interface. Seismic inversion models (Ammon, 2005) indicate that the main shock propagated northward from the epicenter parallel to the Sumatra trenches for approximately 1,200 km of the fault length. In this study, the ruptured subduction zone is identiﬁed by ﬁve segments of tsunami source based on diﬀerent morphologies (Figure 1). 2.1 Source Model The main generating force of a tsunami triggered by an earthquake is the uplift or subsidence of the sea-ﬂoor. Determining the actual extent of sea-ﬂoor change in a sub-sea earthquake is very diﬃcult. In general, the displacement can be computed from the formulae which output surface deformation as a function of fault strike, dip, slip, length, width, depth, moment magnitude, and Lame’s constants for the surrounding rock (Okada, 1985). The underlying

Numerical Simulation of the December 26, 2004: Indian Ocean Tsunami

61

Figure 1. Earthquake tsunami source

assumptions are based on the isotropic property and half-plane homogeneity of a simple source conﬁguration. Okada’s formulae are used in this study to compute ground displacement from fault parameters of each segment shown in Table 1. The total seismic moment release is Mo = 1.3 × 1023 J, equivalent to Mw = 9.3. Okada’s solution is implemented in TOPICS (Tsunami Open and Progressive Initial Conditions System) which are then tranferred and linearly superimposed into the wave propagation model (FUNWAVE) as an initial free surface condition. The ﬁve segments are distinguished by their unique shape and orientation. Segment 1 covers the Southern arc of the ruptured subduction zone with length L = 220 km. Segments 2 and 3 are relatively straight sections of the subduction zone in a NNW direction along the trench with the lengths of 150 and 390 kms respectively. The last two segments (4 and 5) have a marked change in orientation and shape. Segment 4 (L = 150 km) is facing southern Thailand whereas a signiﬁcant number of larger islands are located on the overriding plate of segment 5 (L = 350 km).

62

J. Asavanant et al.

2.2 Wave Propagation Model Usually tsunamis are long waves (as compared with the ocean depth). Therefore, it is natural ﬁrst to consider the long-wave (or shallow-water) approximation for the tsunami generation model. However, the shallow water equations ignore the frequency dispersion which can be important for the case of higher frequency wave propagation in relatively deep water. In this paper, a fully nonlinear and dispersive Boussinesq model (FUNWAVE) is used to simulate the tsunami propagation from deep water to the coast. FUNWAVE also includes physical parameterization of dissipation processes as well as an accurate moving inundation boundary algorithm. A review of the theory behind FUNWAVE are given by Kirby (2003). Table 1. Tsunami source parameters Parameters

Segment 1 Segment 2 Segment 3 Segment 4 Segment 5

xo (longitude) yo (latitude) d (km) ϕ (degrees) λ (degrees) δ (degrees) ∆ (m) L (km) W (km) to (s) µ (Pa) Mo (J) λo (km) To (min) ηo (m)

94.57 3.83 25 323 90 12 18 220 130 60 4.0 × 1010 1.85 × 1022 130 24.77 -3.27;+7.02

93.90 5.22 25 348 90 12 23 150 130 272 4.0 × 1010 1.58 × 1022 130 17.46 -3.84;+8.59

93.21 7.41 25 338 90 12 12 390 120 588 4.0 × 1010 2.05 × 1022 120 23.30 -2.33;+4.72

92.60 9.70 25 356 90 12 12 150 95 913 4.0 × 1010 0.61 × 1022 95 18.72 -2.08;+4.49

92.87 11.70 25 10 90 12 12 350 95 1273 4.0 × 1010 1.46 × 1022 95 18.72 -2.31;+4.6

3 Tsunami Simulations Simulations of the December 26, 2004 tsunami propagation in the Bay of Bengal (72◦ to 102◦ E in longitude and 13◦ S to 23.5◦ N in latitude) are performed by using GEOWAVE, which is a single integrated model combining TOPICS and FUNWAVE. The application of GEOWAVE on landslide tsunami is discussed in Watts et al (2003). We construct the numerical simulation grid by using ETOPO2 bathymetry and topography data together with denser and more accurate digitized bathymetry and topography data provided by Chulalongkorn University Tsunami Rehabilitation Research Center. These

Numerical Simulation of the December 26, 2004: Indian Ocean Tsunami

63

data were derived by a composite approach using 30 m NASA’s Space Shuttle Radar Topography Mission (SRTM) data for the land area with digitized navigational chart (Hydrographic Department of the Royal Thai Navy) and overlaid onto the 1:20,000 scale administrative boundary GIS (ESRI Thailand, Co Ltd). The projection’s rectiﬁcation was veriﬁed and adjusted, whenever needed, using up to two ground control points per square kilometer. We regridded the data using linear interpolation to produce the uniform grid with 1.85 × 1.85 km, which approximately corresponds to a 1 minute grid spacing, yielding 1,793 by 2,191 points. The time step for each simulation is set at 1.2 sec. Two kinds of boundary conditions are used in FUNWAVE, i.e. total reﬂected wall and sponge layer on all ocean boundaries. In the simulations, the ﬁve segments of tsunami sources are triggered at appropriate times to according to the reduced speed of propagation of the rupture. Based on the shear wave speed prediced by seismic inversion models, the delay between each segments can be estimated and the values to are provided in Table 1.

4 Discussion of Results The maximum elevations above the sea level are plotted in Fig. 2 showing the tsunami’s radiation patterns. Details of regional areas of Banda Aceh and Thailand’s westcoast are shown in Figs. 3a, 3b. The estimate of sea surface elevation about two hours after the start of tsunami event is obtained along the satellite track No. 129 (Jason 1). The comparison between the model results with the satellite altimetry illustrated in Fig. 4 shows satisfactory agreement, except for a small spatial shift at some locations. This may be due to the noise in satellite data. During the event, sea surface elevations were measured at several tide gage stations and also recorded with a depth echo-sounder by the Belgian yacht “Mercator”. Table 2 lists the the tide gage and the Belgian yacht with their locations. Fig. 5 shows both measured and simulated time series in the Maldives (Hannimaadhoo, Male), Sri Lanka (Columbo), Taphao Noi (east coast of Thailand) and the yacht. Simulated elevation and arrival times at these locations agree well as compared to those of observations. As expected from seismological aspects, all of the tide gage data (both observed and modeled) show leading elevation wave on the western side (uplift) of the sources and depression waves on the eastern side (subsidence) of the source area. As shown in Fig. 3b, the largest runups are predicted near Banda Aceh (northern Sumatra) and in western coast of Thailand (Khao Lak area). The largest runup measured on the west coast of Banda Aceh are underpredicted by 50% likely due to the lack of detailed coastal bathymetry and topography. However, better agreement on the extreme runup values can be found in the Khao Lak area, Thailand where more accurate coastal topography was speciﬁed in the model grid.

64

J. Asavanant et al.

Figure 2. Maximum elevations in Bay of Bengal

(a)

(b)

Figure 3. (a) Maximum elevations along Banda Aceh and (b) Maximum elevations along the westcoast of Thailand

Numerical Simulation of the December 26, 2004: Indian Ocean Tsunami

65

Figure 4. Comparison of tsunami measured with satellite altimetry by Jason1 and results of tsunami simulation

(a)

(b) Figure 5. Comparison of numerical tide gage data for (a) Hanimaadhoo, (b) Male, (c) Colombo, (d) Taphao Noi, and (e) mercator yatch

66

J. Asavanant et al.

(c)

(d)

(e) Figure 5. (continued)

Numerical Simulation of the December 26, 2004: Indian Ocean Tsunami

67

Table 2. Tide gage locations Locations

Coordinates

Hannimaadhoo, Maldives Male, Maldives Columbo, Sri Lanka Taphao Noi, Thailand Mercator (Phuket), Thailand

(6.767, (4.233, (7.000, (7.833, (7.733,

73.167) 73.540) 79.835) 98.417) 98.283)

5 Final Remarks For several decades, models based on either linear or nonlinear versions of the shallow water theory are most generally used within the tsunami problem. This shallow water model basically neglects the eﬀects of dispersion. Here we propose the use of dispersive (yet fully nonlinear) Boussinesq equations which is also another long wave propagation model. The earthquake tsunami sources consisting of ﬁve segments for vertical coseismic displacement are simulated in TOPICS based on parameters provided in Table 1. We simulate tsunami propagation and inundation with FUNWAVE which is a public domain higher order Boussinesq model developed over the last ten years at the University of Delaware (Wei and Kirby, 1995). To estimate dispersive eﬀects, FUNWAVE can also be set to perform the simulations under the Nonlinear Shallow Water Equations (NSWE) for the same set of parameters. Grilli, et al (2007) reported, in the regions of deeper water in WSW direction of main tsunami propagation west of the source, that dispersion can reduce the wave amplitude by up to 25% compared to the nondispersive shallow water equation model. These diﬀerences occur very locally which may be associated with local topographic features and decrease signiﬁcantly after the tsunami has reached the shallower continental shelf. Furthermore the eastward propagation towards Thailand exhibits a very weak dependence on frequency dispersion. A more detailed discussion on these dispersive eﬀects can be found in Ioualalen, et al (2007).

6 Acknowledgments The authors would like to gratefully acknowledge the National Electronics and Computer Technology Center (NECTEC) under the Ministry of Science and Technology, Thailand for the use of their ITANIUM computer cluster and Dr. A. Snidvongs, Head of Chulalongkorn University Tsunami Rehabilitation Research Center, for providing the digitized inland topography and sea bottom bathymetry along the westcoast of Thailand. M. Merriﬁeld from UHSLC for kindly providing us with the tide gage records. S. Grilli, J. Kirby, and F. Shi acknowledge continuing support from the Oﬃce of Naval Research, Coastal Geosciences Program.

68

J. Asavanant et al.

References 1. Ammon, C. J. et al (2005). Rupture process of the 2004 Sumatra-Andaman earthquake. Science, 308, 1133-1139. 2. Fritz, H. M. and C. E. Synolakis (2005). Field survey of the Indian Ocean tsunami in the Maldives. Proc 5th Intl on Ocean Wave Meas and Analysis (WAVES 2005, Madrid, Spain, July 2005), ASCE. 3. Grilli, S. T., Ioualalen, M., Asavanant, J., Shi, F., Kirby, J., and Watts, P. (2007). Source constraints and model simulation of the December 26, 2004 Indian Ocean Tsunami. J Waterway Port Coast and Ocean Engng, 133(6), 414–428. 4. Ioualalen, M., Asavanant, J., Kaewbanjak, N., Grilli, S. T., Kirby, J. T., and Watts, P. (2007) Modeling the 26th December 2004 Indian Ocean tsunami: Case study of impact in Thailand. J Geophys Res, 112, C07025, doi:10.1029/2006JC003850. 5. Kawata, T. et al (2005). Comprehensive analysis of the damage and its impact on coastal zones by the 2004 Indian Ocean tsunami disaster. Disaster Prevention Research Institute http://www.tsunami.civil.tohoku.ac.jp/sumatra2004/ report.html 6. Kirby, J. T. (2003). Boussinesq models and applications to nearshore wave propagation, surf zone processes and wave-induced currents. Advances in Coastal Modeling, V. C. Lakhan (ed), Elsevier Oceanography Series 67, 1-41. 7. McNeill, L., Henstock, T. and Tappin, D. (2005). Evidence for seaﬂoor deformation during great subduction zone earthquakes of the Sumatran subduction zone: Results from the ﬁrst seaﬂoor survey onboard the HMS Scott, 2005, EOS Trans. AGU, 86(52), Fall Meet. Suppl., Abstract U14A-02. 8. Moran, K., Grilli, S. T. and Tappin, D. (2005). An overview of SEATOS: Sumatra earthquake and tsunami oﬀshore survey. EOS Trans. AGU, 86(52), Fall Meet. Suppl., Abstract U14A-05. 9. Okada, Y. (1985). Surface deformation due to shear and tensile faults in a halfspace. Bull. Seis. Soc. Am., 75(4), 1135-1154. 10. Satake, K. et al (2005). Report on post tsunami survey along the Myanmar coast for the December 2004 Sumatra-Andaman earthquake http: //unit.aist.go.jp/actfault/english/topics/Myanmar/index.html 11. Watts, P., Grilli, S. T., Kirby, J. T., Fryer, G. J., and Tappin, D. (2003). Landslide tsunami case studies using a Boussinesq model and a fully nonlinear tsunami generation model. Nat. Hazards and Earth Sci. Systems, 3(5), 391-402. 12. Wei, G. and Kirby, J. T. (1995). Time-dependent numerical code for extended Boussinesq equations. J. Waterway Port Coast and Ocean Engng, 121(5), 251-261.

Approximate Dynamic Programming for Generation of Robustly Stable Feedback Controllers Jakob Bj¨ ornberg1 and Moritz Diehl2 1 2

Center of Mathematical Sciences, University of Cambridge, Wilberforce Road, CB3 OWB Cambridge, United Kingdom Optimization in Engineering Center (OPTEC)/Electrical Engineering Department (ESAT), K.U.Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium [email protected]

Abstract In this paper, we present a technique for approximate robust dynamic programming that allows to generate feedback controllers with guaranteed stability, even for worst case disturbances. Our approach is closely related to robust variants of the Model Predictive Control (MPC), and is suitable for linearly constrained polytopic systems with piecewise aﬃne cost functions. The approximation method uses polyhedral representations of the cost-to-go function and feasible set, and can considerably reduce the computational burden compared to recently proposed methods for exact dynamic programming for robust MPC [1, 8]. In this paper, we derive novel conditions for guaranteeing closed loop stability that are based on the concept of a “uroborus”. We ﬁnish by applying the method to a state constrained tutorial example, a parking car with uncertain mass.

1 Introduction The optimization based feedback control technique of Model Predictive Control (MPC) has attracted much attention in the last two decades and is nowadays widespread in industry, with many thousand large scale applications reported, in particular in the process industries [19]. Its idea is, simply speaking, to use a model of a real plant to predict and optimize its future behaviour on a so called prediction horizon, in order to obtain an optimal plan of future control actions. Of this plan, only the ﬁrst step is realized at the real plant for one sampling time, and afterwards the real system state – which might be diﬀerent than predicted – is observed again, and a new prediction and optimization is performed to generate the next sampling time’s feedback control. So far, in nearly all MPC applications, a deterministic – or

70

J. Bj¨ ornberg and M. Diehl

nominal – model is used for prediction and optimization. Though the reason for repeated online optimization and feedback is exactly the non-deterministic nature of the process, this “nominal MPC” approach is nevertheless useful in practice due to its inherent robustness properties [9, 10]. In contrast to this, robust MPC, originally proposed by Witsenhausen [21], is directly based on a worst-case optimization of future system behaviour. While a key assumption in nominal MPC is that the system is deterministic and known, in robust MPC the system is not assumed to be known exactly, and the optimization is performed against the worst-case predicted system behaviour. Robust MPC thus typically leads to min-max optimization problems, which either arise from an open loop, or from a closed loop formulation of the optimal control problem [14]. In this paper, we are concerned with the less conservative, but computationally more demanding closed loop formulation. We regard discrete-time polytopic systems with piecewise aﬃne cost and linear constraints only. For this problem class, the closed loop formulation of robust MPC leads to multi-stage min-max optimization problems that can be attacked by a scenario tree formulation [12, 20] or by robust dynamic programming (DP) approaches [1, 8, 17] (an interesting other approach to robust MPC is based on “tubes”, see e.g. [13]). Note that the scenario tree formulation treats a single optimization problem for one initial state only, whereas DP produces the feedback solution for all possible initial states. Unfortunately, the computational burden of both approaches quickly becomes prohibitive even for small scale systems as the size of the prediction horizon increases. A recently developed approximation technique for robust DP [4] considerably reduces the computational burden compared to the exact method. This approximation is at the expense of optimality, but still allows to generate robustly stable feedback laws that respect control and state constraints under all circumstances. The ﬁrst aim of the article is to review this technique in a detailed, self contained fashion. In addition, in this paper, we considerably weaken the previous requirements to obtain robust stability, by a novel approach based on the concept of a “uroborus”. We ﬁrst show, in Section 2, how the robust DP recursion can be compactly formulated entirely in terms of operations on sets. For the problem class we consider, these sets are polyhedra and can be explicitly computed [1, 8], as reviewed in Section 3. In Section 4 we generalize an approximation technique originally proposed in [15] (for deterministic DP). This allows us to approximate the result of the robust dynamic programming recursion with considerably fewer facets than in the exact approach. In Section 5 we give conditions that guarantee robust closed loop stability of the generated feedback law, entirely in terms of the polyhedral set representation, and introduce the idea of a “uroborus”. Finally, in Section 6 we illustrate with an example how the approximation approach can be used to synthesize a robustly stable feedback, and we conclude the paper in Section 7. Implementations of all algorithms presented in this paper are publicly available [6].

Approximate Dynamic Programming

71

2 Robust Dynamic Programming with Constraints We consider discrete-time dynamic systems xk+1 = fk (xk , uk ),

fk ∈ F,

k∈N

(1)

with states xk ∈ Rnx and controls uk ∈ Rnu . The transition functions fk in each step are uncertain, but we assume they are known to be in a certain set F. Given an N -stage policy π = (u0 (·), u1 (·), . . . , uN −1 (·)), and given an uncertainty realization φ = (f0 , f1 , . . . , fN −1 ) ∈ FN , as well as an initial 0 value x0 , we deﬁne the corresponding state sequence xπ,φ,x , k = 0, . . . , N , by k π,φ,x0 π,φ,x0 π,φ,x0 π,φ,x0 x0 = x0 and xk+1 = fk (xk , uk (xk )). To each admissible event (xk , uk ), i.e. state xk and control uk at time k, a cost Lk (xk , uk ) is associated, which is additive over time. The objective in robust dynamic programming is to ﬁnd a feedback policy π ∗ that minimizes the worst case total cost, i.e. solves the min-max optimization problem min max π

φ∈ FN

N −1

0 0 0 Lk (xπ,φ,x , uk (xπ,φ,x )) + VN (xπ,φ,x ) N k k

(2)

k=0 0 0 , uk (xπ,φ,x )) ∈ Lk , (xπ,φ,x k k

subject to:

0 xπ,φ,x ∈ XN , N

∀φ ∈ FN , k ∈ {0, . . . , N −1}. The sets Lk and XN specify constraints on states and controls, and VN (·) is a ﬁnal cost. Starting with VN and XN , we compute the optimal cost-to-go functions Vk and feasible sets Xk recursively, for k = N − 1, . . . , 1, by the robust Bellman equation with constraints (cf. [17]): V˜k (x, u) Vk (x) := min n u∈R

u

s.t.

˜ k, (x, u) ∈ X

and

˜ k }, Xk := {x | ∃u : (x, u) ∈ X

(3) (4)

˜ k := {(x, u) ∈ Lk | where V˜k (x, u) := Lk (x, u)+ maxf ∈F Vk+1 (f (x, u)) and X f (x, u) ∈ Xk+1 ∀f ∈ F }. The optimal feedback control uk (x) from the optimal policy π ∗ can be determined as the minimizer of (3). 2.1 Formulation in Terms of Epigraphs Given a set W and a function g : W → R, deﬁne the epigraph of g by epi(g) := {(w, s) ∈ W × R | w ∈ W, s ≥ g(w)}.

(5)

Given two subsets A and B of W × R, deﬁne their cut-sum A B by A B := {(x, s + t) | (x, s) ∈ A, (x, t) ∈ B}.

(6)

72

J. Bj¨ ornberg and M. Diehl

If W ⊆ X × U and f is any function W → X, deﬁne an “epigraph function” fE : W × R → X × R by fE (x, u, s) = (f (x, u), s). We think of X as a state space and U as a space of controls. If we have a dynamic programming recursion with stage constraints deﬁned by the sets Lk and stage costs Lk : Lk → R, let ek = epi(Lk ). Suppose the ﬁnal cost and constraint set are VN and XN , and let EN = epi(VN ). Similarly deﬁne Ek = epi(Vk ) for any 0 ≤ k ≤ N − 1. Here we regard Vk as a function deﬁned on Xk only. We now deﬁne the operation Tk on the set Ek+1 as follows: ⎛ ⎞ Tk (Ek+1 ) := p ⎝ek fE−1 (Ek+1 )⎠ , (7) fE ∈ FE

˜ := {(x, s) | ∃u : where p : X × U × R → X × R denotes projection, p(E) ˜ (x, u, s) ∈ E}, and FE := {fE | f ∈ F}. Proposition 1. Ek = Tk (Ek+1 ) for k = 0, . . . , N − 1. Proof. By deﬁnition of Ek = epi(Vk ) and by (3) ˜ k , s ≥ V˜k (x, u)}. Ek = {(x, s) | ∃u : (x, u) ∈ X

(8)

−1 We have that (x, u, s) ∈ fE ∈FE fE (Ek+1 ) iﬀ f (x, u) ∈ Xk+1 for all ˜ k := ek f , and s ≥ maxf ∈F Vk+1 (f (x, u)). Furthermore, (x, u, s) ∈ E −1 f (E ) iﬀ, in addition, (x, u) ∈ L and s ≥ Lk (x, u) + k+1 k E fE ∈FE maxf ∈F Vk+1 (f (x, u)), in other words iﬀ (x, u, s) is in the epigraph epi(V˜k ) ˜ k . Therefore E ˜ k = epi(V˜k ). Hence Tk (Ek+1 ) = p(E ˜k) = of V˜k restricted to X p(epi(V˜k )) which is the same as the expression (8) for Ek .

In view of Proposition 1, we call Tk the robust dynamic programming operator. We will also denote this by T , suppressing the subscript, whenever the time-dependency is unimportant. Using Proposition 1 we easily deduce the following monotonicity property, motivated by a similar property in dynamic programming [3]. Proposition 2. If E ⊆ E then T (E ) ⊆ T (E). Proof. Referring to (7), we see that for any fE ∈ FE we have fE−1 (E ) ⊆ fE−1 (E). Thus, letting P = fE−1 (E ) and P = fE−1 (E), we have P ⊆ P . Now suppose (x, s) ∈ p(e P ), i.e. there exists a u such that (x, u, s) ∈ e P . Thus there exist s1 , s2 with s = s1 + s2 and (x, u, s1 ) ∈ e and (x, u, s2 ) ∈ P . But then (x, u, s2 ) ∈ P , so (x, u, s) = (x, u, s1 + s2 ) ∈ e P . Thus (x, s) ∈ p(e P ). Hence, T (E ) = p(e P ) ⊆ p(e P ) = T (E).

(9)

Approximate Dynamic Programming

73

3 Polyhedral Dynamic Programming From now on we consider only aﬃne systems with polytopic uncertainty, of the form f (x, u) = Ax + Bu + c. (10) Here the matrices A and B and the vector c are contained in a polytope F = conv{(A1 |B1 |c1 ), . . . , (Anf |Bnf |cnf )},

(11)

and we identify each matrix (A|B|c) ∈ F with the corresponding function f . Polytopic uncertainty may arise naturally in application problems (see the example in this paper) or may be used to approximate nonlinear systems. We consider convex piecewise aﬃne (CPWA) cost functions " ! ˘ , VN (x) = max PN [ x1 ], (12) L(x, u) = max P˘ [ x1 ] + Qu ˘ and where the maximum is taken over the components of a vector, and P˘ , Q, PN are matrices of appropriate dimensions. For simplicity of notation, we assume here and in the following that the stage costs Lk and feasible sets Lk in (2) do not depend on the stage index k. We treat linear constraints that result in polyhedral feasible sets $ # $ # ˇ ≤ 0 , XN = x PˆN [ x1 ] ≤ 0 . L = (x, u) Pˇ [ x1 ] + Qu Deﬁnition 1. By polyhedral dynamic programming we denote robust dynamic programming (3)–(4) for aﬃne systems f with polytopic uncertainty (11), CPWA cost functions L and VN , and polyhedral constraint sets L and XN . The point is that for polytopic F, CPWA L, VN , and polyhedral sets L, XN , all cost-to-go functions Vk are also CPWA, and the feasible sets Xk are polyhedral. This is proved in [1] and [8], the latter of which uses epigraphs. We give an alternative formulation here, using the ideas and notation from Section 2.1. Theorem 1. For T a polyhedral dynamic programming operator and E a polyhedron, also T (E) as deﬁned in (7) is a polyhedron. Proof. We begin by proving that fE ∈ FE fE−1 (E) is a polyhedron. FE is given as the convex hull of the matrices D1 , . . . , Dnf , with Di = A0i B0i 01 c0i . Assume that E consists of all points y satisfying Qy ≤ q. Thus (x, u, s) ∈ fE−1 (E) iﬀ QD · (x, u, s, 1)T ≤ q (where fE is represented by D ∈ FE ). It follows that −1 fE ∈ FE fE (E)

=

%

ti ≥0, nf i=1 ti =1

nf ( nf & '

x

x ) &x' x u u u QDi u ti QDi s ≤ q = ≤ q s s s 1 1 i=1

i=1

which as the ﬁnite intersection of polyhedra is again a polyhedron. In addition, the following trivial lemma holds [5].

74

J. Bj¨ ornberg and M. Diehl

Lemma 1. The cut-sum of two polyhedra is a polyhedron. ˜=e Therefore, also the cut-sum E

fE ∈ F E

fE−1 (E) is a polyhedron. Finally,

˜ is a polyhedron, also p(E) ˜ is a polyhedron. Corollary 1. If this E This corollary, which is a direct consequence of the constructive Theorem 2 below, completes the proof of Theorem 1. ˜ is a nonempty polyhedron represented as Theorem 2 ([8]). Assume E ⎧⎡ ⎤ * + ⎡x⎤ , - ⎫ ⎨ x P˜ Q ˜ 1 ⎬ ˜ = ⎣u⎦ ⎣1⎦ ≤ E s , (13) ¯ 0 ⎭ ⎩ P¯ Q s u which is bounded below in the s-dimension. Then ) x 1 ˜ = {(x, s) | ∃u : (x, u, s) ∈ E} ˜ = x P ≤ s p(E) 0 s Pˆ 1 with

˜ D ¯ P = D

P˜ P¯

and

¯ P¯ . Pˆ = R

˜ D) ¯ are the vertices of the polyhedron The row vectors of (D ) ˜ λ ˜ ≥ 0, λ ¯ ≥ 0, Q ˜ Q ¯ = 0, 1T λ ˜ =1 ˜ T λ+ ¯T λ λ Λ= ¯ λ

(14)

(15)

(16)

¯ span the extreme rays of Λ. and the row vectors of (0 R) ˜ iﬀ the linear program Proof. We see that (x, s) ∈ p(E) ˜ min s s.t. (x, u, s) ∈ E

(u,s )

(17)

has a ﬁnite solution y and s ≥ y. The dual of (17) is max λT px , λ∈Λ

where px =

P˜k P¯k

x . 1

(18)

(19)

By assumption (17) is solvable, so the dual (18) is feasible and the set Λ therefore nonempty. Denoting the vertices and extreme ray vectors of Λ by di and rj respectively, we may write Λ = conv{d1 , . . . , dnd } + nonneg{r1 , . . . , rnr }. There are two cases, depending on x:

(20)

Approximate Dynamic Programming

75

1. If the primal (17) is feasible, then by strong duality the primal (17) and the dual (18) have equal and ﬁnite solutions. In particular maxλ∈Λ λT px < ∞. This implies that there is no ray vector r of Λ with rT px > 0. Therefore, the maximum of the dual (18) is attained in a vertex di so that its optimal value is maxi=1,...,nd dTi px . 2. If the primal (17) is infeasible then the dual must be unbounded above, since it is feasible. So there is a ray r ∈ nonneg{r1 , . . . , rnr } with rT px > 0, and hence there must also be an extreme ray rj with rjT px > 0. ˜ iﬀ s ≥ dT px for all vertices di and rT px ≤ 0 This shows that (x, s) ∈ p(E) i j for all extreme rays rj . The proof is completed by collecting all vertices dTi = ¯ k ), the extreme rays rT = (˜ ˜ kD (d˜Ti , d¯Ti ) as rows in the matrix (D rjT , r¯jT ) as rows j ¯ k ), and showing that r˜j = 0 for all j. For any λ ∈ Λ we ˜k R in the matrix (R ˜ r˜j ) = 1. Thus 1T r˜j = 1−1T λ ˜ = 1−1 = 0. have that λ+rj ∈ Λ, so that 1T (λ+ As r˜j ≥ 0 we see that r˜j = 0. The possibility to represent the result of the robust DP recursion algebraically can be used to obtain an algorithm for polyhedral dynamic programming. Such an algorithm was ﬁrst presented by Bemporad et al. [1], where the explicit solution of multi-parametric programming problems is used. The representation via convex epigraphs and the duality based construction at the base of Theorem 2 above was ﬁrst presented in [8]. Polyhedral dynamic programming is exact, and does not require any tabulation of the state space. Thus it avoids Bellman’s “curse of dimensionality”. However, the numbers of facets required to represent the epigraphs Ek will in general grow exponentially. Even for simple examples, the computational burden quickly becomes prohibitive as the horizon N grows.

4 Approximate Robust Dynamic Programming We now review an approximation technique for polyhedral dynamic programming that was ﬁrst presented in [4], and which allows to considerably reduce the computational burden compared to the exact method. When used properly, the method is able to preserve robust stability properties in MPC applications. This approximation technique also is the motivation for the new stability proofs centered around the “uroborus” presented in Section 5. We will ﬁrst show how to generate a polyhedron that is “in between” two polyhedra V ⊆ W, where V and W can be thought of as two alternative epigraphs generated during one dynamic programming step. For this aim we suppose $ # $ # V = x viT [ x1 ] ≤ 0, i ∈ IV , W = x wjT [ x1 ] ≤ 0, j ∈ IW . We describe how to generate another polyhedron $ # A = x ∈ Rn viT [ x1 ] ≤ 0, ∀i ∈ IA ⊆ IV ,

76

J. Bj¨ ornberg and M. Diehl

which satisﬁes V ⊆ A ⊆ W, and which uses some of the inequalities from V. In general A will be represented by fewer inequalities than V, and shall later be used for the next dynamic programming recursion step. It is exactly this data reduction that makes the approximation method so much more powerful than the exact method. We show how to generate the index set IA iteratively, using intermediate (k) sets IA . The method is a generalization of an idea from [15] to polyhedral sets. At each step, let A(k) be given by # $ (k) . A(k) = x ∈ Rn viT [ x1 ] ∀i ∈ IA (k)

where the index set IA is a subset of IV , the index set of the inner polyhedron V. Procedure–Polyhedron Pruning (0)

1. Let k = 0 and IA := ∅ (i.e. A(0) = Rn ). 2. Pick some j ∈ IW . 1 ∗2 a) If there exists an x∗ ∈ A(k) such that wjT x1 > 0, then let 1 ∗2 (k+1) (k) := IA ∪ {i∗ }, and remove i∗ = arg maxi∈IV viT x1 . Set IA i∗ from IV . (k+1) (k) := IA and remove j from IW . b) If no such x∗ exists, set IA (k) 3. If IW = ∅ then let IA := IA and end. Otherwise set k := k + 1 and go to 2. The idea of polyhedron pruning is illustrated in Figure 1. In step (2) of the 1algorithm, for given j ∈ IW we have to ﬁnd an x∗ ∈ A(k) such that 2 T x∗ wj 1 > 0, or make sure that no such x∗ exists. We address this task using the following linear program: (k)

max wjT [ x1 ] s.t. wjT [ x1 ] ≤ η, viT [ x1 ] ≤ 0, ∀i ∈ IA , x

(21)

for some η > 0 (in our implementation we used η = 0.05). If the optimal value of problem (21) is negative or zero, we know that no such x∗ exists. Otherwise we take its optimizer to be x∗ . In our algorithm, we also perform a special scaling of the inequalities, which leads to the following assumption: Assumption 1 Each vector vi = ¯ vi 2 = 1.

1 v¯i 2 ξi

∈ Rn × R deﬁning V is normed so that

We require Assumption 1 for the following reason. The quantity 1 ∗2 viT x1

(22)

in (2) of the pruning procedure measures the vertical “height” of the hyperplane described by vi . If Assumption 1 holds, then by maximizing (22) we ﬁnd the piece of V that is furthest away from x∗ , because then the height

Approximate Dynamic Programming

77

W

A(k)

vi∗

x∗ vi Figure 1. Polyhedron pruning. Here x∗ is in A(k) but violates a constraint deﬁning W, so we form A(k+1) by including another constraint from V. The inequality represented by vi∗ is chosen rather than the one represented by vi , as the distance from vi∗ to x∗ is greater (note that inequalities are hatched on the infeasible side)

above x∗ and the distance from x∗ coincide. Note that we employ a diﬀerent selection rule from [15]: instead of adding the constraint vi∗ that is furthest away from x∗ in a predetermined direction, we choose it so that it maximizes the perpendicular distance to x∗ . Summarizing, we use the described pruning method to perform approximate polyhedral dynamic programming, by exectuting the steps of the following algorithm. Procedure–Approximate Polyhedral DP 1. Given the epigraph Ek , the stage cost L and feasible set L, let Eout = T (Ek ) as in Section 2.1. 2. Choose some polyhedron Ein that satisﬁes Ein ⊆ Eout . 3. Let Ek−1 be the result (A) of applying the polyhedral pruning procedure with V = Ein and W = Eout . 4.1 How to choose Ein ? There are many ways of choosing the set Ein , and the choice is largely heuristic. Recall that Eout = T (Ek ) is the epigraph of the cost-to-go Vk−1 restricted to the feasible set Xk−1 . To get a polyhedron contained in Eout you could raise or steepen the cost-to-go Vk−1 , or you could diminish the size of the feasible set Xk−1 , or both. We are mainly interested in the case where x = 0

78

J. Bj¨ ornberg and M. Diehl

is always feasible and the minimum value of the cost-to-go Vk is attained at zero for all k. Then you can let Ein := {(x, s) ∈ Rn × R | (αx, s) ∈ Eout }

(23)

with some α > 1. This is illustrated in Figure 2. It is diﬃcult to assess the loss of optimality due to this approximation, as not only the cost function is made steeper, but also the feasible set is reduced. However, with the above deﬁnition, Ein is a factor α1 smaller than Eout . For example, if we choose α = 1.1 in the tutorial example below, we lose at most roughly 10% of the volume in each iteration.

s

Ek−1 Eout

Ein

x Figure 2. Pruning of epigraphs (sketch)

4.2 The Approximated Robust MPC Feedback Law For a given polyhedral epigraph E we can deﬁne an approximate robustly optimal feedback law u(·) by letting u(x) be the solution of a linear programming problem (LP), &x' ˜ u(x) := arg min s s.t. u ∈ E (24) u,s

s

−1 ˜ =e with E fE ∈ FE fE (E) as in Theorem 1. The computational burden associated with this LP in only nu +1 variables is negligible and can easily be performed online. Alternatively, a precomputation via multiparametric linear programming is possible [1].

5 Stable Epigraphs and the Uroborus The question of stability is a major concern in MPC applications and has been addressed e.g. in [2, 12, 18]. In the notation of this paper, the question is if the uncertain closed loop system

Approximate Dynamic Programming

xk+1 = fk (xk , u(xk )), fk ∈ F,

79

(25)

where u(xk ) is determined as solution of (24), is stable under all possible realizations of fk ∈ F. In the case of uncertain systems, it might not be possible to be attracted by a ﬁxed point, but instead by an attractor, a set that we denote T. In this case, the cost function L would penalize all states that are outside T in order to drive the system towards T. The following theorem formulates easily verifyable conditions that guarantee that a given set T is robustly asymptotically attractive for the closed-loop system, in the sense deﬁned e.g. in [12]. Theorem 3 (Attractive Set for Robust MPC [4]). Consider the closed loop system (25), and assume that 1. there is a non-empty set T ⊂ Rn and an ε > 0 such that L(x, u) ≥ ε · d(x, S) for all (x, u) ∈ L, where d(x, T) := inf y∈T x − y is the distance of x from T, 2. E ⊆ T (E), 3. there exists an s0 ∈ R with (x0 , s0 ) ∈ T (E), such that A := {(x, s) ∈ T (E) | s ≤ s0 } is compact. Then the closed loop is robustly asymptotically attracted by the set T, i.e., lim d(xk , T) = 0, for all system realizations (fk )N ∈ FN .

k→∞

The simple proof uses V (x) := minx,s ss.t. [ xs ] ∈ T (E) as a Lyapunov function and is omitted for the sake of brevity. While conditions 1 and 3 are technical conditions that can easily be met by suitably setting up the MPC cost function and constraints, the crucial assumption on the epigraph E, Assumption 2 – which guarantees that V (x) is a robust Lyapunov function – is much more diﬃcult to satisfy. Deﬁnition 2. We call a set E a stable epigraph iﬀ E ⊆ T (E). Unfortunately, it is not straightforward to generate a stable epigraph in practice. One might think to perform approximate dynamic programming for a while and to check at each iteration if the most currently generated epigraph is stable. Unfortunately, there is no monotonicity in approximate dynamic programming, and this procedure need not yield a result at any iteration. Note that if we would start the exact dynamic programming procedure with a stable epigraph EN , each iterate T k (EN ) would also yield a stable epigraph. Theorem 4. If EN is stable, then E := T k (EN ) is stable for all k ≥ 0. Proof. Using the monotonicity of T , we obtain EN ⊆ T (EN ) ⇒ T k (EN ) ⊆ T k (T (EN )) ⇔ E ⊆ T (E).

80

J. Bj¨ ornberg and M. Diehl

The assumption that EN is stable (or an equivalent assumption) is nearly always made in existing stability proofs for MPC [7, 12, 16], but we point out that (i) it is very diﬃcult to ﬁnd such a positively invariant terminal epigraph EN in practice and (ii) the exact robust dynamic programming procedure is prohibitive for nontrivial horizon lengths N . These are the reasons why we avoid any assumption on EN at all and directly address stability of the epigraph E that is ﬁnally used to generate the feedback law (24). A similar approach is taken in [11]. But the crucial question of how to generate a stable epigraph remains open. 5.1 The Novel Concept of a Uroborus Fortunately, we can obtain a stable epigraph with considerably weaker assumptions than usually made. Deﬁnition 3. Let the collection E1 , E2 , . . . , EN satisfy Ek ⊆ T (Ek+1 ),

k = N − 1, . . . , 1

(26)

and EN ⊆ T (E1 )

(27)

Then we call E1 , E2 , . . . , EN a uroborus. The name “uroborus” is motivated from mythology, where it denotes a snake that eats its own tail, as visualized in Figure 3. A uroborus could be generated by approximate dynamic programming, as follows. Starting with some epigraph EN , we can perform the conservative version of approximate dynamic programming to generate sets EN −1 , . . . , E1 , E0 , which ensures Ek ⊆ T (Ek+1 ) by construction, i.e. the ﬁrst N − 1 inclusions (26) are already satisﬁed. Now, if also the inclusion EN ⊆ E0 holds, i.e., if the head E0 “eats” the tail EN , then also (27) holds because of E0 ⊆ T (E1 ), and the generated sequence of sets EN −1 , . . . , E1 is a uroborus. But why might EN ⊆ E0 ever happen? It is reasonable to expect the sets Ek to grow on average with diminshing index k if the initial set EN was chosen suﬃciently small (a detailed analysis of this hope, however, is involved). So, if N is large enough, we would therefore expect the condition EN ⊆ E0 to hold. i.e., the ﬁnal set E0 to be large enough to include EN . It is important to note that our numbering of the epigraphs is contrary to their order of appearance during the dynamic programming backwards recursion, but that we stick to the numbering here, in order to avoid confusion. It should be kept in mind, however, that an uroborus need not necessarily be found in the whole collection EN , . . . , E0 generated during the conservative dynamic programming recursion by checking EN ⊆ T (E1 ), but that it suﬃces that we ﬁnd Ek ⊆ T (Ek−N +1 ) for any two integers k and N . Note also that a uroborus with N = 1 is simply a stable epigraph.

Approximate Dynamic Programming

81

Figure 3. The uroborus in mythology is a snake that eats its own tail (picture taken from http://www.uboeschenstein.ch/texte/marks-tarlow3.html)

Theorem 5. Let E1 , . . . , EN be a uroborus for T , and let K := their union. Then K is a stable epigraph.

3N k=1

Ek be

Proof. Using the monotonicity of T , we have from (26) the following inclusions: EN −1 ⊆ T (EN ) ⊆ T (K) down to E1 ⊆ T (E2 ) ⊆ T (K) and ﬁnally, 3N from (27) EN ⊆ T (E1 ) ⊆ T (K). Hence K = k=1 Ek ⊆ T (K). Deﬁnition 4. We call a robust dynamic programming operator T convex iﬀ T (E) is convex whenever E is convex. Corollary 2. If E1 , . . . , EN is a uroborus and T is convex, then the convex hull E := conv{E1 , . . . , EN } is also stable. Proof. Again K ⊆ T (E), so E = conv(K) ⊆ T (E) by the convexity of T (E).

6 Stability of a Tutorial Example In order to illustrate the introduced stability certiﬁcate, we consider a tutorial example ﬁrst presented in [8], with results that have partly been

82

J. Bj¨ ornberg and M. Diehl

presented in [4]. The task is to park a car with uncertain mass in front of a wall as fast as possible, without colliding with the wall. The state x = [ vp ] consist of position p and velocity v, the control u is the acceleration force, constant on intervals of length t = 1. We deﬁne the following discrete time dynamics: t ( 2t ) uk (28) xk+1 = ( 10 1t ) xk + 2m The mass m of the car is only known to satisfy 1 ≤ m ≤ 1.5, i.e. we have a polytopic system (10) with uncertain matrix B. We impose the constraints# p ≤ 0 and |u| ≤ 1, and choose $ L(x, u) = max(−p + v, −p − v), p XN = [ v ] p ≤ 0, v ≤ 0, −p − v ≤ 0.3 , VN (x) = 100 · (−p − v). First we computed the cost-to-go functions and feasible sets using the exact polyhedral dynamic programming method described in [8]. Computations took almost 4 hours for N = 7. The number of facets deﬁning E0 was 467. Then we repeated the computation using the approximate method of Section 4, where we chose Ein according to (23) with α = 1.1. Computations took about 8 minutes for N = 49; the robustly feasible sets XN , XN −1 , . . . , X0 are plotted in Figure 4. The number of facets deﬁning E0 was 105. 8

6

Velocity (v)

4

2

0

−2

−4

−6 −90

−80

−70

−60

−50 −40 Position (p)

−30

−20

−10

0

Figure 4. Large: Feasible sets for N = 49 using the approximate procedure with α = 1.1. The largest set is the feasible set corresponding to the result T (E0 ) of exact dynamic programming applied to the ﬁnal epigraph E0 (which corresponds to the second largest set). Small: Feasible sets for N = 7 using the exact method. The ﬁgures have the same scale

Approximate Dynamic Programming

83

6.1 Stability by a Trivial Uroborus As can already be seen in Figure 4, the epigraph E0 is contained in its image under the exact dynamic programming operator, E0 ⊆ T (E0 ), i.e., E0 is stable, or, equivalently, E0 forms a uroborus with only one element. We checked this inclusion by ﬁnding the vertices of E0 and checking that they indeed satisfy the inequalities that deﬁne T (E0 ) (within a tolerance of 10−6 ). Therefore, we can simply set E := E0 in the deﬁnition (24) of the robust MPC feedback law u(·) to control the parking car. To satisfy all conditions of the stability guarantee, Theorem 3, we ﬁrst set T = {0}. By construction, conditions 1 and 3 are met whenever x0 ∈ X0 . Together with stability of E, the approximate robust MPC leads the closed loop robustly towards the origin for all initial states x0 in the set X0 that is much larger than what could be obtained by the exact procedure in a reasonable computing time.

1.2

1

Speed (v)

0.8

0.6

0.4

0.2

0 −1.8

−1.6

−1.4

−1.2

−1 −0.8 Position (p)

−0.6

−0.4

−0.2

0

Figure 5. Asymptotic stability of the closed loop resulting from the min-max MPC feedback u(·), for three trajectories with randomly generated masses in each time step

7 Conclusions We have reviewed a method for approximate robust dynamic programming that can be applied to polytopic systems with piecewise aﬃne cost and linear constraints, and have given novel conditions to guarantee robust stability of

84

J. Bj¨ ornberg and M. Diehl

approximate MPC. The underlying dynamic programming technique uses a dual approach and represents the cost-to-go functions and feasible sets at each stage in one single polyhedron Ek . A generalization of the approximation technique proposed in [15] to polyhedral sets allows us to represent these polyhedra approximately. Based on the novel concept of a uroborus, we presented a way to generate a robust MPC controller with the approximate dynamic programming recursion that is guaranteed to be stable. The ideas are demonstrated in a tutorial example, a parking car with uncertain mass. It is shown that our heuristic approximation approach is indeed able to generate the positively invariant set required for this stability certiﬁcate. Comparing the results with the exact robust dynamic programming method used in [8], we were able to demonstrate a signiﬁcant ease of the computational burden. The approximation algorithm, which is publicly available [6], is able to yield much larger positively invariant sets than the exact approach and thus considerably widens the range of applicability of robust MPC schemes with guaranteed stability. The novel concept of a uroborus makes generation of robustly stable MPC controllers much easier than before and promises to fertilize both, robust MPC theory and practice. Acknowledgments Financial support by the DFG under grant BO864/10-1, by the Research Council KUL: CoE EF/05/006 Optimization in Engineering Center (OPTEC), and by the Belgian Federal Science Policy Oﬃce: IUAP P6/04 (Dynamical systems, control and optimization, 2007-2011) is gratefully acknowledged. The authors also thank Sasa Rakovic for very fruitful discussions, in particular for pointing out the possibility to convexify the union of a uroborus, during an inspiring walk to Heidelberg castle.

References 1. A. Bemporad, F. Borrelli, and M. Morari. Min-max control of constrained uncertain discrete-time linear systems. IEEE Transactions on Automatic Control, vol. 48, no. 9, 1600-1606, 2003. 2. A. Bemporad and M. Morari. Robust model predictive control: A survey. In A. Garulli, A. Tesi, and A. Vicino, editors, Robustness in Identiﬁcation and Control, number 245 in Lecture Notes in Control and Information Sciences, 207–226, Springer-Verlag, 1999. 3. D. P. Bertsekas. Dynamic Programming and Optimal Control, volume 1 and 2. Athena Scientiﬁc, Belmont, MA, 1995. 4. J. Bj¨ ornberg and M. Diehl. Approximate robust dynamic programming and robustly stable MPC. Automatica, 42(5):777–782, May 2006. 5. J. Bj¨ ornberg and M. Diehl. Approximate robust dynamic programming and robustly stable MPC. Technical Report 2004-41, SFB 359, University of Heidelberg, 2004.

Approximate Dynamic Programming

85

6. J. Bj¨ ornberg and M. Diehl. The software package RDP for robust dynamic programming. http://www.iwr.uni-heidelberg.de/~Moritz.Diehl/RDP/, January 2005. 7. H. Chen and F. Allg¨ ower. A quasi-inﬁnite horizon nonlinear model predictive control scheme with guaranteed stability. Automatica, 34(10):1205–1218, 1998. 8. M. Diehl and J. Bj¨ ornberg. Robust dynamic programming for min-max model predictive control of constrained uncertain systems. IEEE Transactions on Automatic Control, 49(12):2253–2257, December 2004. 9. R. Findeisen and F. Allg¨ ower. Robustness properties and output feedback of optimization based sampled-data open-loop feedback. In Proc. of the joint 45th IEEE Conf. Decision Contr., CDC’05/9th European Control Conference, ECC’05, 7756–7761, 2005. 10. R. Findeisen, L. Imsland, F. Allg¨ ower, and B.A. Foss. Towards a sampleddata theory for nonlinear model predictive control. In W. Kang, C. Borges, and M. Xiao, editors, New Trends in Nonlinear Dynamics and Control, volume 295 of Lecture Notes in Control and Information Sciences, 295–313, New York, 2003. Springer-Verlag. 11. L. Gr¨ une and A. Rantzer. On the inﬁnite horizon performance of receding horizon controllers. Technical report, University of Bayreuth, February 2006. 12. E. C. Kerrigan and J. M. Maciejowski. Feedback min-max model predictive control using a single linear program: Robust stability and the explicit solution. International Journal on Robust and Nonlinear Control, 14:395–413, 2004. 13. W. Langson, S.V. Rakovic I. Chryssochoos, and D.Q. Mayne. Robust model predictive control using tubes. Automatica, 40(1):125–133, 2004. 14. J. H. Lee and Z. Yu. Worst-case formulations of model predictive control for systems with bounded parameters. Automatica, 33(5):763–781, 1997. 15. B. Lincoln and A. Rantzer. Suboptimal dynamic programming with error bounds. In Proceedings of the 41st Conference on Decision and Control, 2002. 16. D. Q. Mayne. Nonlinear model predictive control: Challenges and opportunities. In F. Allg¨ ower and A. Zheng, editors, Nonlinear Predictive Control, volume 26 of Progress in Systems Theory, 23–44, Basel Boston Berlin, 2000. Birkh¨ auser. 17. D. Q. Mayne. Control of constrained dynamic systems. European Journal of Control, 7:87–99, 2001. 18. D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert. Constrained model predictive control: stability and optimality. Automatica, 26(6):789–814, 2000. 19. S. J. Qin and T. A. Badgwell. A survey of industrial model predictive control technology. Control Engineering Practice, 11:733–764, 2003. 20. P. O. M. Scokaert and D. Q. Mayne. Min-max feedback model predictive control for constrained linear systems. IEEE Transactions on Automatic Control, 43:1136–1142, 1998. 21. H. S. Witsenhausen. A minimax control problem for sampled linear systems. IEEE Transactions on Automatic Control, 13(1):5–21, 1968.

Integer Programming Approaches to Access and Backbone IP Network Planning∗ Andreas Bley and Thorsten Koch Konrad-Zuse-Zentrum f¨ ur Informationstechnik Berlin, Takustr. 7, 14195 Berlin, Germany {bley,koch}@zib.de

Abstract In this article we study the problem of designing a nation-wide communication network. Such networks usually consist of an access layer, a backbone layer, and maybe several intermediate layers. The nodes of each layer must be connected to those of the next layer in a tree-like fashion. The backbone layer must satisfy survivability and IP-routing constraints. Given the node locations, the demands between them, the possible connections and hardware conﬁgurations, and various other technical and administrational constraints, the goal is to decide, which node is assigned to which network level, how the nodes are connected, what hardware must be installed, and how traﬃc is routed in the backbone. Mixed integer linear programming models and solution methods are presented for both the access and the backbone network design problem. The focus is on the design of IP-over-SDH networks, but the access network design model and large parts of the backbone network design models are general and also applicable for other types of communication networks. Results obtained with these methods in the planning of the German research network are presented.

1 Introduction The German gigabit research network G-WiN, operated by the DFN-Verein e.V., is the largest IP network in Germany. In this article we describe the mathematical models and tools used to plan the layout and dimensioning of the access and backbone network [BK00, BKW04]. Since these models are general in nature, they were used also in two other projects. One was the placement of switching centers in circuit switched networks, a cooperation

∗

This work was partially funded by the Bundesministerium f¨ ur Bildung, Wissenschaft, Forschung und Technologie (BMBF).

88

A. Bley and T. Koch

with Telekom Austria. In the other project, studies about MSC planning in mobile phone networks were conducted together with E-Plus. Unfortunately, the data from the later projects cannot be published. For this reason, we focus throughout this article on the G-WiN IP network as an example, but show the generalized models that were developed. The problem we were faced with, roughly can be stated as follows: Given the node locations, the demands between them, the possible node layers, connections, and hardware conﬁgurations, and various other technical and administrational constraints, the goal is to decide, which node is assigned to which network layer, how the nodes are connected, what hardware must be installed, and how traﬃc is routed in the backbone network. In this article we present a two-phase approach that splits between access and backbone network planning. In a ﬁrst step, we only consider the access network. Then, in the second phase, the backbone dimensioning and routing problem is addressed. Both problems are solved by integer linear programming techniques. The problems encountered in the access network planning can be viewed as capacitated two-level facility location problems. There is a huge amount of literature on various aspects of this kind of problems. See for example [ALLQ96, Hal96, MD01, MF90, PLPL00]. The backbone network planning problem is a capacitated survivable network design problem with some additional constraints to make sure that the routing can be realized in practice with the OSPF routing protocol, which is the dominating routing protocol in the Internet. The survivable network design problem in general is well studied in the literature, see for example [AGW97, BCGT98, GMS95, KM05]. Lagrangian approaches for minimizing the maximum link utilization and for ﬁnding minimum cost network designs with respect to OSPF routing have been proposed in [LW93] and [Ble03]. Various heuristic algorithms for OSPF network design and routing optimization problem can be found in [BRT04, BRRT05, BGW98, ERP01, FGL+ 00, FT00, FT04]. Polyhedral approaches to the network design problem with shortest path routing for unicast and multicast traﬃc were presented in [BGLM03, HY04] and [Pry02]. In [BAG03] the combinatorial properties of shortest path routings were studied and a linear programming approach to compute routing weights for a given shortest path conﬁguration was proposed. The computational complexity of several problems related to (unsplittable) OSPF routing is discussed in [Ble07a] and [Ble05]. This article is organized as follows: In Section 2 we describe the problem setting in more detail. In Section 3 we present the mathematical model for the access network planning problem. The same is done for the backbone network in Section 4. In Section 5 we describe the algorithms used and in the last section we report on computational results.

IP Approaches to Access and Backbone IP Network Planning U

W

89

V

Figure 1. Network layers

2 Problem Description The network design and routing problem studied in this article can be described as follows. Given are three node sets, a set U of demand nodes (locations), a set W of possible intermediate nodes, and a set V of possible backbone nodes (see Figure 1). The same location must belong to each set U , W , and V , if it may be assigned to each of the corresponding layers. Additional to the node sets U , W , and V , we have sets AU W and AW V with all potential connections between nodes from U and W , and W and V , respectively. For each pair of demand nodes u, v ∈ U the directed traﬃc demand is du,v ∈ IR+ . Various hardware or technology can be installed or rented at the nodes or potential edges of the network. This hardware has a modular structure and combined hardware components must have matching interfaces. For example, a router provides several slots, these can be equipped with cards providing link-interfaces, and, in order to set up a certain link-type on an edge, corresponding link-interfaces are needed at the two terminal nodes. The set of all available component types is denoted by C. We distinguish between node components CV , edge components CE , or global components CG , depending on whether they can be installed at nodes, on edges, or globally for the entire network. Components may be real, like router cards or leased lines, or artiﬁcial, like bundles of components that can be purchased only together. A component, if installed, provides or consumes several resources. The set of all considered resources is denoted by R. For each node component c ∈ CV , resource r ∈ R and node v ∈ V the number kvc,r ∈ IR denotes how much of resource r is provided (if > 0) or consumed (if < 0) by component c if installed at c,r specify the provision or consumption of for node v. Analogously, kec,r and kG edge and global components if installed. We distinguish between three classes of resources, node resources RV , edge resources RE , and global resources RG . For each edge resource r and each edge e, the total consumption must not exceed the total provision of resource r by the components installed on edge e and its incident vertices. Analogously, the consumption must not exceed the provision by the components installed at a node and its incident edges for node resources or by the components installed on all edges, nodes, and

90

A. Bley and T. Koch

globally for global resources. The component cost and the routing capacity are special resources. Due to the forest structure of the solutions, for the access network planning problem it is often possible to simplify the possible hardware combinations to a set of relatively few conﬁgurations and reassign node costs to edges in the preprocessing. Also the directed traﬃc demands can be aggregated. Hence, we can assume that for the access network planning we are given a set of assembly stages Sn , for each node n from W or V , and that κrs n describes the capacity of resource r that will be provided at node n if it is at stage s. Regarding the traﬃc demands as a special resource, each location u ∈ U has an undirected demand of δur for each resource r. For each v node from W and V the assembly stage has to be chosen in such a way, that there is enough capacity to route the demands of all resources for all demand nodes that are connected to this node v. The backbone nodes and edges have to be dimensioned in such a way, that the accumulated demands can be routed according to the OSPF speciﬁcation. Since in our approach the access network planning problem is solved ﬁrst, the set of backbone nodes is given and ﬁxed for the backbone network planning problem. For notational convenience, it will again be denoted by V . The set of potential links between these nodes is denoted by E, the graph G = (V, E) is called supply graph. The traﬃc demands between the demand nodes U are aggregated to a set of demands between the backbone nodes V . We assume that between each pair of nodes v1 , v2 ∈ V there is a demand (maybe equal to zero) denoted by dv1 ,v2 . We wish to design a backbone network that is still operational if a single edge or node fails. Therefore, we introduce the set of operating states O ⊆ V ∪ E ∪ {∅}. We distinguish between the normal operating state o = ∅, which is the state with all nodes and edges operational, and failure states, which are the states with a single edge (o = e ∈ E) or a single node (o = v ∈ V ) non-operational. Note that in a node failure state o = v ∈ V all edges incident to v are non-operational, too, and the demands with origin or destination v cannot be satisﬁed and are therefore not considered in this operating state. Unfortunately, in the backbone network planning we cannot simplify the hardware combinations like in the access network planning. Since we always had signiﬁcant costs associated with or several restrictions on the use of certain hardware components, it was necessary to explicitly consider all single hardware components. The capacities provided by the components installed on the edges must be large enough to allow a feasible routing with respect to the OSPF routing protocol in all operating states. Assuming non-negative routing weights for all arcs, the OSPF protocol implies that each demand is sent from its origin to its destination along a shortest path with respect to these weights. In each operating state only operational nodes and arcs are considered in the shortest path computation. In this article we address only static OSPF routing

IP Approaches to Access and Backbone IP Network Planning

91

where the weights do not depend on nor change with the traﬃc. Dynamic shortest path routing algorithms, which try to adapt to traﬃc changes, often cause oscillations that lead to signiﬁcant performance degradation, especially if the network is heavily loaded (see [CW92]). Also, though most modern IP routers support OSPF extensions that allow to split traﬃc onto more than one forwarding arcs, in this article we consider only the standard nonbifurcated OSPF routing. This implies, that the routing weights must be chosen in such a way that, for all operating states and all demands, the shortest path from the demand’s origin to its destination is unique with respect to these weights. Otherwise, it is not determined which one of the shortest paths will be selected by the implementation of the routing protocol in the real network and, therefore, it would be impossible to guarantee that the chosen capacities permit a feasible routing of the demands. The variation of data package transmission times increases signiﬁcantly with the number of nodes in the routing path, especially if the network is heavily loaded. In order to guarantee good quality of service, especially for modern real-time multi-media services, the maximum routing path length is bounded by a small number (at least for the normal operating state). Also, even though OSPF routing in principle may choose diﬀerent paths for the routing from s to t and t to s, a symmetric routing was preferred by the network administration. For diﬀerent planning horizons, usually also diﬀerent optimization goals are used. In the long-term strategic planning the goal is to design a network with minimal cost. In the short-term operational planning the goal often is to improve the network’s quality with no or only very few changes in the hardware conﬁguration. Since Quality-of-Service in data networks is strongly related to the utilization (load or congestion) of the network, typical objectives chosen for quality optimization thus are the minimization of the total, average, or maximum utilization of the network’s components. The utilization of a network’s component is ratio between the traﬃc ﬂow through this component and its capacity. In order to provide the best possible quality uniformly for all connections and to hedge against burstiness in traﬃc, a variant of minimizing the maximum utilization of the edge components was chosen as objective function for operational backbone network planning in our application. Usually, simply minimizing the maximum utilization over all network components in all operating states makes no sense in practice. Often there are some components or links in the network that always attain the maximum utilization, like, for example, gateways to other networks with ﬁxed small capacities. Also, it is not reasonable to pay the same attention to minimizing the utilization of some link in a network failure state as in the normal operating state. To overcome these diﬃculties, we introduce a set of disjoint load groups j ∈ J, j ⊂ C × E × O. Each load group deﬁnes a set of edge components on some edges in some operating states,e.g., all edge components on all interior edges in the normal operating state, and each triple (edge component, edge, operating state) belongs to at most one load group. For each load group the

92

A. Bley and T. Koch

maximum utilization is taken only over the components, edges, and operating states in that load group. The objective is to minimize a linear combination of the maximum utilization values for all load groups. Thus, diﬀerent classes of network components can be treated independently and diﬀerently in the utilization minimization. The concept of load groups can be generalized straightforward to include all node, edge, and global components. In our application, we only consider edge components in load groups. In the next two sections, we will develop mathematical models for the problems described above.

3 Access Network Planning For the access network we assume that all traﬃc is routed through the backbone. This leads to a forest structure with the demand nodes as leafs and the backbone nodes as roots. One advantage of this approach is that we can map all node attributes like cost and capacity on the edges, since each node has at most one outgoing edge. In the uncapacitated case this results in a Steiner tree problem in a layered graph, which could be solved at least heuristically quite easy, see for example [KM98]. For each possible connection in AU W we introduce a binary variable xuw , which is 1 iﬀ u and w are connected. Analogously, variables xwv are introduced for AW V . For each possible connection in AW V and each resource r ∈ IR we r that is equal to the ﬂow of resource r between use a continuous variable fwv w and v . For each node n from W and V and each assembly stage s ∈ Sn we use a binary variable zns that is 1 iﬀ stage s is selected for node n. There are no ﬂow variables and costs between the demand and intermediate nodes, because these costs can be computed in advance and added to the installation cost for the link. 3.1 Constraints Here we present the constraints needed to construct a feasible solution. Each demand node has to be connected to exactly one intermediate node and each intermediate node can be connected to at most one backbone node: xuw = 1 ∀u ∈ U xwv ≤ 1 ∀v ∈ W (1) (u,w)∈AU W

(w,v)∈AW V

Each intermediate and each backbone node has exactly one conﬁguration: s zw = 1 ∀w ∈ W zvs = 1 ∀v ∈ V (2) s∈Sw

s∈Sv

If it is possible, that node n ∈ W ∪ V is not chosen at all, there has to be a conﬁguration with κrs n = 0 for all r ∈ IR. In the case of a planning with no

IP Approaches to Access and Backbone IP Network Planning

93

current installations the cost of the corresponding variable zns would be zero. Otherwise if there is already an installation, we can give this conﬁguration zero cost and associate with all other conﬁgurations either negative costs, which means we earn something by switching to another conﬁguration, or positive changing costs. This way, we could also add a ﬁxed charge just for changing what is already there. The conﬁguration of the intermediate nodes must provide enough resources to meet the demands: s δur xuw − κrs ∀w ∈ W, r ∈ IR (3) w zw ≤ 0 s∈Sw

(u,w)∈AU W

Demands can only be routed on chosen connections: r λrw xwv − fwv ≥ 0 ∀(w, v) ∈ AW V , r ∈ IR

and λrw = max κrs w s∈Sw

The ﬂow balance at each intermediate node has to be ensured: r δur xuw − fwv =0 ∀w ∈ W, r ∈ IR (u,w)∈AU W

(4)

(5)

(w,v)∈AW V

Equation (5) can easily be extended with a constant additive factor in case the intermediate node is serving a ﬁxed amount of the demand. Also, a scaling factor can be incorporated if, for example, data compression is employed for the traﬃc from the intermediate to the backbone nodes. The conﬁguration of a backbone node has to provide enough resources to meed the demands of the assigned intermediate nodes: r s fwv − κrs ∀v ∈ V, r ∈ IR (6) v zv ≤ 0 (w,v)∈AW V

s∈Sw

3.2 Objective function While minimizing cost was always the objective, getting sensible cost coefﬁcient for the objective function proved to be a major problem in all projects. The objective can be a combination of the following terms: Installation cost for connections τuw xuw + τwv xwv , (7) (u,w)∈AU W

(w,v)∈AW V

installation cost for conﬁgurations s τws zw + τvs zvs , w∈W s∈Sw

(8)

v∈V s∈Sv

and ﬂow unit cost between intermediate and backbone nodes r r τwv fwv .

(9)

r∈IR (w,v)∈AW V r The τuw , τwv , τws , τvs , and τwv ∈ IR are the cost coeﬃcients used to weight the desired terms.

94

A. Bley and T. Koch

3.3 Notes If a demand node can be connected directly to a backbone node, this can be modeled by introducing an additional artiﬁcial intermediate node with only one link to the backbone node and adding a constraint that forbids a “zero” conﬁguration of the backbone if this link is active. The model can be easily extended to an arbitrary number of intermediate levels, but, of course, solving it will become more and more diﬃcult. A major drawback of this model is the inability to cope with locally routed demands. If two demand nodes are connected to the same intermediate node, often the demand need not to be routed to the backbone. Finding optimal partitions that minimize the traﬃc in the backbone is NP-hard itself and diﬃcult to solve [FMdS+ 96]. On the other hand, demands are usually quite unstable. While a “big” node will always have a high demand, the destination of its emanating traﬃc might change frequently. In the case of the G-WiN, a large fraction of the traﬃc is leaving it or entering from outside the network. We did limited experiments which indicated that, at least in this case, the possible gain from incorporating local traﬃc is much less than the uncertainty of the input data.

4 Backbone Network Planning In this section we present a mixed-integer linear programming model for the backbone network dimensioning and OSPF routing problem. 4.1 Network hardware In contrast to the access network, the possible conﬁgurations of the modular hardware for the backbone network are described by explicit single component variables and resource constraints. The number of installations of components are modeled by integer variables zvc ∈ IN, zvc ≤ zvc ≤ zvc , for each node component c ∈ CV and node v ∈ V , zec ∈ IN, zec ≤ zec ≤ zec , for each c c c c , for ∈ IN, zG ≤ zG ≤ zG edge component c ∈ CE and edge e ∈ E, and zG c each global component c ∈ CG . The lower and upper bounds z∗ and z∗c for each component and node, edge, or the global network restrict how often this component can be installed there. Depending on the resource type, with each resource there are one or more inequalities associated. For node resource constraint we have kec,r zec + kvc,r zvc ≥ 0 ∀ r ∈ RV , v ∈ V . (10) c∈CE e∈δ(v)

c∈CV

Analogously, the inequalities for edge and global resource constraints are

IP Approaches to Access and Backbone IP Network Planning

kec,r zec

c∈CE

c,r c (kvc,r zvc + kw zw ) ≥ 0 ∀ r ∈ RE , vw = e ∈ E, and

95

(11)

c∈CV

kec,r zec +

c∈CE e∈E

kvc,r zvc +

c∈CV v∈V

c,r c kG zG ≥ 0 ∀ r ∈ R G .

(12)

c∈CG

The node conﬁgurations s ∈ Sw chosen for the backbone nodes in the access network planning phase can be regarded as special ﬁxed components now, which provide the resources to install further components. The optimization goal of minimizing the total network cost can be easily formulated as c,cost c kec,cost zec + kvc,cost zvc + kG zG , (13) min c∈CE e∈E

c∈CV v∈V

c∈CG

where cost ∈ RG is the special cost resource. Of course, any other objective that is linear in the components can be formulated as well via an appropriate resource. 4.2 Routing There are several possible ways to model the OSPF routing of the demands. We used arc–ﬂow, path–ﬂow, and tree–based formulations in our practical experiments. In this article, we present a formulation that is based on binary arc–ﬂow variables. To model the directed OSPF traﬃc appropriately, we associate the two directed arcs (u, v) and (v, u) with each edge e = uv ∈ E and let A = {(u, v), (v, u) | uv ∈ E}. To simplify the notation, we use V o , E o , and Ao to denote the sets of operational nodes, edges, and arcs in operating state o, respectively. For each operating state o ∈ O and each traﬃc demand we use a standard ∈ formulation with binary arc–ﬂow variables, i.e., there is a variable ps,t,o a = 1 iﬀ arc a is in {0, 1} for all o ∈ O, s, t ∈ V o , a ∈ Ao , s = t, with ps,t,o a the routing path from s to t is operating state o. The ﬂow balance and edge capacity constraints then are ⎧ ⎪ ⎨1 u=s s,t,o ps,t,o − p = ∀ o ∈ O, s, t, u ∈ V o , and (14) −1 u = t a a ⎪ ⎩ + − a∈δA a∈δA 0 otherwise o (u) o (u)

ds,t · ps,t,o (u,v) ≤

s,t∈V o

c,cap c kuv zuv

∀ o ∈ O, (u, v) ∈ Ao .

(15)

c∈CE

In our application, there were no node capacity constraints necessary to model, but they can be included into the model in a straightforward way. The allow only symmetric routing, the inequalities

96

A. Bley and T. Koch t,s,o ps,t,o (u,v) = p(v,u)

∀ o ∈ O, s, t ∈ V o , (u, v) ∈ Ao

(16)

can be added to the formulation. A maximum admissible routing path length p¯s,t,o between s and t in operating state o can be enforced by the inequalities ps,t,o ≤ p¯s,t, o ∈ O, s, t ∈ V o . (17) a a∈Ao

4.3 OSPF Routing So far, we presented a mixed-integer linear programming model for a general survivable network design problem with single-path routing. In this section, we will show how to incorporate the special features of the OSPF routing protocol into this model. It is easy to see, that not all possible conﬁgurations of routing paths are realizable with a shortest path routing protocol. For some path conﬁgurations it is impossible to ﬁnd weights, such that all paths are shortest paths simultaneously. Many, rather complicated constraints have to be satisﬁed by a path conﬁguration to be realizable by some routing weights. Examples of such constraints are presented, for example, in [BAGL00], [BAG03], [Gou01], and [SKK00]. We will call a path conﬁguration admissible if it can be realized by a set of routing weights. In the following, we will present some simple necessary constraints for admissible path conﬁgurations and what inequalities these constraints impose on the path variables p. Unfortunately, a complete description of all admissible path conﬁgurations by inequalities (in a template-scheme) on the path variables is not known. But it is possible to decide in polynomial time by solving a linear program whether a given path conﬁguration is admissible and, if not, to generate an inequality that is valid for admissible path conﬁgurations but violated by the given one. We will present this linear program and show how it can be used to separate inequalities cutting oﬀ non-admissible path conﬁgurations. Subpath constraints The simplest constraints that must be satisﬁed by the routing paths are the subpath (or path monotony) constraints. Suppose we have a set of routing weights w and P s,t is a unique shortest path between s, t ∈ V , s = t, of length at least 2. Let v ∈ V be an inner node of P s,t . Then, if P s,v or P v,t denote the shortest paths between s, v and v, t, respectively, both paths must be subpaths of P s,t . Otherwise, P s,t o cannot be the unique shortest path between s in t, see Figure 2. Hence, the routing variables of all admissible path conﬁgurations must satisfy the subpath inequalities:

IP Approaches to Access and Backbone IP Network Planning

ps,v,o ≤ ps,t,o + 1− a a

ps,t,o a

97

∀ o ∈ O, s, t, v ∈ V o , a ∈ Ao .

(18)

∀ o ∈ O, s, t, v ∈ V o , a ∈ Ao .

(19)

− a ∈δA o (v)

s,t,o pv,t,o ≤ ps,t,o + 1− pa a a − a ∈δA o (v)

Note that, in general, the subpath inequalities are not facet deﬁning in this basic form but can be turned into facets by appropriate lifting. Nevertheless, in our branch-and-cut implementation, we only use the unlifted form of the subpath inequalities (18) and (19) together with the (still not facet-deﬁning) simple lifted version s,t,o ∀ o ∈ O, s, t, v ∈ V o , a ∈ Ao . (20) + pv,t,o ≤ ps,t,o +2 1− pa ps,t,o a a a − a ∈δA o (v)

These inequalities proved already suﬃciently tight to obtain good practical results, even without the complicated lifting procedures. Operating state coupling constraints The operating states coupling constraints are the simplest constraints between routing paths of diﬀerent operating states. Let o1 , o2 ∈ O, o1 = o2 , be two diﬀerent operating states. Suppose for some given routing weights the paths P s,t,o1 and P s,t,o2 are the unique shortest paths between s, t ∈ V o1 ,o2 , s = t, in operating states o1 and o2 , respectively. Furthermore, suppose that in operating state o1 no edge or node on P s,t,o2 fails and that in operating state o2 no edge or node on P s,t,o1 fails. Then both paths would remain feasible (s, t)-paths in both operating states. Since only one of them can be the shorter path, both must be identical (see Figure 3). For notational convenience, we deﬁne the following “trigger term”: ⎧ if oi = ∅, ⎪ ⎨0 s,t,oj s,t,oj s,t,oj if oi = uv ∈ E, ] := p(u,v) + p(v,u) [oi ∈ P ⎪ s,t,oj ⎩ p if oi = v ∈ V. a∈δ − (v)−oj a All admissible path conﬁgurations must satisfy the following operating state coupling inequalities: 2 1 ≤ ps,t,o + [o1 ∈ P s,t,o2 ] + [o2 ∈ P s,t,o1 ] ps,t,o a a ∀ o1 , o2 ∈ O, s, t ∈ V o1 ,o2 , a ∈ Ao1 ,o2 .

(21)

Also the operating state coupling inequalities are not facet deﬁning in this basic form in general, but can be turned into facets by a lifting procedure.

98

A. Bley and T. Koch t

t v u

u

s

s

Figure 2. Example of a violated subpath constraint

Figure 3. Example of a violated operating state coupling constraint

Computing routing weights Let w be routing weights that induce unique shortest path between all node pairs in all operating states. For each o ∈ O and s, t ∈ V − o denote by o πs,t ∈ IR+ the distance between s and t in operating state o with respect to w. It is well known from shortest path theory, that the metric inequalities o o πs,u + w(u,v) − πs,v ≥ 0,

∀ o ∈ O, s ∈ V o , (u, v) ∈ Ao

are satisﬁed and hold with equality if and only if (u, v) is on a shortest (s, v)path in operating state o. Since scaling the weights does not change the shortest paths, we can ﬁnd a scaling factor α > 0 such that for weights αw the left-hand-side of all strict inequalities is at least 1. Now suppose we are given integer variables p that deﬁne a path conﬁguration, that is not necessarily admissible but satisﬁes all subpath constraints. Consider the following linear program, where w and π are variables (while p is ﬁxed): min

w(u,v)

(22)

(u,v)∈A o o πs,u + w(u,v) − πs,v =0 o πs,u

+ w(u,v) −

o πs,v

≥1

wa ≥ 1 o ≥0 πs,v

∀ o ∈ O, s ∈ V o , (u, v) ∈ Ao with ps,v,o (u,v) = 1

(23)

∀ o ∈ O, s ∈ V , (u, v) ∈ A , with

(24)

o

o

ps,v,o (u,v)

=0

∀a∈A

(25)

∀ o ∈ O, s, v ∈ V o , s = v

(26)

It is easy to see that this linear program has a (primal feasible) solution if and only if the given variables p correspond to an admissible path conﬁguration and, if this is the case, are unique shortest paths with respect to the weights w. Otherwise, if the linear program has no solution, there are no routing weights inducing the given conﬁguration of paths, i.e., the conﬁguration is not admissible. In that case, the linear program contains an infeasible system

IP Approaches to Access and Backbone IP Network Planning

99

of rows I = I1 ∪I0 consisting of some equalities I1 ⊆ {(s, (u, v), o) | ps,v,o (u,v) = 1} and inequalities I0 ⊆ {(s, (u, v), o) | ps,v,o = 0}. Since I is an infeasible system, (u,v) for no admissible path conﬁguration all ps,v,o with (s, (u, v), o) ∈ I can have 1 (u,v) s,v,o the value 1 and all p(u,v) with (s, (u, v), o) ∈ I0 in I can have the value 0 simultaneously. Hence, the inequality s,v,o s,v,o p(u,v) − p(u,v) ≤ |I1 | − 1, (27) (s,(u,v),o)∈I1

(s,(u,v),o)∈I0

which is violated by the given non-admissible path conﬁguration p, is valid for all admissible path conﬁgurations and can be added to the problem formulation. Note that, in general, these infeasibility inequalities (27) are not facet deﬁning. They cut oﬀ only one speciﬁc non-admissible sub-conﬁguration of paths. To use them eﬃciently in a branch-and-cut framework, they have to be strengthened. Otherwise they will become active only too deep in the branching tree to be useful. In our implementation, we try to reduced their support as far as possible. First, we use only irreducible infeasibility systems of rows (set-inclusion minimal infeasibility systems), second, we apply some easy reductions that exploit the problem structure. For example, such an infeasibility systems usually contains both an entry (s, (u, v), o) ∈ I1 and one s,v,o or more entries (s, (u , v), o) ∈ I0 . Since, if ps,v,o (u,v) = 1, the variables p(u ,v) are already forced to 0 by the ﬂow balance constraints (14), they need not be included in the infeasibility system’s inequality (27). If the OSPF routing must be symmetric, like in the G-WiN, one usually wants to provide equal routing weights for both directions of an edge. This is achieved by adding the equalities w(u,v) = w(v,u)

uv ∈ E

(28)

to the linear system (23)–(26). 4.4 Utilization minimization The variables and constraints presented so far are suﬃcient to model the cost minimization variants of the backbone network design problem as a mixed-integer linear program. For the utilization minimization variant, we need some further variables and constraints. For each load group j ∈ J, we introduce a continuous variable λj ∈ IR, 0 ≤ λj ≤ λj , for the maximum utilization attained by a component c on some edge e in operating state o with (c, e, o) ∈ j. The upper bound λj for that variable is used to specify initially a maximum feasible utilization. The objective is min α j λj , (29) j∈J

100

A. Bley and T. Koch

where αj is the non-negative objective coeﬃcient associated with load group j. For each j ∈ J, e ∈ E, and c ∈ C with (c, e, o) ∈ j for some o ∈ O, we introduce variables for the maximum usable routing capacity yec,j ≥ 0. The value of these variables is the amount of ﬂow that can be routed through the component in this operating state without increasing the current maximum utilization for the load group. The maximum usable capacities cannot be larger than the original components capacity, if installed, times the maximum attained utilization of the load group. This can be expressed by the following two variable-upper-bound inequalities: yec,j ≤ λj kec,cap yec,j

≤

c zuv (λj kec,cap )

∀ j ∈ J, c ∈ C, e ∈ E with (c, e, o) ∈ j for o ∈ O (30) ∀ j ∈ J, c ∈ C, e ∈ E with (c, e, o) ∈ j for o ∈ O (31)

In the capacity constraints (15) the original capacities provided by the installed edge components are replaced by the maximum usable capacities. This yields for utilization minimization the new capacity constraints s,t∈V

ds,t · ps,t,o (u,v) ≤ o

c,j(c,uv,o) yuv

∀ o ∈ O, (u, v) ∈ Ao ,

(32)

c∈C:j(c,uv,o)=∅

where j(c, uv, o) denotes the load group that (c, uv, o) belongs to or ∅, if it belongs to no load group.

5 Computation In this section we report on the methods used to solve the models shown in the previous sections. Also, we explain why these algorithms where chosen. 5.1 Access network In all cases we studied, the access network planing problem was highly restricted. This is not surprising, because usually the locations that can be chosen as backbone or intermediate nodes are limited due to technical, administrative and political reasons. The biggest obstacle always was the unavailability of sensible cost data for the potential connections. Also, in all projects we encountered some peculiar side constraints that were speciﬁc to the project. Therefore, we needed a very ﬂexible approach and used a general IP-Solver. And, at least for the data used in the projects, it was possible with some preprocessing to solve the access network planing problems with CPLEX [CPL01], SIP [Mar98], or a comparable MIP-Solver. Unfortunately, the solution times heavily depend on the actual data and objective function, especially on the relation between connection costs and

IP Approaches to Access and Backbone IP Network Planning

101

assembly stage costs and on how tight the resource capacities are. In the G-WiN all locations were already built and the number of backbone and intermediate nodes to choose was ﬁxed as a design criteria. So only the connection costs had to be regarded in the objective function. In one of the other projects this situation reversed: the transportation network was already installed and only the cost for the equipment at the nodes had to be considered. In Section 6 we show some examples on how diﬃcult the problems can get, if we allow all connections between the nodes and have several tight resources. 5.2 Backbone network In order to solve the backbone network planning problem a special branchand-cut algorithm based on the MIP formulation presented in Section 4 was developed and implemented in C++. CPLEX [CPL01] or SoPlex [Wun96] can be used to solve the linear programming relaxations, the branch-and-cut tree is managed by our software. The initial LP relaxation contains all component variables and all resource constraints (10), (11), (12). If the objective is utilization minimization, all load group variables and all usable capacity variables are in the initial LP relaxation as well as the associated variable-upper-bound constraints (30) and (31). If the binary arc–ﬂow variables of all demands and all operating states are considered, the LP relaxation becomes too large to be used. We only considerer the arc–ﬂow variables for all demands in the normal operating state and for the biggest demands in the node failure states, i.e., only for demands with a value of least 50% of the biggest demand value. Arc–ﬂow variables for edge failure states are not in the formulation. Most edge failure states are dominated by some node failure states, already. Numerous experiments with real world data revealed that using only this restricted variable set is a good tradeoﬀ between the quality of the relaxation and the time necessary to obtain good solutions. The ﬂow balance constraints (14) and, the edge capacity constraints (15) or (32) are in the initial formulation, if at least one arc–ﬂow variable in their support is. The routing symmetry constraints (16), of course, are not explicitly generated in practice. Instead, arc–ﬂow variables are generated only for one of the directions s to t or t to s and then used to model the paths for both directions. The path length inequalities (17), if necessary, are also in the initial LP. Clearly, for all arcs that do not belong to any short (s, t)-path or those starting in t or ending in s, the corresponding arc–ﬂow variables must be 0 and are not included the model. Although there is only a polynomial number of subpath constraints (18), (19), and (20) and operating state coupling constraints (21), we generate these inequalities only if violated to keep the size of the relaxation as small as possible. Also, we only use coupling inequalities (21) that link the normal operating state to some failure state. Separating coupling inequalities between two failure states is too time consuming and many of these inequalities are

102

A. Bley and T. Koch

already implied by the coupling constraints between the normal operating state and the two failure states. At each node of the branch-and-cut tree we iteratively separate violated inequalities and resolve the LP relaxation for a limited number of times (≤ 5), as long as there is substantial increase (≥ 1%) in the optimum LP-value. In each such iteration we separate the following inequalities in the order given below, until at most 50 inequalities are separated: 1. subpath inequalities (18), (19), and (20) for the normal operating state, 2. induced cover inequalities [Boy93] for the knapsacks given by the edge capacity constraints (15) or (32) in the normal operating state1 , 3. cover inequalities [BZ78, NV94] for the knapsacks given by the resource constraints (10), (11), and (12), 4. subpath inequalities (18), (19), and (20) for failure operating states, 5. induced cover inequalities for the knapsacks given by the edge capacity constraints (15) or (32) for failure operating states, 6. operating state coupling inequalities (21) between the normal and some failure operating state, and 7. IIS inequalities (27). This strategy separates ﬁrst those inequalities that are computationally easier to separate and give the best improvement in the LP bound. It proved to be well suited in our experiments. If, at some branch-and-cut node, the ﬁnal LP solution is fractional, our basic strategy is to branch on a “good” arc–ﬂow variable of a big demand. From those demands, whose value is no less than a certain percentage (0.9) of the biggest demand with fractional arc–ﬂow variables, we choose the arc– ﬂow variable whose fractional value is closest to a target value (0.8). Flow variables for the normal operating state are preferred: When choosing the next branching variable we divide all demands by a factor of 5 for failure state ﬂow variables. At every branch-and-cut node with depth 3k, k ∈ IN, we branch on a fractional component variable, if there is one. In such a case, global components are always preferred to edge and these to node components, and then the fractional variable whose component provides the biggest routing capacity is chosen. The next node to explore in the branch-and-cut tree is selected by a mix of best-dual-bound and dive strategy. For a given number of iterations (32) we choose the node with the best dual bound. Thereafter, we dive for a good feasible solution in the tree, starting at a best-dual-bound node and then always choosing the child node whose arc–ﬂow variable was set to 1 or edge component variable was rounded up in the last branching. We do not backtrack on the dive path. If a feasible solution was found or the last child node is infeasible, we switch back to the best-dual-bound strategy and continue. 1

The precedence constraints in these knapsacks are the subpath constraints among the paths using the edge.

IP Approaches to Access and Backbone IP Network Planning

103

Two primal heuristics are used at the branch-and-cut nodes. They are applied only at nodes with depth 2k , k ∈ IN, reﬂecting the fact that the “important“ branches, that ﬁx big demands’ ﬂow variables or large edge components’ variables, are performed ﬁrst, while deeper in the branch-and-cut tree only small demands’ ﬂow variables are ﬁxed. Both heuristic ﬁrst generate initial routing weights and compute a shortest path routing with respect to these weights. Then, a (mixed-) integer programming solver is used to compute a hardware conﬁguration minimizing the original cost or utilization objective function plus the total violation of edge capacity constraints by this routing. These integer programs are fairly easy to solve in practice, they only contain the component variables and resource constraints. In a last step, the initially routing weights are adopted to the topology computed by the integer program and perturbed to make all shortest paths unique. Our ﬁrst heuristic takes a linear combination of the dual variables of the edge capacity constraints (15) or (32) over the diﬀerent operating states as initial routing weights. The second heuristic utilizes the linear program (22)–(26) to compute initial routing weights. In contrast to the IIS inequality separator, the heuristic initializes the metric (in-)equalities (23) and (24) only for those arc–ﬂow variables ps,v,o (u,v) that are integer or near-integer (≤ 0.1 or ≥ 0.9) in the current fractional solution. All other metric inequalities (23) or (24) are “deactivated” by setting their sense and right hand side to “≤ 0”. If this LP has a feasible solution, the computed routing weights induce at least all the near-integer routing paths at the current branch-and-cut node. If it has no solution, we generate an IIS inequality (27) cutting oﬀ a non-admissible subconﬁguration of the near-integer paths. The heuristic works as a separator, too. If the objective is to minimization the average utilization, we apply additional bound strengthening techniques to speed up the computation. For ∗ each load group j, we store an upper bound λj on the corresponding maximum utilization of the optimal solution and update these bounds during ∗ the optimization process. Initially, λj = λj . Whenever a new feasible solution with average utilization λ∗ is found, we tighten these bounds by setting ∗ ∗ λj := min{λj , λ∗ /αj } and we update the variable upper bound constraints ∗ (31) for these new bounds λj . Note that this operation may exclude feasible but non-optimal solutions from the remaining solution space; but we are interested only in the optimal solution and not in the entire feasible solution space. Indirectly, this bound strengthening also tightens the capacity constraints (32). Thus, in the following separation attempts we may ﬁnd much stronger cover inequalities for the associated knapsacks than we would ﬁnd without the bound strengthening. But, again, these inequalities are valid only for the optimal solution, not for all feasible solutions of the original problem.

104

A. Bley and T. Koch

6 Results The application of our main interest was the planning of the German research network G-WiN. The G-WiN started in 2000 with about 759 demand locations, see Figure 4, 261 of them could be used as intermediate or backbone nodes, see Figure 5. It was decided from the administration, that there should be about 10 backbone nodes and two intermediate node per backbone node. Several sets of demand data were generated. For the initial access and backbone network planning accounting data from the predecessor network B-WiN was used, later, for the operational replanning of the G-WiN backbone network, accounting data in the latest G-WiN network were available. These traﬃc measurements were scaled to anticipate future demands. In the access network planning step, several scenarios were optimized and assessed according to the stability of the solutions for increasing demands. The backbone nodes were then successively selected. For this reason we can not provide running times for the original problems. We generated some data sets with the original data and varying costs and resources. As can be seen in Table 1, instances of identical size can vary strongly in running time (CPLEX 8.0 MIP solver on an 2.5 GHz Pentium4, Linux). The harder instances were always those with scarcer resources, a higher number of assembly stages, or a more diﬃcult cost structure. The solution for the G-WiN access network planning problem is shown in Figure 6. It should be noted that access link cost functions that do not depend on the distance can lead to very strange-looking optimal solutions. We encountered this, when we had costs that were constant for a certain range. These solutions may be diﬃcult to verify and crossing access network link may be hard to explain to a practitioner. For the G-WiN backbone planning, the underlying network topology was a complete graph on 10 nodes, corresponding to the virtual private STM backbone network inside Germany, plus one additional node linked to two of these ten nodes, corresponding to the ‘uplink’ to other networks via two gateways. The capacities on the two gateway links and the ‘conﬁguration’ of the uplink-node were ﬁxed. On each of the other links, one capacity conﬁguration from a given set of conﬁgurations could be installed. Depending on the speciﬁc problem instance, these sets represented various subsets of the STM-hierarchy. Several types of IP router cards and SDH cards had to be considered as components at the nodes. In order to install a certain capacity conﬁguration on an edge, appropriate IP router and SDH interfaces must be provided by the router and interface cards at the two terminal nodes. Additional IP-interfaces had to be provided at the nodes in order to connect the backbone level to the access level of the network. Of course, every node could hold only a limited number of router and interfaces cards. Besides these local restrictions, for both, edge capacities components as well as node interface and router card components, there were also several global restrictions. For

IP Approaches to Access and Backbone IP Network Planning

100 km

100 km

Figure 4. G-WiN-1 sites with aggregated traﬃc demands

100 km

Figure 5. G-WiN-1 sites, potential level-1/2 sites as triangles

100 km Kiel

Kiel Rostock

Rostock

Hamburg

Hamburg

Oldenburg

Oldenburg

Hannover

Berlin

Hannover

Braunschweig Magdeburg

Bielefeld

Berlin Braunschweig Magdeburg

Bielefeld

Goöttingen

Essen

Goöttingen

Essen Leipzig

Leipzig Dresden

Aachen

105

Marburg

Koöln

Dresden Aachen

Ilmenau

Marburg

Koöln

Ilmenau

Frankfurt Darmstadt

Frankfurt Darmstadt Wuürzburg

Wuürzburg Erlangen

Erlangen

KaiserslauternHeidelberg

KaiserslauternHeidelberg

Karlsruhe

Regensburg

Karlsruhe

Stuttgart

Regensburg Stuttgart

Augsburg

Augsburg

Muünchen

Freiburg

Figure 6. G-WiN-1 access network solution

Muünchen

Freiburg

Figure 7. G-WiN-1 core backbone and level-1–level-2 links

106

A. Bley and T. Koch Name DFN1 DFN2 DFN3 TV1 TV2 SL1 SL2 KG1 KG2 BWN1 BWN2

Vars Cons Nonz 6062 5892 6062 1818 1818 7188 7188 8230 8230 13063 13063

771 747 771 735 735 3260 3260 5496 5496 9920 9920

B&B

12100 235 11781 13670 12100 >40000 5254 0 5254 24 27280 267 27280 7941 42724 511 42724 3359 57688 372 57688 18600

Time (h:mm:ss) 6 27:05 >3:00:00 <1 <1 13 1:26 2:56 3:25 1:48 5:54

Table 1. Results for the access network planning problem

example, the totally installed edge capacity and the number of installations of each edge capacity type and each card type were bounded. Besides these rather natural constraints, there were also more complicated ones limiting the number of certain reconﬁgurations for the operational network replanning. These constraints were modeled via artiﬁcial global and local resources that account for changes in the number of installed hardware components. Finally, the routing path length for each traﬃc demand was bounded by 2 or 3 hops in the normal operating state. For the design of the initial G-WiN, the objective was to minimize the total cost of the node and edge hardware components. It should be noted that the costs associated with the edge capacity types and IP router or SDH interface cards did not depend on the speciﬁc edge or node where these components were installed. For the operational replanning of the G-WiN backbone, the objective was to minimize an average of the maximum utilization values for the following four load groups: all edge components on national edges in the normal operating state, all edge components on the two gateway edges in the normal operating state, all edge components on national edges in the failure operating states, and all edge components on the two gateway edges in the failure operating states. The cost optimal G-WiN-1 backbone, together with the backbone-intermediate links, is shown in Figure 7. Typical computational results for utilization minimization and cost minimization problems are presented in Tables 2 and 3, respectively. The suﬃx fail or nos in the names indicate whether or not failure operating states are considered. The computations were performed on an 1.7 GHz Pentium4, running the Linux operating system and CPLEX 7.5 as LP solver, with a total time limit of two hours. The column LP displays the values of the initial linear programming relaxations, LB and UB show the best lower bounds and the best solutions after the branch-and-cut algorithm. Time and B&B report the time and the number of evaluated branch-and-cut nodes until the optimal solution was found or the time limit was reached.

IP Approaches to Access and Backbone IP Network Planning Problem DFN2 nos DFN3 nos DFN4 nos Atlanta nos EP98 nos NSF nos DFN2 fail DFN3 fail DFN4 fail Atlanta fail EP98 fail NSF fail

LP

LB

UB

0.48 0.05 0.03 0.63 0.18 0.68 2.61 0.10 0.09 1.26 0.78 0.73

0.57 0.10 0.05 0.83 0.23 0.68 3.06 0.27 0.13 1.67 0.96 0.74

0.57 0.10 0.05 0.83 0.23 0.68 3.06 0.46 0.14 1.67 1.27 0.74

Time (h:mm:ss) 38 8 18 7 1:29 23:48 14:17 2:00:00 2:00:00 14 2:00:00 2:00:00

107

B&B 775 48 50 36 204 3453 1350 140 1284 70 1406 2078

Table 2. Results for backbone utilization minimization problems

Problem DFN2 nos DFN3 nos DFN4 nos Atlanta nos EP98 nos NSF nos DFN2 fail DFN3 fail DFN4 fail Atlanta fail EP98 fail NSF fail

LP

LB

UB

72.4 240.0 272.0 56.4 165.0 229.5 72.4 320.0 272.0 65.4 179.5 318.6

76.0 240.0 272.0 104.4 176.4 324.4 76.0 320.0 272.0 126.5 202.3 389.3

76.0 240.0 272.0 107.0 184.2 475.0 76.0 320.0 272.0 175.0 309.2 735.0

Time (h:mm:ss) 8 6 3 2:00:00 2:00:00 2:00:00 3:07 24:12 56 2:00:00 2:00:00 2:00:00

B&B 21 23 3 3073 2114 198 19 278 5 960 623 115

Table 3. Results for backbone cost minimization problems

The group of DFN instances are real world planning problems from the German research network G-WiN, as described above. The other group of problem instances, i.e., Atlanta, EP98, and NSF, are based on real network topologies with 15 nodes and 22 edges, 13 nodes and 24 edges, and 14 nodes and 21 edges, respectively, and real world traﬃc data. To these data sets we added artiﬁcial capacity structures similar to the STM-hierarchy, but scaled according to the demand values in the data sets. These instances have neither node and global components nor restrictions. Looking at the results for the utilization minimization in Table 2, we see that the smaller instances, especially those where no failure operating states were considered, could be solved to optimality within reasonable times. For bigger instances, which also consider failure operating states, the algorithm

108

A. Bley and T. Koch

in general ﬁnds very good solutions (which could not be improved by other heuristic methods) within the given time bound, although it sometimes fails to prove a reasonable lower bound. We want to point out that the best solutions shown in the tables were identiﬁed very quickly, which was one of our main goals in the design of the primal heuristics and the branching strategy. Typically, the algorithm improves the dual bound pretty fast during the ﬁrst iterations, when branches are performed on the biggest demands and edge components. Later, after a few hundred iterations, there is almost no further improvement in the dual bound. The same general behavior was observed for cost minimization problems. Comparing the results for utilization minimization and cost minimization, the DFN instances for cost minimization are much easier to solve than their utilization counterparts. The reason is that in these instances the traﬃc demands are relatively small compared to the potential edge capacities and that the global reconﬁguration restrictions limit the number of feasible topologies. Thus, ﬁnding a good routing is not so important in the cost minimization variants of these instances. For the other problem instances, it seems that the utilization minimization variants are easier to solve than their respective cost minimization counterparts. This is a result of the bound strengthening described in the previous section. This technique, which can be applied only for utilization minimization, leads to a substantial reduction of the remaining search space whenever a new solution is found, and thus speeds up both the ﬁnding of the optimal solution and the veriﬁcation of its optimality.

Acknowledgments We would like to thank DFN and BMBF for funding this project and our colleagues at DFN, in particular Dr. M. Pattloch, for fruitful discussions and excellent cooperation.

References [AGW97]

[ALLQ96]

[BAG03] [BAGL00] [BCGT98]

D. Alevras, M. Gr¨ otschel, and R. Wess¨ aly, Capacity and survivability models for telecommunications networks, Tech. Report SC 97-24, Konrad-Zuse-Zentrum f¨ ur Informationstechnik, Berlin, 1997. K. Aardal, M. Labb´e, J. Leung, and M. Queyranne, On the two-level uncapacitated facility location problem, INFORMS Journal on Computing 8 (1996), 289–301. W. Ben-Ameur and E. Gourdin, Internet routing and related topology issues, SIAM Journal on Discrete Mathematics 17 (2003), 18–49. W. Ben-Ameur, E. Gourdin, and B. Liau, Internet routing and topology problems, Proceedings of DRCN2000 (Munich), 2000. D. Bienstock, S. Chopra, O. G¨ unl¨ uk, and C-Y. Tsai, Minimum cost capacity installation for multicommodity network ﬂows, Mathematical Programming 81 (1998), 177–199.

IP Approaches to Access and Backbone IP Network Planning [BGLM03]

109

S. Borne, E. Gourdin, B. Liau, and A. Mahjoub, Design of survivable IP-over-optical networks, Proceedings of the First International Network Optimization Conference (INOC 2003), Paris, 2003, pp. 114–118. [BGW98] A. Bley, M. Gr¨ otschel, and R. Wess¨ aly, Design of broadband virtual private networks: Model and heuristics for the B-WiN, Robust Communication Networks: Interconnection and Survivability (N. Dean, D. F. Hsu, and R. Ravi, eds.), DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 53, AMS, 1998, pp. 1–16. [BK00] A. Bley and T. Koch, Optimierung des G-WiN, DFN-Mitteilungen 54 (2000), 13–15. [BKW04] A. Bley, T. Koch, and R. Wess¨ aly, Large-scale hierarchical networks: How to compute an optimal architecture?, Proceedings of Networks 2004 (Vienna), VDE Verlag, 2004, pp. 429–434. [Ble03] A. Bley, A Lagrangian approach for integrated network design and routing in IP networks, Proceedings of the First International Network Optimization Conference (INOC 2003), Paris, 2003, pp. 107–113. , On the approximability of the minimum congestion unsplit[Ble05] table shortest path routing problem, Proceedings of 11th Conference on Integer Programming and Combinatorial Optimization (IPCO 2005), Berlin, 2005, pp. 97–110. , Inapproximability results for the inverse shortest paths problem [Ble07a] with integer lengths and unique shortest paths, Networks 50 (2007), 29–36. , Routing and capacity optimization for IP networks, Ph.D. the[Ble07b] sis, Technische Universit¨ at Berlin, 2007. [Boy93] E. A. Boyd, Polyhedral results for the precedence-constrained knapsack problem, Discrete Applied Mathematics 41 (1993), 185–201. [BRRT05] L. Buriol, M. Resende, C. Ribeiro, and M. Thorup, A hybrid genetic algorithm for the weight setting problem in OSPF/IS-IS routing, Networks 46 (2005), 36–56. [BRT04] L. Buriol, M. Resende, and M. Thorup, Survivable IP network design with OSPF routing, Optimization Online (2004). [BZ78] E. Balas and E. Zemel, Facets of the knapsack polytope from minimal covers, SIAM Journal on Applied Mathematics 34 (1978), 119–148. [CPL01] ILOG CPLEX Division, 889 Alder Avenue, Suite 200, Incline Village, NV 89451, USA, ILOG CPLEX 7.5 reference manual, 2001, Information available at http://www.cplex.com [CW92] J. Crowcroft and Z. Wang, Analysis of shortest-path routing algorithms in a dynamic network environment, ACM SIGCOM Computer Communication Review 22 (1992), 63–71. [ERP01] M. Ericsson, M. G. C. Resende, and P. M. Pardalos, A genetic algorithm for the weight setting problem in OSPF routing, Tech. report, AT&T Labs Research, 2001. [FGL+ 00] A. Feldmann, A. Greenberg, C. Lund, N. Reingold, and J. Rexford, NetScope: Traﬃc engineering for IP networks, IEEE Network 14 (2000), 11–19. [FMdS+ 96] C. E. Ferreira, A. Martin, C. C. de Souza, R. Weismantel, and L. A. Wolsey, Formulations and valid inequalities of the node capacitated graph partitioning problem, Mathematical Programming 74 (1996), 247–266.

110

A. Bley and T. Koch

[FT00] [FT04] [GMS95]

[Gou01] [Hal96]

[HY04] [KM98] [KM05] [LW93]

[Mar98] [MD01]

[MF90] [NV94]

[PLPL00]

[Pry02]

[SKK00]

[Wun96]

B. Fortz and M. Thorup, Internet traﬃc engineering by optimizing OSPF weights, Proceedings of IEEE INFOCOM 2000, 2000. , Increasing internet capacity using local search, Computational Optimization and Applications 29 (2004), 13–48. M. Gr¨ otschel, C. L. Monma, and M. Stoer, Design of survivable networks, Handbooks in Operations Research and Management Science, vol. Network Models, ch. 10, pp. 617–672, North-Holland, 1995. E. Gourdin, Optimizing internet networks, OR/MS Today (2001), 46–49. L. Hall, Experience with a cutting plane algorithm for the capacitated spanning tree problem, INFORMS Journal on Computing 8 (1996), 219–234. K. Holmberg and D. Yuan, Optimization of Internet protocol network design and routing, Networks 43 (2004), 39–53. T. Koch and A. Martin, Solving Steiner tree problems in graphs to optimality, Networks 32 (1998), 207–232. H. Kerivin and A. Mahjoub, Design of survivable networks: A survey, Networks 46 (2005), 1–21. F. Y. S. Lin and J. L. Wang, Minimax open shortest path ﬁrst routing algorithms in networks suporting the SMDS service, Tech. report, Bell Communications Research, 1993. A. Martin, Integer programs with block structure, Habilitations-Schrift, Technische Universit¨ at Berlin, 1998. S. Melkote and M. S. Daskin, Capacitated facility location/network design problems, European Journal of Operations Research 129 (2001), 481–495. P. Mirchandani and R. Francis (eds.), Discrete location theory, Wiley, New York, 1990. G. L. Nemhauser and P. H. Vance, Lifted cover facets of the 0-1 knapsack polytope with GUB constraints, Operations Research Letters 16 (1994), 255–263. K. Park, K. Lee, S. Park, and H. Lee, Telecommunication node clustering with node compatibility and network survivability requirements, Management Science 46 (2000), 263–374. M. Prytz, On optimization in design of telecommunications networks with multicast and unicast traﬃc, Ph.D. thesis, Royal Institute of Technology, Stockholm, Sweden, 2002. D. Staehle, S. K¨ ohler, and U. Kohlhaas, Towards an optimization of the routing parameters for IP networks, Tech. report, Department of Computer Science, University of W¨ urzburg, 2000. R. Wunderling, Paralleler und objektorientierter simplex, Tech. Report TR 96-09, Konrad-Zuse-Zentrum f¨ ur Informationstechnik, Berlin, 1996, Information available at http://www.zib.de/ Optimization/Software/Soplex

An Adaptive Fictitious-Domain Method for Quantitative Studies of Particulate Flows Sebastian B¨onisch Numerical Analysis Group, IWR, University of Heidelberg Im Neuenheimer Feld 294, D-69120 Heidelberg, Germany [email protected] Abstract We present an adaptive ﬁctitious-domain method to simulate the motion of rigid particles in viscous ﬂuids. Our algorithm is based on the stress-DLM approach proposed by Patankar et al. ([PSJ00, Pat01, SP05]). The consequent use of adaptivity (e.g. locally adapted meshes, adaptive quadrature) makes our method very accurate and eﬃcient, especially in the case of moderate particle volume fractions. Quantitative studies of particulate ﬂow problems become therefore feasible. We validate our method by solving a well-known benchmark problem. The savings achieved by adaptivity are huge. A benchmark problem for multiple particles is proposed. The problem of accurate resolution of non-smooth particle geometries is addressed.

1 Introduction The development of eﬃcient computational techniques for solid-liquid ﬂows has been the subject of intensive research eﬀorts during the past decade (cf. e.g. [GPH01, SP05, WT06] and references therein). This marked interest stems from the fact that the understanding of particulate ﬂows is a main issue in many applied problems ([WT06] and references therein). For a survey of the diﬀerent numerical approaches proposed for this problem, we refer the reader, e.g., to [SP05]. In this contribution we propose a numerical scheme for rigid particulate ﬂows which can be seen, to some extent, as a combination of Eulerian and Langrangian methods: On one hand, we borrow from Eulerian methods the idea of solving the ﬂuid equations on a ﬁxed, simple-shaped “ﬁctitious” domain. On the other hand, we let the mesh reﬁnement follow the particle boundaries. The resulting method is highly accurate and, at the same time, very economic. This will be demonstrated by solving selected test problems.

112

S. B¨ onisch

2 Adaptive Fictitious-Domain Method In this section we describe our numerical method. It is based on the socalled “stress-DLM formulation” introduced in [PSJ00], which we are going to describe ﬁrst. 2.1 The stress-DLM formulation Let Ω be the computational domain which includes both the ﬂuid and the particle domain. Let P (t) be the particle domain. In order to facilitate notations, we assume that there is only one moving particle in a Newtonian ﬂuid. The formulation can be easily generalized beyond these assumptions. The governing equations for ﬂuid motion are given by: ρf (∂t u + (u · ∇)u) + ∇p − µ∆u = ρf g

in Ω\P (t),

∇ · u = 0 in Ω\P (t), u = u∂Ω (t) on ∂Ω(t), u = ui

on

(1)

∂P (t),

u|t=0 = u0 (x) in Ω\P (0), where ρf is the ﬂuid density, u is the ﬂuid velocity, g is gravity, n is the unit outward normal on the particle surface, ui is the velocity at the ﬂuid-particle interface ∂P (t) and u0 is the initial velocity. Patankar et al. ([PSJ00]) treated the particle as a ﬂuid subject to an additional constraint to impose the rigidity. The governing equations for particle motion are: ρs (∂t u + (u · ∇)u) + ∇p − µ∆u = ρs g in P (t), ∇ · u = 0 in P (t), ∇ · (D[u]) = 0 in P (t),

(D[u]) · n = 0 on ∂P (t), u = ui on ∂P (t),

u|t=0 = u0 (x)

(2)

in P (0),

where ρs is the particle density. Equation (2)3 represents the rigidity constraint which sets the deformation tensor, D[u] := (∇u + ∇uT )/2, in the particle domain equal to zero. The momentum and continuity equations applicable in the entire domain can be written as: ρ (∂t u + (u · ∇)u) + ∇p − µ∆u = ρg + f ∇ · u = 0 in

where ρ = ρ(x) =

ρf ρs

in Ω\P , in P,

in Ω, Ω,

(3)

(4)

An Adaptive Fictitious-Domain Method for Particulate Flows

113

and f is the additional term due to the rigidity constraint (2)3 in the particle domain. More speciﬁcally, f = ∇ · D[λ], where λ is the Lagrange multiplier needed to enforce the rigidity constraint, cf. [PSJ00]. 2.2 Numerical scheme We now solve equation (3) by means of the fast splitting scheme proposed by Patankar et al. ( [Pat01], [SP05]). For the reader’s convenience we shall brieﬂy recall this scheme. Further details concerning the solution of the individual substeps and the incorporation of adaptivity are presented in the next section. Patankar’s fast projection scheme consists of two substeps: (I) Determine intermediate velocity and pressure ﬁelds u ˆ and pˆ by solving the following equations in the entire domain Ω: n ρ uˆ−u u · ∇)ˆ u + ∇ˆ p − µ∆ˆ u = ρg in Ω, ∆t + (ˆ (5) ∇·u ˆ = 0 in Ω. Set un+1 = u ˆ and pn+1 = pˆ in the ﬂuid domain. Clearly, in general, u ˆ is not a rigid body motion inside P . ˆ onto a rigid body (II) Determine un+1 in the particle domain by projecting u motion, i.e. solve n+1 u −u ˆ ρs = f in P. (6) ∆t To solve equation (6), one needs knowledge of f . Following [SP05], f can be obtained by imposing as an additional condition that the total linear and angular momentum in the particle domain should be conserved. To this end, we ﬁrst split u ˆ as u ˆ = ur + u , where MU =

ur = U + ω × r,

r × ρs u ˆ dx.

ρs u ˆ dx and Ip ω = P

(7)

P

Since the total linear and angular momentum should be conserved in the projection step, set un+1 = ur in the particle domain. This corresponds to f = −(ρs u )/∆t. Remark 1. The algorithmic realization of the projection step (6) is very cheap. However, the accuracy of the integrations can be argued since the particle boundaries and the underlying grid do not match in general. We address this point in Sect. 2.3.

114

S. B¨ onisch

2.3 Algorithmic details In this subsection we wish to give more detailed information about the algorithmic realization of the splitting scheme described in Sect. 2.2. In particular, we are going to comment on how we incorporate the concept of adaptivity into the scheme. Locally reﬁned meshes When solving PDEs numerically, the use of locally adapted meshes often leads to dramatic savings in CPU time. Basically, two approaches of local mesh adaptation can be distinguished: the heuristic approach and the systematic approach via a-posteriori error estimation and control. In the case of particulate ﬂows at moderate Reynolds numbers, we expect the error to be much larger near the particle boundaries than in the far-ﬁeld region. Based on this heuristic criterion, we reﬁne the grid around the particle boundaries in several stages, see Fig. 1. It is clear that huge savings in terms of number of grid cells are possible in comparison to uniform grid reﬁnement. Whether further savings are possible by employing a systematic error estimation still has to be examined (see also [BHR03]).

Figure 1. Left: A typical computational grid for the simulation of a single moving particle of circular shape (see also Sect. 3.1). Several reﬁnement stages lead to a discretization which is both, economic and accurate. Right: A “selective” quadrature rule is used to integrate functions over a particle domain. A summed Newton-Cotes rule is employed for boundary cells, while for all other cells a standard Gauss formula of high order is used

An Adaptive Fictitious-Domain Method for Particulate Flows

115

Boundary approximation As mentioned in Sect. 2.2 (Remark 1), the projection step (6) is very cheap. On the other hand, the accuracy of the integration (7) cannot be expected to be high since the particle boundaries and the underlying grid do not match. Of course, our local reﬁnement strategy increases the accuracy. Another important improvement can be achieved by using a “selective quadrature rule”: By a “boundary cell” we wish to understand a cell which has non-empty intersection with the of a particle. We want to integrate numeri4 4 boundary cally P f (x) dx = i Ki χP (x)f (x) dx, where {Ki } denotes the mesh cells of a triangulation. Since χP (x)f (x) is, in general, a non-smooth function on boundary cells, it makes no sense to use a quadrature rule of high order for these cells. So while standard Gauss formulae of high order are well-suited for non-boundary cells, it is better to employ a summed Newton-Cotes formula for boundary cells. This idea is also depicted in Fig. 1 and validated quantitatively in Table 1. Table 1. This table shows the discrete volume of a disk P with diameter d = 0.25. The exact value is (d/2)2 π ≈ 0.0490874. The result of the numerical integration of 4 1 dx with two diﬀerent quadrature rules is shown: volhG (P ) refers to the use of a P standard Gauss rule of order 4 for all cells. volhΣ (P ) refers to the use of a summed tensor midpoint formula for boundary cells, cf. Fig. 1 h

volhG (P )

rel. error

volhΣ (P )

rel. error

2−4 0.0498047 1.4613e-02 0.0491562 1.4016e-03 2−5 0.0493164 4.6651e-03 0.0490866 1.6297e-05

Space discretization and solver details Until now we have not yet speciﬁed how we discretize and solve the PDE (5). The key ingredients of our solver technology shall be listed now: •

• • •

We use the Finite Element Method (FEM) on locally reﬁned meshes consisting of quadrilateral cells. As elements for velocity and pressure we take equal-order elements, Q1 /Q1 or Q2 /Q2 , together with LPS pressure stabilization, cf. [BB01]. The nonlinear convective term is linearized by Newton’s method. The system is solved in a fully-coupled manner; the resulting saddle-point problems are solved by the General Minimal Residual Method (GMRES), preconditioned by geometric multigrid iteration. All simulations were done using the simulation package Gascoigne3D (http://www.gascoigne.de). The visualization of the results was done with HiVision ([BH06]).

116

S. B¨ onisch

3 Numerical Results In this section we present computational results using our adaptive ﬁctitious-domain algorithm. We start with solving a well-known benchmark problem for a single moving particle (Sect. 3.1). Then we proceed to multiple particles (Sect. 3.2). The issue of non-smooth particle geometries is addressed in Sect. 3.3. 3.1 A benchmark problem To validate our adaptive scheme, we ﬁrst consider a benchmark problem which has been used by several authors to test their computational techniques (see e.g. [PSJ00], [WT06]): We want to simulate the fall of a rigid circular disk in a bounded, rectangular cavity Ω ﬁlled with an incompressible Newtonian ﬂuid. The setup of this problem is as follows: • • • • • • •

Ω = (0, 2) × (0, 6) The diameter of the disk is d = 0.25. At time t = 0 the disk is located at (1, 4). The ﬂuid as well as the disk are at rest initially. The ﬂuid is at rest at the boundary of the cavity. The ﬂuid density is ρf = 1, the disk density is ρs = 1.25. The ﬂuid viscosity is 0.1.

Figure 2 shows the temporal evolution of the y-component of the mass center of the disk and of the vertical component of the particle velocity. A quantitative comparison of our ﬁndings to the results described in [GPH01] is given in Table 2. We ﬁnd a maximum Reynolds number of the particle which is very close to the reference value provided in [GPH01]. However, due to the use of adaptive mesh reﬁnement, we achieve huge savings in terms of the number of needed grid cells.

5

2

4.5

1

4

0

3.5

–1 –2

v

y

3 2.5

–3

2

–4

1.5

–5

1

–6

0.5 0 0

0.1

0.2

0.3

0.4

t (sec)

0.5

0.6

0.7

0.8

–7

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

t (sec)

Figure 2. Left: y-component of the mass center of the disk vs. time; Right: vertical velocity component v vs. time

An Adaptive Fictitious-Domain Method for Particulate Flows

117

Table 2. Left: Maximum Reynolds number obtained by Glowinski et al. [GPH01]. (1) (2) Remax and Remax refer to two diﬀerent numerical schemes, both employing globally reﬁned meshes. Right: Maximum Reynolds number obtained with our adaptive method. The agreement with the reference value provided by Glowinski et al. [GPH01] is very good; only a fraction of mesh cells is needed # cells

(1)

(2)

Remax Remax

≈ 440,000 17.27 17.44 ≈ 780,000 17.31 17.51

# cells

Remax

≈ 12,000 17.62 ≈ 18,000 17.71

3.2 Multiple particles Two falling disks - “drafting,kissing,tumbling” We now consider the simplest situation involving more than one particle. We simulate the fall of two rigid circular disks in a bounded, rectangular cavity Ω. The setup of this problem is as follows (cf. [WT06]): • • • • •

Ω = (0, 2) × (0, 8) The diameter of the disks is d = 0.2. At time t = 0 the disks are located at (1, 7.2) and (1, 6.8), resp. The ﬂuid density is ρf = 1, the disk density is ρs = 1.01. The ﬂuid viscosity is 0.01.

Figure 3 shows the result of the simulation. The well-known phenomenon of “drafting,kissing and tumbling” is clearly reproduced (cf. e.g. [GPH01]). A quantitative comparison with [WT06] (not shown here) is also very promising. A proposal for a benchmark problem for more than two particles Unfortunately, to date there is no commonly accepted benchmark problem for more than two moving particles. In the following we wish to make a proposal for such a benchmark problem: We consider the free fall of 41 circular particles starting from rest. The geometry of the domain Ω and the initial positions of the particles are also shown in Fig. 4: 5 6 • Ω = [(0, 2) × (0, 6)]\ [(0, 23 ) × (4, 4 23 )] ∪ [( 43 , 2) × (4, 4 23 )] • The diameter of the disks is d = 0.1. • The ﬂuid density is ρf = 1, the disk density is ρs = 1.25. • The ﬂuid viscosity is 0.1. An interesting quantity to look at in order to compare diﬀerent numerical approaches could be the following: Which of the particles is the ﬁrst to hit the bottom of the cavity and when? In Fig. 4 the evolution of the y-component of the mass centers is depicted. In Fig. 5 the motion of the particles and of the ﬂuid is shown.

118

S. B¨ onisch

Figure 3. The typical “drafting, kissing, tumbling” phenomenon can be observed (cf. [GPH01]): The lagging particle falls faster than the leading one, because it experiences less drag. The two particles touch (“kiss”). But since this is an unstable conﬁguration in a Newtonian ﬂuid, they separate and “tumble” apart

5 4.5 4 3.5

yc

3 2.5 2 1.5 1 0.5 0 0

1

2

3

4

5

6

7

8

t

Figure 4. Left: Sketch of the geometry and the initial particle positions for the proposed benchmark problem; Right: Temporal evolution of the y-component of the particle centers

3.3 Non-smooth particle geometries We wish to conclude this section by pointing out that the use of adaptivity is even more crucial when simulating the motion of non-smooth particles. Sharp corners are smeared out on the lengthscale of the local meshsize. Therefore it is advisable to reﬁne very strongly around such corners. In Fig. 6 the result of a simulation of a moving triangle is shown. It would have been practically impossible to get such a good corner resolution with uniform grid reﬁnement.

An Adaptive Fictitious-Domain Method for Particulate Flows

119

Figure 5. Top: Temporal evolution of the ﬂow ﬁeld and the particle positions. From left to right: t = 0.01: The particles begin to move. t = 0.25: The asymmetric initial setup clearly leads to an asymmetric solution. t = 1.0: The ﬁrst particles pass the constriction. t = 2.55: The ﬁrst particle (which one?) hits the ground. t = 4.0 and t = 8.5: The particles arrange at the bottom of the cavity. Bottom: Corresponding locally reﬁned Finite Element grids

Figure 6. Top: The ﬂow around a moving triangular particle is shown (left). A zoom on one of the sharp corners shows that the resolution of the highly non-smooth particle geometry is very satisfying (right). Bottom: Corresponding Finite Element grids. Local mesh reﬁnement is crucial for the accurate resolution of the corners

120

S. B¨ onisch

4 Conclusion We have implemented an adaptive ﬁctitious-domain method for accurate and economic direct numerical simulations of rigid particulate ﬂows. Our scheme is based on the fast algorithmic realization of the stress-DLM method as introduced by Patankar, [Pat01]. The consequent use of adaptivity leads to huge savings in terms of CPU time, especially in the case of moderate particle volume fractions. The extension of the method to three-dimensional particulate ﬂows is straightforward and currently in progress.

References [BB01]

Becker, R., Braack, M.: A Finite Element Pressure Gradient Stabilization for the Stokes Equations Based on Local Projections. Calcolo, 38(4), 173–199 (2001) [BHR03] B¨ onisch, S., Heuveline, V., Rannacher, R.: Numerical simulation of the free fall problem. In: Bock, H.G. et al. (eds.) Modeling, Simulation and Optimization of Complex Processes. Springer, Berlin Heidelberg New York (2005) [BH06] B¨ onisch, S., Heuveline, V.: Advanced Flow Visualization with HiVision. In: Rannacher, R. et. al. (eds.) Reactive Flows, Diﬀusion and Transport. Springer, Berlin, 2007 [DMN01] Diaz-Goano, C., Minev, P., Nandakumar, K.: A Lagrange multiplier/ ﬁctitious-domain approach to particulate ﬂows. In: Margenov, W. et al. (eds.) Lecture Notes in Computer Science. Springer, Berlin (2001) [GPH01] Glowinski, R., Pan, T., Hesla, T., Joseph, D., P´eriaux, J.: A ﬁctitiousdomain approach to the direct numerical simulation of incompressible viscous ﬂow past moving rigid bodies: application to particulate ﬂow. J. Comp. Phys., 169, 363–426 (2001) [Pat01] Patankar, N.: A formulation for fast computations of rigid particulate ﬂows. Center Turbul. Res., Ann. Res. Briefs, 185–196 (2001) [PSJ00] Patankar, N., Singh, P., Joseph D., Glowinski R., Pan, T.: A new formulation of the distributed Lagrange multiplier/ﬁctitious-domain method for particulate ﬂows. Int. J. Multiphase Flow, 26, 1509–1524 (2000) [SP05] Sharma, N., Patankar, N.: A fast computation technique for the direct numerical simulation of rigid particulate ﬂows. J. Comp. Phys., 205, 439–457 (2005) [SHJ03] Singh, P., Hesla, T., Joseph, D.: Distributed Lagrange multiplier method for particulate ﬂows with collision. Int. J. Multiphase Flow, 27, 1829–1858 (2001) [WT06] Wan, D., Turek, S.: Direct numerical simulation of particulate ﬂow via multigrid FEM techniques and the ﬁctitious boundary method. Int. J. Numer. Meth. Fluids, 51(5), 531–566 (2006)

Adaptive Sparse Grid Techniques for Data Mining H.-J. Bungartz, D. Pﬂ¨ uger, and S. Zimmer Department of Informatics, Technische Universit¨ at M¨ unchen, Boltzmannstraße 3, 85748 Garching, Germany {bungartz, pflueged, zimmer}@in.tum.de Abstract It was shown in [GaGT01] that the task of classiﬁcation in data mining can be tackled by employing ansatz functions associated to grid points in the (often high dimensional) feature-space rather than using data-centered ansatz functions. To cope with the curse of dimensionality, sparse grids have been used. Based on this approach we propose an eﬃcient ﬁnite-element-like discretization technique for classiﬁcation instead of the combination technique used in [GaGT01]. The main goal of our method is to make use of adaptivity to further reduce the number of grid points needed. Employing adaptivity in classiﬁcation is reasonable as the target function contains smooth regions as well as rough ones. Regarding implementational issues we present an algorithm for the fast multiplication of the vector of unknowns with the coeﬃcient matrix. We give an example for the adaptive selection of grid points and show that special care has to be taken regarding the boundary values, as adaptive techniques commonly used for solving PDEs are not optimal here. Results for some typical classiﬁcation tasks, including a problem from the UCI repository, are presented.

1 Introduction Due to technical and scientiﬁc progress an ever increasing amount of data can be created, collected and stored. Typical examples are medical datasets or datasets collected in e-commerce or via geological observations as in tsunami warning systems. The availability of vast datasets increases the need for efﬁcient algorithms in the ﬁeld of data mining that can handle large datasets and extract hidden patterns and retrieve unknown information. 1.1 Data Mining Our focus is on classiﬁcation, the machine learning of a two-class problem, which plays an important role within the ﬁeld of predictive modelling in data mining. Classiﬁcation algorithms try to ﬁnd a function that correctly assigns

122

H.-J. Bungartz et al.

data to one of several given classes. Given is a set of preclassiﬁed data, the training dataset S = {(xi , yi ) ∈ [0, 1]d × {−1, 1}}M i=1 . For a ﬁnite set of data points the feature space can always be normalized to ﬁt in [0, 1]d . Each of the M training data points is assigned to one of the two class labels, +1 or −1. The aim is to compute a classiﬁcator or machine learner (ml) f : [0, 1]d → {−1, 1} based on the set of preclassiﬁed training data, which can be used to obtain class predictions applied on previously unseen data points. In generally, f can obtain any arbitrary real number, so a class prediction of greater or equal than zero is usually mapped to the positive class and a prediction of lower than zero to the negative one. The output of f for any given data point can then be considered as a measure for the conﬁdence the ml has in its prediction: The higher the absolute value is, the higher is the conﬁdence that the data point belongs to the corresponding class. To compute the ml we follow [GaGT01] and minimize the functional H[f ] =

M 1 (yi − f (xi ))2 + λ∇f 2L2 , M i=1

where (yi − f (xi ))2 is a cost or error function ensuring that the target function f is somehow close to the training dataset, ∇f 2L2 is the regularization operator or stabilizer incorporating the smoothness assumptions, and λ is the regularization parameter that controls the trade-oﬀ between approximation error and smoothness of f . 1.2 Discretization Common classiﬁcation algorithms use mostly global ansatz functions associated to data points to reconstruct f , with the aim to reduce the number of ansatz functions needed. The draw-back is that they typically scale quadratic or even worse in the number of training data points. This does not impose a problem for applications where preclassifying data points is very expensive or hardly possible as this results in very small datasets for training anyway. However, there are many applications with plenty of training data – e.g. in e-commerce or engineering – where the correct classiﬁcation of data points can be obtained automatically, say via observing the completion of user transactions. The idea therefore is to ﬁnd a classiﬁcation algorithm which is independent of the training dataset and which scales only linearly in the number of training data points. Additionally, many classiﬁcation algorithms are restricted to learning certain kinds of problems. It is known for instance that

Adaptive Sparse Grid Techniques for Data Mining

123

some Neural Network topologies are incapable of separating complex structures as two intertwined spirals [Sing98], whereas it was shown in [GaGT01] and [Pﬂ¨ u05] that sparse grids can cope with completely diﬀerent kinds of classiﬁcation tasks. A promising approach, followed in [GaGT01], is to discretize the feature space and to use ansatz functions associated to grid points. A suitable basis {φi }N i=1 has to be introduced, and the problem is restricted to a ﬁnite dimensional space VN spanned by the basis functions, fN (x) =

N

αj φj (x).

j=1

In the following we consider the space of piecewise d-linear functions. Minimisation of H[f ] leads to a linear system with N unknowns λM C + B · B T α = By,

(1)

with Cij = (∇φi (x), ∇φj (x))L2 and Bij = φi (xj ), which can be solved iteratively, e.g. with the CG method. But one encounters the curse of dimensionality: Using a regular grid with n grid points in one dimension results in nd grid points for d dimensions. A straightforward discretization of space is therefore infeasible even when dealing with only low dimensional feature spaces and small values of n. 1.3 Sparse Grids Sparse grids cope with the curse of dimensionality and have been successfully applied in various ﬁelds of application [BuGr04]. A sparse grid needs far less grid points – only O(N log(N )d−1 ) rather than O(N d ) – with only slightly deteriorated accuracy. The underlying principle is a hierarchical formulation of the basis functions. We use the standard hierarchical basis # $ Φl := φl ,i : l ≤ l, i ≤ 2l − 1 ∧ i odd . with piecewise linear ansatz functions φl,i (x) := φ x · 2l − i and φ(x) := max(1−|x|, 0). Note that all basis functions on one level have pairwise disjoint supports and cover the whole domain. The hierarchical basis functions can be extended to d dimensions via a tensor product approach, and are deﬁned as φl,i (x) :=

d 7

φlj ,ij (xj ).

j=1

l and i are multi-indices, indicating level and index of the underlying onedimensional hat functions for each dimension.

124

H.-J. Bungartz et al.

The basis

5 6 ΦWl := φl,i (x) : ij = 1, . . . , 2lj − 1, ij odd, j = 1, . . . , d

span subspaces Wl . Again, all basis functions have pairwise disjoint, equally sized supports and cover the whole domain. The space of d-linear functions Vn := Vn∞ with mesh width 2−n+1 can be written as a sum of subspaces Wl : Vn(∞) :=

n l1 =1

···

n

W(l1 ,...,ld ) =

ld =1

8

Wl .

|l|∞ ≤n

The hierarchical basis allows to choose subspaces according to their contribution to the approximation. This can be done by an a priori selection (1) [BuGr04], resulting for example in the space Vn : 8 Vn(1) := Wl . |l|1 ≤n+d−1

Figure 1 shows the tableau of subspaces in two dimensions for the sparse grid (1) V3 . As an alternative, adaptive methods for selecting grid points have also turned out to be comparatively easily implemented in various applications. In the next two sections we will focus on an eﬃcient way of solving the system of linear equations and on the use of an adaptive creation of the underlying sparse grid.

(1)

V3

Figure 1. Scheme of subspaces and sparse grid for level n = 3, 2D

2 Solving the System of Linear Equations The solution of (1) can be approximated using the combination technique as shown in [GaGT01]. The classiﬁcation problem is solved for multiple, but

Adaptive Sparse Grid Techniques for Data Mining

125

smaller regular grids. The solution is obtained by a linear combination of the partial solutions for those grids. The main advantage is that one can apply the whole machinery available for regular grids. Alternatively, the system can be solved for the sparse grid with a direct ﬁnite-element-like technique, but the overall system (λM C + BB T ) is everything but sparse. Iterative solvers have to be used, for instance a preconditioned Conjugated Gradient method. For eﬃciency reasons, the matrices should not be assembled directly. Only the application of the matrices to a vector should be implemented. The crucial part hereby is the application of the stiﬀness matrix C. This is known to be algorithmically complicated. With an eﬃcient realization, one iteration scales only linearly in the number of training data points and in the number of grid points, respectively. Our focus is on the direct ﬁnite element technique, as this allows for adaptive grid generation. In the remaining part of this section we will present the main ideas of the application of the matrices and sketch the algorithmic diﬃculties and how to tackle them. 2.1 Application of B and B T The application of the matrices B and B T can be done in a straightforward way: B T is a (M × N )-matrix, with M being the number of training points and N the number of grid points (unknowns). It is (B T α)i =

N

φj (xi )αj ,

1 ≤ i ≤ M,

j=1

so for all training data points xi one has to descend in the tree of subspaces once. For each subspace Wl there can be only one nonzero basis function for the current data point xi . It can be identiﬁed and evaluated by a constant number of operations. Fixing the level for all but one dimension and varying the remaining one, the basis functions for d − 1 dimensions stay the same and do not have to be identiﬁed and evaluated multiple times. Additional operations can be saved if this is taken into consideration. The application of B T to a vector α can be done in O(M log N ). The multiplication of the matrix B with a vector α can be done analogously. 2.2 Application of C For the application of the stiﬀness matrix C to a vector α we will consider the one-dimensional case ﬁrst. Let the coeﬃcients αi be arranged in a binary tree, according to the hierarchy. Then the application of C can be split in a down-part (from the root towards the leaves) and in an up-part (towards

126

H.-J. Bungartz et al.

the root), each with one traversal of the binary tree, and therefore in O(N ) operations. Extending this to two dimensions we have products of up- and downprocesses for each dimension due to the tensor product construction of the basis functions. Let upi be the information propagation upwards in dimension i, downi downwards. Then we have to apply (up1 +down1 )(up2 +down2 ). This is where the main diﬃculties arise when we try to handle one dimension after the other. Figure 2 shows the information propagation between the basis functions associated to the grid points labeled with 1 and 2. Whereas the information of 1 can be propagated upwards to the common root node in x1 -direction and downwards in x2 -direction in the second step to node 2, this does not work for down1 up2 from node 2 to 1. Here three grid points are missing to store the propagated information intermediately.

Figure 2. Flow of information: (left) up1 down2 , 1 → 2, (right) down1 up2 , 2 → 1

Unfortunately, the missing grid points cannot be created on the ﬂy as this would result in a full regular grid again. Instead, the up- and downdirections have to be reordered so that all propagations upwards are done before the ﬁrst propagation downwards, regardless of the dimension. For ddimensional problems, we get a similar structure with up- and down-processes in each direction. This leads to an application of the matrix C to a vector which is, again, linear in the number of grid points. For further details of our implementation, see [Pﬂ¨ u05].

3 Adaptivity Common classiﬁcation algorithms use mostly global basis functions centered according to the training dataset and hence depend algorithmically on it. Contrary, discretizing feature space provides the possibility to classify somehow independent of the training data. The advantage of scaling only linearly in the number of training points comes with a higher number of basis functions. The idea of adaptive sparse grid classiﬁcation is to make use of both

Adaptive Sparse Grid Techniques for Data Mining

127

worlds. It seems to be promising to aim for an algorithm that scales only linearly regarding the cardinality of the training dataset, but still spends grid points especially in regions where it is most necessary. It is reasonable to apply adaptivity in classiﬁcation in any case, as the target function contains smooth regions as well as rough ones. Applying adaptivity, we start with the regular sparse grid on level two, (1) V2 . We use a Conjugated Gradient method for a few iterations for training. For the following examples, we use a very simple reﬁnement strategy which we adopted from solving diﬀerential equations: Grid points are reﬁned according to the surplus – the contribution or computed coeﬃcient – of the basis functions. But rather than reﬁning all grid points with a surplus exceeding some ﬁxed threshold, it is more useful to reﬁne a certain percentage with the highest surplus out of those grid points that can be reﬁned. A basic requirement of sparse grids is that for each single basis function in the structure of binary trees all of its possible ancestors exist. This has to be taken into account when creating new grid points. All missing ancestors have to be created recursively up to the root basis function. In the following, we show some observations for a two-dimensional classiﬁcation problem ﬁrst, as it can be visualized in contrast to high-dimensional ones. The beneﬁt of employing adaptivity increases with growing dimensionality. For this we present some results for a real-world medical dataset. The ﬁrst example, the so-called Ripley dataset, has been taken from [RiHj95]. The training dataset consists of 250 points. Additionally, a dataset with 1000 points for testing is provided. The Ripley dataset serves as a benchmark for classiﬁcation algorithms as it is known to contain 8% of noise. 1 0.8 0.6 0.4 0.2 0

0.2

0.4

0.6

0.8

1

Figure 3. Ripley dataset: (left) training data, (right) adaptive grid and classiﬁcation

Figure 3 shows the training dataset and the adaptive sparse grid together with the corresponding classiﬁcation boundary. We used λ = 0.01 and reﬁned the top 15% each. It can be seen that most grid points have been spent in the

128

H.-J. Bungartz et al.

critical region, the region with the most noise. The accuracy on the test set is 90.9% which is a very good result for a dataset containing 8% of noise. As a higher dimensional example we chose the Bupa liver dataset, obtained from the UCI Repository [NHBM98]. It contains real vital data of 345 patients, mainly blood test values, and describes symptoms for liver disorders. We used the same value for the regularisation parameter again, λ = 0.01. Table 1. Comparison of non-adaptive and adaptive sparse grids, Bupa liver dataset regular sparse grid # grid points max l # iterations 1 1 1 13 2 3 97 3 7 545 4 18 2561

5

10625

6

adaptive grid reﬁnement acc. # grid points max l # iterations 58.0 59.7 13 2 3 61.4 77 3 8 66.1 243 4 17 655 4 28 35 72.8 1543 5 44 3981 5 93 65 81.7 8783 6 158

acc. 59.7 62.0 65.5 71.6 81.2 86.7 91.9

Table 1 clearly demonstrates the gain of using adaptivity. The left hand side of the table shows the classiﬁcation for the regular sparse grids for level one to eight, the right hand side for the adaptive sparse grid. There we start (1) with V2 and continue with six times of reﬁnement. The table shows the number of grid points involved, the maximum level of a basis function, the number of iterations of the diagonally preconditioned CG, and the accuracy obtained on the training data. The ﬁrst time an accuracy of more than 81% is reached is for level six in the case of the regular sparse grids. This involves 10625 grid points and 65 iterations. Using the adaptive sparse grid is by far better: After only four times of reﬁnement with merely 1543 basis functions and 44 iterations we already get an accuracy better than 81%.

4 Boundary Considerations For the previous examples we did not use any grid points on the boundary – the functions we learned were all zero on the boundary of the feature space. Nevertheless, we got excellent results. In the remainder of this section we will point out why it is advantageous not to spend extra computational eﬀort on the boundary in classiﬁcation. To allow for boundary values apart from zero, additional grid points can be spent on the boundary. But this results in a much larger number of unknowns, especially for higher dimensions. For example, the simple ten dimensional (1) sparse grid V2 consists of 21 points. Employing grid points on the boundary

Adaptive Sparse Grid Techniques for Data Mining

129

results in 452,709 points instead. To save precious computation time one would be better oﬀ without plenty of additional basis functions. As an alternative, the basis functions can be modiﬁed, especially those adjacent to the boundary. On the ﬁrst level, we use the constant function 1. On all other level we fold up the hat functions next to the boundary. This leads to the following basis functions in 1D, higher dimensional functions are constructed via the tensor product approach again: ⎧ 1 if l = 1 ∧ i = 1 ⎪ ⎪ 2) 1 ⎪ 1 ⎪ 2 − 2l · x if x ∈ 0, 2l−1 ⎪ ⎪ if l > 1 ∧ i = 1 ⎨ 0 otherwise 1 2 ) φl,i (x) := l 1 2 · x + 1 − i if x ∈ 1 − 2l−1 ,1 ⎪ ⎪ if l > 1 ∧ i = 2l − 1 ⎪ ⎪ 0 otherwise ⎪ ⎪ ⎩ φ x · 2l − i otherwise. Figure 4 (left) shows the sparse grid for level two with additional grid points on the boundary. The common hat basis functions span a space of piecewise linear functions with a mesh width of 2−2 . If the modiﬁed boundary functions are used, the same function space can be obtained by using additionally the basis functions adjacent to the boundary of the next higher level. This results in the same number of unknowns and in the grid on the right.

=

(1)

Figure 4. V2 tions

with (left) common hat basis functions, (right) modiﬁed basis func-

Considering the modiﬁed basis, adaptivity can take care of the boundary functions: They are automatically created wherever needed. Classiﬁcations showed that it is hardly necessary to modify the basis functions if there are no data points located exactly on the boundary as in the case of the Ripley dataset, for example. Table 2 shows that there is almost no diﬀerence in the number of grid points created. But as the condition number of the matrix detoriates using the modiﬁed boundary functions, it takes more than twice as many iterations to reduce the norm of the residuum below 10−8 . If the datasets are normalized not to the unit hypercube [0, 1]d but to a slightly smaller region then it suﬃces to take the normal hat function basis when classifying adaptively. This allows to start with 2d+1 grid points for the sparse grid on level two, creating “boundary” grid points only when necessary.

130

H.-J. Bungartz et al. Table 2. Conventional hat basis functions vs. modiﬁed boundary functions

hat functions # grid points max l # iterations 5 2 5 17 3 18 49 4 35 123 5 50 263 6 64 506 7 85 866 7 96

modiﬁed boundary functions acc. # grid points max l # iterations acc. 89.9 5 2 5 89.9 91.1 17 3 20 91.0 91.0 49 4 57 91.0 91.0 117 5 115 91.0 90.7 232 6 169 90.6 90.8 470 6 235 90.7 90.7 812 7 252 90.8

5 Summary We presented an adaptive classiﬁcation algorithm using sparse grids to discretize feature space. The algorithmically hard part, the multiplication with the stiﬀness matrix, was sketched. The algorithm allows for the classifcation of large datasets as it scales only linearly in the number of training data points. Using adaptivity in sparse grid classiﬁcation allows to reduce the number of grid points signiﬁcantly. An adaptive selection of grid points is especially useful for higher dimensional feature spaces. Special care has to be taken regarding the boundary values. Sparse grids usually employ grid points located on the boundary. A vast number of grid points can be saved if those are omitted and if the datasets are normalized so that no data points are located on the boundary instead. This allows to start with only 2d + 1 grid points; adaptivity takes care about the creation of grid points next to the boundary. Ongoing research includes investigations on employing other reﬁnement criteria which are suited better for classiﬁcation and the use of other basis functions to further improve adaptive sparse grid classiﬁcation.

References [BuGr04]

H.-J. Bungartz and M. Griebel. Sparse grids. Acta Numerica Volume 13, 2004, p. 147–269. [GaGT01] J. Garcke, M. Griebel and M. Thess. Data Mining with Sparse Grids. Computing 67(3), 2001, p. 225–253. [NHBM98] D. Newman, S. Hettich, C. Blake and C. Merz. UCI Repository of machine learning databases, 1998. [Pﬂ¨ u05] D. Pﬂ¨ uger. Data Mining mit D¨ unnen Gittern. Diplomarbeit, IPVS, Universit¨ at Stuttgart, March 2005. [RiHj95] B. D. Ripley and N. L. Hjort. Pattern Recognition and Neural Networks. Cambridge University Press, New York, NY, USA. 1995. [Sing98] S. Singh. 2D spiral pattern recognition with possibilistic measures. Pattern Recogn. Lett. 19(2), 1998, p. 141–147.

On the Stochastic Geometry of Birth-and-Growth Processes. Application to Material Science, Biology and Medicine Vincenzo Capasso ADAMSS (Centre for Advanced Applied Mathematical and Statistical Sciences) and Department of Mathematics Universit´ a degli Studi di Milano, via Saldini 50, 20133 Milano, Italy [email protected]

Dedicated to Willi Jaeger on his 65th birthday Abstract Nucleation and growth processes arise in a variety of natural and technological applications, such as e.g. solidiﬁcation of materials, semiconductor crystal growth, biomineralization (shell growth), tumor growth, vasculogenesis, DNA replication. All these processes may be modelled as birth-and-growth processes (germ-grain models), which are composed of two processes, birth (nucleation, branching, etc.) and subsequent growth of spatial structures (crystals, vessel networks, etc), which, in general, are both stochastic in time and space. These structures usually induce a random division of the relevant spatial region, known as a random tessellation. A quantitative description of the spatial structure of a random tessellation can be given, in terms of random distributions /` a la Schwartz/, and their mean values, known as mean densities of interfaces (n-facets) of the random tessellation, at diﬀerent Hausdorﬀ dimensions (cells, faces, edges, vertices), with respect to the usual d-dimensional Lebesgue measure. With respect to all ﬁelds of applications, predictive mathematical models which are capable of producing quantitative morphological features can contribute to the solution of optimization or optimal control problems. A non trivial diﬃculty arises from the strong coupling of the kinetic parameters of the relevant birth-and-growth (or branching-and-growth) process with the underlying ﬁeld, such as temperature, and the geometric spatial densities of the evolving spatial random tessellation itself. Methods for reducing complexity include homogenization at mesoscales, thus leading to hybrid models (deterministic at the larger scale, and stochastic at lower scales); we bridge the two scales by introducing a mesoscale at which we may locally average the microscopic birth-and-growth model in presence of a large number of grains.

132

V. Capasso

The proposed approach, also suggests methods of statistical analysis for the estimation of mean geometric densities that characterize the morphology of a real system.

1 Introduction Many processes of biomedical interest may be modelled as birth-andgrowth processes (germ-grain models), which are composed of two processes, birth (nucleation, branching, etc.) and subsequent growth of spatial structures (cells, vessel networks, etc), which, in general, are both stochastic in time and space. These structures induce a random division of the relevant spatial region, known as random tessellation. A quantitative description of the spatial structure of a tessellation can be given, in terms of the mean densities of interfaces (n-facets). In applications to material science a main industrial interest is controlling the quality of the relevant ﬁnal product in terms of its mechanical properties; as shown e.g. in [30], these are strictly related to the ﬁnal morphology of the solidiﬁed material, so that quality control in this case means optimal control of the ﬁnal morphology. In medicine, very important examples of birth-and-growth processes are mathematical models of tumor growth and of tumor-induced angiogenesis. In this context, the understanding of the principles and the dominant mechanisms underlying tumor growth is an essential prerequisite for identifying optimal control strategies, in terms of prevention and treatment. Predictive mathematical models which are capable of producing quantitative morphological features of developing tumor and blood vessels can contribute to this. A major diﬃculty derives from the strong coupling of the kinetic parameters of the relevant birth-and-growth (or branching-and-growth) process with various underlying ﬁelds, and the geometric spatial densities of the existing tumor, or capillary network itself. All these aspects induce stochastic time and space heterogeneities, thus motivating a more general analysis of the stochastic geometry of the process. The formulation of an exhaustive evolution model which relates all the relevant features of a real phenomenon dealing with diﬀerent scales, and a stochastic domain decomposition at diﬀerent Hausdorﬀ dimensions, is a problem of high complexity, both analytical and computational. Methods for reducing complexity include homogenization at larger scales, thus leading to hybrid models (deterministic at the larger scale, and stochastic at smaller scales). The aim of this paper is to present an overview of a large set of papers produced by the international group coordinated by the author on the subject. As a matter of example we present a couple of simpliﬁed stochastic geometric models, for which we discuss how to relate the evolution of mean geometric densities describing the morphology of the systems to the kinetic parameters of birth and growth.

Birth-and-Growth Processes

133

In Section 2 the general structure of stochastic birth-and-growth processes is presented, introducing a basic birth process, as a marked point process. In Section 3 a volume growth model is presented which is of great interest in many problems of material science, and has attracted a lot of attention because of its analytical complexity with respect to the geometry of the growth front (see e.g. [31, 47] and references therein). In many of the quoted applications it is of great importance to handle evolution equations of random closed sets of diﬀerent (even though integer) Hausdorﬀ dimensions. Following a standard approach in geometric measure theory, such sets may be described in terms of suitable measures. For a random closed set of lower dimension with respect to the environment space, the relevant measures induced by its realizations are singular with respect to the Lebesgue measure, and so their usual Radon-Nikodym derivatives are zero almost everywhere. In Sections 4 and 5 an original approach is presented, recently proposed by the author and his group, who have suggested to cope with these diﬃculties by introducing random generalized densities (distributions) ´ a la Dirac-Schwartz, for both the deterministic case and the stochastic case. In this last one we analyze mean generalized densities, and relate them to densities of the expected values of the relevant measures. For the applications of our interest, the Delta formalism provides a natural framework for deriving evolution equations for mean densities at all (integer) Hausdorﬀ dimensions, in terms of the local relevant kinetic parameters of birth and growth. In Section 6 connections with the concept of hazard function is oﬀered, with respect to the survival of a point to its capture by the relevant growing phase. Section 7 shows how evolution equations for the mean densities of interfaces at all Hausdorﬀ dimensions are obtained in terms of the kinetic parameters of the process, via the hazard function. In Sections 8 and 9 it is shown how to reduce the complexity of the problem, from both the analytical and computational point of view, by taking into account the multiple scale structure of the system, and deriving an hybrid model via an heuristic homogenization of the underlying ﬁeld at the larger scale. Some numerical simulations are reported for the case of crystallization of polymers, together with a simpliﬁed problem of optimal control of the ﬁnal morphology of the crystallized material.

2 Birth-and-Growth Processes The set of ﬁgures 1-11, shows a family of real processes from Biology, Medicine and Material Science. In a detailed description, all these processes can be modelled as birth-and-growth processes. In forest growth, births start from seeds randomly dispersed in a region of interest, and growth is due to nutrients

134

V. Capasso

Figure 1. Candies or phtalate crystals?

Figure 2. Forest growth or crystallization process?

in the soil that may be randomly distributed themselves or driven by a fertilization procedure; in tumor growth abnormal cells are randomly activated and develop thanks to a nutritional underlying ﬁeld driven by blood circulation (angiogenesis); in crystallization processes such as sea shells, polymer solidiﬁcation, nucleation and growth may be due to a biochemical underlying ﬁeld, to temperature cooling, etc.

Birth-and-Growth Processes

135

Figure 3. Sea shell crystallization (from [50])

Figure 4. A simulation of the growth of a tumor mass coupled with a random underlying ﬁeld (from [3])

Figure 5. Vascularization of an allantoid (from [27])

136

V. Capasso

Figure 6. Angiogenesis on a rat cornea (from [26]) (left). A simulation of an angiogenesis due to a localized tumor mass (black region on the right) (from [25]) (right)

Figure 7. Response of a vascular network to an antiangiogenic treatment (from [33])

All this kind of phenomena are subject to random ﬂuctuations, together with the underlying ﬁeld, because of intrinsic reasons or because of the coupling with the growth process.

Birth-and-Growth Processes

137

Figure 8. A libellula wing

Figure 9. A real picture showing a spatial tessellation due to vascularization of a biological tissue : endothelial cells form a vessel network (from [44])

Entangled interlamellar links Branch points

Figure 10. A schematic representation of a spherulite and an impingement phenomenon

2.1 The Birth process - Nucleation or Branching The birth process is modelled as a stochastic marked point process (MPP) N, deﬁned as a random measure on BR+ × E by ∞ N= Tj ,Xj j=1

138

V. Capasso

Figure 11. Real Experiment

Figure 12. Simulated Experiment; a Johnson-Mehl tessellation

where • • • •

E denotes the sigma-algebra of the Borel subsets of E, a bounded subset of Rd , the physical space; Tj is an R+ -valued random variable representing the time of birth of the n−th nucleus, Xj is an E-valued random variable representing the spatial location of the nucleus born at time Tj , t,x is the Dirac measure on BR+ × E such that for any t1 < t2 and B ∈ E, 1 if t ∈ [t1 , t2 ], x ∈ B, t,x ([t1 , t2 ] × B) = 0 otherwise.

The (random) number of nuclei born during A , in the region B is given by N (A × B) = {Tj ∈ A, Xj ∈ B}, A ∈ BR+ , B ∈ E .

2.2 Stochastic Intensity The stochastic intensity of the nucleation process provides the probability that a new nucleation event occurs in the inﬁnitesimal region [x, x + dx], during the inﬁnitesimal time interval [t, t + dt], given its past history Ft− up to time t, ν(dx × dt) := P [N (dx × dt) = 1 | Ft− ] = E[N (dx × dt) | Ft− ] In many cases, such as in a crystallization process, we have volume growth models. If the nucleation events occur at {(Tj , Xj ) | 0 ≤ T1 ≤ T2 ≤ . . .}

Birth-and-Growth Processes the crystalline phase at time t > 0 is described by a random set 9 t Θj , Θt =

139

(1)

Tj ≤t

given by the union of all crystals born at times Tj and locations Xj and freely grown up to time t. In this case Θt has the same dimension d as the physical space. If we wish to impose that no further nucleation event may occur in the crystalline phase, we have to assume that the stochastic intensity is of the form ν(dx × dt) = α(x, t)(1 − δΘt− (x))dt dx where δΘt denotes the indicator function of the set Θt , according to a formalism that will be discussed later; so that the term (1 − δΘt− (x)) is responsible of the fact that no new nuclei can be born at time t in a zone already occupied by the crystalline phase. The parameter α, also known as the free space nucleation rate, is a suitable real valued measurable function on R+ × E, such that α(·, t) ∈ L1 (E), for all t > 0 and such that T

α(x, t)dx < ∞

dt

0< 0

for any 0 < T < ∞, and

E

∞

α(x, t)dx = ∞.

dt

0< 0

E

We will denote by ν0 (dx × dt) = α(x, t)dtdx the free space intensity . If α(x, t) is deterministic, this corresponds to a space-time (inhomogeneous) Poisson process, in the free space.

2.3 Angiogenic processes An angiogenic process can be modelled as a birth-and-growth process at dimension 1. Such processes are usually called ﬁber systems [7]. Now the marked counting process modelling the birth processes, call it M, refers to the oﬀspring of a new capillary from an already existing vessel, i.e. from a point of the stochastic ﬁbre system Θt , so that the branching rate is given by µ(dl × dt) = P (M (dl × dt) = 1) = β(x, t)δΘt− (x)dldt. This shows the dependence of the branching rate upon the existing stochastic ﬁbre system Θt− , and the fact that the point of birth will belong to its inﬁnitesimal element dl. This case, which may provide a model for the stochastic branching of vessels, illustrated in Fig. 6, requires a much more complicated growth model, as many other so called ﬁbre processes, or ﬁbre systems; for the modelling aspects we refer to [25, 48], and to [7] for the (also nontrivial) statistical aspects.

140

V. Capasso

3 A Volume Growth Model Volume growth models have been studied more extensively since the pioneering work by [35]. The classical Kolmogorov-Avrami-Evans theory for isothermal crystallization heavily used that the growth rate is constant so that (in absence of impingement) individual crystals are of a spherical shape; the same is true whenever the growth rate depends upon time only. In the case of non-homogeneous growth, i.e., if the growth rate G depends on both space and time, the shape of a crystal is no longer a ball centered at the origin of growth. For arbitrary growth rates, we assume the following: Assumption 1(Minimal-time Principle) [14]

The curve along which a crystal grows, from its origin to any other point, is such that the needed time is minimal. A crystal at time t is given as the union of all growth lines. Figure 13 shows a simulation of non-homogeneous crystal growth for which we used a typical parabolic temperature proﬁle (i.e., the solution of the heat equation without latent heat) and data for the growth rate obtained by measurements of isotactic polypropylene (i-PP). The growth model We assume that the growth of a nucleus occurs with a nonnegative normal velocity G(x, t), i.e., the velocity of boundary points is determined by 9 (2) V = Gn on ∂( Θjt ), j

where n is the unit outer normal. We shall consider the growth from a spherical nucleus from an inﬁnitesimal radius R → 0. Without further notice we shall assume that G is bounded and continuous on E × [0, T ] with 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Figure 13. Crystal shapes in a typical temperature ﬁeld

Birth-and-Growth Processes

141

TIME=2 sec

, TIME=9 sec

, Figure 14. A simulated crystallization process

g0 :=

inf x∈E,t∈[0,T ]

G(x, t) > 0,

G0 :=

sup

G(x, t).

(3)

x∈E,t∈[0,T ]

Moreover, we assume that G Lipschitz-continuous with respect to the space variable x. As a consequence, given a nucleation event at time t0 and location x0 , the corresponding grain Θ(t; x0 , t0 ), freely grown up to time t > t0 , is given by [13] Θ(t; x0 , t0 ) = {x ∈ E|∃ξ ∈ W 1,∞ ([t0 , t]) : ξ(t0 ) = x0 , ξ(t) = x, ˙ ξ(s) ≤ G(ξ(s), s), s ∈ (t0 , t)}

(4)

for t ≥ t0 and Θ(t; x0 , t0 ) = ∅ for t < t0 . In this case we may indeed claim that the whole crystalline phase is given by (1). The following deﬁnition will be useful later; we denote by dimH and Hn the Hausdorﬀ dimension, and the n-dimensional Hausdorﬀ measure, respectively. Deﬁnition 1. [24] Given an integer n, such that 0 ≤ n ≤ d, we say that a subset A of Rd is n-regular, if it satisﬁes the following conditions: (i) A is Hn -measurable; (ii) Hn (A) > 0; and Hn (A ∩ Br (0)) < ∞, for any r > 0;

142

V. Capasso

(iii)

Hn (A ∩ Br (x)) lim = r→0 bn r n

1 0

Hn -a.e. x ∈ A, , ∀x ∈ A.

where bn is the volume of the unit ball in Rn . Remark 1. Note that condition (iii) is related to a characterization of the Hn rectiﬁability of the set A [29]. Theorem 1. [8] Subject to the initial condition that each initial germ is a spherical ball of inﬁnitesimal radius, under suitable regularity on the growth ﬁeld G(t, x), each grain Θtt0 (x0 ) is such that the following inclusion holds Θts0 (x0 ) ⊂ Θtt0 (x0 ), for s < t, with ∂Θts0 (x0 ) ∩ ∂Θtt0 (x0 ) = ∅, for s < t. Moreover, for almost every t ∈ R+ , Θtt0 (x0 ) is a d-regular closed set, and ∂Θtt0 (x0 )is a (d − 1)-regular closed set. As a consequence Θtt0 (x0 ) and ∂Θtt0 (x0 ) satisfy Hd (Θtt0 (x0 ) ∩ Br (x)) = 1 for Hd -a.e. x ∈ Θtt0 (x0 ), r→0 bd r d Hd−1 (∂Θtt0 (x0 ) ∩ Br (x)) lim = 1 for Hd−1 -a.e. x ∈ ∂Θtt0 (x0 ), r→0 bd−1 rd−1 lim

where Br (x) is the d-dimensional open ball centered in x with radius r, and bn denotes the volume of the unit ball in IRn . Further we assume that G(t, x) is suﬃciently regular so that, at almost any time t > 0, the following holds Hd ((Θtt0 (x0 )⊕r \ Θtt0 (x0 )) ∩ A) = Hd−1 (∂Θtt0 (x0 ) ∩ A), r→0 r lim

for any A ∈ BRd such that Hd−1 (∂Θtt0 (x0 ) ∩ ∂A) = 0, where we have denoted by F⊕r the parallel set of F at distance r ≥ 0 (i.e. the set of all points x ∈ Rd with distance from F at most r) (see e.g. [37, 45]).

Birth-and-Growth Processes

143

4 Closed Sets as Distributions – The Deterministic Case In order to pursue our analysis of the germ-grain model associated with a birth-and-growth process, we ﬁnd convenient to represent n-regular closed a la Dirac-Schwartz, in terms of their “geometric sets in IRd as distributions ´ densities”. Consider an n-regular closed set Θn in IRd . Then we have Hn (Θn ∩ Br (x)) 1 Hn -a.e. x ∈ Θn , = , lim n 0 ∀x ∈ Θn . r→0 bn r but Hn (Θn ∩ Br (x)) Hn (Θn ∩ Br (x)) bn rn = lim = d r→0 r→0 bd r bn r n bd r d

lim

∞ 0

Hn -a.e. x ∈ Θn , ∀x ∈ Θn .

By analogy with the delta function δx0 (x) associated with a point x0 , for x ∈ IRd we deﬁne the generalized density of Θn as Hn (Θn ∩ Br (x)) . r→0 bd r d

δΘn (x) := lim

The density δΘn (x) (delta function of the set Θn ) can be seen as a linear functional deﬁned by a measure in a similar way as the classical delta function δx0 (x) of a point x0 . Deﬁne µΘn (A) := (δΘn , 1A ) := Hn (Θn ∩ A),

A bounded in BIRd .

In accordance with the usual representation of distributions in the theory of generalized functions, for any function f ∈ Cc (IRd , IR), the space of continuous functions with compact support, we may formally write (δΘn , f ) := f (x)µΘn (dx) = f (x)δΘn (x)dx. IRd

IRd

5 Stochastic Geometry A relevant aspect of stochastic geometry is the analysis of the spatial structure of objects which are random in location and shape. Given a random object Σ ∈ Rd , a ﬁrst quantity of interest is for example the probability that a point x belongs to Σ, or more in general the probability that a compact set K intersects Σ. The theory of Choquet-Matheron [39, 46], shows that it is possible to assign a unique probability law PΣ associated with a RACS (random closed set) Σ ∈ Rd on the measurable space (F, σF ) of the family of closed sets in Rd

144

V. Capasso

endowed with the σ-algebra generated by the hit-or-miss topology, by assigning its hitting functional TΣ . Given a probability space (Ω, A, P ). A RACS Σ is a measurable function Σ : (Ω, A) −→ (F, σF ). The hitting functional of Σ is deﬁned as TΣ : K ∈ K −→ P (Σ ∩ K = ∅), where K the family of compact sets in Rd . Actually we may consider the restriction of TΣ to the family of closed balls {Bε (x); x ∈ Rd , ε ∈ R+ − {0}}. We shall denote by EΣ , or simply by E, the expected value with respect to the probability law PΣ . 5.1 Closed sets as distributions – The stochastic case Suppose now that Θn is an n-regular Random Closed Set in IRd on a suitable probability space (Ω, F, P ), with E[Hn (Θn ∩ Br (0))] < ∞, for all r > 0. As a consequence, µΘn , deﬁned as above, is a random measure, and correspondingly δΘn is a random linear functional. Consider the linear functional E[δΘn ] deﬁned on Cc (IRd , IR) by the measure E[µΘn ](A) := E[Hn (Θn ∩ A)], i.e. by

(E[δΘn ], f ) =

f (x)E[δΘn ](x)dx := IRd

f (x)E[µΘn ](dx), IRd

for any f ∈ Cc (IRd , IR). It can be shown that the expected linear functional E[δΘn ] so deﬁned is such that, for any f ∈ Cc (IRd , IR), (E[δΘn ], f ) = E[(δΘn , f )], which corresponds to the expected linear functional a´ la Gelfand-Pettis. For a discussion about measurability of (δΘn , f ) we refer to [6, 38, 51]. Note that, even though for any realization Θn (ω), the measure µΘn (ω) may be singular, the expected measure E[µΘn ] may be absolutely continuous with respect to ν d , having classical Radon-Nykodym density E[δΘn ]. It is then of interest to say whether or not a classical mean density can be introduced for sets of lower Hausdorﬀ dimensions, with respect to the usual Lebesgue measure on Rd . In order to respond to this further requirement, in [23] we have introduced the following deﬁnition.

Birth-and-Growth Processes

145

Deﬁnition 2. Let Θ be a random closed set in Rd with E[HdimH (∂Θ) (∂Θ)] > 0. We say that Θ is absolutely continuous if and only if E[HdimH (∂Θ) (∂Θ ∩ ·)] ν d (·)

(5)

on BRd , where dimH denotes the Hausdorﬀ dimension. Remark 2. We are assuming that the random set Θ is suﬃciently regular so that, if dimH (Θ) = d, then dimH (∂Θ) = d − 1, while if dimH (Θ) = s < d, then ∂Θ = Θ, and E[HdimH (∂Θ) (∂Θ)] < ∞; thus (5) becomes: E[Hd−1 (∂Θ ∩ ·)] ν d (·) if dimH (Θ) = d, E[Hs (Θ ∩ ·)] ν d (·) if dimH (Θ) = s < d. It is easy to check that the deﬁnition above is consistent with the case that Θ is a random variable or a random point in Rd . For n = d, it is easily seen that δΘd (x) = 1Θd (x), ν d -a.s., which directly implies E[δΘd ](x) = P (x ∈ Θd ), ν d -a.s.. The density VV (x) := E[δΘd ](x) = P (x ∈ Θd ) in material science is known as the (degree of ) crystallinity. The complement to 1 of the crystallinity, is known as porosity px = 1 − VV (x) = P (x ∈ Θd ). When the RACS Θd is a.c. according to the deﬁnition above, then the mean surface density SV (x) := E[δ∂Θd ](x) is well deﬁned, too, as a classical function (see Fig. 15).

6 The Hazard Function In the dynamical case, such as a birth-and-growth process, the RACS Θt may depend upon time so that a second question arises, i.e. when a point x ∈ E is reached (captured) by a growing stochastic region Θt ; or viceversa up to when a point x ∈ E survives capture? In this respect the degree of crystallinity (now also depending on time) VV (x, t) = P (x ∈ Θt ) may be seen as the probability of capture of point x ∈ E , by time t > 0. In this sense the complement to 1 of the crystallinity, also known as porosity px (t) = 1 − VV (x, t) = P (x ∈ Θt )

(6)

represents the survival function of the point x at time t, i.e. the probability that the point x is not yet covered by the random set Θt .

146

V. Capasso

Figure 15. An estimate of SV and VV on a planar (simulated) sample of coppertungsten alloy (from [32])

Figure 16. Capture of a point x during time ∆t

With reference to the growing RACS Θt we may introduce the (random) time τ (x) of survival of a point x ∈ E with respect to its capture by Θt , such that px (t) = P (τ (x) > t). In order to relate these quantities to the kinetic parameters of the process, we follow Kolmogorov [35] by introducing the concept of causal cone. Deﬁnition 3. The causal cone C(x, t) of a point x at time t is the set of points (y, s) in the time space R+ × E such that a crystal born in y at time s covers the point x by time t

Birth-and-Growth Processes

147

Figure 17. The causal cone of point x at time t

C(x, t) : = {(y, s) ∈ E × [0, t]|x ∈ Θ(t; y, s)}. where we have denoted by Θst (y) the crystal born at y ∈ E at time s ∈ R+ , and observed at time t ≥ s. Some information on the properties of the boundaries in the sense of geometric measure theory have been obtained in [8], for a freely grown crystal Θ(t; y, s), Proposition 1. For almost every t > s, the set Θ(t; y, s) has ﬁnite nontrivial Hausdorﬀ-measure Hd ; its boundary ∂Θ(t; y, s) has ﬁnite nontrivial Hausdorﬀ-measure Hd−1 . From the theory of Poisson processes, it is easily seen that P(N (C(x, t)) = 0) = e−ν0 (C(x,t)) , where ν0 (C(x, t)) is the volume of the causal cone with respect to the intensity measure of the Poisson process ν0 (C(x, t)) = α(y, s)d(y, s). C(x,t)

The following theorem holds [13] for the time derivative of the measure ν0 (C(x, t)). Proposition 2. Let the standard assumptions on the nucleation and growth rates be satisﬁed. Then ν0 (C(x, t)) is continuously diﬀerentiable with respect to t and t ∂ ν0 (C(x, t)) = G(x, t) dt0 dx0 K(x0 , t0 ; x, t)α(x0 , t0 ) (7) ∂t 0 Rd

148

V. Capasso

with K(x0 , t0 ; x, t) :=

{z∈Rd |τ (x0 ,t0 ;z)=t}

da(z)δ(z − x).

Here δ is the Dirac function, da(z) is a (d − 1)-surface element, and τ (x0 , t0 ; z) is the solution of the eikonal problem |

∂τ 1 ∂τ (x0 , t0 , x)| = (x0 , t0 , x) ∂x0 G(x0 , t0 ) ∂t0 |

∂τ 1 (x0 , t0 , x)| = , ∂x G(x, τ (x0 , t0 , x))

subject to suitable initial and boundary conditions. Let us suppose that the growth rate of a crystal depends only upon the point (x, t) under consideration, and not upon the age of the crystal, for example. In this case, under our modelling asssumptions, px (t) = P (x ∈ Θt ) = P (N (C(x, t)) = 0) = e−ν0 (C(x,t)) . Thanks to Proposition 2 , ν0 (C(x, t)) is continuously diﬀerentiable with respect to t, so that the hazard function, deﬁned as the rate of capture by the process Θt, i.e. P (x ∈ Θt+∆t |x ∈ / Θt ) , h(x, t) = lim ∆t→0 ∆t is given by ∂ ∂ ln px (t) = ν0 (C(x, t)), ∂t ∂t is well deﬁned; hence the time of capture τ (x) is an absolutely continuous random variable, having probability density function h(x, t) = −

fx (t) = px (t)h(x, t). Since fx (t) =

d ∂VV (x, t) (1 − px (t)) = dt ∂t

we immediately obtain ∂VV (x, t) = (1 − VV (x, t))h(x, t). ∂t This is an extension of the well known Avrami-Kolmogorov formula [4,35], proven for a very speciﬁc space and time homogeneous birth and growth process; instead our expression holds whenever a mean volume density and an hazard function are well deﬁned.

Birth-and-Growth Processes

149

Consider now the extended birth-and-growth process which evolves in such a way that germs are born with birth rate α(x, t) and grains grow with growth rate G(x, t), independently of each other, i.e. ignoring overlapping of germs and grains; under the above mentioned regularity assumptions on these parameters, which make the free crystals absolutely continuous the following quantities are well deﬁned a.e. Deﬁnition 4. We call mean extended volume density at point x and time t the quantity Vex (x, t) such that, for any B ∈ BR+ , d E[ ν (Θ(t; Xj , Tj ) ∩ B)] = Vex (x, t)ν d (dx). B

Tj
It represents the mean of the sum of the volume densities, at time t, of the grains freely born and grown. [18] Correspondingly, we can deﬁne an extended surface density: Deﬁnition 5. We call mean local extended surface density at point x and time t the quantity Sex (x, t) such that, for any B ∈ BRd , E[ ν d−1 (∂Θ(t; Xj , Tj ) ∩ B)] = Sex (x, t)ν d (dx). B

Tj
It represents the mean of the sum of the surface densities, at time t, of the grains supposed freely born and grown. Theorem 2. Under the previous modelling assumptions on birth and on growth, if the growth ﬁeld makes (4) well deﬁned, the following equality holds ν0 (C(x, t)) = Vex (x, t). Thanks to the above theorem, h(x, t) = −

∂ ∂ ∂ ln px (t) = ν0 (C(x, t)) = Vex (x, t) ∂t ∂t ∂t

so that we also have ∂ ∂ VV (x, t) = (1 − VV (x, t)) Vex (x, t). ∂t ∂t By Proposition 2 t ∂ ν0 (C(x, t)) = G(x, t) dt0 dx0 K(x0 , t0 ; x, t)α(x0 , t0 ). ∂t 0 Rd

(8)

150

V. Capasso

Consequently ∂ Vex (x, t) = G(x, t) ∂t

t

dt0 0

Rd

dx0 K(x0 , t0 ; x, t)α(x0 , t0 ),

(9)

and

t

h(x, t) = G(x, t)

dt0 0

Rd

dx0 K(x0 , t0 ; x, t)α(x0 , t0 ).

On the other hand, from the results in [24] we may claim that, for any individual crystal Θj t := Θ(t; Xj , Tj ), the following evolution equation holds relating its mean volume and surface densities ∂ E[δΘj t ](x) = G(t, x)E[δ∂Θj t ](x). ∂t

(10)

By linearity arguments, by considering all crystals individually born, and grown independently of each other, we get ∂ Vex (x, t) = G(x, t)Sex (x, t). ∂t

(11)

A comparison of equations (9) and (11) yields the interesting expression

t

Sex (x, t) =

dt0 0

Rd

dx0 K(x0 , t0 ; x, t)α(x0 , t0 ) =

h(x, t) , G(x, t)

(12)

i.e. the available free surface can be described directly in terms of the hazard function and the growth rate (and vice versa). Using (8) and (11) we ﬁnally have ∂ ∂ VV (x, t) = (1 − VV (x, t)) Vex (x, t)) = (1 − VV (x, t))G(x, t)Sex (x, t). (13) ∂t ∂t When referring to the actual birth-and-growth process, i.e. to the actual (volume) VV (t, x) and (surface) SV (t, x) densities, we have the following [24]. Theorem 3. Under the assumptions above, mean densities satisfy the following evolution equation (in weak form) ∂ E [δΘt (x)] = G(t, x)E[δ∂Θt (x)]. ∂t As a consequence of the previous theorem we can state that (in a weak form.) ∂ VV (t, x) = G(t, x)SV (t, x). ∂t More in general, we may study the case of when a compact K ⊂ E is reached by the invading stochastic region Θ(t).

Birth-and-Growth Processes

151

In this respect we may introduce the (random) hitting time τ (K) of a compact K ⊂ E by Θt . It will be such that the corresponding survival function is given by SK (t) := P (τ (K) > t) = P (Θt ∩ K = ∅) = 1 − TΘt (K). TΘt (K) = P (Θt ∩ K = ∅) is the so called hitting functional of the RACS Θt . Correspondingly an hazard function h(K, t) can be deﬁned as the hitting rate of the process Θt , i.e. P (Θt+∆t ∩ K = ∅)|Θt ∩ K = ∅)) . ∆t→0 ∆t

h(K, t) = lim Under suﬃcient regularity h(K, t) =

T t+∆t (K) − TΘt (K) d 1 lim Θ = − ln(1 − TΘt (K)) 1 − TΘt (K) ∆t→0 ∆t dt

Since TΘ0 (K) = 0, it follows that t ) TΘt (K) = 1 − exp − h(K, s)ds . 0

This expression allows an estimation of the hitting functional by means of an estimation of the hazard function. Finally, we may observe that anyhow, whenever we may estimate directly the hitting functional TΘt (K), for a suﬃciently large family of compact sets K, we may claim to know the stochastic structure of the RACS Θt (see e.g. [19]).

7 Mean Densities of Stochastic Tessellations A random subdivision of space needs further information to be characterized. For example in the birth-and-growth process described above we may include an additional feature known as impingement, by assuming that, at points of contact of their growth front, grains stop growing. In this case the spatial region in Rd in which the process occurs is divided into cells (random Johnson-Mehl tessellation [34,42]; see also [43]), and interfaces (n-facets, n = 0, 1, 2, . . . , d) at diﬀerent Hausdorﬀ dimensions(cells, faces, edges, vertices) appear (for a planar process, see Figure 18). As above, we may describe quantitatively the tessellation by means of mean densities of the n-facets with respect to the d-dimensional Lebesgue measure [42]. We may call cell of a random tessellation any element of a family of RACS’s partitioning the region E in such a way that any two distinct elements

152

V. Capasso

of the family have empty intersection of their interiors. It is clear that this last deﬁnition may be used also in the static (time independent) case. Let us now introduce a rigorous concept of “interface” at diﬀerent Hausdorﬀ dimensions. Deﬁnition 6. An n-facet at time t (0 ≤ n ≤ d) is the non-empty intersection between m + 1 cells, with m = d − n. Note that in the previous deﬁnition • • • •

d = dimension of the space in which the tessellation takes place n = Hausdorﬀ dimension of the interface under consideration m + 1 = number of cells that form such an interface for n = d a d-facet is simply a cell

(see also Figure 18). Consider now the union of all n-facets at time t, Ξn (t). For any Borel set B in IRd one can deﬁne the mean n-facet content of B at time t as the measure Md,n (t, B) = EΞ [λn (B ∩ Ξn (t))]

(14)

where λn is the n-dimensional Hausdorﬀ measure (coinciding with the ndimensional Lebesgue measure ν n n = d, d − 1). Note that, with the previous deﬁnitions, Ξd (t) ≡ Ξ(t), so that Md,d (t, B) is the d-dimensional volume of the portion of the set B occupied by cells at time t. Suppose that the kinetic parameters of the birth-and-growth process are such that Md,n admits a density µd,n (t, x) with respect to ν d , the standard d-dimensional Lebesgue measure on IRd , i. e. for any Borel set B Md,n (t, B) = µd,n (t, x)dx, (15) B

then the following deﬁnition is meaningful.

Figure 18. n−facets for a tessellation in IR2

Birth-and-Growth Processes

153

Deﬁnition 7. The function µd,n (t, x) deﬁned by (15) is called local mean nfacet density of the (incomplete) tessellation at time t. In particular µd,d (t, x) is the mean local volume density of the occupied region at time t, and µd,d−1 (t, x) is the surface density of the cells. It is still an open problem, in general, to obtain evolution equations for these densities. Under suﬃcient regularity conditions for the birth-and-growth model analyzed above, the following evolution equations can be obtained [40, 42]. ∂ (hm+1 (x, t)) µd,n (x, t) = cd,n (1 − VV (t, x))(G(x, t))−m ∂t (m + 1)! where cd,n is a constant depending only on the space dimension, and hk (x, t) = (h(x, t))k , f or k = 2, 3, . . . ;

(16)

h1 (x, t) has a diﬀerent expression.

8 Interaction with an Underlying Field In most real cases spatial heterogeneities are induced because of the dependence of the kinetic parameters of the birth-and-growth process upon an underlying ﬁeld φ(x, t) (chemicals, nutrients, etc.) : G(x, t) = G(φ(x, t)),

α(x, t) = α :(φ(x, t)).

Viceversa, the birth-and-growth process may induce a source term in the evolution equation of the underlying ﬁeld ∂ φ = div (κ∇φ) + g[ρ, (δΘt )t ], ∂t

in R+ × E,

subject to suitable boundary and initial conditions. Here (δΘt )t denotes the (distributional) time derivative of the indicator function δΘt of the growing region Θt at time t, and ρ denotes some relevant parameter or family of parameters (note that the above equation has to be understood in a weak sense). Also the parameters in the evolution equation of the underlying ﬁeld may depend upon the evolving “phase”, i.e., if ρ1 , and κ1 denote the parameters in the growing mass and ρ2 , and κ2 the ones in the “empty” space, one should write ρ = δΘt ρ1 + (1 − δΘt )ρ2 ,

κ = δΘt κ1 + (1 − δΘt )κ2 .

This equation is now a random diﬀerential equation, since all parameters, and the source term depend upon the stochastic geometric process Θt . A direct

154

V. Capasso

consequence is the stochasticity of the underlying ﬁeld, and viceversa the stochasticity of the kinetic parameters. This strong coupling between the underlying ﬁeld and the birth-andgrowth process, makes the previous theory for the hazard function, and consequently, for the evolution of mean geometric densities not directly applicable, being now the kinetic parameters of the process themselves stochastic. Multiple scales and hybrid models For many practical tasks, the stochastic models presented above, which are able to describe the full process, are too sophisticated. On the other hand in many practical situations multiple scales can be identiﬁed. As a consequence it suﬃces to use averaged quantities at the larger scale, still using stochastic quantities at the lower scales. The advantage of using averaged quantities at the larger scale is convenient, both from a theoretical point of view, and from a computational point of view. Under typical conditions,we may assume that the typical scale for diﬀusion of the underlying ﬁeld (macroscale) is much larger than the typical grain size (microscale). This allows to approximate the full stochastic model by a hybrid system. Under these conditions a mesoscale may be introduced, which is suﬃciently small with respect to the macroscale of the underlying ﬁeld and suﬃciently large with respect to typical grain size. First this means that the substrate may be considered approximately homogeneous at this mesoscale. A typical size xmeso on this mesoscale satisﬁes xmicro << xmeso << xmacro , where xmicro and xmacro are typical sizes for single grains and for the ﬁeld diﬀusion. This typical feature of the process is illustrated in the ﬁgure.

Figure 19. Typical scales in the birth-and-growth process

Birth-and-Growth Processes

155

It makes sense to consider a [numerical] discretization of the whole space in subregions Bi , i = 1, . . . , L at the level of the mesoscale, i.e. small enough that the space variation of the underlying ﬁeld φ inside Bi may be denied; essentially this corresponds to approximating the contribution due to the growth process by its local mean value, i.e. by the the mean rate of phase change in the equation for φ, δΘt (x) E[δΘt (x)] = VV (x, t), (for a more rigorous discussion on this item we refer to [20]). For the parameters, we take the corresponding averaged quantities ρ:(x, t) = VV (x, t)ρ1 + (1 − VV (x, t))ρ2 , κ :(x.t) = VV (x, t)κ1 + (1 − VV (x, t))κ2 . If we now substitute all stochastic quantities in the equation for the underlying ﬁeld by their corresponding mean values, we obtain an initial-boundary value problem for a parabolic partial diﬀerential equation: ∂ : ∂ : + g[: (φ) = div (: κ∇φ) ρ, VV ], ∂t ∂t in E × R+ , d = 1, 2, 3, supplemented by suitable boundary conditions and initial conditions. We have expressed the essential diﬀerence between the (generalized) functions δΘt , and φ and their deterministic counterparts in the averaged equations : by using VV , and φ. In order to solve the above equation we need to provide an evolution equation for the mean volume density VV (x, t). We may notice that the above system provides now a deterministic ﬁeld : t) in E × R+ . Once we approximate φ with its deterministic counterpart, φ(x, we are given deterministic ﬁelds for the kinetic parameters : t)), and G(x, t) = G( : t)). : φ(x, α(x, t) = α :(φ(x, With these parameters the birth-and-growth process is now again stochastically simple. We may then refer to the previous theory, to obtain the hazard function h(x, t), and consequently the evolution equations for the mean geometric densities, in terms of these ﬁelds α(x, t) and G(x, t). In particular we reobtain the required evolution equation (13) for the volume density, ∂VV (x, t) = (1 − VV (x, t))G(x, t)Sex (x, t), ∂t subject to trivial initial conditions.

156

V. Capasso

For Sex (x, t) we will use the expression obtained above for the evolution with deterministic ﬁelds G and α. This approach is called “hybrid”, since we have substituted the stochastic underlying ﬁeld φ(x, t) given by the full system by its “averaged” counterpart : t). One should check that the hybrid system is fully compatible with φ(x, the rigorous derivation of the evolution equation for VV . In fact, once we substitute in the equation for φ the deterministic volume density VV , and the deterministic averaged parameters, we obtain a linear equation for φ; we may : apply the expectation operator and easily obtain the given equation for φ. We refer to [13] for further details on a mathematical theory of the averaged model.

9 Numerical Simulations As a concrete example, the outcomes of numerical simulations are shown below, for the case of the crystallization of a polymer melt coupled with a temperature ﬁeld, by applying a cooling temperature at the boundary of a bounded domain. ∂ ∂ (: cρ:T:) = div (: κ∇T:) + (h: ρVV ), ∂t ∂t in E × R+ , d = 1, 2, 3, supplemented by the boundary condition ∂ T: : T: − Tout ), = β( ∂n

(17)

on ∂E × R+ , and initial condition VV = 0 T: = T 0 in E × {0}, usually with T 0 (x) ≥ Tm for all x ∈ E. In the numerical simulations crystallization is considered in a rectangular domain, whose length is twice the width (Ω = (0, L) × (0, 2L)). We performed simulations using a uniform temperature Tout . For all material parameters we used measurements for isotactic polypropylene [8, 9]. 9.1 Inverse problems Inverse problems regarding the functional dependence of the kinetic parameters of the crystallization process upon an underlying temperature ﬁeld have been analyzed in [10].

Birth-and-Growth Processes

157

Figure 20. Setup of the numerical example

145 140 135 130 125 120 115 2 1.8

1 1.6

0.8 0.6

1.4 0.4

1.2

0.2 1

0

Figure 21. Temperature after 10 minutes. Because of the symmetry, only the upper part of the rectangle ((0, L) × (L, 2L)) is plotted

crystallinity 0

0

0.2

0.2

0.4

0.4

0.6

1

0.9

1.8

0.8

1.7

0.7

1.6

0.6

1.5

0.5

0.6

0.8 1

2

1.9

0.8

0

0.5

1

1

0

t=2 sec

0.5

1

t=4 sec

0

0

1.4

0.4

0.2

0.2

1.3

0.3

0.4

0.4

1.2

0.2

0.6

0.6 1.1

0.8

1

1

0.1

0.8

0

0.5

t=8 sec

1

1

0

0.5

0

0.2

0.4

0.6

0.8

1

0

1

t=16 sec

Figure 22. Degree of crystallinity after 5 and 15 minutes

9.2 Optimal control of the ﬁnal morphology Again as an example, we may refer to [11] where we have considered the evolution equation of the (d − 1)-facet density (for the case d = 2, 3 only. Consider the hybrid model for which the kinetic parameters depend upon a

158

V. Capasso

Figure 23. Plot of log(γ), where γ is the mean density of crystals interfaces in R2 , after 5 minutes

deterministic temperature ﬁeld) so that we may use teh following evolution equation for the d − 1 interface density, ∂ µd,d−1 (x, t) = cost [h(x, t)]2 (1 − VV (x, t))[G(x, t)]−1 ∂t 2 = cost G(x, t)(1 − VV (x, t))Sex (x, t) If we denote by γ(x, t) = µd,d−1 (x, t), we may introduce the following cost functional γ(x, τf )dx many and small crystals C(γ, q) := − E

k + 2

|∇γ(x, τf |2 dx

spatial uniformity

E

+ αD(q − q ∗ )

cooling strategy

where, τf denotes the ﬁnal time, and D is a norm or seminorm, a measure of the “distance” between the temperature cooling proﬁle q and the chosen cooling strategy q ∗ . The above system is subject to physical limitations ∂q ≤ 0, on ∂E × (0, τf ) ∂t and to the required ﬁnal solidiﬁcation −a ≤

VV (x, τf ) ≥ 1 − b, in E.

Birth-and-Growth Processes

159

Acknowledgments It is a pleasure to acknowledge useful discussions with many Colleagues, at diﬀerent Universities and Research Centres. Particular thanks are due to Professor Martin Burger (Linz), Professor Alessandra Micheletti (Milan), Dr. Elena Villa (Milan), and Dr. Giacomo Aletti (Milan). The warm hospitality of the Austrian Academy of Sciences at RICAM (Radon Institute for Computational and Applied Mathematics) in Linz, chaired by Professor Heinz Engl, is acknowledged with pleasure.

References 1. Ambrosio, L., Capasso, V., Villa, E.: On the approximation of geometric densities of random closed sets. RICAM Report N. 2006-14, Linz, Austria (2006). 2. Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems. Clarendon Press, Oxford (2000). 3. Anderson, A. R. A.: Eﬀects of cell adhesion on solid tumour geometry, In: Sekimura,T. et al(eds) Morphogenesis and Pattern Formation in Biological Systems. Springer-Verlag, Tokyo (2003). 4. Avrami, M.: Kinetics of phase change. Part I, J. Chem. Phys. 7, 1103–1112 (1939). 5. Baddeley, A. J.: A crash course in stochastic geometry. In: BarndorﬀNielsen, O. E. et al. (eds) Stochastic Geometry. Likelihood and Computation. Chapman & Hall/CRC, Boca Raton (1999). 6. Baddeley, A. J., Molchanov, I. S.: On the expected measure of a random set. In: Proceedings of the International Symposium on Advances in Theory and Applications of Random Sets (Fontainebleau, 1996). World Sci. Publishing, River Edge, NJ, (1997), 3–20. 7. Beneˇs, V., Rataj, J.: Stochastic Geometry. Kluwer, Dordrecht (2004). 8. Burger, M.: Growth fronts of ﬁrst-order Hamilton-Jacobi equations. SFB Report 02-8, J. Kepler University, Linz, Austria (2002). 9. Burger, M., Capasso, V.: Mathematical modelling and simulation of nonisothermal crystallization of polymers. Mathematical Models and Methods in Applied Sciences, 6, 1029-1053 (2001). 10. Burger, M., Capasso, V., Engl, H.: Inverse problems related to crystallization of polymers. Inverse Problems, 15, 155-173 (1999). 11. Burger, M., Capasso, V., Micheletti, A.: Optimal Control of Polymer Morphologies. Journal of Engineering Mathematics, 49, 339-358 (2004). 12. Burger, M., Capasso, V., Micheletti, A.: An extension of the KolmogorovAvrami formula to inhomogeneous birth-and-growth processes. Math Everywhere, Part I, Springer, 63-76 (2007). 13. Burger, M., Capasso, V., Pizzocchero, L.: Mesoscale averaging of nucleation and growth models. SIAM J. on Multiscale Modeling and Simulation, 5, 564–592 (2006). 14. Burger, M., Capasso, V., Salani, C.: Modelling multi-dimensional crystallization of polymers in interaction with heat transfer, Nonlinear Analysis: Real World Application, 3, 139–160 (2002).

160

V. Capasso

15. Capasso, V. (ed): Mathematical Modelling for Polymer Processing. Polymerization, Crystallization, Manufacturing. Mathematics in Industry, Vol. 2, SpringerVerlag, Heidelberg (2003). 16. Capasso, V., Micheletti, A.: Local spherical contact distribution function and local mean densities for inhomogeneous random sets. Stochastics and Stoch. Rep., 71, 51–67(2000). 17. Capasso, V., Micheletti, A., Eder, G.: Polymer crystallization processes and incomplete Johnson-Mehl tessellations. In: Arkeryd,L., Bergh,J., Brenner, P., R. Petterson (eds) Proceedings of ECMI98. B.G. Teubner Stuttgart, Leipzig (1999), 130–137. 18. Capasso, V., Micheletti, A.: Stochastic Geometry of Spatially Structured Birthand-Growth Processes. Application to Crystallization Processes. In: Merzbach, E. (ed) Topics in Spatial Processes. Lecture Notes in Mathematics, Vol. 1802, Springer-Verlag, Heidelberg (2003). 19. Capasso, V., Micheletti, A.: Stochastic geometry and related statistical problems in Biomedicine. In: A. Quarteroni et al (eds) Complex Systems in Biomedicine. Springer, Milano (2006). 20. Capasso, V., Morale, D., Salani, C.: Polymer crystallization processes via many particle systems. In: Capasso,V. (ed) Mathematical Modelling for Polymer Processing. Polymerization, Crystallization, Manufacturing. Springer-Verlag, Heidelberg (2003). 21. Capasso, V., Salani, C.: Stochastic-birth-and-growth processes modelling crystallization of polymers with spatially heterogeneous parameters. Nonlinear Analysis: Real World Application, 1, 485–498 (2000). 22. Capasso, V., Villa, E.: Survival functions and contact distribution functions for inhomogeneous stochastic geometric marked point processes, Stoch. Anal. Appl., 23, 79–96 (2005). 23. Capasso, V., Villa, E.: Continuous and absolutely continuous random sets. Stoch. Anal. Appl., 24, 381–397 (2006). 24. Capasso, V., Villa, E.: On the geometric densities of random closed sets, 2005. RICAM Report 13/2006, Linz, Austria. To appear on Stoch. Anal. Appl. (2008). 25. Chaplain, M. A. J., Anderson, A. R. A.: Modelling the growth and form of capillary networks. In: Chaplain, M.A.J. et al (eds) On Growth and Form. Spatiotemporal Pattern Formation in Biology. John Wiley & Sons, Chichester (1999). 26. Corada, M., Zanetta, L., Orsenigo, F., Breviario, F., Lampugnani, M. G., Bernasconi, S., Liao, F., Hicklin, D. J., Bohlen, P., and Dejana, E.: A monoclonal antibody to vascular endothelial-cadherin inhibits tumor angiogenesis without side eﬀects on endothelial permeability. Blood. 100, 905-911 (2002). 27. Crosby, C. V., Fleming, P., Zanetta, L., Corada, M., Giles, B., Dejana, E., Drake, C.: VE-cadherin is essential in the de novo genesis of blood vessels (vasculogenesis) in the allantoids. Blood 105, 2771-2776 (2005). 28. Eder, G.: Mathematical modelling of crystallization processes as occurring in polymer processing. Nonlinear Analysis 30 3807-3815 (1997). 29. Falconer, K. J.: The Geometry of Fractal Sets. Cambridge University press, Cambridge (1985). 30. Friedman, L. H., Chrzan, D. G.: Scaling theory of the Hall-Petch relation for multilayers. Phys. Rev. Letters, 81, 2715–2718 (1998). 31. Friedman, A., Velasquez, J. L.: A free boundary problem associated to crystallization of polymers. Indiana Univ. Math. Journal 50, 1609–1650 (2001).

Birth-and-Growth Processes

161

32. Hahn, U., Micheletti, A., Pohlink, R., Stoyan, D., Wendrock, H.: Stereological Analysis and Modeling of Gradient Structures. J. of Microscopy, 195, 113-124 (1999). 33. Jain, R. K., Carmeliet, P. F.: Vessels of Death or Life. Scientiﬁc American 285, 38-45 (2001). 34. Johnson, W. A., Mehl, R. F.: Reaction Kinetics in processes of nucleation and growth. Trans. A.I.M.M.E., 135, 416–458 (1939). 35. Kolmogorov, A. N.: On the statistical theory of the crystallization of metals. Bull. Acad. Sci. USSR, Math. Ser. 1, 355–359 (1937). 36. Kolmogorov, A.N.: Foundations of the Theory of Probability. Second English edition, Chelsea Publishing Company, New York (1956). 37. Lorenz, T.: Set valued maps for image segmentation. Comput. Visual. Sci., 4, 41–57 (2001). 38. Matheron, G.: Les Variables Regionalis´ees et leur Estimation. Masson et Cie, Paris (1965). 39. Matheron, G.: Random Sets and Integral Geometry. John Wiley & Sons, New York, 1975. 40. Meijering, J. L.: Interface area, edge length, and number of vertices in crystal aggregates with random nucleation. Philips Res. Rep. , 8, 270–290 (1953). 41. Micheletti, A.: Mathematical modelling and simulation of polymer crystallization processes. Nonlinear Analysis, 47, 1761–1772 (2001). 42. Møller, J.: Random Johnson-Mehl tessellations. Adv. Appl. Prob., 24, 814–844 (1992). 43. Møller, J.: Lectures on Random Voronoi Tessellations. Lecture Notes in Statistics, Springer-Verlag, New York (1994). 44. Serini, G. et al: Modeling the early stages of vascular network assembly. EMBO J., 22, 1771-1779 (2003). 45. Sokolowski, J., Zolesio, J.-P.: Introduction to Shape Optimization. Shape Sensitivity Analysis. Springer, Berlin (1992). 46. Stoyan, D., Kendall, W.S., Mecke, J.: Stochastic Geometry and its Application. John Wiley & Sons, New York (1995). 47. Su, B.: Weak solutions os a polymer crystal growth model. Preprint, 2006. 48. Sun, S. et al: Nonlinear behaviors of capillary formation in a deterministic angiogenesis model. Nonlinear Analysis, 63, e2237-e2246 (2005). 49. Thompson, D.W.: On Growth and Form. Cambridge University Press, Cambridge, (1917). 50. Ubukata, T.: Computer modelling of microscopic features of molluscan shells. In: Sekimura, T. et al (eds) Morphogenesis and Pattern Formation in Biological Systems. Springer-Verlag, Tokyo (2003), 355-368. 51. Z¨ ahle, M.: Random processes of Hausdorﬀ rectiﬁable closed sets. Math. Nachr., 108, 49–72 (1982).

Inverse Problem of Lindenmayer Systems on Branching Structures Somporn Chuai-Aree1,3 , Willi J¨ ager1 , Hans Georg Bock1 , and Suchada 2 Siripant 1

2 3

Interdisciplinary Center for Scientiﬁc Computing (IWR), University of Heidelberg, Im Neuenheimer Feld 368, 69120 Heidelberg, Germany [email protected], [email protected], [email protected], [email protected] Advanced Virtual and Intelligent Computing (AVIC), Chulalongkorn University, Phayathai, Bangkok 10330, Thailand [email protected] Faculty of Science and Technology, Prince of Songkla University, Muang, Pattani 94000, Thailand [email protected]

Abstract Lindenmayer systems (L-systems) have been used to generate and describe the geometrical structures for example, branch structures, graph structures, both in biology and medicine. The L-systems consist of a number of iteration n, an initial string ω and a set of production rules P . The production rules are a set of predecessor a and successor χ. They are written as the form a ←− χ. The production rules have been deﬁned and analyzed from the real structure by a structure decomposition manually. The rules are compiled and transformed to represent 2D and 3D structure. However, the complicated structures are not easy to decompose and time consuming to get such production rules. In this paper, we propose an algorithm to solve this problem automatically from 2D input images by given initial pixels or voxels. The data acquisition can be retrieved from 2D image scanner, camera, CT-Scanner or MRI. The methods namely Region and Volume Growing Methods are applied to bound the target object. The skeletonization process is an important part in our reconstruction. The L-systems are reconstructed for representing the structure from 2D input image or sliced images of the volume data.

1 Introduction In recent years there has been signiﬁcant interest in using graph-based abstraction of skeleton for qualitative shape recognition. The application of such methods has aﬀected very much in biology, medicine, image recognition, etc. There are many diﬀerent methods to reconstruct branching structures from input data. Some methods are very sensitive to noisy disturbance. Methods based on Voronoi techniques in Russ [Rus95] preserve topology, but heuristic

164

S. Chuai-Aree et al.

pruning measures are introduced to remove unwanted edges. Methods based on Euclidean distance functions in [Rus95] can localize skeletal points accurately, but often at the cost of altering the object’s topology. In this paper we introduce a region growing method, which is represented by the method of ﬁnite diﬀerence method. It is easy to implement in order to calculate the target object. Lindenmayer systems (L-systems) have been widely used for describing the geometrical structures, i.e. Tree-like structures and network structures. Production rules are deﬁned by the decomposition of complex structures to many simple components. Figure 1 shows the diagram of L-systems for describing the plant structure in Chuai-Aree et al. [CJB05a]. The inverse problem of L-systems is the method to reconstruct the branch structures from 2D input image or sliced images of the volume data to a set of production rules.

Figure 1. The transformation of L-systems, structure and its inverse problem

This paper is organized into nine parts as follows: Lindenmayer systems, the ﬂow diagram of reconstruction of branching structure, anisotropic diffusion ﬁltering, region and volume growing method, skeletonization process, construction of branching structure, resolution reduction, experiment and results, and conclusion and further works, respectively.

2 Lindenmayer Systems (L-systems) Lindenmayer systems (L-systems) were ﬁrst introduced by Aristid Lindenmayer in 1968 as a mathematical theory of plant development. They have attracted the attention of the computer scientists who investigated them through formal language theory. Specialists in computer graphics, particularly Prusinkiewicz have used the L-systems to produce realistic images of trees, bushes, ﬂowers and some images are well illustrated in Mech et al. [Mech96]. For a three dimensional movement, a component is free to move in any X, Y,

Inverse Problem of Lindenmayer Systems on Branching Structures

165

or Z direction. Hence, the directional angles in this case are three directions. The initial three directional angles, ax , ay , az of the ﬁrst unit movement are set with respect to X, Y and Z axes, respectively. The directional angles of the other unit movements are computed in a similar fashion to that of the two dimensional case. Three constants, dx , dy , dz , are used to adjust the direction of the unit movements. In addition, the physical location of the unit movement must be represented by XYZ coordinates. Therefore, a unit movement described in the Cartesian coordinate system is denoted by a hexaplet (X,Y,Z,ax , ay , az ). After adding/subtracting dx , dy , dz , the new XYZ coordinates of the movement is computed by multiplying the coordinates of the current movement with the rotation matrices Rx , Ry , Rz shown in equation (1). The rotation of a unit movement and its direction are captured in a symbolic form similar to two dimensional case [Prus94] by using these symbols /, \, &,ˆ, +, −, |. The meaning of each symbol is explained in table 1. ⎡

⎤ 1 0 0 Rx (θ) = ⎣ 0 cos(θ) − sin(θ) ⎦ , 0 sin(θ) cos(θ) ⎡

⎤ cos(θ) 0 − sin(θ) ⎦, 0 Ry (θ) = ⎣ 0 1 sin(θ) 0 cos(θ)

(1)

⎡

⎤ cos(θ) sin(θ) 0 Rz (θ) = ⎣ − sin(θ) cos(θ) 0 ⎦ 0 0 1

Table 1. Symbols and their meanings are used in L-systems deﬁnition Symbols I(a, b, l) I(l) +(δz ) −(δz ) &(δy ) ˆ(δy ) \(δx ) /(δx ) | [ ]

Meaning To generate the cylinder with begin radius a, end radius b, and length l To generate the cylinder with constant radius and length l Roll counterclockwise by angle δz , using rotation matrix Rz (δz ) Roll clockwise by angle δz , using rotation matrix Rz (−δz ) Roll counterclockwise by angle δy , using rotation matrix Ry (δy ) Roll clockwise by angle δy , using rotation matrix Ry (−δy ) Roll counterclockwise by angle δx , using rotation matrix Rx (δx ) Roll clockwise by angle δx , using rotation matrix Rx (−δx ) Roll back, using rotation matrix Ry(180) Push the current state of the turtle onto a pushdown stack to create a new branch Pop a state from the stack and make it the current state of the turtle to close the branch

166

S. Chuai-Aree et al.

3 The Flow Diagram of Reconstruction Process of Branching Structure This paper proposes the method for reconstructing the branching structure from an input image and 3D volume data. There are seven steps of each procedure in ﬁgure 2. The ﬂow diagram starts by reading the input image for 2D and sliced images for 3D. The preprocessing step is applied by using anisotropic diﬀusion ﬁltering. Either region or volume growing method is started by the set of given initial starting points. After the growing process stopped, thinning and skeletonization are applied, respectively. The network is reconstructed and network resolution is reduced. The L-system production rules are generated and ready for further uses.

Figure 2. Flow diagram of reconstruction process of branching structure

The example input images and volume sliced images are shown in ﬁgure 3. 3.1 Procedure of Reconstruction from 2D Image or 3D Volume Data The reconstruction supports both 2D image and 3D volume data. There are seven consecutive steps as follows: 1. Read all sliced input images with the width W, height H, and Length L (L = 1 for 2D, L > 1 for 3D), 2. Do preprocessing the input image or volume by smoothing, denoising, edge/surface preservation by anisotropic diﬀusion ﬁltering, 3. Proceed with the region (2D) or volume (3D) growing method by given pixels pi (x, y) or voxels pi (x, y, z), 4. Do thinning and skeletonization processes,

Inverse Problem of Lindenmayer Systems on Branching Structures

167

Figure 3. Example of input images and volume sliced images of volume data, neuron and rice root images from [Str04], leaf network from [Ash99], actual soil volume data from P. Kolesik

5. Generate the network of branching structure and resolution reduction. 6. Generate L-system production rules or L-string. 7. Print L-system production rules or L-string to the output ﬁle.

4 Anisotropic Diﬀusion Filtering Anisotropic diﬀusion ﬁltering is the method for smoothing, edge preservation, and denoising process [PM90] and [Wei98]. In this paper we use the technique of the Perona-Malik (PM) model in [PM90] to process the smoothing, edge preservation and denoising before doing region or volume growing method. Since the region or volume growing method can be grown very easily if the input image was done by anisotropic diﬀusion process. The nonlinear diﬀusion ﬁltering of PM-model is based on the equation (2) ∂t u = div(g(|∇u|2 )∇u),

(2)

and it uses diﬀusivities such as g(s2 ) =

1 1+s2 /λ2 .

(3)

The λ is conduction coeﬃcient, where 20 ≤ λ ≤ 100, and s = |∇u| is absolute gradient of gray intensity u. One advantage of nonlinear diﬀusion ﬁltering is the combination of disconnected lines. The example of disconnected lines of branching structure is shown in ﬁgure 4. The result after diﬀusion ﬁltering is useful for region growing method. The equation (2) is solved by ﬁnite diﬀerence method.

168

S. Chuai-Aree et al. Original Image

Non Linear Diffusion

240

50

50

240

100

100 220

220 150

150 200

200

200

200 180

250

250 180

300

300 160

350

350

160

140 400

400 120

450

100

200

300

400

500

140

450

100

200

300

400

500

Time = 500, Step = 50

Figure 4. Example of nonlinear diﬀusion ﬁltering for combination of disconnected lines

5 Region and Volume Growing Method This method is called “Seeded Region Growing” and it was introduced by Rolf Adams and Leanne Bischof [AB94]. They presented a new method for segmentation of intensity images, which is robust, rapid and free of tuning parameters. These characteristics allow implementation of one very good algorithm which could be applied to a large variety of images. This method, however, requires selection of seed regions, what has to be done manually and it classiﬁes this approach to the class of semiautomated algorithms. The algorithm grows these seed regions until all image pixels have been processed. The region growing method is kind of a Level-Set Method [Set99]. The input images of this paper are assumed to contain the branch structures in the images. The system is started by giving the initial starting pixels on the branching structure. The initial pixels will be growing around the given points and diﬀuse under the condition of gray intensity of each pixel. If the considered pixel satisﬁes the condition in branching set, that pixel will be taken into the set of branching structure. The development of growing process is computed by the ﬁnite diﬀerence method. In 3D case, the pixel is considered as the voxel. The color input image is converted to gray-scale image as ﬁgure 5 by using the following equation (4) Gray = Round(0.299 ∗ Red + 0.587 ∗ Green + 0.114 ∗ Blue)

(4)

Here Gray, Red, Green, and Blue are intensity of resulting gray image, red, green, and blue channel of color image, respectively. Gray is the integer value

Inverse Problem of Lindenmayer Systems on Branching Structures

169

Figure 5. Example of input image, the color image (left) converted to the gray-scale image (right)

from 0 (black) to 255 (white) value. Round function returns the integer value for Gray intensity. Red, Green, and Blue are also integer value from color image. The gray intensity bar is shown in ﬁgure 6.

Figure 6. The gray-scale intensity bar from zero intensity (black) to 255 intensity (white) with the range [a, b] and current pixel p

We deﬁne I, B, G as a set of all pixels in input gray-scale image for 2D or set of all voxels in input grey-scale volume image for 3D, set of pixels which is branching structure, and set of given initial pixels, respectively. So then we have G ⊆ B and B ⊆ I. The description of each set is given below. ⎧ {pi,j | (0 ≤ pi,j ≤ 255) ∧ (1 ≤ i ≤ W )∧ ⎪ ⎪ ⎨ (1 ≤ j ≤ H)} f or 2D I = | (0 ≤ p ≤ 255) ∧ (1 ≤ i ≤ W )∧ {p ⎪ i,j,k i,j,k ⎪ ⎩ (1 ≤ j ≤ H) ∧ (1 ≤ k ≤ L)} f or 3D (5) B = {p ∈ I | (a ≤ Int(p) ≤ b) ∧ (0 ≤ a ≤ 255) ∧ (0 ≤ b ≤ 255) ∧ (a ≤ b)} G = {∃p | p ∈ B}

170

S. Chuai-Aree et al.

Here W, H, L, a and b are the width, height, depth of the image, the minimum and maximum intensity value of branching structure set, respectively. The value a, b and p are illustrated in the ﬁgure 6. The function Int(p) is the intensity of pixel p.

Figure 7. Setup of the input image and given initial pixel

Figure 7 shows the setup of an input image and shows the possible eight directions of region growing process around the initial given point. The step by step of region growing process is represented in ﬁgure 8. 5.1 Algorithm for Region Growing Process in 2D Image The region growing process is given in the following steps. 1. Load all pixels Pi,j from input image, convert to gray intensity value and store to an array M as original gray-scale array. The size of array M is W×H. 2. Initialize zero array M Old and M N ew for calculating the growing process. The size of array M Old and M N ew is W×H. 3. Deﬁne the set of given initial pixels G by user and set the value of the initial points in array M Old to the medium intensity, e.g. 128. 4. Deﬁne the condition of a and b in equation (5) for region growing process by user. The considered interval of branching structure will be in [InitialV alue − a, InitialV alue + b]. The InitialV alue is gray intensity of the last given pixel.

Inverse Problem of Lindenmayer Systems on Branching Structures

171

Figure 8. Growing process of each step

5. Start the region growing process. 6. Set Oldpoint = 0, Newpoint = 1, Iteration = 0 While (Oldpoint = Newpoint) do { Set Oldpoint=Newpoint , and Iteration = Iteration + 1 For all elements in M do { If (Mi,j ≥ MGx,Gy − a) and (Mi,j ≤ MGx,Gy + b) Then { • Region growing of each pixel pi,j in G by considering the 8-neighboring pixels, we use the following equation (6) to proceed the growing process N ew Old Old Old Mi,j = Mi,j + Round(0.99 ∗ (Mi−1,j + Mi+1,j Old Old Old Old +Mi,j−1 + Mi,j+1 + Mi−1,j−1 + Mi+1,j−1 Old Old Old +Mi−1,j+1 + Mi+1,j+1 − 8 ∗ Mi,j ))

(6)

• Take all new neighboring points into G, which satisfy the condition in N ew B. If Mi,j = 0, it means the pixel pi,j satisﬁes the condition. • Count all growing points in G and set the value to Newpoint. } // If } // For } // While 7. Update B = G and set B is stored information of branching structure satisfying the condition in B. 8. The array M N ew and B will be used for skeletonization process. 5.2 Algorithm for Volume Growing Process in 3D Volume Data Figure 9 shows an example of soil volume with canola root scanned by CT scanner. The given size of input volume is 512x512x1313. The volume growing process is given in the following steps.

172

S. Chuai-Aree et al.

Figure 9. Example of CT volume data of plant root

1. Load all voxels Pi,j,k from sliced input images, convert to gray intensity value and store to an array M as original gray-scale array. The size of array M is W×H×L. 2. Initial zero array M Old and M N ew for calculating the growing process. The size of array M Old and M N ew is W×H×L. 3. Deﬁne the set of given initial voxels G by user and set the value of the initial points in array M Old to the medium intensity, e.g. 128. 4. Deﬁne the condition of a and b in equation (5) for volume growing process by user. The considered interval of branching structure will be in [InitialV alue − a, InitialV alue + b]. The InitialV alue is gray intensity of the last given voxel. 5. Start the volume growing process. 6. Set Oldpoint = 0, Newpoint = 1, Iteration = 0 While (Oldpoint = Newpoint) do { Set Oldpoint=Newpoint , and Iteration = Iteration + 1 For all elements in M do { If (Mi,j,k ≥ MGx,Gy,Gz − a) and (Mi,j,k ≤ MGx,Gy,Gz + b) Then { • Region growing of each voxel pi,j,k in G by considering the 26neighboring voxels, we use the following equation (7) to proceed the growing process N ew Old Old Old Mi,j,k = Mi,j,k + Round(0.99 ∗ (Mi−1,j,k−1 + Mi+1,j,k−1 Old Old Old +Mi,j−1,k−1 + Mi,j+1,k−1 + Mi−1,j−1,k−1 Old Old Old +Mi+1,j−1,k−1 + Mi−1,j+1,k−1 + Mi+1,j+1,k−1 Old Old Old Old +Mi,j,k−1 + Mi−1,j,k + Mi+1,j,k + Mi,j−1,k Old Old Old Old +Mi,j+1,k + Mi−1,j−1,k + Mi+1,j−1,k + Mi−1,j+1,k Old Old Old Old +Mi+1,j+1,k + Mi−1,j,k + Mi+1,j,k + Mi,j−1,k Old Old Old Old +Mi,j+1,k + Mi−1,j−1,k + Mi+1,j−1,k + Mi−1,j+1,k Old Old Old +Mi+1,j+1,k + Mi+1,j+1,k+1 − 26 ∗ Mi,j,k ))

(7)

Inverse Problem of Lindenmayer Systems on Branching Structures

173

• Take all new neighboring voxels into G, which satisfy the condition in N ew = 0, it means the voxel pi,j,k satisﬁes the condition. B. If Mi,j,k • Count all growing voxles in G and set the value to Newpoint. } // If } // For } // While 7. Update B = G and set B is stored information of branching structure satisfying the condition in B. 8. The array M N ew and B will be used for skeletonization process. Figure 10 shows the sequence of volume growing process of root data in every several steps after applying algorithm 5.2. The remark for a very fast computation of region and volume growing method is simply calculated by using only M Old instead of M N ew on the left hand side in equation (6) and (7). The region and volume growing process can be accelerated for fast computation.

6 Skeletonization Process This section presents the method how to proceed the thinning process from set B in the previous section. After the thinning process was done, the skeleton of branching structure can be found. Skeletonization is the process of peeling oﬀ of a pattern as many pixels as possible without aﬀecting the original shape of the pattern. It means after pixels have been peeled oﬀ, the pattern is still connected with one pixel line width. The skeleton hence obtained must have the following properties: as thin as possible, connected, centered. In this paper, we have improved the so called Hilditch’s algorithm for skeletonization. 6.1 Hilditch’s Algorithm and Improvement There are two versions for Hilditch’s algorithm [Tou97]: one using a 4x4 window and the another one using a 3x3 window. Here we are concerned with the 3x3 window version. Hilditch’s algorithm consists of performing multiple passes on the pattern and on each pass. Consider the following 8-neighborhood of a pixel p1 from ﬁgure 11, we want to decide whether to peel oﬀ p1 or keep it as part of the resulting skeleton. For this purpose we arrange the eight neighbors of p1 in a clockwise order and we deﬁne the two functions: Fa (p1) and Fb (p1). The function Fa (p1) counts the number of (0,1) pattern corresponding to pixel p1. It means the two neighboring pixels are arranged from white pixel (zero) to black pixel (one). The function Fb (p1) counts only the number of non-zero neighbors of pixel p1. Each pixel pi will be set to one if it is black pixel, otherwise it will

174

S. Chuai-Aree et al.

Figure 10. Sequence of volume growing method of root data in every several steps

(a)

(b)

(c)

Figure 11. A setup of Hilditch’s algorithm, (a) point labels from p1 to p9, (b) Fa (p1) = 1, Fb (p1) = 2, (c) Fa (p1) = 2, Fb (p1) = 2

Inverse Problem of Lindenmayer Systems on Branching Structures

175

be set to zero since we need the multiplication (*) in the third and fourth conditions of the algorithm. 1. Function Fa (p1) returns the number of (0,1) patterns (or number of arrow in ﬁgure 12) in the sequence (p2,p3), (p3,p4), (p4,p5), (p5,p6), (p6,p7), (p7,p8), (p8,p9), (p9,p2), and 2. Function Fb (p1) gives a number of non-zero neighbors of pixel p1.

(a)

(b)

(c)

Figure 12. Function Fa (p1) and Fb (p1) from ﬁgure 11, (a) cycle of pattern from p2 to p9, (b) Fa (p1) = 1 (number of arrows represents (0,1) pattern), Fb (p1) = 2, (c) Fa (p1) = 2 (two arrows for two (0,1) patterns), Fb (p1) = 2

From ﬁgure 11(b) and 12(b), we have Fa (p1) = 1 since there is only one (0,1) pattern from pixel (p9, p2) and Fb (p1) = 2 because of the non-zero neighbors of pixel p2 and p3. The same procedure is applied to ﬁgure 11(c) and 12(c). It shows Fa (p1) = 2 and Fb (p1) = 2. The Hilditch’s algorithm checks all the pixels and decide to change a pixel from black to white (or change the value of 1 to 0) if it satisﬁes the following four conditions: 1. 2. 3. 4.

2 ≤ Fb (p1) ≤ 6, Fa (p1) = 1, p2 ∗ p4 ∗ p8 = 0 or Fa (p2) = 1, and p2 ∗ p4 ∗ p6 = 0 or Fa (p4) = 1, stop when nothing changes (no more pixels can be removed).

6.2 Algorithm for Skeletonization Process Before calculating the skeleton of an given object, the height ﬁeld map has to be proceeded as the following algorithm. Algorithm for Calculating Height Field Map for 2D Image 1. Trace the edge detection (in ﬁgure 13(b)) from array M and put to the set of boundary point E and other points to the set O when O = B − E.

176

S. Chuai-Aree et al.

(a)

(b)

(c)

Figure 13. The height ﬁeld map for the thickness of branching structure, (a) given object, (b) edge detection, (c) height ﬁeld map (black is short distance, white is long distance from boundary)

2. For all points j in O do { • initial dminj = W + H • For all points i in E do { ; - Calculating the distance di,j = (xj − xi )2 + (yj − yi )2 - if dminj > di,j then dminj = di,j } } Algorithm for Calculating Height Field Volume for 3D Volume Data 1. Trace the edge/surface detection from array M and put to the set of boundary voxel E and other voxels to the set O when O = B − E. 2. For all voxels j in O do { • initial dminj = W + H + L • For all voxels i in E do { ; - Calculating the distance di,j = (xj − xi )2 + (yj − yi )2 + (zj − zi )2 - if dminj > di,j then dminj = di,j } } Algorithm for Skeletonization Process for 2D Image 1. Take the array M N ew from the region growing process. 2. For all elements in M N ew do N ew = 0. - Set the value of each element in array M to 1 if Mi,j 3. Calculate height ﬁeld map (see ﬁgure 13 for 2D). 4. Start the thinning process For all elements in M N ew do { N ew Old Old - If M(i,j) = 0 then Mi,j = 1 else Mi,j =0

Inverse Problem of Lindenmayer Systems on Branching Structures

177

} Set Oldpoint = 1, N ewpoint = 0, and Iteration = 0 While (Oldpoint = N ewpoint) do { • Oldpoint = N ewpoint, Iteration = Iteration + 1 • Set zero array C with size H×W. • For all elements in M N ew { - Compute the Hilditch’s algorithm - Count N ewpoint } // For } // While. 5. Remove some jagged points along the skeleton of branching structure and updating B. Array M N ew and set B are stored the information of branch structure. 6. Record the thickness information of the branch structure from calculating height ﬁeld map. Algorithm for Skeletonization Process for 3D Volume Data In 3D volume data, we calculate skeleton by applying the peeling oﬀ process from outside and label the number of each layer. 1. Take the array M N ew from the volume growing process. 2. For all elements in M N ew do N ew = 0 for 3D. - Set the value of each element in array M to 1 if Mi,j,k 3. Calculate depth ﬁeld value (DFV) of each voxel by peeling oﬀ process. For all elements in M N ew do { - Set the depth ﬁeld value at position DF Vi,j,k = 1 - Count N p if depth ﬁeld value DF Vi,j,k = 1 } Iteration = 1 While (N p > 0) do { • Iteration = Iteration + 1 • For all elements in M N ew { - Count the number of its 26-neighbors (Cp) if DF Vi,j,k = 1, - If Cp < 20 then set the depth ﬁeld value of current voxel DF Vi,j,k = Iteration, } // For - Count N p if depth ﬁeld value DF Vi,j,k = 1 } // While. 4. Construct the skeleton from the maximum DF Vi,j,k to the smaller value by replacing the sphere and removing the smaller spheres inside the bigger sphere and connect them to its neighbors,

178

S. Chuai-Aree et al.

5. Remove some jagged points along the skeleton of branching structure and updating B. Array M N ew and set B are stored the information of branch structure, 6. Record the thickness information of the branch structure from calculating depth ﬁeld value.

7 Construction of Branching Structure After skeletonization process, the skeleton of branching structure is stored in array M new and set B. Since all the skeleton points are not connected to each other by the line connection yet, this section describes the algorithm to generate the network and remove some unnecessary points, which are on the same straight line. 7.1 Algorithm for Constructing the Branching Network 1. Initialize a stack S and start with a user supplied point for generating the network, 2. Calculate the nearest given starting point R in array M N ew and push point R into stack S, 3. Set point R as a root node of the branching structure T, 4. Set M old = M N ew for marking the path that was discovered, 5. While (stack S is not empty) do { Old = 0, • Pop a point Pi,j from the top of stack S and set the value of Mi,j • Look for all neighboring points of Pi,j with a value equal to 1, and push them to stack S (If there are no neighboring points of Pi,j , the point Pi,j will be set as terminated point.), • Set all neighboring points of Pi,j as children of Pi,j and mark the value of that point in array M Old to be 0, } // While 6. Finally, the network of branching structure T is constructed.

8 Resolution Reduction Up to now the network of branching structure T has been reconstructed, but it is still having a high resolution. In this section, we propose the algorithm to reduce the number of point in the network in algorithm 8.1. The L-string construction is given in algorithm 8.2. 8.1 Algorithm for Resolution Reduction 1. Consider the network of branching structure T,

Inverse Problem of Lindenmayer Systems on Branching Structures

179

2. Start the process at the root node R of branching structure T, 3. For all nodes in T do { • For every 3 nodes and each node has only one child, A is the parent of B and B is the parent of C, −−→ • If node A, B, and C are on the same line, then calculate the angle BA −−→ and BC, −−→ −−→ • If the angle between BA and BC ≤ (180 ± δ), then remove node B from T, where δ is the resolution angle for removal, } // For 4. Finally, the network of branching structure T is regenerated. 8.2 Algorithm for L-string Construction 1. 2. 3. 4. 5.

Read the network of branching structure T after reducing its resolution, Start the process at the root node T of branching structure T, Let A be the root node and B be the ﬁrst child of root node A, −−→ → − Calculate the angle δR between unit vector j and AB, For all nodes in T which have parent node and at least one child do { • Let B be the current node and A be the parent of node B, −−→ • Calculate the vector AB and its length DAB , • If node B has more than one child then – Rearrange all children of node B in the order of left branches “+(δ)I”, right branches “-(δ)I”, and middle branches “I” (has no angle) for L-string preparation and exchange all children of node B in the same order left,right and middle, respectively • For all children of node B do { – Let C be the current child of node B, −−→ – Calculate the vector BC and its length DBC , −−→ −−→ – Calculate the angle δB between AB and BC, – If the angle δB > and node B has more than one child then print “[” for adding new branch, → n of – Calculate the angle δN between unit perpendicular vector − −−→ → − AB, where n is rotated 90 degrees clockwise, – If δN > 90 + degrees then print “+(δB )” else if δN < 90 − print “-(δB )”, where is a small angle, −−→ – Print the segment of BC in the form of “I(DBC )”, – If node C has no child and the angle δB > then print “]” for ending its branch } // For } // For 6. Finally, the L-string code of the network T is generated.

180

S. Chuai-Aree et al.

(a)

(b)

(c)

(d)

(e)

Figure 14. L-string construction: (a) input network from algorithm 8.1, (b), (c), (d) and (e) step by step of L-string construction of input network (a)

Figure 14 shows the L-string reconstruction from input network from algorithm 8.1 by applying the algorithm 8.2. The L-string of the input network after applying the algorithm 8.2 is “I(181.0)[−(45.5)I(185.2)]I(179.0)[+(43.68)I(162.6)]I(188)”. The given L-string starts with an internode with 181.0 pixel unit length, then draws a new branch rotated 45.5 degrees clockwise with 185.2 pixel unit length and closes its branch, then continues a new main stem with 179.0 pixel unit length, then draws a new branch rotated 43.68 degrees counter-clockwise with 162.6 pixel unit length, and ﬁnally draws a new main stem with 188 pixel unit length.

9 Experiment and Results In this section we show the result of some examples. Figure 15 illustrates the development of region growing from six given initial points. The black boundary regions represent the growing process until all branching structures are discovered. The skeletonization of clover plant is shown in ﬁgure 15 in the third row. The diﬀerent resolution structures of the clover plant are shown in ﬁgure 16. The user deﬁne δ value will reduce the structure resolution of the network. Figure 17 illustrates the reconstruction process from an input image, de-noising, smoothing, region growing, skeletonization, network construction, and 3D object of a leaf network.

Inverse Problem of Lindenmayer Systems on Branching Structures

Input image

iter=70

skel iter=1

iter=10

iter=90

skel iter=5

iter=30

iter=120

skel iter=10

181

iter=50

iter=157

skel iter=15

Figure 15. Region growing process and skeletonization in branching structure of clover plant

Some Tree-like structures and their L-string codes are given in ﬁgure 18 by applying the algorithm 8.2. Each structure shows the number labels of the pixel node number. Figure 19 shows some results of reconstructed canola roots reconstructed by Dr. P. Kolesik in .MV3D ﬁle and a neuron structure of rat done by P. J. Broser in HOC ﬁle. They can be converted to L-string ﬁle format. Figure 20 shows the vascular aneurysm reconstruction with the diﬀerent conditions of consideration during doing region growing method. Since the input image is not sharp enough, the smoothing process and anisotropic diffusion ﬁltering can be applied as a preprocessing step before applying the region or volume growing method.

10 Conclusion and Further Works This paper provides the methods to reconstruct the branching structure from input images. It will be useful for further uses in bio-informatics and

182

S. Chuai-Aree et al.

(a)

(b)

(c)

(d)

Figure 16. Two resolution structures of clover plant with diﬀerent δ value, (a) and (c) show the wire frame structure, (b) and (d) show 3D structure

Actinidia latifolia (Actinidiaceae)

Actinidia latifolia (Actinidiaceae)

Actinidia latifolia (Actinidiaceae)

Actinidia latifolia (Actinidiaceae)

Fig. 39.2

Fig. 39.2

Fig. 39.2

Fig. 39.2

opposite percurrent - 4⬚s cross

opposite percurrent - 4⬚s cross

opposite percurrent - 4⬚s cross

opposite percurrent - 4⬚s cross

Figure 17. The reconstruction with de-noising process from an input image, denoising, smoothing, region growing, skeletonization, network construction, and 3D object

Inverse Problem of Lindenmayer Systems on Branching Structures

(a)

(b)

(c)

(d)

183

(e)

Figure 18. Some Tree-like structures constructed in L-string codes: (a) I(132.0) [+(29.61)I(117.8)] [-(29.58)I(119.0)], (b) I(136) [-(43.1)I(111.0)] I(101), (c) I(102.0) [+(45.21)I(104.6)] I(145.0), (d) I(108)[+(46.05)I(115.2)] [-(45.6)I(117.3)] I(137), (e) I(94.00) [+(45.64)I(96.89)] [-(1.29)I(84.00)] [-(45.1)I(100.4)] I(84), the number labels in each network are the pixel node numbers

(a)

(b)

(c)

Figure 19. Some results of (a), (b) reconstructed roots and (c) neuron structure from P. J. Broser in HOC ﬁle

medical applications. One can use this method to observe the development of branching structures in biology, botany or medicine. The method allows the user to choose the resolution of the network since in many cases the high resolution of network can be ignored. This research also provides to work with the sliced images from medical and biological applications. The volume of branching structure can be discovered and represented in a 3D object. The user can adjust the parameter of each angle in branching structure. The L-string or production rules of L-systems can also be simply generated after reconstruction process using turtle interpretation based on bracketed L-system.

184

S. Chuai-Aree et al.

(a)

(b)

(c)

(d)

Figure 20. The vascular aneurysm and its reconstruction with diﬀerent conditions and resolutions

Acknowledgments: The authors would like to thank Dr. Susanne Kr¨ omker at the Interdisciplinary Center for Scientiﬁc Computing (IWR), University of Heidelberg, for her productive suggestion. We also would like to thank Dr. Peter Kolesik at the University of Adelaide for example of soil volume with canola roots, and Dr. Philip Julian Broser at the Max Planck Institute for Medical Research (MPI) in Heidelberg for his rat’s neuron reconstruction.

References [PM90]

Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7), 629–639 (1990) [AB94] Adams, R., Bischof L.: Seeded region growing. IEEE Trans. on PAMI, 16(7), 641–647 (1994) [Ash99] Ash, A., Ellis, B., Hickey, L. J., Johnson, K., Wilf, P., and Wing S. : Manual of Leaf Architecture- morphological description and categorization of dicotyledonous and net-veined monocotyledonous angiosperms by Leaf Architecture Working Group. (1999) [CJB05a] Chuai-Aree, S., J¨ ager, W., Bock, H. G., and Siripant, S.: Simulation and Visualization of Plant Growth Using Lindenmayer Systems. In

Inverse Problem of Lindenmayer Systems on Branching Structures

[CJB05b]

[DPS00]

[Mech96]

[Prus94]

[Rus95] [Set99] [Str04] [Tou97]

[Wei98]

185

H. G. Bock, E. Kostina, H. X. Phu and R. Rannacher (eds.): Modeling, Simulation and Optimization of Complex Processes, Springer-Verlag, pp. 115–126 (2005) Chuai-Aree, S., J¨ ager, W. Bock, H. G., and Siripant, S.: Reconstruction of Branching Structures Using Region and Volume Growing Method. International Conference in Mathematics and Applications (ICMA-MU 2005) (Bangkok, Thailand) (2005) Dimitrov, P., Phillips, C., Siddiqi, K.: Robust and Eﬃcient Skeletal Graphs. Conference on Computer Vision and Pattern Recognition (Hilton Head, South Carolina) (2000) Mech, R., and Prusinkiewicz, P.: Visual models of plants interacting with their environment, Proceeings in Computer Graphics (SIGGRAPH’96), 397–410 (1996) Prusinkiewicz, P., Remphrey, W., Davidson, C., and Hammel, M.: Modeling the architecture of expanding Fraxinus pennsylvanica shoots using L-systems, Canadian Journal of Botany, 72, 701–714 (1994) Russ, J. C.: The Image Processing Handbook. CRC Press, (1995) Sethian, J. A.: Level Set Methods and Fast Marching Methods. Cambridge University Press, (1999) Strzodka, R., and Telea, A.: Generalized Distance Transforms and Skeletons in Graphics Hardware, The Eurographics Association (2004). Toussaint, G.: Hilditch’s Algorithm for Skeletonization. (1997) http://jeff.cs.mcgill.ca/ godfried/teaching/projects97/ azar/skeleton.html Weickert, J.: Anisotropic Diﬀusion in Image Processing. B.G. Teubner Stuttgart (1998)

3D Cloud and Storm Reconstruction from Satellite Image Somporn Chuai-Aree1,4 , Willi J¨ ager1 , Hans Georg Bock1 , Susanne 1 2 Kr¨ omker , Wattana Kanbua , and Suchada Siripant3 1

2 3 4

Interdisciplinary Center for Scientiﬁc Computing (IWR), University of Heidelberg, Im Neuenheimer Feld 368, 69120 Heidelberg, Germany [email protected], [email protected], [email protected], [email protected] Thai Meteorological Department, 4353 Sukhumvit Road, Bangna, Bangkok 10260, Thailand watt [email protected] Advanced Virtual and Intelligent Computing (AVIC), Chulalongkorn University, Phayathai, Bangkok 10330, Thailand [email protected] Faculty of Science and Technology, Prince of Songkla University, Muang, Pattani 94000, Thailand [email protected]

Abstract The satellite images in Asia are produced every hour by Kochi University, Japan (URL http://weather.is.kochi-u.ac.jp/SE/00Latest.jpg). They show the development of cloud or storm movement. The sequence of satellite images can be combined to show animation easily but it is shown only from the top-view. In this paper, we propose a method to condition the 2D satellite images to be viewed from any perspective angle. The cloud or storm regions are analyzed, segmented and reconstructed to 3D cloud or storm based on the gray intensity of cloud properties. The result from reconstruction can be used for a warning system in the risky area. Typhoon Damrey (September 25 - 27, 2005) and typhoon Kaitak (October 29 November 1, 2005) are shown as a case study of this paper. Other satellite images can be conditioned by using this approach as well.

1 Introduction In recent years there have occurred many storms in the world, especially in South East Asia and the United States. Even the movement of storm can be predicted and tracked step by step, but the catastrophe still happens. The warning systems have to be functioned to people for evacuation from the risky area to a safe region. In this paper we propose the method to motivate the people for evacuating from the area of storm by visualization. The satellite images are captured in time steps of an hour in Fig. 1 shows a typical 2D image from the top view. The reconstruction of those satellite images to represent

188

S. Chuai-Aree et al.

a 3D image of cloud and storm are important for any perspective viewpoint. The image processing of cloud and storm segmentation can be applied for ﬁltering before combining the ﬁltered storm and earth topography data. In this paper we use the satellite images from Kochi University, Japan as a case study. For cloud segmentation, detection, tracking, extraction and classiﬁcation, there are many methods to overcome these problems, for example, Tian et al. studied cloud classiﬁcation with neural networks using spectral and textural features in [TSA99], Visa et al. proposed a method of neural network based on cloud classiﬁer in [VIVS95], Hong et al. have also used the Artiﬁcial Neural Network (ANN) for cloud classiﬁcation system in [HHGS05]. Griﬃn et al. applied the Principal Component Analysis (PCA) for characterizing and delineating of plumes, clouds and ﬁres in hyperspectral images in [GHBS00]. The fuzzy method has been used by Hetzheim for characterizing of clouds and their heights by texture analysis of multi-spectral stereo images in [Het00]. Kubo et al. have extracted the clouds in the Antarctic using wavelet analysis in [KKM00]. Welch et al. have classiﬁed the cloud ﬁled classiﬁcation based upon high spatial resolution textural feature in [Wel88]. Yang et al. have used wavelets to detect the cloud region in sea surface temperature images by combining data from NOAA polar-orbiting and geostationary satellites in [YWO00]. Mukherjee et al. have tracked the cloud region by scale space classiﬁcation in [MA02]. In this paper, we propose the two new techniques for image segmentation of cloud and storm using the color diﬀerent of cloud property and segmentation on 2D histogram of intensity against gradient length. From Fig. 1; we can see the cloud and storm regions which need to be segmented. The main purpose of this paper is how to convert the 2D satellite images of Fig. 2 (left image) to the 3D image of Fig. 2 (right image) of cloud and storm as virtual reality by using a given virtual height. The paper is organized as follows: in section 2 the satellite image, its properties and in section 3 the segmentation of cloud and storm are presented. Section 4 describes the volume rendering by sliced reconstruction. The visualization methods and animations are shown in section 5. Finally, the conclusion and further works are given in section 6.

2 Satellite Image and Its Properties The color values of cloud and storm regions are mostly in gray. They can be seen clearly when their intensities are high. In the color satellite image, some regions of thin layers of cloud are over the earth and islands which changed the cloud color from gray-scale to some color deviations as shown in Fig. 3 in the red circle. In this paper we use the satellite images from MTSAT-IR IR1 JMA, Kochi University, Japan at URL http://weather.is.kochi-u.ac.jp/SE/00Latest.jpg

3D Cloud and Storm Reconstruction from Satellite Image

189

Figure 1. 2D satellite image on September 9, 2005 at 10:00GMT

Figure 2. The conversion of 2D satellite image to 3D surface

(latest ﬁle). The satellite image consists of a combination of the cloud satellite image and a background topography image from NASA. Fig. 3 shows the cloud color which can be varied by gray from black (intensity value = 0) to white (intensity value = 255). The background consists of the land which varies from green to red, and the ocean which is blue. Cloud regions are distributed everywhere on the background.

190

S. Chuai-Aree et al.

Figure 3. Satellite image on September 23, 2005 at 21:00GMT

3 Cloud and Storm Segmentation This section describes two methods for cloud and storm segmentation. In the ﬁrst method we deﬁne two parameters for segmenting the cloud region from the ocean and earth namely Cdv (Color Diﬀerence Value) and Ccv (Cloud Color Value). The second method provides the segmentation by gradient length and pixel intensity. 3.1 Image Segmentation by Color Diﬀerence and Color Value Let I be a set of input images with a width W and a height H, P be a set of pixels in I (P ∈ I), B be a set of background pixels, C be a set of cloud or storm pixels, and pi,j be a pixel in row i and column j. The pixel pi,j consists of four elements namely red (RR), green (GG), blue (BB) for color image and gray (YY). The description of each set is given in equation (1). P = {pi,j | (0 ≤ pi,j ≤ 255) ∧ (1 ≤ i ≤ W ) ∧ (1 ≤ j ≤ H)} pi,j = {RRi,j , GGi,j , BBi,j , Y Yi,j |RRi,j , GGi,j , BBi,j , Y Yi,j ∈ [0, 255]} C = {p ∈ P |(|RRi,j − GGi,j | ≤ Cdv) ∧ (|GGi,j − BBi,j | ≤ Cdv)∧ (1) (|RRi,j − BBi,j | ≤ Cdv) ∧ (Y Yi,j ≥ Ccv) ∧ (0 ≤ Ccv ≤ 255)∧ (0 ≤ Cdv ≤ 255)} P = B∪C

3D Cloud and Storm Reconstruction from Satellite Image

191

The pixel pi,j in color image can be transformed to gray-scale (Y Yi,j ) by the following equation (2). Y Yi,j = Round(0.299 ∗ RRi,j + 0.587 ∗ GGi,j + 0.114 ∗ BBi,j ) Y Yi,j ∈ {0, 1, 2, ..., 255}

(2)

The gray-scale is used to condition all pixels in P . Each pixel has red, green, blue and gray channel. The gray value and the diﬀerent values between red-green, green-blue and red-blue are conditioned by the speciﬁed parameters for checking the group of cloud pixels. Algorithm for checking cloud pixels For checking all pixels pi,j in P , the diﬀerent values between red and green, green and blue, red and blue are bounded by the value of Cdv. The gray-scale value is greater than or equal to the parameter Ccv. If these conditions are true, then the current pixel pi,j is satisﬁed to be a cloud pixel in C. The algorithm is given below. For all pixels do { Calculate gray values Y Yi,j from Pi,j with equation (2) Deﬁne two parameters : Cdv and Ccv Pixel is cloud if all following conditions are true : (|RRi,j − GGi,j | ≤ Cdv) and (|GGi,j − BBi,j | ≤ Cdv) and (|RRi,j − BBi,j | ≤ Cdv) and (Y Yi,j ≥ Ccv) } Fig. 4 shows the comparison between the diﬀerent values of two parameters Cdv and Ccv. The Cdv and Ccv value of the ﬁrst row are 50, 140 and 70, 100 for the second row, respectively. Fig. 4(a) and 4 (c) are segmented cloud and storm regions, 4(b) and 4(d) are background of 4(a) and 4(c), respectively. The second example of the world satellite image is shown in Fig. 5. Fig. 5(a) and 5(d) are input images and they are similar. Fig. 5(b) and 5(c) are segmented by the parameter Cdv = 106, Ccv = 155, Fig. 5(e) and 5(f) are the output from the parameter Cdv = 93, Ccv = 134. Fig. 4 and 5 show that the parameter Cdv and Ccv aﬀected the size and shape of cloud and storm regions. The bigger value of Cdv can take wider range of cloud region and also depends on the parameter Ccv. 3.2 Image Segmentation by Gradient Length and Its Intensity Our second method describes a calculation of gradient length and its intensity for segmentation based on the 2D histogram. This method transforms

192

S. Chuai-Aree et al.

Figure 4. Segmented cloud and storm from Fig. 1, (a) and (b) by Cdv = 50, Ccv = 140, (c) and (d) by Cdv = 70, Ccv = 100

the input image to the 2D histogram of gradient length and intensity. Let pi,j be the gradient of a pixel pi,j . The calculation of gradient length is given by equation (3). ; 2 2 pi,j = (pi+1,j − pi−1,j ) + (pi,j+1 − pi,j−1 ) (3) pmax = max{ pi,j }; ∀ i, j pmin = min{ pi,j }; ∀ i, j The 2D histogram is plotted in 2D as gradient length, and intensity on vertical axis and horizontal axis, respectively. The size of histogram is set to 255x255 since the intensity of each pixel is mapped on the horizontal axis, and also the gradient length of each pixel is mapped on the vertical axis. Let Ω be a set of histogram points, hm,n be a frequency of the intensity and gradient length position at the point (m, n), where (0 ≤ m ≤ 255) and (0 ≤ n ≤ 255), hmax be the maximum frequency of all histogram points, α be a multiplying factor for mapping all frequencies on 2D plane, ρ(hm,n ) be the intensity of a plotting point (m, n) on the histogram Ω, pmax , and pmin be the maximum and minimum intensity value of all pixels in P . The intensity position m and gradient length position n are computed by equation (4).

3D Cloud and Storm Reconstruction from Satellite Image

193

Figure 5. Segmented cloud and storm (a) and (b) by Cdv = 106, Ccv = 155, (c) and (d) by Cdv = 93, Ccv = 134

pmax pmin m

= max{pi,j }; ∀ i, j = min{pi,j }; ∀ i, j (pi,j −pmin ) = Round(255 ∗ (pmax −pmin ) )

n hmax α

min ) = Round(255 ∗ ( p i,j p min ) max − = max{hm,n }; ∀ m, n = Log10255 (hmax )

(

Log

(h

p

−

p

)

(4)

)

10 m,n ρ(hm,n ) = α Log10 (hmax )

Fig. 6 shows the transformation of a gray-scale image to the 2D histogram. All points in gray-scale image are mapped on 2D histogram which is referred to gradient length and intensity. The segmented region (white region) on the histogram means the selected area for segmenting the gray-scale image. The segmented result is shown by white region. The rectangles on the 2D histogram are related to the segmented regions on the input image. The segmentation process operates on the 2D histogram and sends the results of segmentation to the output image (right images). The comparison of cloud and storm segmentation between gray-scale and color image is shown in Fig. 7 using the same segmented region on its histogram. The segmented results are shown in the right column which are nearly similar to each other. The middle column shows the diﬀerent intensity

194

S. Chuai-Aree et al.

Figure 6. The transformation of gray-scale image to 2D histogram and its segmentation

Figure 7. The comparison of cloud and storm segmentation of the same segmented region: input images (left), 2D histograms (middle) and output images (right), grayscale segmentation (upper row), color segmentation (lower row)

3D Cloud and Storm Reconstruction from Satellite Image

195

distributions of histogram between gray-scale and color image since the operation of color image has been done on red, green and blue channel.

4 Volume Rendering by Sliced Reconstruction In this section, the volume rendering method is described. The advantages of OpenGL (Open Graphics Library) are applied by using the alpha-cut value. Each satellite image is converted to N slices by using diﬀerent alpha-cut values from minimum alpha-cut value (ground layer) to maximum alpha cut value (top layer). The alpha-cut value is a real value in [0,1]. Fig. 8 shows the structure of sliced layers from a satellite image.

Figure 8. 2D surfaces for volume rendering of cloud and storm reconstruction

4.1 Volume Rendering Algorithm 1. Deﬁne number of sliced layer (N ) and cloud layer height (CloudLayerH), 2. Deﬁne virtual cloud height (CloudHeight) value and unit cell size (κ), 3. For all sliced layers do a) deﬁne cloud density (CloudDens) of each layer b) deﬁne alpha-cut value of current layer

196

S. Chuai-Aree et al.

c) draw rectangle with texture mapping of satellite image with its alphacut value The source code for volume rendering by sliced images is given in the following code. CloudLayerH := 0.01; for i:=1 to N do begin CloudDens := 0.5*(100-CloudDensity)/100; glAlphaFunc(GL GREATER, CloudDens + (1-CloudDens)*i/N ); glNormal3f(0,1,0); glBegin(GL QUADS); glTexcoord2f(0,0); glVertex3f(-50*κ, CloudLayerH*CloudHeight+0.1*i/N , -50*κ); glTexcoord2f(w,0); glVertex3f( 50*κ, CloudLayerH*CloudHeight+0.1*i/N , -50*κ); glTexcoord2f(w,h); glVertex3f( 50*κ, CloudLayerH*CloudHeight+0.1*i/N , 50*κ); glTexcoord2f(0,h); glVertex3f(-50*κ, CloudLayerH*CloudHeight+0.1*i/N , 50*κ); glEnd(); end; In the source code, the value 50 is a speciﬁc size of each polygon. The polygon of each layer is drawn on XZ-plane from (-50κ, -50κ), (50κ, -50κ), (50κ, 50κ) and (-50κ, 50κ), respectively.

5 Visualization and Animation This section explains the visualization technique and results of two case studies of typhoon Damrey and typhoon Kaitak. This paper proposes two methods for visualization. The ﬁrst method uses the segmented cloud and storm regions from segmentation process with real topography (Etopo2). The full modiﬁcation of virtual height of cloud and earth is applied in the second method. The end user can select any visualization. 5.1 Visualization Using Etopo Data Real data from satellite topography namely Etopo2(2 minutes grid ≈ 3.7 kilometers near equator.) can be retrieved from NOAA at the highest resolution in Asia. Fig. 9 (left) shows the spherical map of the world using Etopo10 (10 minutes) and the case study of Etopo2 is shown in Fig. 9 (right).

3D Cloud and Storm Reconstruction from Satellite Image

197

Figure 9. 3D topography of world (Etopo10) and earth (Etopo2)

Visualization Procedure 1. Read the target region of Etopo2 data for 3D surface of the earth, 2. Calculate the average normal vector of each grid point, 3. For all time steps do a) read satellite input images in the target period of time (every hour), b) apply segmentation method for cloud and storm ﬁltering, c) draw Etopo2 surface of earth and all sliced layers of cloud with its virtual height, d) apply light source to the average normal vector for all objects. 4. Show the animation of all time steps Fig. 10 shows the result of ﬁxed satellite image in diﬀerent perspectives of typhoon Damrey using Etopo2. The virtual height of ﬁltered cloud and storm is deﬁned by the user. 5.2 Visualization Using Fully Virtual Height In this section, each satellite image is mapped to the whole volume for all layers with given maximum virtual height. The alpha-cut value for intermediate layers is interpolated. The topography of earth and ocean is the result from the ﬁltering process of each satellite image. Visualization Procedure 1. Deﬁne the maximum virtual height value 2. For all time steps do a) read satellite input images in the target period of time (every hour), b) apply segmentation method for cloud and storm ﬁltering,

198

S. Chuai-Aree et al.

Figure 10. The typhoon Damrey from diﬀerent perspectives

c) draw all sliced layers of cloud with its virtual height, d) apply texture mapping for all slices. 3. Show the animation of all time steps The result of this technique is shown in Fig. 11 in diﬀerent perspectives. The ﬁltering process gives a smooth result for ground layers and cloud layers. The alpha-cut value is applied for all slices using the algorithm in 4.1. 5.3 Visualization of Numerical Results In order to compare the behavior of storm movement, this paper proposes the result from numerical results of these two typhoons using the model MM5 for weather simulation. This study meteorological simulation is carried out by using the nonhydrostatic version of MM5 Mesoscale Model from NCAR/PSU (National Center for Atmospheric Research/Pennsylvania State University) in [Dud93] and [GDS94]. The model has been modiﬁed to execute for parallel processing

3D Cloud and Storm Reconstruction from Satellite Image

199

Figure 11. The typhoon Kaitak from diﬀerent perspectives

by using MPI version. MM5 Version 3 Release 7 (MM5v3.7) was compiled by using PGI version 6.0 and operated on the Linux tle 7.0 platform. The calculations were performed on the ﬁrst three days of typhoon Damrey (September 25-28, 2005) and typhoon Kaitak (October 28 - November 1, 2005) period. The central latitude and longitude of the coarse domain was 13.1 degree North and 102.0 degree East respectively, and the Mercator map projection was used. The vertical resolution of 23 pressure levels progressively increased towards the surface. The 3D storm reconstruction and numerical solution of typhoon Damrey movement are shown in Fig. 12 and Fig. 13 for every 6 hours starting from September 25, 2005 at 01:00GMT (left to right, top to bottom) to September 26, 2005 at 07:00GMT. The cloud volume is calculated by marching cube method from a given iso-surface value. The second 3D storm reconstruction and numerical result of typhoon Kaitak for every 6 hours starting from October 29, 2005 at 02:00GMT (left to right, top to bottom) to October 30, 2005 at 08:00GMT are shown in Fig. 14 and Fig. 15, respectively. The numerical

200

S. Chuai-Aree et al.

results were visualized by our software namely VirtualWeather3D which is working on Windows Operating System.

Figure 12. The 3D storm reconstruction of typhoon Damrey every 6 hours starting from September 25, 2005 at 01:00GMT (left to right, top to bottom)

3D Cloud and Storm Reconstruction from Satellite Image

201

Figure 13. The numerical result of typhoon Damrey every 6 hours starting from September 25, 2005 at 01:00GMT (left to right, top to bottom)

202

S. Chuai-Aree et al.

Figure 14. The 3D storm reconstruction of typhoon Kaitak for every 6 hours starting from October 29, 2005 at 02:00GMT (left to right, top to bottom)

3D Cloud and Storm Reconstruction from Satellite Image

203

Figure 15. The numerical result of typhoon Kaitak for every 6 hours starting from October 29, 2005 at 02:00GMT (left to right, top to bottom)

204

S. Chuai-Aree et al.

6 Conclusion and Further Works

Figure 16. The 3D reconstruction of hurricane Katrina: input image (from NASA) (a) in diﬀerent perspectives

This paper has proposed a methodology for reconstructing the cloud and storm from satellite images by converting to 3D volume rendering which shall be useful for the warning system. Two methods for cloud and storm segmentation are described by using the parameters Cdv, Ccv in the ﬁrst method and using the histogram of gradient length and intensity in the second method. For visualization, two methods are shown by using the Etopo2 data and using fully virtual height given by the end user. The method can be used for any kind of satellite images both gray-scale and color image. Other examples for the hurricane Katrina approaching New Orleans on August 28, 2005 and the hurricane Kyrill in Europe on January 18, 2007 are shown in Fig. 16 and Fig. 17, respectively. The virtual height parameter can be adjusted by the end user as a maximum virtual height. The numerical results from Virtual Weather 3D show

3D Cloud and Storm Reconstruction from Satellite Image

205

Figure 17. The 3D storm reconstruction of hurricane Kyrill every 6 hours starting from January 18, 2007 at 01:00GMT (left to right, top to bottom)

the movement of cloud and storm volume to the real height of each pressure level. The software supports both visualizing the satellite images and numerical results from MM5 Model. The animation results can be captured for every time step. The combination of predicted wind speed and direction will be applied to the satellite images in our further work.

7 Acknowledgment The authors wish to thank the EEI-Laboratory at Kochi University for all satellite images, NASA for input image in Fig. 16 and two meteorologists namely Mr. Somkuan Tonjan and Mr. Teeratham Tepparat at the Thai Meteorological Department in Bangkok, Thailand for their kindness in executing the MM5 model. Finally, the authors would like to thank the National Geophysical Data Center (NGDC), NOAA Satellite and Information Service for earth topography (ETOPO) data.

References [Dud93]

Dudhia, J.: A nonhydrostatic version of the Penn State-NCAR mesoscale model: Validation test and simulation of an Atlantic cyclone and cold front, Mon. Wea. Rev., 121, 1493–1513 (1993)

206

S. Chuai-Aree et al.

[GDS94]

Grell, G., Dudhia, J., and Stauﬀer, D.: A description of the ﬁfth generation Penn State/NCAR Mesoscale Model. NCAR Tech. Note NCAR/TN-398 + STR, (1994) [GHBS00] Griﬃn, M. K., Hsu, S. M., Burke, H. K., and Snow, J. W.: Characterization and delineation of plumes, clouds and ﬁres in hyperspectral images, in Proc. 2000 IEEE International Geoscience and Remote Sensing Symposium, II, In: Stein, T. I. (ed.) Piscataway: IEEE, 809–812 (2000) [Het00] Hetzheim, H.: Characterisation of clouds and their heights by texture analysis of multi-spectral stereo images, in Proc. 2000 IEEE International Geoscience and Remote Sensing Symposium, V, In: Stein, T. I. (ed.) Piscataway: IEEE, 1798–1800 (2000) [HHGS05] Hong, Y., Hsu, K., Gao, X., Sorooshian, S.: Precipitation Estimation from Remotely Sensed Imagery Using Artiﬁcial Neural Network - Cloud Classiﬁcation System, Journal of Applied Meteorology, 43, No. 12, 1834-1853, (2005) [KKM00] Kubo, M., Koshinaka, H., Muramoto, K.: Extraction of clouds in the Antartic using wavelet analysis, in Proc. 2000 IEEE International Geoscience and Remote Sensing Symposium, V, In: Stein, T. I. (ed.) Piscataway: IEEE, 2170–2172 (2000) [MA02] Mukherjee, D. P., Acton, S. T.: Cloud tracking by scale space classiﬁcation. IEEE Trans. Geosci. Rem. Sens., GE-40, No.2, 405–415 (2002) [TSA99] Tian, B., Shaikh, M. A., Azimi-Sadjadi, M. R., Von der Haar, T. H., Reinke, D. L.: A Study of cloud classiﬁcation with neural networks using spectral and textural features, IEEE Trans. Neural Networks, 10, 138–151 (1999) [VIVS95] Visa, A., Iivarinen, J., Valkealahti, K., Simula, O.: Neural network based cloud classiﬁer, Proc. International Conference on Artiﬁcial Neural Networks, ICANN’95 (1995) [Wel88] Welch, R. M., et al.: Cloud ﬁled classiﬁcation based upon high spatial resolution textural feature (I): Gray level cooccurrence matrix approach, J. Geophys. Res., 93, 12663–12681 (1988) [YWO00] Yang, Z., Wood, G., O’Reilly, J. E.: Cloud detection in sea surface temperature images by combining data from NOAA polar-orbiting and geostationary satellites, in Proc. 2000 IEEE International Geoscience and Remote Sensing Symposium, V, In: Stein, T. I. (ed.) Piscataway: IEEE, 1817–1820 (2000)

Providing Query Assurance for Outsourced Tree-Indexed Data Tran Khanh Dang and Nguyen Thanh Son Faculty of Computer Science and Engineering, HCMC University of Technology, National University of Ho Chi Minh City, Vietnam {khanh, sonsys}@cse.hcmut.edu.vn Abstract Outsourcing database services is emerging as an important new trend thanks to continued developments of the Internet and advances in the networking technology. In this outsourced database service model, organizations rely upon the premises of an external service provider for the storage and retrieval management of their data. Since a service provider is typically not fully trusted, this model introduces numerous interesting research challenges. Among them, most crucial security research questions relate to (1) data conﬁdentiality, (2) user privacy, (3) data privacy, and (4) query assurance. Although there exist a number of research work on these topics, to the best of our knowledge, none of them has dealt with ensuring query assurance for outsourced tree-indexed data. To address this issue, the system must prove authenticity and data integrity, completeness and, not less importantly, provide freshness guarantees for the result set. These objectives imply that (1) data in the result set is originated from the actual data owner and has not been tampered with; (2) the server did not omit any tuples matching the query conditions; and (3) the result set was generated with respect to the most recent snapshot of the database. This is not a trivial task, especially as tree-based index structures are outsourced to untrusted servers. In this paper, we discuss and propose solutions to security issues in order to provide query assurance for outsourced databases that come together with tree-based index structures. Our techniques allow users to operate on their outsourced tree-indexed data on untrusted servers with high query assurance and at reasonable costs. Experimental results with real datasets conﬁrm the eﬃciency of our approach and theoretical analysis.

1 Introduction Outsourcing database services is emerging as an important new trend thanks to continued growth of the Internet and advances in the networking technology. In the outsourced database service (ODBS) model, organizations rely on the premises of an external service provider, which include hardware, software and manpower, for the storage and retrieval management of their data. Since a service provider is typically not fully trusted, this model raises

208

T.K. Dang and N.T. Son

numerous interesting research challenges related to security issues. First of all, because the life-blood of every organization is the information stored in its databases, making outsourced data conﬁdential is therefore one of the foremost challenges in this model. In addition, privacy-related concerns must also be taken into account due to their important role in the real-world applications. Not less importantly, in order to make the outsourced database service viable and really applicable, the query result must also be proven qualiﬁed. This means that the system has to provide users with some means to verify the query assurance claims of the service provider. Overall, most crucial security-related research questions in the ODBS model relate to the below issues: • • •

•

Data conﬁdentiality: Outsiders and the server’s operators (database administrator - DBA) cannot see the user’s outsourced data contents in any cases (even as the user’s queries are performed on the server). User privacy: Users do not want the server and even the DBA to know about their queries and the results. Ensuring the user privacy is one of the keys to the ODBS model’s success. Data privacy: Users are not allowed to get more information than what they are querying on the server. In many situations, users must pay for what they have got from the server and the data owner does not allow them to get more than what they have paid for, or even users do not want to pay for what they do not need because of the low bandwidth connections, limited memory/storage devices, etc. This security objective is not easy to obtain and a cost-eﬃcient solution to this issue is still an open question [Dan06b]. Query assurance: Users are able to verify the correctness (authenticity and data integrity), completeness and freshness of the result set. We succinctly explain these concepts as follows (more discussions can be found in [NaT06, MNT04, BGL+03, PJR+05, PaT04, Sio05]): – Proof of correctness: As a user queries outsourced data, it expects a set of tuples satisfying all query conditions and also needs assurance that data returned from the server originated from the data owner and have not been tampered with either by an outside attacker or by the server itself. – Proof of completeness: As a user queries outsourced data, completeness implies that the user can verify that the server returned all tuples matching all query conditions, i.e., the server did not omit any tuples satisfying the query conditions. Note that, a server, which is either malicious or lazy, might not execute the query over the entire database and return no or only partial results. Ensuring the query result completeness aims to detect this unexpected behavior. – Proof of freshness: The user must be ensured that the result set was generated with respect to the most recent snapshot of the database.

Providing Query Assurance for Outsourced Tree-Indexed Data

209

This issue must be addressed so as to facilitate dynamic outsourced databases, which frequently have updates on their data. The above security requirements diﬀer from the traditional database security issues [CFM+95, Uma04] and will in general inﬂuence the performance, usability and scalability of the ODBS model. Although there exist a number of research work on the above topics such as [DuA00, HIL+02, BoP02, DVJ+03, LiC04, ChM04, Dan06a, Dan06b], to the best of our knowledge, none of them has dealt with the problem of ensuring query assurance for outsourced treeindexed data. It has been clearly proven in the literature that tree-indexed data have played an important role in both traditional and modern database applications [Dan03]. Therefore, security issues in query assurance for outsourced tree-indexed data need to be addressed completely in order to materialize the ODBS model. This is even then not a trivial task, especially as tree-based index structures are outsourced to untrusted servers [DuA00, Dan05]. In this paper, we will discuss and propose solutions to security issues in order to provide query assurance for dynamic outsourced databases that come together with tree-based index structures. Our techniques allow users to operate on their outsourced tree-indexed data on untrusted servers with high query assurance and at reasonable costs. Our proposed solutions will address all the three desired security properties of query assurance. Moreover, as presented in [DuA00, MNT04, Dan06b], there are several ODBS models depending on desired security objectives. In this paper, however, we will focus on the most basic and typical ODBS model where only data conﬁdentiality, user privacy, and query assurance objectives should be taken into account. Our holistic solution allows users to manipulate their outsourced data as it is being stored in in-house database servers. The rest of this paper is organized as follows: Section 2 brieﬂy summarizes main related work; Section 3 introduces a state-of-the-art approach to managing outsourced tree-indexed data without query assurance; In section 4, we present our contributions to completely solve the problem of query assurance in dynamic outsourced tree-indexed data; Section 5 shows experimental results with real datasets in order to establish the practical value of our proposed solutions; and, ﬁnally, section 6 gives conclusions and future work.

2 Related Work Although various theoretical problems concerning computation with encrypted data and searching on encrypted data have appeared in the literature [Fon03], the ODBS model which heavily depends on data encryption methods has just emerged not long ago [DuA00, HMI02, Dan06b]. Even then it has rapidly got special attention from the research community due to a variety of conveniences brought in as well as interesting research challenges related [Dan05]. The foremost research challenge relates to security objectives

210

T.K. Dang and N.T. Son

for the model as introduced in Section 1. In Figure 1 we diagrammatically summarize security issues in the ODBS model, together with major references to the corresponding state-of-the-art solutions.

ODBS model [DuA00, HMI02, Dan06b]

Confidentiality

Privacy

Query Assurance

Auditing

[BoP02, DVJ+03, Dan05]

[BDW+04, Dan06b]

User Privacy

Data Privacy

[HIL+02, LiP04, ChM04, Dan06a]

[GIK+08, DuA00, Dan06b]

Correctness [BGL+03, MNT04, PaT04, PJR+05, NaT06, Sio05, this paper]

Completeness

Freshness

[PJR+05, NaT06, Sio05, this paper]

[this paper]

Figure 1. Security issues in the ODBS model

As shown in Figure 1, most security objectives of the ODBS model have been investigated. To deal with data conﬁdentiality issue, most approaches adopted to encrypt (outsourced) data before its being stored at the external server [BoP02, DVJ+03, Dan05]. Although this solution can protect the data from outsiders as well as the server, it introduces diﬃculties in querying process: It is hard to ensure the user and data privacy as performing queries over encrypted data. In general, to address the privacy issue (including both user and data privacy), outsourced data structures (tree- or non tree-based) that are employed to manage the data storage and retrieval should be considered. Notably, the problem of user privacy has been quite well-solved (even without special hardware [SmS01]) if the outsourced database contains only encrypted records and no tree-based indexes are used for the storage and retrieval purposes (see [Dan06b] for an overview). However, the research result is less incentive in case such trees are employed although some proposals have been made as [LiC04,Dan06b]. In our previous work [Dan06b], we did propose an extreme protocol for the ODBS model based on private information retrieval (PIR)like protocols [Aso01]. It would, however, become prohibitively expensive if only one server is used to host the outsourced data [CGK+95]. In [DVJ+03], Damiani et al. also gave a solution to query outsourced data indexed by B + trees but their approach does not provide an oblivious way to traverse the tree and this may lead to compromise security objectives [LiC04,Dan06a]. Of late, Lin and Candan [LiC04] introduced a computational complexity approach to solve the problem with sound experimental results reported. Their solution, however, only supports oblivious search operations on outsourced search trees, but insert, delete, and modiﬁcation ones. That means their solution can not

Providing Query Assurance for Outsourced Tree-Indexed Data

211

be applied to dynamic outsourced search trees where several items may be inserted into and removed from, or existing data can be modiﬁed. In our very recent work [Dan06a], we analyzed and introduced techniques to completely solve the problem of data conﬁdentiality and user privacy, but query assurance, in the ODBS model with dynamic tree-indexed data supports. In section 3 we will elaborate on these techniques and extend them in order to deal with the three security objectives of query assurance as mentioned above. Contrary to the user privacy, although there are initial research activities, see [GIK+98, DuA00, Dan06b], the problem of data privacy still needs much more attention. In [GIK+98], Gertner et al. ﬁrst time considered the data privacy issue in the context of PIR-like protocols and proposed the so-called SPIR-Symmetrically PIR-protocol in order to prevent users from knowing more than answers to their queries. Unfortunately, such PIR-based approaches can not be applied to the ODBS model because the data owners in PIR-like protocols are themselves the database service providers. In [DuA00], Du and Atallah introduced protocols for secure remote database access with approximate matching with respect to four diﬀerent ODBS models requiring diﬀerent security objectives among those presented in the previous section. Even so, their work did not support outsourced tree-indexed data. In our recent work [Dan06b] we presented a solution to ensuring data privacy in the ODBS model, which can be applied to tree-indexed data as well. Nevertheless, our proposed solution must resort to a trusted third party, which is not easy to ﬁnd in practice. Recently, addressing the three issues of query assurance has also attracted many researchers and, as a result, a number of solutions have been proposed as [BGL+03,MNT04,PaT04,PJR+05,NaT06,Sio05]. We must even now note that none of them has given a solution to the problem of guaranteeing the query result freshness (cf. Figure 1). To prove the correctness of a user’s query results, the state-of-the-art approaches [BGL+03, MNT04, PaT04, Sio05] employed some aggregated/condensed digital signature scheme to reduce the communication and computation costs. First, Boneh et al. [BGL+03] introduced an interesting aggregated signature scheme that allows aggregation of multiple signers’ signatures generated from diﬀerent messages into one short signature based on elliptic curves and bilinear mappings. This scheme was built based on a “Gap Diﬃe-Hellman” group where the Decisional DiﬃeHellman problem is easy while the Computational Diﬃe-Hellman problem is hard [JoN01]. Despite a big advantage that this scheme can be applied to different ODBS models, it must bear a disadvantage related to the performance. As shown in [MNT04], the computational complexity of Boneh et al.’s scheme is quite high for practical uses in many cases. Second, in [MNT04] Mykletun et al. introduced a RSA-based condensed digital signature scheme that can be used for ensuring authenticity and data integrity in the ODBS model. Their scheme is concisely summarized as follows. Condensed-RSA Digital Signature Scheme: Suppose pk=(n, e) and sk=(n, d) are the public and private keys, respectively, of the RSA signature

212

T.K. Dang and N.T. Son

scheme, where n is a k-bit modulus formed as the product of two k/2-bit primes p and q. Assume φ(n) = (p − 1)(q − 1), both public and private exponents e, d Zn∗ and must satisfy ed ≡ 1 mod φ(n). Given t diﬀerent messages {m1 , ..., mt } and their corresponding signatures {s1 , ..., st } that are generated by the same signer. A condensed-RSA signature is computed as follows: t si mod n. This signature is of the same size as a single standard RSA s1,t =Πi=1 signature. To verify the correctness of t received messages the user must mult h(mi ) mod n. tiply the hashes of all t messages and check that: (s1,t )e ≡ Πi=1 As we can see, the above scheme is possible due to the fact that RSA is multiplicatively homomorphic. We will apply this scheme to our ODBS model in order to provide correctness guarantees of the received tree nodes from the server (cf. Section 4.1). Note that, however, this scheme is applicable only for a single signer’s signatures. Sion [Sio05] also employed this approach to deal with the correctness of query results in his scheme. Besides, in [PaT04], Pang and Tan applied and modiﬁed the idea of “Merkle Hash Trees” (MHT) [Mer80] to provide a proof of correctness for edge computing applications, where a trusted central server outsources parts of the database to proxy servers located at the edge of the network. In [NaT06], however, the authors pointed out possible security ﬂaws in this approach. Furthermore, there are also some approaches to deal with the completeness of a user’s query results [Sio05, PJR+05, NaT06]. First, in [Sio05], the author proposed a solution to provide such assurances for arbitrary queries in outsourced database frameworks. The solution is built around a mechanism of runtime query “proofs” in a challenge-response protocol. More concretely, before outsourcing the data, the data owner partitions its data into k segments {S1 , ..., Sk }, computes hashes for each segment, H(Si ), i = 1, k, then stores (outsources) them all together at the service provider’s server. In addition, the data owner also calculates some “challenge tokens” with respect to Si . Actually, the challenge tokens are queries that the data owner already knows their results, which can be used for veriﬁcation later. Whenever a batch of queries are sent to the server, certain challenge token(s) are also sent together. The result set is then veriﬁed using the challenge tokens for its completeness. Although this approach can be applied to diﬀerent query types, not 100% of the query assurance (the completeness) can be guaranteed because there are chances for a malicious server to “get away” with cheating in the query execution phase (i.e., the server only needs to “guess” and return the correct answer to the challenge token together with fake result sets for other queries in the batch, but nothing else). Moreover, this approach also introduces cost-ineﬃciency for database updates because the challenging answers must be recalculated. Seriously, although the author did not aim to address the user privacy issue in the paper, we should note that user privacy in this approach may be compromised because the server knows what data segments are required by the user so inference and linking attacks can be conducted [Dan06b, DVJ+03]. Second, in [PJR+05], the authors introduced a solution based on aggregated signature schemes and MHT to provide the

Providing Query Assurance for Outsourced Tree-Indexed Data

213

completeness of the query result. This approach is an extension of that presented in their previous work [PaT04], which has been proven insecure due to some possible security ﬂaws [NaT06]. Last, in [NaT06], the authors developed an approach, called Digital Signature Aggregation and Chaining (DSAC), which achieves both correctness and completeness of query replies. However, in their approach, tuples must be pre-sorted in ascending order with respect to each searchable dimension for calculation of the signature chain, and thus it still does not support outsourced tree-indexed data where the order of tree nodes’ contents is not able to determine. This pre-sorting requirement also has a negatively tremendous impact on data updates, hence the total performance of the system will be degenerated. Apart from the security issues as mentioned above and in Section 1, as we can observe in Figure 1, there exists another question, that is “How can the server conduct auditing activities in systems provided with such security guarantees (without employing special hardware equipment)?”. The server may not know who is accessing the system (see, e.g., [Dan06b]), what they are asking for, and what the system returns to the user, and thus how can it eﬀectively and eﬃciently tackle the accountability or develop intrusion detection/prevention systems? The goals of privacy-preserving and accountability appear to be in contradiction and an eﬃcient solution to balance the two is still open. More discussions about this topic can be found in a recent publication [BDW+04]. In Section 3 below we will elaborate on the state-of-the-art approach proposed in [Dan06a] to managing the storage and retrieval of dynamic outsourced tree-indexed data, and in Section 4 we will extend this approach to strengthen it with query assurance supports, including all the three concerned security objectives.

3 A Pragmatic Approach to Managing Outsourced Tree-Indexed Data As discussed in the literature, tree-based index structures take an indispensable role in both traditional and modern database applications [Dan03]. However, in spite of their advantages these index structures introduce a variety of diﬃculties in the ODBS model [DuA00, Dan06b]. To detail the problem, let’s see Figure 2a illustrating an example of the B + -tree for an attribute CustomerName with sample values. All tree nodes were encrypted before being stored at the outsourcing server to ensure the data conﬁdentiality. Assume a user is querying all customers whose name is Ha on this tree. If we do not have a secure mechanism for the query processing, a sequence of queries that will access in sequence nodes 0, 1, and 5 with respect to the above query will be revealed to the server. In addition, the server also realizes that the user was accessing nodes 0, 1, 5, and node 0 is the root, node 1 is an internal node, node 5 is a leaf node of the tree, and so the user privacy is compromised. More seriously, using such information collected gradually, together

214

T.K. Dang and N.T. Son

with statistical methods, data mining techniques, etc. the server can rebuild the whole tree structure and infer sensitive information from the encrypted database, hence data conﬁdentiality can also be spoiled. Besides, during the querying, the user will also get more information showing that there are at least two other customers named John and Bob in the database so the data privacy is not satisﬁed (note that we will not address the data privacy problem in this paper).

0

John 2

1 Bob 3 Alice

Bob

6

5

4 Anne

Rose

Ha

Carol

Ha

John

Trang

7 Linh

Rose

8 Son

Trang

(a) B+Table NID Node 0 (1,John,2,-,-1) 1 (3,Bob,4,Ha,5) 2 (6,Rose,7,Trang,8) 3 (Alice,Anne,4) 4 (Bob,Carol,5) 5 (Ha,-,6) 6 (John,Linh,7) 7 (Rose,Son,8) (Trang,-,-1) 8

NID 0 1 2 3 4 5 6 7 8

B+EncryptedTable EncryptedNode D0a1n2g3Kh75nhs& T9&8ra§ÖÄajh3q91 H&$uye’’µnÜis57ß@ L?{inh*ß23&§gnaD Wh09a/[%?Ö*#Aj2k j8Hß}[aHo$§angµG #Xyi29?ß~R@€>Kh ~B3!jKDÖbd0K3}%§ T-§µran&gU19=75m

(b)

Figure 2. An example of the B + − tree (a) and the corresponding plaintext and encrypted table (b)

Although Damiani et al. proposed an approach [DVJ+03] to outsourced tree-based index structures, it unfortunately has some security ﬂaws that may compromise the desired security objectives [Dan06b, Dan05]. Recently, in [LiC04, Dan06a], the authors developed algorithms based on access redundancy and node swapping techniques to address security issues of outsourced tree-indexed data. We brieﬂy summarize their solutions below. Obviously, as private data is outsourced with search trees, the tree structure and data should all be conﬁdential. As shown in [DVJ+03], encrypting each tree node as a whole is preferable because protecting a tree-based index by encrypting each of its ﬁelds would disclose to the server the ordering relationship between the index values. Lin and Candan’s approach [LiC04] also follows this solution and, like others [Dan05, Dan06a, DVJ+03], the unit of storage and access in their approach is also a tree node. Each node is identiﬁed by a unique node identiﬁer (NID). The original tree is then stored at the server as a table with two attributes: NID and an encrypted value representing the node content. Let’s see an example: Figure 2a shows a B + -tree built on an attribute CustomerName; Figure 2b shows the corresponding plaintext

Providing Query Assurance for Outsourced Tree-Indexed Data

215

and encrypted table used to store the B + -tree at the external server. As we can see, that B + -tree is stored at the external server as a table over schema B + EncryptedTable = {NID, EncryptedNode}. Based on the above settings, Lin and Candan proposed an approach to oblivious traversal of outsourced search trees using two adjustable techniques: access redundancy and node swapping. Access Redundancy: Whenever a client accesses a node, called the target node, it asks for a set of m-1 randomly selected nodes in addition to the target node from the server. Hence, the probability that the server can guess the target node is 1/m. This technique is diﬀerent from those presented in [DVJ+03], where only the target node is retrieved (this may lead to reveal the tree structure as shown in [Dan05, Dan06b]). Besides the access redundancy, it also bears another weakness: it can leak information on the target node position. This is easy to observe: multiple access requests for the root node will reveal its position by simply calculating the intersection of the redundancy sets of the requests. If the root position is disclosed, there is a high risk that its child nodes (and also the whole tree structure) may be exposed [LiC04]. This deﬁciency is overcome by secretly changing the target node’s address after each time it is accessed. Node Swapping: Each time a client requests to access a node from the server, it asks for a redundancy set of m nodes consisting of at least one empty node along with the target one. The client then (1) decrypts the target node; (2) manipulates its data; (3) swaps it with the empty node; and (4) re-encrypts all m nodes and writes them back to the server. Note that, this technique must re-encrypt nodes using a diﬀerent encryption scheme/key (see [LiC04] for details). Thanks to this, the authors proved that the possible position of the target node is randomly distributed over the data storage space at the server, and thus the weakness of the access redundancy technique is overcome. Although Lin and Candan’s approach only supports oblivious tree search operations, the two above techniques have served as the basis for our further investigation. Based on the access redundancy and node swapping techniques, in [Dan06a] we developed practical algorithms for privacy-preserving search, insert, delete, and modify operations that can be applied to a variety of dynamic outsourced tree-based index structures and uniﬁed user as well as multi-querier model (without data privacy considerations) (see [MNT04, DuA00, Dan06b] for more details about ODBS models). Although our work provided the vanguard solutions for this problem, it did not consider the query assurance problem. In Section 4 we will extend our previous work to address this problem.

4 Query Assurance for Outsourced Tree-Indexed Data In this section, we present an extension of our previous work in [Dan06a], which introduced solutions to the problems of data conﬁdentiality and user

216

T.K. Dang and N.T. Son

privacy in the ODBS model, in order to incorporate solutions to ensuring the correctness, completeness, and freshness of the query results. Section 5 will show the experimental results with real datasets. 4.1 Correctness Guarantees As introduced in Section 1, to guarantee the correctness of the query result set the system must provide a means for the user to verify that the received data originated from the data owner as it is. As analyzed in Section 2, the state-of-the-arts employed the public key cryptography scheme to deal with this problem. With respect to our concerned ODBS model, where data privacy considerations are omitted and only single signer (i.e., only one data owner) participates in the query processing, the RSA-based signature scheme is the most suitable as discussed in Section 2. In our context, outsourced tree-indexed data is stored at the server side as described in the previous section, i.e., as a table over schema EncryptedTable = {NID, EncryptedNode}. Before outsourcing the data, the data owner computes the hash h(m) of each encrypted node m. Here, h() denotes a cryptographically strong hash function (e.g., SHA-1). The data owner then “signs” that encrypted node m by encrypting h(m) with its private/secret key sk and stores the signatures together with EncryptedTable at the server. The table schema stored at the server therefore becomes EncryptedTable = {NID, EncryptedNode, Signature} (see Figure 3). With these settings users can then verify each returned node using the data owner public key pk, hence ensuring the correctness of the result set.

NID 0 1 2 3 4 5 6 7 8

B+Table Node (1,John,2,-,-1) (3,Bob,4,Ha,5) (6,Rose,7,Trang,8) (Alice,Anne,4) (Bob,Carol,5) (Ha,-,6) (John,Linh,7) (Rose,Son,8) (Trang,-,-1)

B+Encrypted Table Encrypted Node Signature NID 0 s0 D0a1n2g3Kh75nhs& 1 T9&8ra§ÖÄajh3q91 s1 2 H&$uye’’µnÜis57ß@ s2 3 L?{inh*ß23&§gnaD s3 Wh09a/[%?Ö*#Aj2k 4 s4 5 j8Hß}[aHo$§angµG s5 #Xyi29?ß~R@€>Kh 6 s6 7 ~B3!jKDÖbd0K3}%§ s7 T-§µran&gU19=75m 8 s8

Figure 3. EncryptedTable with tree node contents’ signatures

Although the naive approach above ensures the security objective, it is expensive because the number of signatures to verify equals the redundancy set size. To solve this issue, we employ the condensed-RSA digital signature scheme based on the fact that RSA is multiplicatively homomorphic as presented in Section 3 as follows: Given t input encrypted nodes {m1 , ..., mt } (the redundancy set) and their corresponding signatures {s1 , ..., st }, the server

Providing Query Assurance for Outsourced Tree-Indexed Data

217

computes a condensed RSA signature s1,t as the product of these individual signatures and sends it together with the redundancy set to the user. The user, in turn, will then be able to verify the condensed signature s1,t by employing the hashes computed from all received nodes (in the corresponding redundancy set) as shown in Section 3. With this method, not only the query result correctness is ensured, but both communication and computation costs are also tremendously reduced. Note that, in this case the server has to send only one condensed-RSA signature s1,t to the user for veriﬁcation instead of t individual ones. Section 5 will show the experimental results. 4.2 Completeness Guarantees Completeness guarantees mean that the server did not omit any tuples matching the query conditions. In our context, as a user asks the server for a redundancy set A of t nodes A={m1 , ..., mt } and the server returns him a set R of t nodes R={n1 , ..., nt }, the user must be able to verify that A = R. As presented in Section 3, a user asks for any encrypted nodes (at the server side) through their NIDs. Therefore, the user should be provided with a means of verifying that NID of each mi , i = 1, t, equals NID of each corresponding ni , i = 1, t. To ensure this, our solution is embarrassingly simple: a NID is encrypted with the corresponding node contents and this encrypted value is stored at the server side, together with its signature. Users can then check if the server returned the NIDs (in the redundancy set) that s/he required (the completeness) as well as verify the query result correctness (as shown in Section 4.1). This idea is clearly illustrated in Figure 4.

NID

Including NIDs (encrypted with the corresponding node contents)

0 1 2 3 4 5 6 7 8

B+Encrypted Table Signature Encrypted Node xD0a1n2g3Kh75nhs& yT9&8ra§ÖÄajh3q91 zH&$uye’’µnÜis57ß@ mL?{inh*ß23&§gnaD nWh09a/[%?Ö*#Aj2k oj8Hß}[aHo$§angµG p#Xyi29?ß~R@ C>Kh q~B3!jKDÖbd0K3}%§ fT-§µran&gU19=75m

s0 s1 s2 s3 s4 s5 s6 s7 s8

Figure 4. Settings for verifying completeness guarantees

In more detail, Figure 4 sketches settings for verifying completeness guarantees of the system. First, the encrypted value with respect to the attribute EncryptedNode also includes the NID of its corresponding node (for example, in the ﬁrst row, the encrypted value also includes value 0). Second, the data

218

T.K. Dang and N.T. Son

owner signs each encrypted node using the RSA signature scheme, then stores the signature (e.g., s0 ) together with the NID and its corresponding encrypted value as described in the previous section. Note that, verifying the completeness and correctness must be carried out together, i.e., the user cannot omit any of them. This is also true for freshness guarantees presented below. 4.3 Freshness Guarantees As discussed previously, with dynamic outsourced databases, ensuring only the correctness and completeness of the result set is not enough. But, apart from those, the system must also provide a means for users to verify that the received nodes are from the most recent database state, but the older one(s). Either motivating by clear cost-incentives for dishonest behavior or due to intrusions/viruses, the server may return users obsolete nodes, which do not truly reﬂect the state of the outsourced database at the querying time. This is not a less important problem that also needs to be sorted out to make the ODBS model viable. Actually, in [NaT06] the authors did mention this problem and outlined a possible solution based on MHTs but no cost evaluation has been given (note that, MHTs-based approaches to the ODBS model are quite expensive, especially for dynamic outsourced tree-indexed data [NaT06]). In this section, we propose a vanguard solution to this problem and a comprehensive evaluation for all concerned security objectives will be presented in the next section.

NID

Including NIDs and timestamps of the child nodes

0 1 2 3 4 5 6 7 8

B+Encrypted Table Encrypted Node Signature D0a1n2g3Kh75nhs. T9&8ra§ÖÄajh3q91c%. H&$uye’’µnÜis57ß@j9. L?{inh*ß23&§gnaDx<Wh09a/[%?Ö*#Aj2k;}o j8Hß}[aHo$§angµG10:’’ #Xyi29?ß~R@€>Kh{}~B3!jKDÖbd0K3}%§5, T-§µran&gU19=75mz*

s0 s1 s2 s3 s4 s5 s6 s7 s8

Figure 5. Settings for verifying freshness guarantees

To solve the problem of freshness guarantees, users must be able to verify that the server did return them the most up-to-date required tree nodes (at the time it processed the query). Our solution is also quite simple, but sound and complete, based on timestamps: A timestamp of each child node is stored at its parent node. This timestamp changes as the child node is updated. In other words, a node keeps timestamps of all of its child nodes, and a user

Providing Query Assurance for Outsourced Tree-Indexed Data

219

can then check (from the root node) if the server returned him the latest version of the required node: As accessing the root, the user knows in advance all timestamps of its child nodes, and as a child node is returned s/he can check if this node’s timestamp equals the known value, and so on. There is, however, one question arose: How can users check the root’s timestamp? The answer to this question is quite simple: In the settings for access redundancy and node swapping techniques, there is a special node called SNODE that keeps some meta-data and the root’s address. The SNODE’s address and its decryption key are known to all qualiﬁed users. Therefore, SNODE will keep the timestamp of the root, and each qualiﬁed user is also informed about the timestamp of SNODE. With these settings, freshness guarantees of the query result can be eﬀectively veriﬁed. Note that, the encrypted value representing the corresponding node contents now includes not only its NID, but also timestamps of the child nodes. The corresponding signature is computed based on this ﬁnal encrypted value. Figure 5 clearly illustrates this.

5 Experimental Results To conﬁrm theoretical analyses carried out in previous sections and establish the practical applicability of our approach, we implemented a prototype system and evaluated the proposed solutions with real datasets. For all experiments, we used 2-dimensional datasets, which were extracted from the SEQUOIA dataset at http://www.rtreeportal.org/spatial.html. The SEQUOIA dataset consists of 2-dimensional points of the format (x, y), representing locations of 62556 California place names. We extracted 5 sub-datasets of 10K, 20K, 30K, 40K and 50K points from the SEQUOIA dataset for experiments. To manage the spatial points, we employed 2-dimensional kd-trees due to its simplicity. For all the trees, we set the maximum number M of data items that a leaf node can keep to 50 and the minimum ﬁll factor value to 4%. This means that each tree leaf node must contain at least 2 points and can store up to 50 points. Furthermore, the tree was stored in a data storage space with 22500-node capacity, divided into 15 levels of 1500 nodes each (see [Dan06a, LiC04] for detailed meanings of these settings). Our prototype system consisted of only one P4 CPU 2.8GHz/1GB RAM PC running Windows 2003 Server. Both client and server were accommodated in the same computer so, for all experiments, we will report averaged time to complete a user request, which can represent the averaged CPU-cost of each client request, and analyze averaged IO- and communication-cost. In addition, all programs were implemented using C#/Visual Studio .NET 2003 and we employed the DES algorithm for the encryption of data, the RSA signature scheme (1024 bits key) with SHA-1 hashing for the digital signatures. We did experiments with all major basic operations, including search (for both point and range queries) and updates (inserts and deletes). Note that, modify operations are combinations of inserts and deletes [Dan05, Dan06a].

220

T.K. Dang and N.T. Son

In addition, because there is no previous work built on the same or similar scheme and addressed the same problem, we had to build our scheme from scratch and did experiments to evaluate our solutions to the query assurance issue on the basis of the condensed-RSA signature scheme and the naive/standard RSA signature scheme (cf. Sections 2, 4). All the security objectives of the query assurance issue (i.e., correctness, completeness, and freshness guarantees) were taken into account. The details are as follows. Initially, we did experiments with the biggest dataset, 50K points for insert, delete, point, and range queries in order to see the performance of both naive RSA and condensed-RSA based solutions. The redundancy set size is set to 4 for the tests. Figure 6 shows the experimental results concerning the CPUcost. It is clearly shown that the condensed-RSA scheme CPU-cost is much better that of the naive RSA scheme. Note that the averaged accessed node number (i.e., the IO-cost) of the two is the same, but the communication cost of the condensed-RSA scheme is also better by a factor of (Redundancy set size - 1) ∗ RSA signature size. This is due to the fact that as with the condensedRSA scheme the server has to send the user only one condensed signature, while it has to send Redundancy set size signatures with respect to the naive RSA scheme. Verifying more signatures is the main reason for a higher CPUcost of the latter.

50K 2-d points (kd-tree)

CPU-time (sec)

30

Naive RSA Condensed RSA

25 20 15 10 5 0 Point

Range Insert Query type

Delete

Figure 6. Condensed RSA signature scheme vs. naive RSA signature scheme

Furthermore, to see the aﬀect of diﬀerent database sizes on the performance, for each of sub-datasets, we ran 100 separate queries with the redundancy set size being set to 4, and calculated averaged values for CPU-time. With inserts, deletes, and point queries we randomly chose 100 points from the corresponding dataset as the queries. With range queries, we randomly chose 100 squares as the queries. The sides of each square were chosen to be 1% of the norm of the data space side (if the dataset is uniformly distributed, this value maintains the selectivity of 0.01% for these range queries).The experimental results are shown in Figure 7. As we can see, the CPU-cost saving of

Providing Query Assurance for Outsourced Tree-Indexed Data

221

all kinds of queries is high, over 30% at the minimum between the condensedRSA scheme and the naive RSA scheme. Again, as mentioned above, although the averaged accessed node number is equal for both schemes, the communication cost of the condensed-RSA scheme is better than that of the naive RSA scheme.

CPU-time saving (%)

Computational cost savings

40 35 30 25 20 15 10 5 0

Point query Range query Insert Delete

10k

20k

30k

40k

50k

Dataset size

Figure 7. A variety of dataset sizes

To conclude this section we emphasize that it has been mathematically proven in [LiC04, Dan06a] that our approach based on the access redundancy and node swapping techniques is computationally secure to protect both queries and the tree structure from a polynomial time server. Therefore, it is quite safe to claim that our proposed solutions in this paper, which have extended the previous work, become full-ﬂedged and can be applied to real-world ODBS models.

6 Conclusion and Future Work In this paper, we explored the problem of query assurance in the oursourced database service (ODBS) model. Concretely, we extended our previous work, see e.g. [Dan06a] and presented a full-ﬂedged solution to the problem of ensuring the correctness, completeness, and freshness for basic operations (insert, delete, modify, point and range queries) on dynamic outsourced treeindexed data. Experimental results with real multidimensional datasets have conﬁrmed the eﬃciency of our proposed solution. Notably, to the best of our knowledge, none of the previous work has dealt with all the three above security issues of query assurance in the ODBS model with respect to dynamic outsourced trees. Our work therefore provides a vanguard solution for this problem. Also, this work can also be applied to non tree-indexed data outsourced to untrusted servers (with settings like those of [DVJ+03, Dan06a]).

222

T.K. Dang and N.T. Son

Our future work will focus on evaluating the eﬃciency of the proposed solutions in real-world applications and on addressing open research issues related. Specially, supporting multiple data owners’ signatures is a generalization of the proposed solution in this paper. An eﬃcient solution to this problem is still open (cf. Section 2). Moreover, as discussed in Section 2, auditing and accountability for the ODBS model as well as computer criminal-related issues must be addressed and it will be one of our future research activities of great interest. Another problem also attracts us is that: How to deal with the problem of over redundancy of the result set returned from the server, i.e., the server sends the user more than what should be returned in the answers. This may cause a user to pay more for the communication cost, to incurs worse computation cost, and so this issue needs to be investigated carefully.

References [Aso01]

Asonov, D.: Private Information Retrieval: An Overview and Current Trends. Proc. ECDPvA Workshop, Informatik, Vienna, Austria (2001) [BDW+04] Burmester, M., Desmedt, Y., Wright, R. N., Yasinsac, A.: Accountable Privacy. Proc. 12th International Workshop on Security Protocols, Cambridge, UK (2004) [BGL+03] Boneh, D., Gentry, C., Lynn, B., Shacham, H.: Aggregate and Veriﬁably Encrypted Signatures from Bilinear Maps. Proc. International Conference on the Theory and Applications of Cryptographic Techniques, May 4-8, Warsaw, Poland, pp. 416-432 (2003) [BoP02] Bouganim, L., Pucheral, P.: Chip-Secured Data Access: Conﬁdential Data on Untrusted Servers. Proc. 28th International Conference on Very Large Data Bases, Hong Kong, China, pp. 131-142 (2002) [CFM+95] Castano, S., Fugini, M. G., Martella, G., Samarati, P.: Database Security. Addison-Wesley and ACM Press, ISBN 0-201-59375-0 (1995) [CGK+95] Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private Information Retrieval. Proc. 36th Annual IEEE Symposium on Foundations of Computer Science, Milwaukee, Wisconsin, USA, pp. 41-50 (1995) [ChM04] Chang, Y-C., Mitzenmacher, M.: Privacy Preserving Keyword Searches on Remote Encrypted Data. Cryptology ePrint Archive: Report 2004/ 051 (2004) [Dan05] Dang, T. K.: Privacy-Preserving Basic Operations on Outsourced Search Trees. Proc. International Workshop on Privacy Data Management (PDM2005, in conjunction with ICDE2005), IEEE Computer Society press, April 8-9, 2005, Tokyo, Japan (2005) [Dan06a] Dang, T. K.: A Practical Solution to Supporting Oblivious Basic Operations on Dynamic Outsourced Search Trees. Special Issue of International Journal of Computer Systems Science and Engineering, CRL Publishing Ltd, UK, 21(1), 53-64 (2006) [Dan06b] Dang, T. K.: Security Protocols for Outsourcing Database Services. Information and Security: An International Journal, ProCon Ltd., Soﬁa, Bulgaria, 18, 85-108 (2006)

Providing Query Assurance for Outsourced Tree-Indexed Data [Dan03]

[DuA00]

[DVJ+03]

[Fon03]

[GIK+98]

[HIL+02]

[HMI02]

[JoN01]

[LiC04]

[Mer80] [MNT04]

[NaT06]

[PaT04]

[PJR+05]

[Sio05]

[SmS01]

223

Dang, T. K.: Semantic Based Similarity Searches in Database Systems: Multidimensional Access Methods, Similarity Search Algorithms, PhD Thesis, FAW-Institute, University of Linz, Austria (2003) Du, W., Atallah, M. J.: Protocols for Secure Remote Database Access with Approximate Matching. Proc. 7th ACM Conference on Computer and Communications Security, 1st Workshop on Security and Privacy in E-Commerce, Greece (2000) Damiani, E., Vimercati, S. D. C., Jajodia, S., Paraboschi, S., Samarati, P.: Balancing Conﬁdentiality and Eﬃciency in Untrusted Relational DBMSs. Proc. 10th ACM Conference on Computer and Communication Security, Washingtion, DC, USA, pp. 93-102 (2003) Fong, K. C. K.: Potential Security Holes in Hacig¨ um¨ us’ Scheme of Executing SQL over Encrypted Data (2003) http://www.cs.siu. edu/∼kfong/research/database.pdf Gertner, Y., Ishai, Y., Kushilevitz, E., Malkin, T.: Protecting Data Privacy in Private Information Retrieval Schemes. Proc. 30th Annual ACM Symposium on Theory of Computing, USA (1998) Hacig¨ um¨ us, H., Iyer, B. R., Li, C., Mehrotra, S.: Executing SQL over Encrypted Data in the Database-Service-Provider Model. Proc. ACM SIGMOD Conference, Madison, Wisconsin, USA, pp. 216-227 (2002) Hacig¨ um¨ us, H., Mehrotra, S., Iyer, B. R.: Providing Database as A Service, Proc. 18th International Conference on Data Engineering, San Jose, CA, USA, pp. 29-40 (2002) Joux, A., Nguyen, K.: Separating Decision Diﬃe-Hellman from DiﬃeHellman in Cryptographic Groups. Cryptology ePrint Archive: Report 2001/003 (2001) Lin, P., Candan, K. S.: Hiding Traversal of Tree Structured Data from Untrusted Data Stores. Proc. 2nd International Workshop on Security in Information Systems, Porto, Portugal, pp. 314-323 (2004) Merkle, R.: Protocols for Public Keys Cryptosystems. Proc. IEEE Symposium on Research in Security and Privacy (1980) Mykletun, E., Narasimha, M., Tsudik, G.: Authentication and Integrity in Outsourced Databases. Proc. 11th Annual Network and Distributed System Security Symposium, San Diego, California, February 5-6, San Diego, California, USA, (2004) Narasimha, M., Tsudik, G.: Authentication of Outsourced Databases Using Signature Aggregation and Chaining. Proc. 11th International Conference on Database Systems for Advanced Applications, April 12-15, Singapore, pp. 420-436 (2006) Pang, H. H., Tan, K-L.: Authenticating Query Results in Edge Computing. Proc. 20th International Conference on Data Engineering, March 30-April 2, Boston, MA, USA, pp. 560-571 (2004) Pang, H. H, Jain, A., Ramamritham, K., Tan, K-L.: Verifying Completeness of Relational Query Results in Data Publishing. SIGMOD Conference, pp. 407-418 (2005) Sion, R.: Query Execution Assurance for Outsourced Databases. Proc. 31st International Conference on Very Large Data Bases, August 30-September 2, Trondheim, Norway, pp. 601-612 (2005) Smith, S. W., Saﬀord, D.: Practical Server Privacy with Secure Coprocessors. IBM Systems Journal 40(3), 683-695 (2001)

224

T.K. Dang and N.T. Son

[Uma04]

Umar, A.: Information Security and Auditing in the Digital Age: A Managerial and Practical Perspective. NGE Solutions, ISBN 0-97274147-X (2004)

An Adaptive Space-Sharing Scheduling Algorithm for PC-Based Clusters Viet Hung Doan1 , Nam Thoai2 , and Nguyen Thanh Son3 1 2 3

Ho Chi Minh City University of Technology [email protected] Ho Chi Minh City University of Technology [email protected] Ho Chi Minh City University of Technology [email protected]

Abstract In recent years, PC-based cluster has become a mainstream branch in high performance computing (HPC) systems. To improve performance of PC-based cluster, various scheduling algorithms have been proposed. However, they only focused on systems with all jobs are rigid or all jobs are moldable. This paper ﬁlls in the gap by building a scheduling algorithm for PC-based clusters running both rigid jobs and moldable jobs. As an extension of existing adaptive space-sharing solutions, the proposed scheduling algorithm helps to reduce the turnaround time. In addition, the algorithm satisﬁes some requirement about job-priority. Evaluation results show that even in extreme cases such as all jobs are rigid or all jobs are moldable, performance of the algorithm is competitive to the original algorithms.

1 Introduction PC-based clusters are getting more and more popular these days as they provide extremely high execution rates with great cost eﬀectiveness. On par with the development of PC-based clusters, scheduling on PC-based clusters has been an interesting research topic in recent years. Most of the research to date on scheduling has focused on the scheduling of rigid jobs, i.e., jobs that require ﬁxed number of processors. However, many parallel jobs are moldable [2, 4], i.e., they adapt to the number of processors that scheduler set at the beginning of the job executions. Due to this ﬂexibility, schedulers can choose an eﬀective number of processors for each job. If most of processors are free, jobs may be allocated a large number of processors, their execution time is reduced. On the other hand, if the load is heavy, the number of processors allocated for each job will be smaller, which reduces the wait time. These scenarios help to reduce the turnaround time. Older and recent studies in [1–3,8,12,14,16,17] have shown the eﬀectiveness of scheduling for moldable jobs. Based on the scheduling mechanisms developed separately for rigid or moldable jobs, this paper proposes a scheduling solution for a class of PC-based clusters running both rigid and moldable jobs. This study

226

V.H. Doan et al.

is motivated from the fact that today PC-based clusters are commonly used as shared resources for diﬀerence purposes. The jobs run in these PC-based clusters are usually both rigid and moldable. The rest of this paper is organized as follows: Section 2 describes background knowledge on scheduling. Section 3 states the research problem addressed in this paper. Section 4 gives an overview of the previous approaches related to the problem. Sections 5 and 6 describe details of the proposed solution. Section 7 describes the evaluation of the new scheme. Section 8 concludes the paper.

2 Background Knowledge Scheduling algorithms are widely disparate both in their aims and methods. From user’s view point, a good scheduler should minimize turnaround time, which is the time elapsed between job submission and its completion. The turnaround time is calculated by summing up wait time and execution time [1]. Another expectation is those jobs with higher priority should be ﬁnished sooner than jobs with lower priority. The paper focuses on two of these objectives. In PC-based clusters, to improve the performance of ﬁne-grained parallel applications, the systems are usually space-shared, and each job has a dedicated access to some number of the processors [1]. The set of dedicated processors is called partition and the number of processors in each partition is called partition size. Space-sharing policies can be divided into four categories [12]. (i) Fixed partitioning, in which partitions are created before the system operates and these partitions cannot be changed later on. (ii) Variable partitioning, in which partitions are created whenever jobs come into the system. (iii) Adaptive partitioning, in which partitions are created when jobs begin to be executed. (iv) Dynamic partitioning, in which partitions may change their sizes during job execution time to adapt to the current system status. To reach the goals of scheduling, there are many diﬀerent ways to schedule jobs. However, they are all based on few mechanisms. A study in [7] shows that in batch scheduling, where jobs are not preempted, two popular approaches are First Come First Serve (FCFS) and backﬁlling. FCFS considers jobs in order of arrival. If there are enough processors to run a job, the scheduler allocates processors, and the job can start. Otherwise, the ﬁrst job must wait, and all subsequent jobs also have to wait. This mechanism may lead to a waste of processing power. However, the FCFS algorithm has been implemented in many real systems for two main reasons: it is quite simple to implement and it does not require users to estimate the run time of jobs. Backﬁlling is a mechanism that tries to balance between utilization and maintaining FCFS order. It allows small jobs to move ahead and to run on processors that would otherwise remain idle [7]. Drawbacks of backﬁlling are that it is more complex than FCFS and an estimate of execution time is required.

An Adaptive Space-Sharing Scheduling Algorithm for PC-Based Clusters

227

Characteristics of the job also have much impact on performance of scheduling algorithms. In the view of processor allocation, parallel jobs can be classiﬁed according to their run-time ﬂexibility. There are four types of parallel jobs classiﬁed by this criterion [8]. (i) Rigid: the required number of processors for job execution is ﬁxed. (ii) Evolving: job may change its resource requirement during execution. (iii) Moldable: job allows scheduler to set the number of processors at the beginning of execution, and the job initially conﬁgures itself to adapt to this number. (iv) Malleable: job can adapt to the changes in the number of processors that scheduler allocates during execution.

3 Problem Deﬁnition The problem tackled in this paper comes from our experiences on operating Supernode II [18], a PC-based cluster at Ho Chi Minh City University of Technology supporting high performance computing applications. All jobs are running in batch mode. Two classes of jobs running on the system are (i) rigid jobs, e.g., sequential jobs or MPI jobs which the required number of processors is ﬁxed, and (ii) moldable jobs, e.g., MPI jobs which can run on any number of processors. Because Supernode II is used as a shared resource, each Supernode II user has a priority to execute their jobs. Additionally, all users of the system want their jobs to ﬁnish as soon as possible. Therefore, the goal of the scheduler is to minimize the turnaround time and to prefer users with high priority. Moreover, the scheduler should not require a good estimation on job execution time. It can be seen that requirements above are common to other PC-based clusters. Therefore a solution for Supernode II can be applied to other PCbased clusters with similar characteristics.

4 Related Works Scheduling for rigid jobs have been studied extensively. Two main mechanisms for scheduling rigid jobs are FCFS and backﬁlling have been described in Sect. 2. This section only focuses on scheduling moldable jobs. When classifying parallel jobs based on the view of processor allocation, it is pointed out in [8] that adaptive partitioning is a suitable processor allocation policy for moldable jobs. Current research in scheduling moldable jobs [1, 3, 12, 14, 16, 17] also use the adaptive space-sharing scheme. In [12], Rosti et al. try to reduce the impact of fragmentation by attempting to generate equally-sized partitions while adapting to transient workload. The partition size is computed by (1). Let T otalP Es be the number of processing elements in the system and QueueLength be the number of jobs currently in

228

V.H. Doan et al.

queue. The policy always reserves one additional partition for jobs that might come in the future by using QueueLength + 1 instead of QueueLength. P artitionSize = max(1, ceil(

T otalP Es )) QueueLength + 1

(1)

An analysis in [14] shows that (1) considers only the queued jobs to determine partition size. This will lead to situation that contravenes the equal allocation principle in [10, 11]. Therefore, Sivarama et al. [14] suggest a modiﬁed policy as in (2). S is the number of executing jobs. The best value of f depends on system utilization and job structure, but a value of f in the range between 0.5 and 0.75 appears to be a reasonable choice. P artitionSize = max(1, ceil(

T otalP Es )), 0 ≤ f ≤ 1 1 + QueueLength + f × S

(2)

The adaptive policy in [1,3] is more restrictive, in that users must specify a range of the number of processors for each job. Schedulers will select a number which gives the best performance. Schedulers in [1,3] use a submit-time greedy strategy to schedule moldable jobs. In [16], Srinivasan et al. have some improvement to [1]: (i) deferring the choice of partition size until the actual job start time instead of job submission time and, (ii) using aggressive backﬁlling instead of conservative backﬁlling. In [17], Srinivasan et al. argue that an equivalent partition strategy tends to beneﬁt jobs with small computation size (light jobs). On the other hand, allocating processors to jobs proportional to the job computation size tends to beneﬁt heavy jobs signiﬁcantly. A compromise policy is that each job will have a partition size proportional to the square root of its computation size (W eight) as in (3). This equation is used to calculate partition size in an enhanced backﬁlling scheme proposed in [17]. √ W eighti √ (3) W eightF ractioni = W eighti i∈{P arallelJobInSystem} In all the works above, there are two questions that schedulers must answer: (i) how many processors will be allocated for a job (partitioningfunction) and (ii) which jobs will be executed when there are some processors that ﬁnish their jobs and become idle (job-selection rules). The two next sections describe details of partitioning-function and job-selection rules.

5 Partitioning-Function Based on the analysis of FCFS and backﬁlling in Sect. 2, a scheduler using backﬁlling for the system in Sect. 3 is not suitable. Because the backﬁlling requires information about the running time of jobs which cannot be precisely

An Adaptive Space-Sharing Scheduling Algorithm for PC-Based Clusters

229

Sum turn around time

provided by users [9], a new scheme should be suggested and it is also the objective of this paper to enhance the FCFS scheme. As analyzed in [10, 11, 14], a scheduling policy should share the processing power equally among jobs. To verify that, we use simulation. The virtual PC-based cluster and its scheduler are built using SimGrid [15]. This virtual system has the same characteristics as those of Supernode II [18]. Three partitioning-functions used in this experiment are (2, 3) and a new partitioning-function similar to (3) but with W eight be replaced by P riority × W eight. The third partition function gives a larger partition size for job with higher priority. These three functions are named Sivarama, Sudha and Sudha with priority respectively. Until now, though there was some research on models for moldable jobs [2, 4], they are still not accepted as a standard for simulation. Therefore, we used a subset of 4000 jobs of CTC workload log from [6]. This workload is also used in [16]. To determine the speedup and execution time for a job for any desired partition size, this experiment uses Downey’s model [5]. Because the workload log does not contain information of user priority, we statistically assign priority by user ID, there are ﬁve levels of priority ranging from 1.0 (lowest priority) to 1.4 (highest priority). To show the performance details of scheduling algorithms jobs are categorized based on their workload-based weight, i.e., the product of their requested number of processors and actual run time in seconds. In all charts, the vertical axis presents time in second and the horizontal axis shows the workload-based weight.

2000000

Sivarama

1750000

Sudha Sudha with priority

1500000 1250000 1000000 750000 500000 250000 0

<=250 250-500 500-1K

1K-2K

2K-4K

4K-8K 8K-16K

16K32K

32K64K

64K128K

128K256K

>256K

Total

Job categories

Figure 1. Turn around time when priority = 1.4

Figure 1 shows that turnaround time increases respectively from Sivarama and Sudha to Sudha with priority. In other words, turnaround time tends to be reduced if jobs have a similar partition size. The reason is that in a heavilyloaded system, if a scheduler assigns a large partition for a heavy job, its

230

V.H. Doan et al.

execution time will be smaller. However, because the partition size is large, the job must wait for a long time until having enough processors to execute. In this situation, delay caused of waiting overwhelms the beneﬁt of reducing execution time. Therefore, in a system that we do not know job execution time in advance, an equivalent partitioning is preferred. This result consists with [10, 11]. With this observation, the partitioning-function chosen in this paper is an enhancement from the partitioning-function in [14]. For rigid jobs, partition size for each job must be equal to the partition size it requires. For moldable jobs, the partition size is calculated in a similar way of (3). However, the numerator now is the number of processors for parallel jobs (P EsF orP arallel). Furthermore, the dominator only counts the number of parallel jobs in queue (P arallelJobInQueue). These diﬀerences rise from the fact that there is a portion of processors reserved for rigid jobs. P artitionSize = max(1, ceil(

P EsF orP arallel )), 0 ≤ f ≤ 1 (4) 1 + P arallelJobInQueue + f × S

P EsF orP arallel = T otalP Es − N umberOf SequentialJobInQueue

(5)

Equations (4, 5) imply that the eﬀect of rigid-parallel jobs and moldableparallel jobs to the size of moldable jobs are the same. An evaluation of this partitioning-function will be conducted in combination with the job-selection rules in the next sections.

6 Job-Selection Rules Job-selection rules use information about partition size in Sect. 5 and status of the system to decide which job will execute next. Scheduling policy must utilize all processing power of the system to reduce turnaround time and it also helps to reduce waiting time for jobs with high priority. To reduce the complexity of the scheduling algorithm, two queues are implemented in the scheduler. The ﬁrst one is a ﬁxed-size queue and the scheduling algorithm only uses this queue, the ﬁxed size of the ﬁrst queue helps to reduce scheduling complexity. The second one is a dynamic-size queue to stabilize job stream entering the system. It contains jobs overﬂowing from the ﬁxed-size queue. Jobs in this queue are ordered by the time they come into the system. Each job X in the ﬁxed-size queue has a priority value called P riority(X). An initial value of P riority(X) is the priority of job X when it is submitted. Details of job-selection rules are: Step 1: Classifying jobs in the ﬁxed-size queue into groups based on the job priority value. Each group Groupi is distinguished by the index i, group that has higher index contains jobs with higher priority.

An Adaptive Space-Sharing Scheduling Algorithm for PC-Based Clusters

231

Step 2: Select next job(s) to execute k = M axGroupIndex while still have P Es for jobs do Select a subset of jobs in Groupk that best ﬁt the available P Es Execute that subset k =k−1 end while The task of selecting a subset of jobs in Groupk that best ﬁt the available P Es using a scheme proposed in [18], including three steps: (i) best-ﬁt loop, selecting a job in Groupk that partition size best ﬁt the available P Es, then repeating the previous selection procedure until being unable to select any more job, (ii) worst-ﬁt loop, similar to best-ﬁt loop but selecting job that worst ﬁt instead of best ﬁt, (iii) exhausted search on the remaining set. Authors in [18] have proved the scheme is a compromise between the correctness and the complexity. With the job-selection rules above, job execution order is not exactly FCFS, but the set of jobs is selected based on their priority and their ﬁtness to available processors. In this way, system utilization will increase and jobs with a higher priority are ensured to be prioritized. However, like other priority scheduling policies, the scheduling algorithm above may lead to starvation. A steady stream of high-priority processes can prevent a low-priority process from getting processors. To cope with this problem, an aging policy is used. P riority(X) will be increased whenever there is a job entering the ﬁxed-size queue after X. When P riority(X) > M axP riority, X will be chosen to execute ﬁrst. If at this time, there are not enough P Es, the scheduler must reserve P Es for these jobs. These job-selection rules associated with partitioning-function in Sect. 5 satisfy all requirements in Sect. 3. The next section presents simulation results of the proposed algorithm.

7 Overall Evaluation Until now, there is no algorithm for both rigid and moldable jobs. Therefore, in this section we only introduce simulation results of the proposed algorithm in two extreme cases when all jobs are rigid or all jobs are moldable. The system and workload are the same as in Sect. 5. When all jobs are rigid, the partitioning-function gives each job a partition size equal to the number of processors it requires. Figures 2 and 3 show turnaround time of the proposed algorithm (represented by Priority) and basic FCFS for rigid jobs (represented by FIFO). In general, the proposed algorithm is a little bit better than FIFO, because the program uses best-ﬁt policy in job selection rules. With jobs having high priority, their turnaround time in Priority scheme is smaller than that in FIFO (Fig. 3). That result

232

V.H. Doan et al.

demonstrates the eﬀectiveness of the proposed algorithm in ensuring priority demand. 8000000

Sum turn around time

7000000

FIFO Priority

6000000 5000000 4000000 3000000 2000000 1000000 0 <=250 250-500 500-1K 1K-2K

2K-4K

4K-8K

8K-16K

16K32K

32K64K

64K128K

128K256K

>256K

Total

>256K

Total

Job categories

Figure 2. Turn around time when all jobs are rigid

1800000

Sum turn around time

1600000 1400000

FIFO Priority

1200000 1000000 800000 600000 400000 200000 0 <=250 250-500 500-1K

1K-2K

2K-4K

4K-8K 8K-16K

16K32K

32K64K

64K128K

128K256K

Job categories

Figure 3. Turn around time of jobs having P riority = 1.4 when all jobs are rigid

When all jobs are moldable, the proposed partitioning-function now becomes the partitioning-function in [14]. Figure 4 shows that when summing up turnaround time of all jobs, the proposed algorithm (represented by Priority) is as good as the algorithm in [14] (represented by FIFO). However, with jobs having high priority, their turnaround time will be smaller than that in case of using the algorithm in [14] (Fig. 5).

An Adaptive Space-Sharing Scheduling Algorithm for PC-Based Clusters

233

9000000

Sum turn around time

8000000 7000000

FIFO Priority

6000000 5000000 4000000 3000000 2000000 1000000 0 <=250 250-500 500-1K

1K-2K

2K-4K

4K-8K

8K-16K

16K32K

32K64K

64K128K

128K256K

>256K

Total

>256K

Total

Job categories

Figure 4. Turn around time when all jobs are moldable 1800000

Sum turn around time

1600000 1400000

FIFO Priority

1200000 1000000 800000 600000 400000 200000 0 <=250 250-500 500-1K

1K-2K

2K-4K

4K-8K 8K-16K

16K32K

32K64K

64K128K

128K256K

Job categories

Figure 5. Turn around time of jobs having P riority = 1.4 when all jobs are moldable

8 Conclusions and Future Work This work describes an adaptive space-sharing scheduling algorithm for PC-based clusters. Beside advantages of adaptive space-sharing method such as adapting to the status of the system, improving system utilization, etc., the proposed algorithm has two signiﬁcant contributions. First, it supports both rigid and moldable jobs. Second, it supports a priority mechanism in which the turnaround time of jobs with high priority is reduced, and guarantees that indeﬁnite postponement will not happen. Performance of the algorithm has been tested by simulation with a real workload trace. In future, the algorithm robustness will be tested by using simulation and further analysis on real PC-based systems.

234

V.H. Doan et al.

References 1. W. Cirne and F. Berman. Adaptive Selection of Partition Size for Supercomputer Requests. Lecture Notes in Computer Science, Springer, Vol. 1911, 187-208 (2000). 2. W. Cirne and F. Berman. A Model for Moldable Supercomputer Jobs. Proc. of IPDPS-01 (2001). 3. W. Cirne and F. Berman. Using Moldability to Improve the Performance of Supercomputer Jobs. Journal of Parallel and Distributed Computing, Vol. 62, 1571-1601 (2002). 4. W. Cirne and F. Berman. A Comprehensive Model of the Supercomputer Workload. Proc. of IEEE 4th Annual Workshop on Job Scheduling Strategies for Parallel Processing (2005). 5. A. B. Downey. A Model for Speedup of Parallel Programs. U.C. Berkerley Technical Report CSD-96-933 (1997). 6. D. G. Feitelson. The Parallel Workloads Archive. http://www.cs.huji.ac. il/labs/parallel/workload/ 7. D. G. Feitelson and L. Rudolph. Parallel Job Scheduling - A Status Report. Lecture Notes in Computer Science, Springer, Vol. 3277 (2005). 8. D. G. Feitelson and L. Rudolph. Toward Convergence in Job Schedulers for Parallel Supercomputers. Lecture Notes in Computer Science, Springer, Vol. 1162, 1-26 (1996). 9. C. B. Lee, Y. Schwartzman, J. Hardy, and A. Snavely. Are User Runtime Estimates Inherently Inaccurate? Proc. of the 10th Job Scheduling Strategies for Parallel Processing (2004). 10. S. T. Leutenegger and M. K. Vernon. The Performance of Multiprogrammed Multiprocessor Scheduling Policies. Proc. of ACM Sigmetrics Conference, 226236 (1990). 11. S. Majumdar, D. L. Eager, and R. B. Bunt. Scheduling in Multiprogrammed Parallel Systems. Proc. of ACM Sigmetrics Conference, 104-113 (1988). 12. E. Rosti, E. Smirni, L. W. Dowdy, G. Serazzi, and B. M. Carlson. Robust Partitioning Policies for Multiprocessor Systems. Performance Evaluation, Vol. 19, 141-265 (1994). 13. U. Schwiegelshohn and R. Yahyapour. Analysis of Fist-Come-First-Serve Parallel Job Scheduling. Proc. of the 9th SIAM Symposium on Discrete Algorithms, 629-638 (1998). 14. P. D. Sivarama and H. Yu. Performance Sensitivity of Space-Sharing Processor Scheduling in Distributed-Memory Multicomputers. Proc. of IPPS/SPDP ’98, 403-409 (1998). 15. SimGrid project. http://simgrid.gforge.inria.fr 16. S. Srinivasan, V. Subramani, R. Kettimuthu, P. Holenarsipur, and P. Sadayappan. Eﬀective Selection of Partition Sizes for Moldable Scheduling of Parallel Jobs. Lecture Notes In Computer Science, Springer, Vol. 2552, 174183 (2002). 17. S. Srinivasan, S. Krishnamoorthy, and P. Sadayappan. A Robust Scheduling Strategy for Moldable Scheduling of Parallel Jobs. Proc. of 2003 IEEE International Conference On Cluster Computing (2003). 18. N. Thoai, T. D. Toan, and T. V. N. Tuong. Resource Management and Scheduling on Supernode II. Proc. of COSCI’05, 137-147 (2005).

Fitting Multidimensional Data Using Gradient Penalties and Combination Techniques Jochen Garcke and Markus Hegland Mathematical Sciences Institute, Australian National University Canberra ACT 0200 jochen.garcke, [email protected]

Abstract Sparse grids, combined with gradient penalties provide an attractive tool for regularised least squares ﬁtting. It has earlier been found that the combination technique, which allows the approximation of the sparse grid ﬁt with a linear combination of ﬁts on partial grids, is here not as eﬀective as it is in the case of elliptic partial diﬀerential equations. We argue that this is due to the irregular and random data distribution, as well as the proportion of the number of data to the grid resolution. These eﬀects are investigated both in theory and experiments. The application of modiﬁed “optimal” combination coeﬃcients provides an advantage over the ones used originally for the numerical solution of PDEs, who in this case simply amplify the sampling noise. As part of this investigation we also show how overﬁtting arises when the mesh size goes to zero.

1 Introduction In this paper we consider the regression problem arising in machine learning. A set of data points xi in a d-dimensional feature space is given, together with an associated value yi . We assume that a function f∗ describes the relation between the predictor variables x and the response variable y and want to (approximately) reconstruct the function f∗ from the given data. This allows us to predict the function value of any newly given data point for future decision-making. In [4] a discretisation approach to the regularised least squares ansatz [10] was introduced. An independent grid with associated local basis functions is used to discretise the minimisation problem. This way the data information is transferred into the discrete function space deﬁned by the grid and its corresponding basis functions. Such a discretisation approach is similar to the numerical treatment of partial diﬀerential equations by ﬁnite element methods. To cope with the complexity of grid-based discretisation methods in higher dimensions Garcke et.al. [4] apply the sparse grid combination technique [5]

236

J. Garcke and M. Hegland

to the regression problem. Here, the regularised least squares ansatz is discretised and solved on a certain sequence of anisotropic grids, i.e. grids with diﬀerent mesh sizes in each coordinate direction. The sparse grid solution is then obtained from the (partial) solutions on these diﬀerent grids by their linear combination using combination coeﬃcients which depend on the employed grids. The curse of dimensionality for conventional “full” grid methods aﬀects the sparse grid combination technique much less; currently up to around 20 dimensions can be handled. Following empirical results in [3], which show instabilities of the combination technique in certain situations, we investigate in this article the convergence behaviour of full and sparse grid discretisation of the regularised regression problem. The convergence behaviour of the combination technique can be analysed using extrapolation arguments where a certain error expansion for the partial solutions is assumed. Alternatively one can view the combination technique as approximation of a projection into the underlying sparse grid space, which is exact only if the partial projections commute. We will study how both these assumptions do not hold for the regularised regression problem and how the combination technique can actually diverge. Applying the optimised combination technique, introduced in [7], repairs the resulting instabilities of the combination technique to a large extent. The combination coeﬃcients now not only depend on the grids involved, but on the function to be reconstructed as well, resulting in a non-linear approximation approach.

2 Regularised Least Squares Regression We consider the regression problem in a possibly high-dimensional space. Given is a data set S = {(xi , yi )}m i=1

xi ∈ IRd , yi ∈ IR,

where we denote with x a d-dimensional vector or index with entries x1 , . . . , xd . We assume that the data has been obtained by sampling an unknown function f∗ which belongs to some space V of functions deﬁned over IRd . The aim is to recover the function f∗ from the given data as well as possible. To achieve a well-posed (and uniquely solvable) problem Tikhonov-regularisation theory [9, 10] imposes a smoothness constraint on the solution. We employ the gradient as a regularisation operator which leads to the variational problem fV = argminf ∈V R(f ) with

1 (f (xi ) − yi )2 + λ||∇f ||2 , m i=1 m

R(f ) =

(1)

Fitting Multidimensional Data Using Combination Techniques

237

where yi = f∗ (xi ). The ﬁrst term in (1) measures the error and therefore enforces closeness of f to the labelled data, the second term ||∇f ||2 enforces smoothness of f , and the regularisation parameter λ balances these two terms. Let us deﬁne the following semi-deﬁnite bi-linear form 1 = f (xi )g(xi ) + λ∇f, ∇g m i=1 m

f, gRLS

(2)

and choose V so that ·, ·RLS is a scalar product on it. With respect to this scalar product the minimisation (1) is an orthogonal projection of f∗ into V [7], i.e. if f − f∗ 2RLS ≤ g − f∗ 2RLS than R(f ) ≤ R(g). As the point evaluations f → f (x) are not continuous in the Sobolev space H 1 for d ≥ 2 we do not get a H 1 -elliptic problem. We suggest to choose a ﬁnite dimensional subspace V ⊂ H 1 of continuous functions containing the constant function. In the following we restrict the problem explicitly to a ﬁnite dimensional subspace VN ⊂ V with an appropriate basis {ϕj }N j=1 . A function f ∈ V is then approximated by N fN (x) = αj ϕj (x). j=1

We now plug the representation (2) of a function f ∈ VN into (1) and obtain the linear equation system (B B + λm · C)α = B y

(3)

and therefore are able to compute the unknown vector α for the solution fN of (1) in VN . C is a symmetric N × N matrix with entries Cj,k = ∇ϕj , ∇ϕk , j, k = 1, . . . , N and corresponds to the smoothness operator. B is a rectangular m × N matrix with entries (B )j,k = ϕj (xk ), j = 1, . . . , N , k = 1, . . . , m and transfers the information from the data into the discrete space, B correspondingly works in the opposite direction. The vector y contains the data labels yi and has length m. In particular we now employ a ﬁnite element approach, using the general form of anisotropic mesh sizes ht = 2−lt , t = 1, . . . , d the grid points are numbered using the multi-index j, jt = 0, . . . , 2lt . We use piecewise d-linear functions d 7 φl,j (x) := φlt ,jt (xt ), jt = 0, . . . , 2lt t=1

where the one-dimensional basis functions φl,j (x) are the so-called hat functions. We denote with Vn the ﬁnite element space which has the mesh size hn in each direction.

3 Combination Technique The sparse grid combination technique [5] is an approach to approximate functions in higher dimensional spaces. Following this ansatz we discretise and

238

J. Garcke and M. Hegland

solve the problem (1) on a sequence of small anisotropic grids Ωl = Ωl1 ,...,ld . For the combination technique we now in particular consider all grids Ωl with |l|1 := l1 + ... + ld = n − q,

q = 0, .., d − 1,

lt ≥ 0,

set up and solve the associated problems (3). The original combination technique [5] now linearly combines the resulting discrete solutions fl (x) ∈ Vl from the partial grids Ωl according to the formula fnc (x)

:=

d−1

q

(−1)

q=0

d−1 q

fl (x).

|l|1 =n−q

The function fnc lives in the sparse grid space 8 Vns :=

Vl .

|l|1 = n − q q = 0, ..., d − 1 lt ≥ 0 −1 d−1 The space Vns has dimension of order O(h−1 ) in contrast to n (log(hn )) d O(hn ) for conventional grid based approaches. Using extrapolation arguments it can be shown that the approximation d−1 ) as property of the combination technique is of the order O(h2n · log(h−1 n ) long as error expansions of the form

f − fl =

d

i=1 j1 ,...,jm ⊂1,...,d

cj1 ,...,jm (hj1 , . . . , hjm ) · hpj1 · . . . · hpjm

for the partial solutions hold [5]. Viewing the minimisation of (1) as projection one can show that the combination technique is an exact projection into the underlying sparse grid space d−1 )) only if the partial (and therefore of approximation order O(h2n · log(h−1 n ) projections commute, i.e. the commutator [PV1 , PV2 ] := PV1 PV2 − PV2 PV1 is zero for all pairs of involved grids [6]. In the following we will show that both these assumptions do not hold for the regularised regression problem and that the combination technique can actually diverge. 3.1 Empirical convergence behaviour We now consider the convergence behaviour of full grid solutions for a simple regression problem, measured against a highly reﬁned grid (due to the lack of an exact solution). As in [3] we consider the function 2

f (x, y) = e−(x

+y 2 )

+ x · y.

Fitting Multidimensional Data Using Combination Techniques

239

Figure 1. Left: Convergence of full grid solutions against highly reﬁned solution measured using (1). Right: Value of the residual (1) and the least squares error for 5000 data using the combination technique with λ = 10−6 0.01

0.01

ct ls-error ct functional

ls-error and residual

0.001 1e-04 1e-05 1e-06 1e-07

100 1000 10000 100000 1000000

1e-08

0.001

1e-04

1e-05

1e-09

0 0

2

4

6

8

10

2

4

6

8

10

level

in the domain [0, 1]2 where the data positions are chosen randomly. To study the behaviour with diﬀerent number of data we take hundred, thousand, tenthousand, hundred-thousand and one million data points. In Figure 1, left we show the error of a full grid solution of level l measured against the one of level n = 12 using the functional (1) as a norm. We see that the error shows two diﬀerent types of convergence behaviour, after some discretisation level the error decreases slower than with the usual h2 . Furthermore, the more data is used, the later this change in the error reduction rate takes place. These observations do not depend on the regularisation parameter λ. A diﬀerent picture arises if we employ the sparse grid combination technique. We measure the residual and the least squares error of the approximation using m = 5000 data and λ = 10−6 , the results are presented in Figure 1, right. One observes that after level n = 3 both error measurements increase on the training data, which cannot happen with a true variational discretisation ansatz. This eﬀect is especially observed for small λ, already with λ = 10−4 the now stronger inﬂuence of the smoothing term results in a (more) stable approximation method. Note that the instability is more common and significant in higher dimensions. 3.2 Asymptotics and errors of the full grid solution If the number m of data points is large, the data term in R(f ) approximates an integral. For simplicity, we discuss only the case of Ω = (0, 1)d and p(x) = 1, however, most results hold for more general domains and probability distributions. Then, if f∗ (x) is a square integrable random ﬁeld with f∗ (xi ) = yi and J(f ) = λ |∇f (x)|2 dx + (f (x) − f∗ (x))2 dx (4) Ω

Ω

then J(f ) ≈ R(f ). Consider a ﬁnite element space VN ⊂ C(Ω) with rectangular elements Q of side lengths h1 , . . . , hd and multilinear element functions.

240

J. Garcke and M. Hegland

The number k of data points xi contained in any element Q is a binomially distributed random variable with expectation m · h1 · · · hd . When mapped onto a reference element I = (0, 1)d , the data points ξ1 , . . . , ξk are uniformly distributed within I. Let φ be a continuous function on Q with expectation 4 4 φ = I φ(ξ)dξ and variance σ(φ)2 = I (φ(ξ) − φ)2 dξ. By the central limit theorem, the probability that the inequality k cσ(φ) 1 φ(ξi ) ≤ √ φ(ξ)dξ − I k i=1 k 4 2 c holds for k → ∞ is in the limit √12π −c e−t /2 dt. As we will apply the ﬁrst lemma of Strang [1] on the bilinear forms corresponding to J(f ) and R(f ) we need this bound for the case of φ(ξ) = u(ξ)v(ξ). Using a variant of the Poincar´e-Friedrichs inequality [1] with the observation that the average of w := φ − φ equals zero, the product rule, the triangular inequality, and the Cauchy-Schwarz inequality we obtain < 2

σ(φ) ≤ C

|∇φ(ξ)| dξ ≤ C (v∇u + u∇v) ≤ Cu1 v1 . I

Transforming this back onto the actual elements Q, summing up over all the elements and applying the Cauchy-Schwarz inequality gives, with high probability for large m, the bound: m cu v 1 1 √1 . u(xi )v(xi ) ≤ u(x)v(x)dx − Ω m k i=1

A similar bound can be obtained for the approximation of the right hand side in the Galerkin equations. We now can apply the ﬁrst lemma of Strang to get the bound best + f √ 1 ), f − fN 1 ≤ C(f − fN 1 k where f best is the best approximation of f in the · -norm. N

1

This bound is very ﬂexible and holds for any intervals I – it does not depend on the particular hi just on the product. This is perfectly adapted to a sparse grid situation where one has on average kl = 2−|l| m data points per element on level |l|. It is known that the combination technique acts like an extrapolation method for the Poisson problem. This is not the case in the regression problem as there is no cancellation of the random errors. Assuming that the errors el are i.i.d. we conjecture that the error of an approximation using the sparse grid combination technique (for large enough k) satisﬁes a bound of the form ⎞ ⎛ 2 |l| f 1 l cl 2 best + ⎠ √ (5) f − fsg 1 ≤ C ⎝f − fsg 1 m

Fitting Multidimensional Data Using Combination Techniques

241

where, as usual, cl are the combination coeﬃcients. To study this eﬀect experimentally let us consider (4) with 1 4 1 3 1 3 1 2 2 3y − 2y y − y + x − x f∗ (x, y) = −100λ · (2x − 1) 4 3 3 2 1 3 1 2 1 4 1 3 x − x y − y . +100 · 3 2 4 3 1 3 1 2 1 4 1 3 is the solution of The function f (x, y) = 100 · 3 x − 2 x 4y − 3y the resulting continuous problem. As indicated, if we now 4 assume that a Monte-Carlo approach is used to compute the integrals f (x)g(x)dx and Ω 4 f (x)f (x)dx in the Galerkin equations we obtain the regularised least ∗ Ω squares formulation (1). We measure the diﬀerence between the resulting discrete solutions for ﬁxed number of data and the above continuous solution. In Figure 2, left, we show how this error, measured in the H 1 -seminorm, behaves. At ﬁrst we have the “usual” decrease in the error, but after about 1 sample point per element the error increases instead. The bound (5) holds only asymptotically in k and thus for ﬁxed k and very small mesh size it will break down. In the following we consider what happens asymptotically in this case for a regular grid (hi = h). Recall that the full grid solution fN satisﬁes the Galerkin equations m 1 T ∇fN ∇gdx + (fN (xi ) − yi )g(xi ) = 0, for all g ∈ VN . (6) λ m i−1 Ω Using an approximate Green’s kernel GN (x, xi ) for the Laplacian one can write the solution as 1 (yi − fN (xi ))GN (x, xi ). mλ i=1 m

fN (x) =

One can show that for i = j the values GN (xi , xj ) are bounded as h → 0 and that GN (xi , xi ) = O(| log(h)|) for d = 2 and GN (xi , xi ) = O(h2−d ) for d = 2. One then gets:

Figure 2. Left: H1 -seminorm diﬀerence of the solutions of J(f ) and R(f ) plotted against the number k of data points per cell. Right: Decrease of functional R 0.01

0.0045

0.001

0.004

1e-04

100 1000 10000 100000 1000000

0.0035

1e-05 0.003 100 1000 10000 100000 1000000

1e-06

1e-07 1e+06

10000

100

1

0.01

1e-04

0.0025

1e-06

0

2

4

6

8

10

242

J. Garcke and M. Hegland

Proposition 1. Let fN be the solution of equation 6. Then fN (xi ) → yi for h → 0 and there exists a C > 0 such that |yi − fN (xi )| ≤ C

mλ , | log h|

if d = 2

and |yi − fN (xi )| ≤ Cmλhd−2 ,

if d > 2.

Proof. Using the Green’s kernel matrix GN with components GN (xi , xj ) one has for the vector fN of function values fN (xi ) the system (GN + mλI)fN = GN y where y is the data vector with components yi . It follows that fN −y = (GN +mλI)−1 GN y−(GN +mλI)−1 (GN +mλI) = −mλ(GN +mλI)−1 y. The bounds for the distance between the function values and the data then follow when the asymptotic behaviour of GN mentioned above is taken into account. ! It follows that one gets an asymptotic overﬁtting in the data points and the data term in R(f ) satisﬁes the same bound m

mλ , | log h|

if d = 2

(fN (xi ) − yi ) ≤ Cmλhd−2 ,

if d ≥ 3

2

(fN (xi ) − yi ) ≤ C

i=1

and

m

2

i=1

and h → 0. The case d = 2 is illustrated in Figure 2, right. While the approximations fN do converge on the data points they do so very locally. In an area outside a neighbourhood of the data points the fN tend to converge to a constant function so that the ﬁt picks up fast oscillations near the data points but only slow variations further away. It is seen that the value of R(fN ) → 0 for h → 0. In the following we can give a bound for this for d ≥ 3. Proposition 2. The value of functional J converges to zero on the estimator fN and J(fN ) ≤ Cmλhd−2 √ for some C > 0. In particular, one has ∇fN ≤ C mλh(d−2) . Proof. While we only consider regular partitioning with hyper-cubical elements Q, the proof can be generalised for other elements. First, let bQ be a member of the ﬁnite element function space such that bQ (x) = 1 for x ∈ Q and bQ (x) = 0 for x in any element which is not a neighbour of Q. One can see that

Fitting Multidimensional Data Using Combination Techniques

243

|∇bQ |2 dx ≤ Chd−2 . Q

Choose h such that for the k-th component of xi one has |xi,k − xj,k | > 3h,

for i = j.

In particular, any element contains at most one data point. Let furthermore Qi be the element containing xi , i.e., xi ∈ Qi . Then one sees that g deﬁned by g(x) =

m

yi bQi (x)

i=1

interpolates the data, i.e., g(xi ) = yi . Consequently, R(g) = λ |∇g|2 dx. Ω

Because of the condition on h one has for the supports supp bQi ∩ supp bQj = ∅ for i = j and so R(g) = λ

m i=1

|∇bQi |2 dx

yi2 Ω

and, thus, R(g) ≤ Cmλhd−2 . It follows that inf R(f ) ≤ R(g) ≤ Cmλhd−2 .

!

We conjecture that in the case of d = 2 one has J(fN ) ≤ Cmλ/| log h|. We would also conjecture, based on the observations, that fN converges very slowly towards a constant function.

4 Projections and the Combination Technique It is well known that ﬁnite element solutions of V-elliptic problems can be viewed as Ritz projections of the exact solution into the ﬁnite element space satisfying the following Galerkin equations: fN , gRLS = f∗ , gRLS ,

g ∈ VN .

The projections are orthogonal with respect to the energy norm · RLS . Let Pl : V → Vl denote the orthogonal projection with respect to the norm S · RLS and let Pn be the orthogonal projection into the sparse grid space S Vn = |l|≤n Vl . If the projections Pl form a commutative semigroup, i.e., if for all l, l there exists a l such that Pl Pl = Pl then there exist cl such that

244

J. Garcke and M. Hegland

PnS =

cl Pl .

|l|≤n

We have seen in the previous section why the combination technique may not provide good approximations as the quadrature errors do not cancel in the same way as the approximation errors. The aspect considered here is that the combination technique may break down if there are angles between spaces which are suﬃciently smaller than π/2 and for which the commutator may not be small. For illustration, consider the case of three spaces V1 , V2 and V3 = V1 ∩ V2 . The cosine of the angle α(V1 , V2 ) ∈ [0, π/2] between the two spaces V1 and V2 is deﬁned as 5 6 c(V1 , V2 ) := sup (f1 , f2 ) | fi ∈ Vi ∩ (V1 ∩ V2 )⊥ , fi ≤ 1, i = 1, 2 . The angle can be characterised in terms of the orthogonal projections PVi into the closed subspaces Vi and the corresponding operator norm, it holds [2] c(V1 , V2 ) = P1 P2 PV3⊥ .

(7)

If the projections commute then one has c(V1 , V2 ) = 0 and α(V1 , V2 ) = π/2 which in particular is the case for orthogonal Vi . However, one also gets α(V1 , V2 ) = π/2 for the case where V2 ⊂ V1 (which might contrary to the notion of an “angle”). Numerically, we estimate the angle of two spaces using a Monte Carlo approach and the deﬁnition of the matrix norm as one has c(V1 , V2 ) = PV1 PV2 − PV1 ∩V2 = sup g

PV1 PV2 g − PV1 ∩V2 g PV2 g

(8)

For the energy norm the angle between the spaces substantially depends on the positions of the data points xi . We consider in the following several diﬀerent layouts of points and choose the function values yi randomly. Then the ratio PV1 PV2 g−PV1 ∩V2 g is determined for these function values and data points PV2 g and the experiment is repeated many times. The estimate chosen is then the maximal quotient. In the experiments we choose Ω = (0, 1)2 and the subspaces V1 and V2 were chosen such that the functions were linear with respect to one variable while the h for the grid in the other variables was varied. In a ﬁrst example, the data points are chosen to be the four corners of the square Ω. In this case, the angle turns out to be between 89.6 and 90 degrees. Lower angles corresponded here to higher values of λ. In the case of λ = 0 one has the interpolation problem at the corners. These interpolation operators, however, do commute. In this case the penalty term is actually the only source of non-orthogonality. A very similar picture evolves if one chooses the four data points from {0.25, 0.75}2 . The angle is now between 89 and 90 degrees where the higher angles are now obtained for larger λ and so the regulariser improves the orthogonality.

Fitting Multidimensional Data Using Combination Techniques

245

A very diﬀerent picture emerges for the case of four randomly chosen points. In our experiments we now observe angles between 45 degrees and 90 degrees and the larger angles are obtained for the case of large λ. Thus the regularise again does make the problem more orthogonal. We would thus expect that for a general ﬁtting problem a choice of larger α would lead to higher accuracy (in regard to the sparse grid solution) of the combination technique. A very similar picture was seen if the points were chosen as the elements of the set 0.2i(1, 1) for i = 1, . . . , 4. In all cases mentioned above the angles decrease when smaller mesh sizes h are considered. 4.1 Optimised combination technique In [7] a modiﬁcation of the combination technique is introduced where the combination coeﬃcients not only depend on the spaces as before, which gives a linear approximation method, but instead depend on the function to be reconstructed as well, resulting in a non-linear approximation approach. In [6] this ansatz is presented in more detail and the name “opticom” for this optimised combination technique is suggested. Assume in the following, that the generating subspaces of the sparse grid are suitably numbered from 1 to s. To compute the optimal combination coeﬃcients ci one minimises the functional θ(c1 , . . . , cs ) = |P f −

s

ci Pi f |2RLS ,

i=1

where one uses the scalar product corresponding to the variational problem ·, ·RLS , deﬁned on V to generate a norm. By simple expansion one gets θ(c1 , . . . , cs ) =

s

ci cj Pi f, Pj f RLS

i,j=1

−2

s

ci Pi f 2RLS + P f 2RLS .

i=1

While this functional depends on the unknown quantity P f , the location of the minimum of J does not. By diﬀerentiating with respect to the combination coeﬃcients ci and setting each of these derivatives to zero we see that minimising this norm corresponds to ﬁnding ci which have to satisfy ⎡ ⎤⎡ ⎤ ⎡ ⎤ P1 f 2RLS P1 f 2RLS · · · P1 f, Ps f RLS c1 ⎢P2 f, P1 f RLS · · · P2 f, Ps f RLS ⎥⎢ c2 ⎥ ⎢ P2 f 2RLS ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ (9) ⎢ ⎥⎢ .. ⎥ = ⎢ ⎥ .. .. .. .. ⎣ ⎣ ⎣ ⎦ ⎦ ⎦ . . . . . Ps f, P1 f RLS · · · Ps f 2RLS cm Pm f 2RLS The solution of this small system creates little overhead. However, in general an increase in computational complexity is due to the need for the determination of the scalar products Pi f, Pj f RLS . Their computation is often diﬃcult

246

J. Garcke and M. Hegland

Figure 3. Value of the functional (1) and the least squares error on the data, i.e. M 2 2 −x2 1 + e−y for the combination i=1 (f (xi ) − yi ) , for the reconstruction of e M technique and the optimised combination technique for the grids Ωi,0 , Ω0,i , Ω0,0 and the optimised combination technique for the grids Ωj,0 , Ω0,j , 0 ≤ j ≤ i with λ = 10−4 (left) and 10−6 (right) 0.01

ct ls-error ct functional opticom ls-error opticom functional opticom intermed. grids l2-error opticom intermed. grids functional

0.001

least square error and functional

least square error and functional

0.01

1e-04

1e-05

1e-06

ct ls-error ct functional opticom ls-error opticom functional opticom intermed. grids l2-error opticom intermed. grids functional

0.001

1e-04

1e-05

1e-06 0

2

4

6

8

10

0

level

2

4

6

8

10

level

as it requires an embedding into a bigger discrete space which contains both Vi and Vj . Using these optimal coeﬃcients ci the combination formula is now fnc (x) :=

d−1

cl fl (x).

(10)

q=0 |l|1 =n−q 2

2

Now let us consider one particular additive function u = e−x + e−y , which we want to reconstruct based on 5000 random data samples in the domain [0, 1]2 . We use the combination technique and optimised combination technique for the grids Ωi,0 , Ω0,i , Ω0,0 . For λ = 10−4 and λ = 10−6 we show in Figure 3 the value of the functional (1), in Table 1 the corresponding numbers for the residuals and the cosine of γ = ∠(PU1 u, PU2 u) are given. We see that both methods diverge for higher levels of the employed grids, nevertheless as expected the optimised combination technique is always better than the normal one. We also show in Figure 3 the results for an optimised combination technique which involves all intermediate grids, i.e. Ωj,0 , Ω0,j for 1 ≤ j < i, as well. Here we do not observe rising values of the functional for higher levels but a saturation, i.e. higher reﬁnement levels do not substantially change the value of the functional.

5 Conclusions Here we consider a generalisation of the usual kernel methods used in machine learning as the “kernels” of the technique considered here have singularities on the diagonal. However, only ﬁnite dimensional approximations

Fitting Multidimensional Data Using Combination Techniques level 1 2 3 4 5 6 7 8 9 10

cos(γ) -0.012924 -0.025850 -0.021397 -0.012931 0.003840 0.032299 0.086570 0.168148 0.237710 0.285065

e2c 3.353704 · 10−4 2.124744 · 10−5 8.209228 · 10−6 1.451818 · 10−5 2.873697 · 10−5 5.479755 · 10−5 1.058926 · 10−4 1.882191 · 10−4 2.646455 · 10−4 3.209026 · 10−4

247

e2o 3.351200 · 10−4 2.003528 · 10−5 7.372946 · 10−6 1.421387 · 10−5 2.871036 · 10−5 5.293952 · 10−5 9.284347 · 10−5 1.403320 · 10−4 1.706549 · 10−4 1.870678 · 10−4

Table 1. Residual for the normal combination technique e2c and the optimised combination technique, as well as cosine of the angle γ = ∠(PU1 u, PU2 u)

are considered. The overﬁtting eﬀect which occurs for ﬁne grid sizes is investigated. We found that the method (using the norm of the gradient as a penalty) did asymptotically (in grid size) overﬁt the data but did this very locally only close to the data points. It appeared that the information in the data was concentrated on the data point and only the null space of the penalty operator (in this case constants) was ﬁtted for ﬁne grids. Except for the overﬁtting in the data points one thus has the same eﬀect as when choosing very large regularisation parameters so that the overﬁtting in the data points does arise together with an “underﬁtting” in other points away from the data. Alternatively, one could say that the regularisation technique acts like a parametric ﬁt away from the data points for small grid sizes and overall for large regularisation parameters. The eﬀect of the data samples is akin to a quadrature method if there are enough data points per element. In practise, it was seen that one required at least one data point per element to get reasonable performance. In order to understand the ﬁtting behaviour we analysed the performance both on the data points and in terms of the Sobolev norm. The results do not directly carry over to results about errors in the sup norm which is often of interest for applications. However, the advice to have at least one data point per element is equally good advice for practical computations. In addition, the insight that the classical combination technique ampliﬁes the sampling errors and thus needs to be replaced by an optimal procedure is also relevant to the case of the sup norm. The method considered here is in principle a “kernel method” [8] when combined with a ﬁnite dimensional space. However, the arising kernel matrix does have diagonal elements which are very large for small grids and, in the limit is a Green’s function with a singularity along the diagonal. It is well known in the machine learning literature that kernels with large diagonal elements lead to overﬁtting, however, the case of families of kernels which approximate a singular kernel is new.

248

J. Garcke and M. Hegland

References 1. D. Braess. Finite elements. Cambridge University Press, Cambridge, second edition, 2001. 2. F. Deutsch. Rate of convergence of the method of alternating projections. In Parametric optimization and approximation (Oberwolfach, 1983), volume 72 of Internat. Schriftenreihe Numer. Math., pages 96–107. Birkh¨ auser, Basel, 1985. 3. J. Garcke. Maschinelles Lernen durch Funktionsrekonstruktion mit verallgemeinerten d¨ unnen Gittern. Doktorarbeit, Institut f¨ ur Numerische Simulation, Universit¨ at Bonn, 2004. 4. J. Garcke, M. Griebel, and M. Thess. Data mining with sparse grids. Computing, 67(3):225–253, 2001. 5. M. Griebel, M. Schneider, and C. Zenger. A combination technique for the solution of sparse grid problems. In P. de Groen and R. Beauwens, editors, Iterative Methods in Linear Algebra, pages 263–281. IMACS, Elsevier, North Holland, 1992. 6. M. Hegland, J. Garcke, and V. Challis. The combination technique and some generalisations. Linear Algebra and its Applications, 420 (2-3): 249–275, 2007. 7. M. Hegland. Additive sparse grid ﬁtting. In Proceedings of the Fifth International Conference on Curves and Surfaces, Saint-Malo, France 2002, pp. 209–218. Nashboro Press, 2003. 8. B. Sch¨ olkopf and A. Smola. Learning with Kernels. MIT Press, 2002. 9. A. N. Tikhonov and V. A. Arsenin. Solutions of ill-posed problems. W. H. Winston, Washington D.C., 1977. 10. G. Wahba. Spline models for observational data, volume 59 of Series in Applied Mathematics. SIAM, Philadelphia, 1990.

Mathematical Modelling of Chemical Diﬀusion through Skin using Grid-based PSEs Christopher Goodyer1 , Jason Wood1 , and Martin Berzins2 1 2

School of Computing, University of Leeds, Leeds, UK [email protected], [email protected] SCI Institute, School of Computing, University of Utah, Salt Lake City, USA [email protected]

Abstract A Problem Solving Environment (PSE) with connections to remote distributed Grid processes is developed. The Grid simulation is itself a parallel process and allows steering of individual or multiple runs of the core computation of chemical diﬀusion through the stratum corneum, the outer layer of the skin. The eﬀectiveness of this Grid-based approach in improving the quality of the simulation is assessed.

1 Introduction The use of Grid technologies [8] has enabled the automated use of remote distributed computers for many new applications. Many of the larger Grid projects around the world are not simply using high performance computers (HPC) but are making better use of the distributed networks of machines within their own companies and institutions. In many of these cases, the problems being tackled on these systems are not so large that parallel computing is being used for single cases, but instead, multiple production runs, sometimes within the context of a design optimization process, are run on a multiprocessor architecture, e.g. [11]. The key elements of any computation are that the results obtained are useful, accurate, and have been obtained as eﬃciently as possible. Accuracy of the results is an issue that is resolved by an appropriate choice of mathematical model and computational method. The usefulness of the results is usually governed by choice of appropriate input parameters, ranging from those representing the solution case to those concerning the methods used during the computation. A common method for getting better control of the relationship between input parameters and output results has been through the use of Problem Solving Environments (PSEs). These typically combine the inputs, the computation, and visualisation of the outputs into one workﬂow, where the user is said to “close the loop” by feeding back from the visualisation of the output, to changes to the inputs. A key element of PSEs is that they

250

C. Goodyer et al.

must be accessible for non-expert computer users, since the users are typically scientists more focused on their own ﬁelds rather than on the intricacies of using the Grid. PSEs are discussed in general in Sect. 3, together with the PSE developed in this work. The application considered here is that of chemical diﬀusion through the skin. In this work we have simulated how the application of a chemical on the outside of the body gets through the outer layer into the body. The use of multiple simulations is very important for situations such as this, since each calculation represents only one particular case. For transient cases using the true full 3-d heterogeneous skin structure it can take hours or even days for solutions to fully converge. Through the use of multiple instances of the solver it is possible to reduce this to the maximum individual runtime, provided enough resources are available. Brief details of the solver, and the range of cases being considered are given in Sect. 2. The interaction of the PSE with the remote processes is the most important part of this work. The software described here has allowed transparent use of the Grid. For example making the process of running a large parallel job on a remote resource as easy as running a small one locally is very important for non-expert users. We have used the gViz library [2, 22] to provide the Grid middleware to handle the communication between the user’s desktop and the remote Grid processes. How these components are joined together is discussed in Sect. 4 where the use of a directory service of running jobs is also described. The main advantages of using a PSE are the ability to visualise the output datasets and to steer the calculations; these are discussed in Sect. 5. The paper is summarised in Sect. 6 along with a summary of the advantages of the Gridbased approach described, and suggested necessary extensions for future work.

2 Chemical Diﬀusion Through Skin The motivating problem in this work is that of numerical modelling of chemical diﬀusion through the skin. Such a situation might arise either purposefully or accidentally. For example a person being accidentally exposed to a chemical at work may hope for minimal adsorption into the body, but application of a drug transdermally could have great therapeutic beneﬁts. The barrier function of the skin comes almost entirely from the outermost layer, the stratum corneum. This is a highly lipophilic layer only about 10µm thick. Once a chemical has got through the stratum corneum it is into the viable epidermis, which is hydrophilic, and hence eﬀectively into the blood steam. The stratum corneum itself is made up of between six and 40 layers of corneocytes. Each corneocytes is hexagonal in shape, and is typically about 40µm across, 1µm high. They are surrounded by a lipid layer about 0.1µm wide, both between the tessellating corneocytes and between the individual layers. It is through this lipid that almost all of the chemical diﬀuses since the corneocytes themselves are almost impermeable. The diﬀusion path through

Mathematical Modelling of Diﬀusion through Skin using Grid-based PSEs

251

the lipid from surface to body is very tortuous due to the aspect ratio of the corneocytes. The mathematical model of this Fickian diﬀusion process has been well understood for some time, although the modelling is often only done on 1-d homogeneous membranes. There has been some 2-d work on “brick and mortar” shaped domains, notably that of Heisig et al. [13] and Frasch [9] but this work is the ﬁrst to tackle the true three dimensional nature of the skin. The model of Fickian diﬀusion is given in non-dimensional form by: ∂ ∂θ ∂θ ∂ ∂θ ∂ ∂θ = γ + γ + γ , (1) ∂τ ∂X ∂X ∂Y ∂Y ∂Z ∂Z where X, Y and Z are the coordinates, τ is the time, θ is the concentration, and γ the diﬀusion coeﬃcient, with boundary conditions: θ = 1, on Z = L θ = 0, on Γ : Z = 0 , and periodic boundary conditions between opposing faces perpendicular to the X-Y plane. In this work we are assuming that the corneocytes themselves are impermeable and hence on these boundaries we are assuming symmetry conditions perpendicular to the boundary. The domains have been meshed using unstructured tetrahedra, and discretised and solved using a Galerkin linear ﬁnite element solver, based on that originally developed by Walkley [19]. The quantities of interest are the concentration proﬁle throughout the domain; the total ﬂux out through the bottom face and the lag time a measure of the relative time taken to reach steady state. These are deﬁned precisely as ∂θ 1 dΓ (2) Flux out, Fout = AΓ Γ ∂Z T Cumulative mass out = Fout dτ , (3) τ =0

where the lag time is the X-intercept of the steady state rate of cumulative mass out, extrapolated backwards. The key questions being considered with the solver thus far have been concerned with the physical geometric model of the skin. The diﬀerences in the alignment of 2-d corneocytes was considered in [12] and these diﬀerences are further complicated by the move to 3-d. We have been considering the eﬀects of alignment, and how these are aﬀected by the aspect ratio of the corneocytes and the thickness of the lipids. Part of this work is to verify the previously published geometric estimates for the eﬀect of “slits, wiggles and necking” around the corneocytes [5, 14]. The eﬀect of the number of layers of corneocytes is also of great importance. The idea of layer independence of quantities needs rigorous proof, as the relative diﬀerence between having two or three layers is much greater than the diﬀerence between ten and eleven layers. Another case of great importance concerns the eﬀect of the size of

252

C. Goodyer et al.

application patches to the skin. Since any patch spreads out sideways as well as down, even through homogeneous membranes, then the separation of patches compared to the depth of stratum corneum makes signiﬁcant changes to both the mass of chemical getting into the body and the lag time.

3 Problem Solving Environments PSEs allow a user to break away from the traditional batch mode of processing. Traditionally, the user of a scientiﬁc research code would set all the parameters inside the program, compile the software, run it with the results saved to disk. These results could then be loaded into a visualisation package, processed and rendered as a screen output. Only once this sequence had been completed would the user be able to decide if the original set of parameters needed alteration, and the whole process would begin again. PSEs are typically built in or around visualisation packages. Some commercial visualisation products have added PSE capabilities during their development, such as IRIS Explorer [21] and AVS [18]. Other systems that were built around particular application areas have expanded into more general products, such as SCIRun [16]. These all have the same key elements of modular environments where graphical blocks represent the tasks (modules), with wires joing the blocks representing dataﬂow from one task to another. The arrangements of modules and wires are variously referred to as workﬂow, dataﬂow networks and maps. With a PSE the simulation is presented as a pre-compiled module in the workﬂow, and all the variables that may be changed are made available to the user through a graphical user interface (GUI). An example of the interface of the standard PSE for chemical diﬀusion through the skin, with the simulation running completely on the local machine, is shown with IRIS Explorer in Fig. 1. The key parts of the environment are, clockwise from top left, the librarian (a repository of all available modules), the map editor (a workspace for visually constructing maps), the output visualisation, and the input modules. IRIS Explorer has a visual programming interface where modules are dragged from the librarian into the map editor and then input and output ports are wired together between diﬀerent modules to represent the ﬂow of data through the pipeline, typically ending up with a rendered 3-d image. The inputs to the modules are all given through GUIs enabling parameter modiﬁcation without the need to recompile. In the map shown the ﬁrst module is the interface to the mesh generator, Netgen [17]. This module takes inputs concerning the alignment of the corneocytes, their sizes and separation, and how much their alignment is staggered. It then passes the appropriately formatted mesh structure onto the second module which performs the simulation. This module takes user inputs such as the location and size of chemical patch on the skin, whether the solve is to be steady state or transient, and computational parameters such as the level of initial grid adaptation and the frequency of generation of output datasets.

Mathematical Modelling of Diﬀusion through Skin using Grid-based PSEs

253

Figure 1. IRIS Explorer PSE for skin

4 Remote Grid Simulation One of the drawbacks to using a PSE used to be that the simulation was part of the environment and therefore had to be run locally within the PSE. This assumption, however, is not necessarily true as was shown by the ‘Pollution Demonstrator’ of Walkley et al. [3,20] which extended the approach to that of a remote simulation connected to a local (IRIS Explorer) PSE which still handled the steering and visualisation needs. This bespoke approach has been extended into a generic thread-based library, called gViz [2] providing an interface to the remote simulation. The local PSE environment is now independent and hence no longer needs to be IRIS Explorer. The gViz library itself operates as a collection of threads attached to the simulation code which manage the connections between the simulation and any connected users. Thus, once the code is running, the simulation initialises the library making itself available for connections from multiple users. When a user connects to the simulation new threads are started to handle steering and output datastreams. The library does not limit the number of connected users making it possible for multiple users to be collaborating by visualising the output results and modifying input parameters of the same simulation. To launch the simulation onto a remote resource it is necessary to both select the desired machine and an appropriate job launching mechanism. The launching module can interrogate a GIIS service (part of the Globus toolkit [7]) to discover information about available resources. Once the simulation starts, the gViz thread opens a port to which connections can be made.

254

C. Goodyer et al.

The gViz library supports several communications methods between simulation and other processes. Here we are using a web service style directory service communication built around SOAP using the gSOAP library [6]. In this method, the simulation contacts the directory service at startup and deposits its connection details. These can then be retrieved by the one or more PSEs when convenient to the user. The ability to undertake collaborative work is provided at no extra cost through the use of the gViz library. Since the directory service can be advertised to collaborators as running in a consistent location they will all be able to connect to this to discover the location of all the running simulations. This also assists users who may wish only to visualise the output rather than steer. When multiple simulations are involved then there are additional job submission considerations. Many providers of Grid resources are actively limiting the number of submissions per person able to be executed simultaneously, and hence running large numbers concurrently is often not possible on the chosen resource. In the work of Goodyer et al. [11] it was seen how the use of a parallel environment was beneﬁcial for solution time of a previously serial optimisation application, through concurrent simulation of independent cases with similar parameter sets. Here we extend this idea to take the multiple independent simulations inside one large MPI job. This means that co-allocation of all runs is handled by the resource’s own job scheduler and reduces the number of submitted jobs to just one, thus avoiding any resource limits set. When the MPI job starts all simulations register separately with the directory service, with unique identiﬁers allowing each to communicate back to the PSE separately if required.

5 Visualization and Steering With multiple simulations being controlled by the same PSE there are issues concerning eﬀective management of both input and output data, [4,22]. In this section we address these in turn, relating to how they have been used for the skin cases solved to date. The cases discussed here are all concerned with assessing how the geometric makeup of the skin aﬀects the calculated values of ﬂux out and lag time. To that end we have already generated a large selection of 3-d meshes for the solver to use On start-up each parallel process uses standard MPI [15] commands to discover its unique rank in the simulation, and load the appropriate mesh. The steering control panel has a collection of input variables, known as steerables and output values, known as viewables. The steerables are the inputs as described in Sect. 2, and the viewables are output variables calculated by the simulation to which the module is currently attached, including the quantities of interest and the current time of the solver through a transient simulation.

Mathematical Modelling of Diﬀusion through Skin using Grid-based PSEs

255

Figure 2. MultiDisplay visualising many output streams

Another diﬀerence between the PSE shown in Fig. 1 and the Grid enabled version is that this steering module now has two location bars along the bottom, and the buttons ‘Connect’ and ‘Steer all’. The two location bars contain the connection information retrieved from the directory service: the ﬁrst being an individual location, the second being a multiline list of simulations. The connections for steering updates can therefore be actually given to anywhere from one to all of the simulations simultaneously. In this manner it is possible to extend the experimentation approach, discussed in Sect. 3 to many simulations, rather than one speciﬁc case at a time. In addition to the ability to ‘Steer all’ it is similarly possible to retrieve all the current viewable parameters from the simulations currently running, and hence the ability to produce summary visualisations of all the cases. The visualisation of remote gViz processes for both 2-d and 3-d datasets has been done in IRIS Explorer, SCIRun, VTK and Matlab [1, 2, 10, 20]. The data sent is generic to enable the client PSE end to convert it into a native format. For concurrent simulations visualisation is conceptually no harder than for a single simulation, however the realities are slightly diﬀerent. In the work of Handley et al. [1] this has been tackled for the ﬁrst time. Here multiple datastreams from multiple simulations are combined, and the 2-d output is tiled across the display. In 3-d this is a signiﬁcantly harder problem, because the quantities of data being returned by the simulation are potentially orders of magnitude more. To address this issue we have developed a ‘MultiDisplay’ module, as shown in Fig. 2. This module receives images rather than the usual 3-d data. The individual datastreams from the simulations are rendered oﬀ

256

C. Goodyer et al.

screen, with an appropriate colourmap and camera position to produce a 2-d image, and these images are fed into the MultiDisplay module. The image pane on the left shows tiles of 16 simulations at a time, and these may be enlarged into the right hand pane for closer inspection. With multiple simulations connected to the PSE the quantity of data being received is the same, regardless of how it is visualised. It is just the quality of the rendering process to get the ﬁnal images which can make a diﬀerence to the time and memory used.

6 Conclusions In this work we have demonstrated how an intensive ﬁnite element solution code can be run through a Problem Solving Environment. It has been seen that this simulation can be launched remotely onto a Grid resource, and still remain interactive for both steering and output visualisations. Through the use of running multiple simulations run through one large MPI job it is possible to have concurrent calculation of diﬀerent cases, with the same set of steerable inputs, for example. Steering has been shown to be possible on both individual cases and sets of cases up to a ‘steer all’ capacity. The output visualisations for these cases have been shown to be possible in a highly detailed manner, through the use of the visualisation environments processing power on individual cases, or in a group processing fashion to render the simultaneous cases all in the same window. The use of the PSE has enabled two forms of steering to the skin scientist user. Firstly, the ability of the scientist to ask “What if...?” questions on individual cases has been seen to be very beneﬁcial in getting quick answers to individual questions. Through the use of multiple concurrent simulations it has been possible to get more general answers over a wider range of cases than had been possible with just the single run. An important part of running the simulations through the PSE and on the Grid has been the ability to detach the local processes from the remote simulations, hence enabling monitoring on demand rather than being constantly connected. It is true that when potentially hundreds of cases are being run for hours and days that the contact time with the simulations is probably only a small percentage of that, however the added ﬂexibility enables the user to see instantly when a parameter has been set up incorrectly. This means that erroneous computations can be minimised and all cases adjusted accordingly. There are several issues that have arisen out of this work needing further consideration. The main ones are concerned with how to eﬃciently handle the visualisation of large numbers of large datastreams. For the skin geometries we have considered here the obvious ﬁrst step would be to give the user the option of only retrieving the surface mesh. This would reduce the necessary transmission time from the simulation to the PSE, and also reduce the amount of work necessary locally to render the ﬁnal image. By retaining the

Mathematical Modelling of Diﬀusion through Skin using Grid-based PSEs

257

full dataset at the simulation end it would be possible to examine the data of an individual simulation in greater depth than would normally be done for multiple simulations. Another potential idea would be to actually do far more of the visualisation work remotely, hence reducing the local load even further. This could be done by using products such as the Grid-enabled version of IRIS Explorer discussed by Brodlie et al. [2] to put the visualisation process closer to the simulation than to the desktop, hence removing the need to the data to reach the local machine before the ﬁnal rendering.

Acknowledgments This work is funded through the EPSRC grant GR/S04871 and builds on the “gViz Grid Middleware” UK e-Science project. We also acknowledge Annette Bunge of the Colorado School of Mines for her collaboration and expertise.

References 1. Aslanidi, O. V., Brodlie, K. W., Clayton, R. H., Handley, J. W., Holden, A. V. and Wood, J. D. Remote visualization and computational steering of cardiac virtual tissues using gViz In: Cox, S., and Walker D. W., eds.: Proceedings of the 4th UK e-Science All Hands Meeting (AHM’05) EPSRC (2005) 2. Brodlie, K., Duce, D., Gallop, J., Sagar, M., Walton, J. and Wood, J. Visualization in Grid Computing Environments. IEEE Visualization (2004) 3. Brodlie, K. W., Mason, S., Thompson, M., Walkley, M. A., and Wood, J. W. Reacting to a crisis: beneﬁts of collaborative visualization and computational steering in a Grid environment. In: Proceedings of the All Hands Meeting 2002 (2002) 4. Brooke, J. M., Coveney, P. V., Harting, J., Jha, S., Pickles, S. M., Pinning, R. L., and Porter, A. R. Computational steering in RealityGrid. In: Cox, S., ed.: Proceedings of the All Hands Meeting 2003, EPSRC, 885–888 (2003) 5. Cussler, E. L., Hughes, S. E., Ward III, W. J. and Rutherford, A. Barrier Membranes. Journal of Membrane Science, 38:161-174 (1988) 6. van Engelen, R. A. and Gallivan, K. A. The gSOAP Toolkit for Web Services and Peer-To-Peer Computing Networks. In: Proceedings of IEEE CCGrid. (2002) 7. Foster, I. and Kesselman, C. Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputer Applications, 11, 115–128 (1997) 8. Foster, I. and Kesselman, C. The Grid 2 : The Blueprint for a new computing infrastructure. Elsevier (2004) 9. Frasch H. F. and Barbero A. M. Steady-state ﬂux and lag time in the stratum corneum lipid pathway: results from ﬁnite element models. Journal of Pharmaceutical Sciences, Vol 92(11), 2196–2207 (2003) 10. Goodyer, C. E. and Berzins, M. Solving Computationally Intensive Engineering Problems on the Grid using Problem Solving Environments. In: Cuhna, J. C. and Rana, O. F., eds., Grid Computing: Software Environments and Tools. Springer Verlag (2006)

258

C. Goodyer et al.

11. Goodyer, C. E., Berzins, M., Jimack, P. K., and Scales, L. E. A Grid-enabled Problem Solving Environment for Parallel Computational Engineering Design Advances in Engineering Software, 37(7):439–449 (2006) 12. Goodyer C. E. and Bunge A. What If...? Mathematical Experiments on Skin, In: Proceedings of the Perspectives in Percutaneous Penetration, La Grande Motte, France (2004) 13. Heisig M., Lieckfeldt R., Wittum G., Mazurkevich G. and Lee G. Non steadystate descriptions of drug permeation through stratum corneum. I. the biphasic brick-and-mortar model. Pharmaceutical Research, Vol 13(3), 421–426 (1996) 14. Johnson, M. E., Blankschtein, D. and Langer, R. Evaluation of solute permeation through the stratum corneum: Lateral bilayer diﬀusion as the primary transport mechanism. Journal of Pharmaceutical Sciences, 86:1162-1172 (1997) 15. Message Passing Interface Forum: MPI: A message-passing interface standard. International Journal of Supercomputer Applications 8 (1994) 16. Parker, S. G., and Johnson, C. R. SCIRun: A scientiﬁc programming environment for computational steering. In: Meuer, H. W., eds.: Proceedings of Supercomputer ’95, New York, Springer-Verlag (1995) 17. Sch¨ oberl, J. Netgen mesh generation package, version 4.4, (2004) http://www. hpfem.jku.at/netgen/ 18. Upson, C., Faulhaber, T., Kamins, D., Laidlaw, D., Schlegel, D., Vroom, J., Gurwitz, R. and van Dam, A. The application visualization system: A computational environment for scientiﬁc visualization. IEEE Computer Graphics and Applications, 9(4):30–42 (1989) 19. Walkley, M. A., Jimack, P. K. and Berzins, M. Anisotropic adaptivity for ﬁnite element solutions of 3-d convection-dominated problems. International Journal of Numerical Methods in Fluids, Vol 40, 551–559, (2002) 20. Walkley, M. A., Wood, J., Brodlie, K. W. A distributed collaborative problem solving environment. In: Sloot, P. M. A., Tan, C. J. K., Dongarra, J. J., Hoekstra, A. G., eds.: Computational Science, ICCS 2002 Part I, Lecture Notes in Computer Science. Volume 2329, 853–861, Springer (2002) 21. Walton, J. P. R. B. Now you see it – interactive visualisation of large datasets. In: Brebbia, C. A., Power, H., eds.: Applications of Supercomputers in Engineering III. Computational Mechanics Publications/Elsevier Applied Science (1993) 22. Wood, J. W., Brodlie, K. W., Walton, J. P. R. gViz: Visualization and computational steering for e-Science. In: Cox, S., ed.: Proceedings of the All Hands Meeting 2003, EPSRC, 164–171 (2003) ISBN: 1-904425-11-9

Modelling Gene Regulatory Networks Using Galerkin Techniques Based on State Space Aggregation and Sparse Grids Markus Heglanda , Conrad Burdenb , and Lucia Santosoa a b

Mathematical Sciences Institute, ANU and ARC Centre in Bioinformatics Mathematical Sciences Institute and John Curtin School of Medical Research, ANU

Abstract An important driver of the dynamics of gene regulatory networks is noise generated by transcription and translation processes involving genes and their products. As relatively small numbers of copies of each substrate are involved, such systems are best described by stochastic models. With these models, the stochastic master equations, one can follow the time development of the probability distributions for the states deﬁned by the vectors of copy numbers of each substance. Challenges are posed by the large discrete state spaces, and are mainly due to high dimensionality. In order to address this challenge we propose eﬀective approximation techniques, and, in particular, numerical techniques to solve the master equations. Two theoretical results show that the numerical methods are optimal. The techniques are combined with sparse grids to give an eﬀective method to solve high-dimensional problems.

1 Introduction Biological processes can involve large numbers of proteins, RNA, and genes which interact in complex patterns. Modern biology has uncovered many of the components involved and identiﬁed many basic patterns of their interactions. The large number of components does, in itself, pose a major challenge to the investigation of their interactions. Consequently, one often studies subsystems in isolation both in vitro and in silica. These subsystems are localised in time and space and may thenselves display complex behavioural patterns. In order to be able to deal with this complexity, computational tools need to be scalable in the number of components and their interactions. In the following, we will consider some new computational tools which have the potential to deal with high complexity. We will, in particular, discuss tools which are suitable to investigate transcriptional regulation processes. Transcription is the process of replicating genetic information to messenger RNA. The main machinery doing transcription is the RNA polymerase which binds at promoter sites of the DNA and polymerises messenger RNA

260

M. Hegland et al.

(mRNA) encoding the same sequence as the gene. Later, the mRNA is used as a blueprint for the synthesis of proteins by the ribosomes. The process of transcription (and translation) is relatively slow compared to other biochemical processes like polymerisation and protein-DNA interactions. This simple observation will allow approximations which then lead to substantial computational simpliﬁcations. Transcription can be both negatively and positively regulated. In an example of negative regulation, a repressor protein (or complex) binds to an operator site (on the DNA) which overlaps a promoter site belonging to a gene. This prevents the RNA polymerase from binding to the promoter site and consequently transcription of this gene is blocked. Positive regulation, on the other hand, may be achieved by proteins which are bound to operator sites close to a promoter site. These proteins may interact with the RNA polymerase to attract it to the promoter site and thus initiate transcription. Following transcription, the mRNA is translated by ribosomes to form proteins. As transcription factors consist of proteins, which themselves have been synthesised in the transcription/translation process, feedback loops emerge where the transcription factors control their own synthesis. The regulation processes discussed above thus give rise to complex regulatory networks which are the target of the tools discussed in the following. The reader interested in a more in-depth discussion and, in particular a more comprehensive coverage of the principles of regulation of transcriptional control, can consult [12, 14]. An accurate, albeit computationally infeasible model of the transcriptional regulation processes, is provided by the Schr¨ odinger equations modelling the electrons and atomic nuclei of the proteins, DNA, water molecules and many other substances involved. A much simpler, but still infeasible model is based on molecular dynamics where the state of a system is described by the locations (and conformations) of all the molecules involved. In a next stage one could model a cell by the number of molecules in each compartment of the cell. The models considered here have only one compartment and the state is modelled by the number of copies of each component. Transcriptional control processes are modelled as biochemical reactions. Chemical reactions are characterised by their stoichiometry and their kinetic properties. Consider ﬁrst the stoichiometry. A typical process is the process of two components A and B (e.g., proteins, or transcription factor/DNA operator) forming a complex: A + B A · B. In the very simpliﬁed model considered here the system (cell) is modelled by the copy numbers of substances or species A1 , . . . , As . These include proteins, RNA, DNA operator sites and complexes. The state of the system is characterised by the copies of each species. These copy numbers x1 , . . . , xs can range from one (in the case of DNA operator sites) to several hundreds in the case

Modelling Gene Regulatory Networks

261

of proteins. The copy numbers deﬁne the “state” of the system which is thus described by the vector x = (x1 , . . . , xs ) ∈ INs . Chemical reactions are then interpreted as “state transitions” We assume that the s species are involved in r reactions. Each chemical reaction is described by two stoichiometric vectors pj , qj with which one can write the reaction as s s pj,i Ai qj,i Ai , j = 1, . . . , r. (1) i=1

i=1

Note that most components pj,i , qj,i of the stoichiometric vectors pj , qj are zero and the most typical nonzero components are one, and higher numbers may occur (e.g. two in the case of dimerisation) as well. The j-th chemical reaction then gives rise to a state transition from the state x to the state x+zj where zj = qj − pj . The species considered consist of “elementary” species like proteins, RNA and DNA operator sites and of “compound” species which are formed as complexes of the elementary species. The overall number of elementary species (bound and free) are constant over time and form an invariant vector y which is linearly dependent on x: y = Bx, where the matrix B describes the compositions of the compounds. In particular, for the j-th reaction one has Bpj = Bqj , or Bzj = 0 j = 1, . . . , r. Note that the pj and qj can be regarded as the positive and negative parts of the zj . A consequence of the invariant Bx = y is also that it deﬁnes the feasible domain (2) Ω := {x | Bx = y, x ∈ INs }. In the computations we take as the computational domain the smallest rectangular grid which contains Ω. In some cases one considers open systems where the domain Ω is unbounded and some of the species are fed through an external “reservoir”. Stochastic models are described by their probability distributions p(x; t) where x ∈ Ω and where t > 0 is the time. Equivalently, one may describe the systems by random variables X(t), i.e., stochastic processes. The stochastic system is assumed to be Markovian and is characterised by the conditional probability distribution p(x2 ; t2 | x1 , t1 ) which is the probability that the system is in state x2 at time t2 if it was in state x1 at time t1 . This is basically a transition probability. It is assumed that the system is stationary, i.e., that p(x2 ; t + s | x1 , t) = p(x2 ; s | x1 , 0).

(3)

It follows that p(x2 ; t2 ) =

p(x2 ; t2 | x1 , t1 )p(x1 ; t1 ).

(4)

x1 ∈Ω

Taking the derivative with respect to s and evaluating at s = 0 one gets a system of linear diﬀerential equations:

262

M. Hegland et al.

∂p(x; t) = A(x | y)p(y; t) ∂t

(5)

y∈Ω

∂p(x; s | y, 0) . ∂s s=0 As one has x∈Ω p(x; t) = 1 it follows that x∈Ω A(x | y) = 0. In a very small time step ∆t one gets a reasonable approximation with A(x | y)p(y; 0). p(x; ∆t) = p(x; 0) + ∆t where

A(x | y) =

y∈Ω

Consider now that the system is in state x0 at time zero and so p(x0 ; 0) = 1 and p(x; 0) = 0 for x = x0 . As the probabilities cannot be negative one sees from the approximation formula that A(x | y) ≥ 0 for y = x, and, as the columns of A have to sum to zero one has A(x | x) ≤ 0. In the case of one reaction (or state transition) x → x + zj for (asymptotically) small ∆t only one reaction will occur and so one sees that A(x | u) can only have one nonzero oﬀ-diagonal element in every column. Consider now reaction j in isolation and set for this case aj (x) = −A(x | x). As the columns sum up to zero one has A(x + zj | x) = aj (x). The diﬀerential equations are then ∂p(x; t) = aj (x − zj )p(x − zj ; t) − aj (x)p(x; t). ∂t Note that by deﬁnition the propensity aj (x) ≥ 0. For the case of multiple reactions, one can then derive the following diﬀerential equations: ∂p(x; t) = aj (x − zj )p(x − zj ; t) − aj (x)p(x; t). ∂t j=1 r

(6)

These are the fundamental master equations. The propensities aj (x) depend on the numbers of particles on the left hand side of the reaction, more speciﬁcally, they depend on the pj and only the components of x which also occur in pj . Essentially, they are polynomial in x and are known up to a multiplicative constant from the law of mass action. The master equations can be written as a linear initial value problem with diﬀerential equations dp = Ap. dt

(7)

initial conditions p(·; 0) and formally, one has the solution p(·; t) = eAt p(·, 0).

(8)

The pj , qj and aj (x) fully determine the reaction. Using them and the chemical master equations one can now determine p(x; t) for diﬀerent times and a given initial condition p(x; 0).

Modelling Gene Regulatory Networks

263

Assume that we are able to determine p(x; t). How does this help us understand the gene regulatory network? With the probability distribution one may determine the probability of any subset Ω ⊂ Ω at any time. Such a subset may relate to certain disease states, development, or diﬀerentiation, cell cycle, or, in the case of the lambda phage, if the phage is in lysogeny or lysis. In the particular example of the lambda phage the levels of the proteins cro and cI (the repressor) are indicators if the phage lyses and the death of the host is immanent or is in lysogeny where the host is not threatened. The probability of the set Ω is p(Ω ; t) = p(x; t). x∈Ω

Other questions can also be answered which relate to the levels of proteins at the stationary state p∞ (x) = limt→∞ p(x; t), in particular the average protein levels which are expectations of the components xi as E(xi ) = xi p(x; t) x∈Ω

and, of course, their variances. Other questions which can be addressed relate to the marginal distributions of the components involved. As the probability p(x; t) is so useful one would like an eﬃcient way to determine this. This is hampered by the size of the feasible set Ω which depends exponentially on the dimension d. This is a manifestation of the curse of dimensionality. One way to address this curse is to use simulation [6]. Here, sample paths x(i) (t) are generated and the sample statistics are used for to estimate probabilities. Examples of regulatory networks where this approach has been used include the lac operon and the lambda phage, see [1]. The error of sampling methods is of order O(n−1/2 ) where n is the number of sample paths used and this approach amounts to a Monte Carlo method for the computation of the underlying integrals. Often one requires large numbers of simulations in order to obtain accurate estimates. The direct determination of p(x; t) would be preferable if it were feasible. Several groups [4, 10] have made eﬀorts to achieve this. In the following we will show how some earlier work by biologists can lead to one approach. In the next section we review this earlier approach. In the following main section we provide the two main convergence results of the Galerkin approach. Then we show how this can be combined with sparse grids.

2 Approximation of Fast Processes Biological processes often involve subprocesses at multiple time scales. At least two time scales have been identiﬁed for gene regulatory processes: the slow scale of protein production involving transcription and translation and

264

M. Hegland et al.

the fast scales of polymerisation and of protein/gene interactions. The scales correspond to a blocking of the the state variables into a fast and a slow part, i.e., x = (xf , xs ). Protein numbers correspond to slow variables while the state of operator switches correspond to fast variables. By deﬁnition of the conditional probability distribution p(xf |xs ; t) one has p(xf , xs ; t) = p(xf | xs ; t)p(xs ; t). We assume now that the fast processes are so fast that p(xf | xs ; t) can be replaced by its stationary limit which we denote by p(xf | xs ). If ps denotes the probability distribution over the slow spaces then one can restate the above equation as p = F ps , where the components of F are F (xf , xs | ys ) = 0 for xs = ys and F (xf , xs | xs ) = p(xf | xs ). Let the aggregation operator E be deﬁned by E(xs | yf , ys ) = 0 if ys = xs and E(ys | yf , ys ) = 1. Then one has EF = I and F is usually called the disaggregation operator. Furthermore, one has ps = Ep and so dp dps =E = EAp = EAF ps . dt dt Having determined the ps one can then compute the p. This is not the only way to do aggregation/disaggregation, for other operators based on piecewise constant approximations of smooth probability distributions, see [10]. The required conditional probabilities p(xf | xs ) can be obtained experimentally as in Shea/Ackers [15]. The discussion is based on statistical thermodynamics. Corresponding to each state x = (xf , xs ) there is an energy Ex and a “redundancy” ωx such that the (canonical ensemble) partition function is ω(xf ,xs ) exp(−E(xf ,xs ) /(kT )) Q(xs , T ) = xf

and the conditional probability follows a Boltzmann distribution p(xf |xs ) =

ω(xf ,xs ) exp(−E(xf ,xs ) /(kT )) . Q(xs , T )

Here the redundancies are known and the energies are determined from experiments. Shea and Ackers apply this method of determining the stationary distributions to three examples from the λ phage, namely, the cooperative binding occurring for the cI repressor on the operator sites OR2 and OR3, the interactions of repressors and RNA-polymerase and the balance between monomers and dimers. In the ﬁrst example, one has xf ∈ {0, 1}2 which describes the binding state of the cI2 repressor on the operator sites OR1 and OR2 and xs is the number of copies of cI2 . In this case the energies Ex only depend on xf and the redundancy factors are ω(0,0,xs ) = 1, ω(1,0,xs ) = ω(0,1,xs ) = xs and ω(1,1,xs ) = xs (xs − 1), respectively.

Modelling Gene Regulatory Networks

265

The matrix A can be decomposed into two parts as A = Af +As , where Af corresponds to the fast processes and As to the slow processes. Overall, one has A = z (Sz − I)Dz and it is assumed that the shift z either only involves variables xs or “mainly” xf . Corresponding to the decomposition into xf and xs the aggregation operator is E = eT ⊗ I 1 and for the slow processes one has ES(0,zs ) = (eT ⊗ I)(I ⊗ Szs ) = Szs ⊗ (eT ⊗ I) = Szs ⊗ E. If one introduces the “reduced” diagonal propensity matrix Dzs = ED(zf ,zs ) F then one gets EAs F = (Szs − I)ED(zf ,zs ) F. (zf ,zs )

Note that the last matrix is again a diagonal matrix so that one gets the ordinary reduced form. For the fast term one has S(zf ,zs ) = Szf ⊗ I then one gets EAf F = 0, and, in the more general case we use here the approximation EAf F = 0. For the solution of the reduced system one now requires to compute the reduced diagonal matrices ED(zf ,zs ) F . Consider the example of a simple feedback switch which is used to control upper levels of proteins. Here xf is binary and xs is an integer. Any two by two block of Dz corresponding to a ﬁxed xs is diagonal and has diagonal elements α (the rate of production) and 0 as for xf = 1 the translation and hence production is suppressed. The partition function is in this case Q(xs ) = e−E1 /kT + xs e−E2 /kT and, with ∆E = E1 − E2 (E1 > E2 are the energies of the unbound and bound states, respectively), an element of the diagonal matrix EDz F is 1/(1+ xs e∆E/kT ). One can see that the production decreases with high xs and, as the decay increases one gets an equilibrium between decay and production for a certain level of xs . In the λ phage one also has cooperative binding. In this case one has two operator sites and the same approach as above now gives a diagonal element of 1/(1 + 2 ∗ xs e∆E1 /kT + xs (xs − 1)e∆E2 /kT ). This provides a much faster switching to zero than the single operator. If there is no cooperativity, one has ∆E2 = 2∆E1 , the interesting case is where there is cooperative binding, i.e., where the energy of the case where both operator sites are bound to a transcription factor is lower than the sum of the energies, i.e., where ∆E2 > 2∆E1 . In this case the suppression for large protein numbers which is dominated by the case of both operator sites bound is even much stronger compared to the small protein number case which is dominated by the cases where only on operator is bound to a transcription factor. Using the aggregation in this case has two advantages: First it does help reduce the computational complexity substantially. Secondly, one does not need to determine the reaction rates for this case, instead, one only needs to measure the Gibbs energies ∆Ei . These macroscopic properties of an equilibrium have been determined for the λ phage, see [15]. 1 T

e = (1, . . . , 1).

266

M. Hegland et al.

3 Aggregation Errors In the following, the approximation error incurred when numerically determining the probability distribution is discussed. Rather than errors in function values which are naturally bounded by the L∞ norm errors now occur in probabilities of sets M ⊂ Ω as p(M ) − p˜(M ). An upper bound is |p(M ) − p˜(M )| ≤ |p(x) − p˜(x)| ≤ p − p˜1 x∈M

which motivates the usage of the L1 norm here. However, like in the case of the function approximation, one often resorts to Euclidean type norms for convenience and computation. Mostly we will consider the ﬁnite Ω case where the choice of the norm used is less of an issue. In addition to the norm in the space of distributions p we will also use the norm for elements q in the dual space. These norms are chosen such that one has (p, q) ≤ pq,

p ∈ X, q ∈ X ,

where in the ﬁnite Ω case X = X = IRΩ and where (p, q) = x∈Ω p(x)q(x). The norms are also such that the operator A in the master equations is bounded, i.e., a(p, q) = −(Ap, q) ≤ Cpq for some C > 0. Furthermore, it will be assumed that the norms are such that an inf-sup condition for a holds: inf sup p

q

a(p, q) ≥α pq

Ω for some α > 0. In the following consider X = IR to be the fundamental linear space, M = {p | x∈Ω p(x) = 1} the subspace of probability distributions and V = {p | x∈Ω p(x) = 0}. The stationary distribution p∞ ∈ M then satisﬁes a(p∞ , q) = 0, for all q ∈ X . More generally, consider the problem of ﬁnding a p ∈ M (not necessarily positive) from a given f ∈ V (necessary and suﬃcient condition for the existence of a solution) such that

a(p, q) = (f, q),

q∈V

(9)

where (f, q) = x∈Ω f (x)q(x). Let p0 be any element in M , then p = p−p0 ∈ V and it satisﬁes the equations a(p , q) = (f, q) − a(p0 , q),

q ∈ V.

One can evoke a variant of the Lax-Milgram lemma (see [2]) to show that this problem 9 is well-posed. The numerical aggregation-disaggregation method considered here approximates probability distributions p ∈ M by distributions ph ∈ F (M ) (where F (M ) is the image of M under disaggregation operator F ) which satisfy a(ph , q) = (f, q),

q ∈ E ∗ (V ).

(10)

The approximation is of optimal order (a variant of C´ea’s/Strang’s lemma, see, e.g. [2]):

Modelling Gene Regulatory Networks

267

Proposition 1. Let p and ph be the solutions of the equations 9 and 10, respectively, and let a(·, ·) be a bounded bilinear form satisfying the inf-sup conditions. If, in addition, the inf-sup conditions also hold on the subspaces F (V ) and E ∗ (V ), i.e., sup q∈E ∗ (V )

a(p, q) ≥ αp, v

p ∈ F (V ),

the error is bounded by p − ph 1 ≤

1+

C α

inf

qh ∈F (M )

p − qh 1 .

Proof. By the triangular inequality one has for any qh ∈ F (M ): p − ph ≤ p − qh + ph − qh . As ph − qh ∈ F (V ) the inf-sup condition provides the bound ph − qh ≤ α−1

sup r∈E ∗ (V )

a(ph − qh , r) . r

As a(ph , r) = (f, r) = a(p, r) one gets a(ph − qh , r) = a(p − qh , r) and, by the boundedness a(p − qh , r) ≤ Cp − qh r it follows that ph − qh ≤

C p − qh . α

Combining this bound with the ﬁrst inequality one gets C p − ph ≤ 1 + p − qh α for any qh ∈ F (M ) and so the inequality is valid for the inﬁmum as well. ! The question of the approximation order of aggregation itself has been discussed elsewhere, see [10]. This shall not be further considered here and depends on the operators E and F . Let the general approximation class Ah be deﬁned as Ah = {p ∈ X | inf p − q ≤ h}. q∈F (V )

The above proposition is then be restated in the short form as p − ph ≤ (1 + C/α)h,

p ∈ Ah .

When applying this to the determination of the stationary distribution p∞ one gets C p∞ − ph,∞ ≤ 1 + p∞ − F Ep∞ . α

268

M. Hegland et al.

In particular, the Galerkin approximation is of optimal error order relative to the · norm. In the following, we will make use of the operator Th : V → F (V ) deﬁned by a(Th f, q) = (f, q), for all q ∈ E ∗ (V ). If one knows an element p0 ∈ F (M ) then the solution of the Galerkin problem is ph = p0 +Th (f −Ap0 ). By the inf-sup condition, the operator Th is bounded and one has Th ≤ 1/α. Moreover, the restriction Th |F (V ) is invertible and one has by continuity of a(·, ·) that Th |−1 F (V ) ≤ C. For simplicity, we will −1 denote the inverse of the restriction by Th . The method considered here for the solution of the time-dependent master equations is a semidiscrete Galerkin method. If pt denotes the time derivative then the solution of the master equations amounts to determining p(t) ∈ M for t ≥ 0 which satisﬁes (pt (t), q) + a(p(t), q) = 0,

q ∈ V.

(11)

Under the conditions above on a the solution of these equations exists for any initial condition p(0) = p0 ∈ M . The typical theorems involving time dependent systems are based on two ingredients, the stability of the underlying diﬀerential equations (in time) and the approximation order of the scheme involved. Consider ﬁrst the stability. In the following we will often make some simplifying assumptions, more complex situations will be treated in future papers. One assumption made is that the domain Ω is ﬁnite. In this case the operator A is a matrix. Each of the components of A are of the form (S − I)D for a nonnegative diagonal matrix D and a matrix S ≥ 0 which contains at most one nonzero element (which is equal to one) per column. Using the generalised inverse D+ it follows that (S − I)DD+ = P − I for some nonnegative matrix P which has only zero and one eigenvalues. Thus the matrix I − P is a singular M -matrix (see, e.g. [11, p. 119]) and so the product (P − I)D = (I − S)D is positive semistable, i.e., it has no eigenvalues with positive real parts [11]. We will furthermore assume that A itself is also semistable such that A has only one eigenvalue 0 with eigenvector p∞ and all the other eigenvalues of strictly negative real parts. Consider now the solution of the initial value problem in V . Here the corresponding matrix A is stable and, in particular, the logarithmic norm introduced by Dahlquist [3,16], µ = lims→0 (I + sA − 1)/s is less than zero. The logarithmic norm provides an estimate of the behaviour of the norm of the solution as p(t) ≤ eµt p(0). As t increases, p(t) gets “smoother” and so the aggregation approximation should intuitively provide better approximations. Formally we model this by assuming that the smoothness class Ah is invariant over time meaning that p(0) ∈ Ah ⇒ p(t) ∈ Ah .

Modelling Gene Regulatory Networks

269

The semidiscrete Galerkin method speciﬁes a ph (t) ∈ F (M ) such that (ph,t (t), q) + a(ph (t), q) = 0,

q ∈ E ∗ (V )

(12)

where ph,t is the derivative of ph with respect to time t. For the error of the semidiscrete Galerkin method one has Proposition 2. Let p(t) be the solution of the initial value problem 11 with initial condition p(0) = p0 . Furthermore let the approximations be such that the approximation class Ah is invariant over time, and that p0 ∈ Ah . Then, let the bilinear form a(·, ·) be bounded with constant C and satisfying an infsup condition and let it be stable on V = {p | a(p, q) = 0, ∀q} with logarithmic norm µ < 0. Then e(t) ≤ C(1 + e−µt )h for some C > 0. Proof. The proof will use two operators, namely Th : V → F (V ) deﬁned earlier and Rh : M → F (M ) such that a(Rh p, q) = a(p, q),

q ∈ F (V ).

Note that Rh ≤ C/α. In the symmetric case the operator Rh is sometimes called the Ritz projection. The approximation error is e(t) = ph (t)−p(t). Consider the time derivative et . By deﬁnition one has for q ∈ F (V ): (et , q) = (ph,t − p, q) = −a(ph , q) + a(p, q) = −a(ph , q) + a(Rh p, q) = a(−ph + Rh p, q). On the other hand, Th is such that (et , q) = a(Th et , q). As the Galerkin problem is uniquely solvable on V , one has Th et = −ph +Rh p = p−ph +Rh p−p and, with r = Rh p − p one gets Th et + e = r.

(13)

Let s = ph − Rh p then e = s + r and thus e ≤ r + s. As u(t) ∈ Ah by our assumption one has r(t) ≤ h. From equation 13 on gets st + Ah s = −Ah Th rt where Ah is a mapping from Vh onto Vh such that a(p, q) = (Ah p, q) for all q, p ∈ Vh . It follows that Ah ≤ C. The diﬀerential equation for s has the solution t

s(t) = e−Ah t s(0) −

0

e−Ah (t−s) Ah Th rt

270

M. Hegland et al.

and integrating by parts one gets −Ah t

s(t) = e

& 't t −Ah (t−s) s(0) − e Ah Th r + e−Ah (t−s) A2h Th rt . 0

0

Using the bound on e−Ah t x introduced above one gets (also using that Ah and Th are bounded uniformly in h) that for some C > 0: s(t) ≤ C(1 + e−µt )h. Combining this bound with r(t) ≤ h provides the claimed bound. !

4 Sparse Grids Sparse grids provide an eﬀective means to approximate functions with many variables and their application to the master equations for gene regulatory networks has been suggested in [10]. Sparse grids can be constructed as the superposition of regular grids. Here this corresponds to constructing m aggregations with disaggregation operators F1 , . . . , Fm . The sparse grid approximation space is then given by the sum of the ranges of the disaggregation operators m R(Fi ). VSG = i=1

For the solution of the Galerkin equations for both the stationary and the time dependent problem the same general approximation results hold as above. While for ordinary aggregations one sees that the resulting approximation is still a probability distribution this is not necessarily the case here any more. Approximation with sparse grids is based on tensor product smoothness spaces. Here we will only shortly describe the method, further analysis will be provided in forthcoming papers. The operators Pi = Fi Ei form projections onto the spaces R(Fi ) such that the residuals are orthogonal to R(Ei ). In the case where the set {P1 , . . . , Pm } form a commutative semigroup, it has been shown in [7, 9] that there exist constants c1 , . . . , cm (the combination coeﬃcients) such that PCT =

m

ci Pi

i=1

is a projection onto VSG such that the residual is orthogonal onto all the spaces R(E1 ), . . . , R(Em ). Consider now the Galerkin approximations pi,∞ of the stationary distribution in the spaces R(Fi ). Using this, a sparse grid combination approximation of the stationary distribution is obtained as

Modelling Gene Regulatory Networks

pCT,∞ =

m

271

ci pi,∞ .

i=1

It is known in praxis that such approximations provide good results, the theory, however, requires a detailed knowledge of the structure of the error which goes beyond what can be done with the C´ea-type results provided above. So far, one knows that the combination technique acts as an extrapolation technique in the case of the Poisson problem [13] by cancelling lower order error terms. It is also known, however, that in some cases the combination technique may not give good results, see [5]. This is in particular the case where the operators Pi do not commute. In order to address this a method has been suggested which adapts the combination coeﬃcients ci to the data, see [8]. The method used to determine the stationary distribution can also be used to solve the instationary problem. Let pi (t) be the Galerkin approximation discussed above in the space R(Fi ). Then one obtains a sparse grid combination approximation as m pCT (t) = ci pi (t). i=1

In terms of the error analysis the error can again be decomposed into two terms, namely the best possible error in the sparse grid space and the diﬀerence between the ﬁnite element solution and the best possible error. For the second error a similar result as for the aggregation/disaggregation method provides optimality but as was suggested above, the ﬁrst type of error is still not well understood.

5 Conclusion The numerical solution of the chemical master equations can be based on aggregation/disaggregation techniques. These techniques have roots both in statistical thermodynamics and approximation theory. While this leads to substantially diﬀerent error estimates of the corresponding approximation, the Galerkin techniques based on these approximation spaces are the same for both approaches and a uniﬁed error theory for the Galerkin approach is developed above which demonstrates the optimality of the Galerkin techniques for these applications. Sparse grids are used successfully in many applications to address the curse of dimensionality and further analysis of their performance will be done in future work. In addition, we plan to consider the case of inﬁnite Ω and a further discussion of implementation and results.

6 Acknowledgments We would like to thank Mike Osborne for pointing out the importance of the logarithmic norm for stability discussions.

272

M. Hegland et al.

References 1. A. Arkin, J. Ross, and H. H. McAdams. Stochastic kinetic analysis of developmental pathway bifurcation in phage λ-infected escherichia coli. Genetics, 149:1633–1648, 1998. 2. D. Braess. Finite Elements. Cambridge, second edition, 2005. 3. G. Dahlquist. Stability and error bounds in the numerical integration of ordinary diﬀerential equations. Kungl. Tekn. H¨ ogsk. Handl. Stockholm. No., 130:87, 1959. 4. L. Ferm and P. L¨ otstedt. Numerical method for coupling the macro and meso scales in stochastic chemical kinetics. Technical Report 2006-001, Uppsala University, January 2006. 5. J. Garcke, M. Griebel, and M. Thess. Data mining with sparse grids. Computing, 67(3):225–253, 2001. 6. D. T. Gillespie. Markov Processes: an introduction for physical scientists. Academic Press, San Diego, USA, 1992. 7. M. Griebel, M. Schneider, and C. Zenger. A combination technique for the solution of sparse grid problems. In Iterative methods in linear algebra (Brussels, 1991), pages 263–281. North-Holland, Amsterdam, 1992. 8. M. Hegland. Adaptive sparse grids. ANZIAM J., 44((C)):C335–C353, 2002. 9. M. Hegland. Additive sparse grid ﬁtting. In Curve and surface ﬁtting (SaintMalo, 2002), Mod. Methods Math., pages 209–218. Nashboro Press, Brentwood, TN, 2003. 10. M. Hegland, C. Burden, L. Santoso, S. MacNamara, and H. Booth. A solver for the stochastic master equation applied to gene regulatory networks. J. Comp. Appl. Math., 205:708–724, 2007. 11. R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press, Cambridge, 1990. Corrected reprint of the 1985 original. 12. B. Lewin. Genes VIII. Pearson Prentice Hall, 2004. 13. C. Pﬂaum and A. Zhou. Error analysis of the combination technique. Numer. Math., 84(2):327–350, 1999. 14. M. Ptashne and A. Gann. Genes and Signals. Cold Spring Harbor Laboratory Press, 2002. 15. M. A. Shea and G. K. Ackers. The or control system of bacteriophage lambda, a physical-chemical model for gene regulation. Journal of Molecular Biology, 181:211–230, 1985. 16. T. Str¨ om. On logarithmic norms. SIAM J. Numer. Anal., 12(5):741–753, 1975.

A Numerical Study of Active-Set and Interior-Point Methods for Bound Constrained Optimization∗ Long Hei1 , Jorge Nocedal2 , and Richard A. Waltz2 1 2

Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston IL 60208, USA Department of Electrical Engineering and Computer Science, Northwestern University, Evanston IL 60208, USA

Abstract This papers studies the performance of several interior-point and activeset methods on bound constrained optimization problems. The numerical tests show that the sequential linear-quadratic programming (SLQP) method is robust, but is not as eﬀective as gradient projection at identifying the optimal active set. Interiorpoint methods are robust and require a small number of iterations and function evaluations to converge. An analysis of computing times reveals that it is essential to develop improved preconditioners for the conjugate gradient iterations used in SLQP and interior-point methods. The paper discusses how to eﬃciently implement incomplete Cholesky preconditioners and how to eliminate ill-conditioning caused by the barrier approach. The paper concludes with an evaluation of methods that use quasi-Newton approximations to the Hessian of the Lagrangian.

1 Introduction A variety of interior-point and active-set methods for nonlinear optimization have been developed in the last decade; see Gould et al. [12] for a recent survey. Some of these algorithms have now been implemented in high quality software and complement an already rich collection of established methods for constrained optimization. It is therefore an appropriate time to evaluate the contributions of these new algorithms in order to identify promising directions of future research. A comparison of active-set and interior-point approaches is particularly interesting given that both classes of algorithms have matured. A practical evaluation of optimization algorithms is complicated by details of implementation, heuristics and algorithmic options. It is also diﬃcult ∗

This work was supported by National Science Foundation grant CCR-0219438, Department of Energy grant DE-FG02-87ER25047-A004 and a grant from the Intel Corporation.

274

L. Hei et al.

to select a good test set because various problem characteristics, such as nonconvexity, degeneracy and ill-conditioning, aﬀect algorithms in diﬀerent ways. To simplify our task, we focus on large-scale bound constrained problems of the form minimize x

subject to

f (x)

(1a)

l ≤ x ≤ u,

(1b)

where f : Rn → R is a smooth function and l ≤ u are both vectors in Rn . The simple geometry of the feasible region (1b) eliminates the diﬃculties caused by degenerate constraints and allows us to focus on other challenges, such as the eﬀects of ill-conditioning. Furthermore, the availability of specialized (and very eﬃcient) gradient projection algorithms for bound constrained problems places great demands on the general-purpose methods studied in this paper. The gradient projection method can quickly generate a good working set and then perform subspace minimization on a smaller dimensional subspace. Interior-point methods, on the other hand, never eliminate inequalities and work on an n-dimensional space, putting them at a disadvantage (in this respect) when solving bound constrained problems. We chose four active-set methods that are representative of the best methods currently available: (1) The sequential quadratic programming (SQP) method implemented in snopt [10]; (2) The sequential linear-quadratic programming (SLQP) method implemented in knitro/active [2]; (3) The gradient projection method implemented in tron [15]; (4) The gradient projection method implemented in l-bfgs-b [4, 19]. SQP and gradient projection methods have been studied extensively since the 1980s, while SLQP methods have emerged in the last few years. These three methods are quite diﬀerent in nature. The SLQP and gradient projection methods follow a so-called EQP approach in which the active-set identiﬁcation and optimization computations are performed in two separate stages. In the SLQP method a linear program is used in the active-set identiﬁcation phase, while the gradient projection performs a piecewise linear search along the gradient projection path. In contrast, SQP methods follow an IQP approach in which the new iterate and the new estimate of the active set are computed simultaneously by solving an inequality constrained subproblem. We selected two interior-point methods, both of which are implemented in the knitro software package [5]: (5) The primal-dual method in knitro/direct [18] that (typically) computes steps by performing a factorization of the primal-dual system; (6) The trust region method in knitro/cg [3] that employs iterative linear algebra techniques in the step computation.

On Bound Constrained Optimization

275

The algorithm implemented in knitro/direct is representative of various line search primal-dual interior-point methods developed since the mid 1990s (see [12]), whereas the algorithm in knitro/cg follows a trust region approach that is signiﬁcantly diﬀerent from most interior-point methods proposed in the literature. We have chosen the two interior-point methods available in the knitro package, as opposed to other interior-point codes, to minimize the eﬀect of implementation details. In this way, the same type of stop tests and scalings are used in the two interior-point methods and in the SLQP method used in our tests. The algorithms implemented in (2), (3) and (6) use a form of the conjugate gradient method in the step computation. We study these iterative approaches, giving particular attention to their performance in interior-point methods where preconditioning is more challenging [1, 8, 13]. Indeed, whereas in active-set methods ill-conditioning is caused only by the objective function and constraints, in interior-point methods there is an additional source of ill-conditioning caused by the barrier approach. The paper is organized as follows. In Section 2, we describe numerical tests with algorithms that use exact Hessian information. The observations made from these results set the stage for the rest of the paper. In Section 3 we describe the projected conjugate gradient method that plays a central role in several of the methods studied in our experiments. A brief discussion on preconditioning for the SLQP method is given in Section 4. Preconditioning in the context of interior-point methods is the subject of Section 5. In Section 6 we study the performance of algorithms that use quasi-Newton Hessian approximations.

2 Some Comparative Tests In this section we report test results for four algorithms, all using exact second derivative information. The algorithms are: tron (version 1.2), knitro/direct, knitro/cg and knitro/active (versions 5.0). The latter three were not specialized in any way to the bound constrained case. In fact, we know of no such specialization for interior-point methods, although advantage can be taken at the linear algebra level, as we discuss below. A modiﬁcation of the SLQP approach that may prove to be eﬀective for bound constraints is investigated by Byrd and Waltz [6], but was not used here. We do not include snopt in these tests because this algorithm works more eﬀectively with quasi-Newton Hessian approximations, which are studied in Section 6. Similarly, l-bfgs-b is a limited memory quasi-Newton method and will also be discussed in that section. All the test problems were taken from the CUTEr collection [11] using versions of the models formulated in Ampl [9]. We chose all the bound constrained CUTEr problems available as Ampl models for which the sizes could be made large enough for our purposes, while excluding some of the repeated models (e.g., we only used torsion1 and torsiona from the group of torsion models).

Table 1. Comparative results of four methods that use exact second derivative information

tron knitro/direct knitro/cg knitro/active problem n iter feval CPU actv@sol aveCG iter feval CPU iter feval CPU aveCG endCG/n iter feval CPU aveCG biggsb1 20000 X1 X1 X1 X1 X1 12 13 2.61 12 13 245.48 942.50 0.1046 X1 X1 X1 X1 bqpgauss 2003 206 206 6.00 95 6.93 20 21 0.88 42 43 183.85 3261.14 2.0005 232 234 65.21 1020.46 chenhark 20000 72 72 4.30 19659 1.00 18 19 2.57 20 21 1187.49 4837.60 0.7852 847 848 1511.60 1148.74 clnlbeam 20000 6 6 0.50 9999 0.83 11 12 2.20 12 13 2.60 3.67 0.0001 3 4 0.41 1.00 cvxbqp1 20000 2 2 0.11 20000 0.00 9 10 51.08 9 10 3.60 6.33 0.0003 1 2 0.18 0.00 explin 24000 8 8 0.13 23995 0.88 24 25 6.79 26 27 16.93 16.46 0.0006 13 14 1.45 3.08 explin2 24000 6 6 0.10 23997 0.83 26 27 6.39 25 26 16.34 16.72 0.0005 12 13 1.26 2.17 expquad 24000 X2 X2 X2 X2 X2 X4 X4 X4 X4 X4 X4 X4 X4 183 663 56.87 1.42 gridgena 26312 16 16 14.00 0 1.75 8 23 17.34 7 8 43.88 160.86 0.0074 6 8 9.37 77.71 harkerp2 2000 X3 X3 X3 X3 X3 15 16 484.48 27 28 470.76 12.07 0.0010 7 8 119.70 0.86 jnlbrng1 21904 30 30 6.80 7080 1.33 15 16 6.80 18 19 163.62 632.33 0.1373 39 40 27.71 92.23 jnlbrnga 21904 30 30 6.60 7450 1.37 14 16 6.31 18 19 184.75 708.67 0.1608 35 36 30.05 122.03 mccormck 100000 6 7 2.60 1 1.00 9 10 11.60 12 13 20.89 4.17 0.0001 X5 X5 X5 X5 minsurfo 10000 10 10 2.10 2704 3.00 367 1313 139.76 X1 X1 X1 X1 X1 8 10 4.32 162.33 ncvxbqp1 20000 2 2 0.11 20000 0.00 35 36 131.32 32 33 10.32 4.63 0.0006 3 4 0.36 0.67 ncvxbqp2 20000 8 8 0.50 19869 1.13 75 76 376.01 73 74 58.65 26.68 0.0195 30 39 5.90 4.26 nobndtor 32400 34 34 10.00 5148 2.85 15 16 8.66 13 14 6817.52 24536.62 2.0000 66 67 78.85 107.42 nonscomp 20000 8 8 0.82 0 0.88 21 23 5.07 129 182 81.64 12.60 0.0003 10 11 1.37 4.20 obstclae 21904 31 31 4.90 10598 1.84 17 18 7.66 17 18 351.83 846.00 0.3488 93 116 40.36 37.01 obstclbm 21904 25 25 4.20 5262 1.64 12 13 5.52 11 12 562.34 2111.64 0.1819 43 50 16.91 39.08 pentdi 20000 2 2 0.17 19998 0.50 12 13 2.24 14 15 3.40 5.36 0.0005 1 2 0.21 1.00 probpenl 5000 2 2 550.00 1 0.50 3 4 733.86 3 4 6.41 1.00 0.0002 1 2 2.79 1.00 qrtquad 5000 28 58 1.60 5 2.18 39 63 1.56 X5 X5 X5 X5 X5 783 2403 48.44 2.02 qudlin 20000 2 2 0.02 20000 0.00 17 18 2.95 24 25 12.74 16.08 0.0004 3 4 0.20 0.67 reading1 20001 8 8 0.78 20001 0.88 16 17 6.11 14 15 5.64 7.21 0.0001 3 4 0.44 0.33 scond1ls 2000 592 1748 18.00 0 2.96 1276 4933 754.57 1972 2849 10658.26 2928.17 0.3405 X1 X1 X1 X1 sineali 20000 11 15 1.30 0 1.27 9 12 3.48 18 61 13.06 4.57 0.0001 34 112 8.12 1.58 torsion1 32400 59 59 14.00 9824 1.86 11 12 8.13 7 8 359.78 1367.14 0.2273 65 66 57.53 64.65 torsiona 32400 59 59 16.00 9632 1.88 10 11 7.62 6 7 80.17 348.33 0.0279 62 63 62.43 74.06 X1 : iteration limit reached X2 : numerical result out of range X3 : solver did not terminate X4 : current solution estimate cannot be improved X5 : relative change in solution estimate < 10−15

276 L. Hei et al.

On Bound Constrained Optimization

277

The results are summarized in Table 1, which reports the number of variables for each problem, as well as the number of iterations, function evaluations and computing time for each solver. For tron we also report the number of active bounds at the solution; for those solvers that use a conjugate gradient (CG) iteration, we report the average number of CG iterations per outer iteration. In addition, for knitro/cg we report the number of CG iterations performed in the last iteration of the optimization algorithm divided by the number of variables (endCG/n). We use a limit of 10000 iterations for all solvers. Unless otherwise noted, default settings were used for all solvers, including default stopping tests and tolerances which appeared to provide comparable solution accuracy in practice. We also provide in Figures 1 and 2 performance proﬁles based, respectively, on the number of function evaluations and computing time. All ﬁgures plot the logarithmic performance proﬁles described in [7]. We now comment on these results. In terms of robustness, there appears to be no signiﬁcant diﬀerence between the four algorithms tested, although knitro/direct is slightly more reliable. Function Evaluations. In terms of function evaluations (or iterations), we observe some signiﬁcant diﬀerences between the algorithms. knitro/active requires more iterations overall than the other three methods; if we compare it with tron—the other active-set method—we note that tron is almost uniformly superior. This suggests that the SLQP approach implemented in knitro/active is less eﬀective than gradient projection at identifying the optimal active set. We discuss this issue in more detail below. As expected, the interior-point methods typically perform between 10 and 30 iterations to reach convergence. Since the geometry of bound constraints is simple, only nonlinearity and nonconvexity in the objective function cause interior-point methods to perform a large number of iterations. It is not surprising that knitro/cg requires more iterations than knitro/direct given that it uses an inexact iterative approach in the step computation. Figure 1 indicates that the gradient projection method is only slightly more eﬃcient than interior-point methods, in terms of function evaluations. As in any active-set method, tron sometimes converges in a very small number of iterations (e.g. 2), but on other problems it requires signiﬁcantly more iterations than the interior-point algorithms. CPU Time. It is clear from Table 1 that knitro/cg requires the largest amount of computing time among all the solvers. This test set contains a signiﬁcant number of problems with ill-conditioned Hessians, ∇2 f (x), and the step computation of knitro/cg is dominated by the large number of CG steps performed. tron reports the lowest computing times; the average number of CG iterations per step is rarely greater than 2. This method uses

278

L. Hei et al. Number of function evaluations 1

Percentage of problems

0.8

0.6

0.4

0.2 TRON KNITRO-DIRECT KNITRO-CG KNITRO-ACTIVE 0 1

4

16

64

256

1024

4096

16384

x times slower than the best

Figure 1. Number of Function Evaluations CPU time 1

Percentage of problems

0.8

0.6

0.4

0.2 TRON KNITRO-DIRECT KNITRO-CG KNITRO-ACTIVE 0 1

4

16

64 256 x times slower than the best

Figure 2. CPU Time

1024

4096

16384

On Bound Constrained Optimization

279

an incomplete Cholesky preconditioner [14], whose eﬀectiveness is crucial to the success of tron. The high number of CG iterations in knitro/cg is easily explained by the fact that it does not employ a preconditioner to remove ill-conditioning caused by the Hessian of the objective function. What is not so simple to explain is the higher number of CG iteration in knitro/cg compared to knitro/active. Both methods use an unpreconditioned projected CG method in the step computation (see Section 3), and therefore one would expect that both methods would suﬀer equally from ill-conditioning. Table 1 indicates that this is not the case. In addition, we note that the average cost of the CG iteration is higher in knitro/cg than in knitro/active. One possible reason for this diﬀerence is that the SLQP method applies CG to a smaller problem than the interior-point algorithm. The eﬀective number of variables in the knitro/active CG iteration is n − tk , where tk is the number of constraints in the working set at the kth iteration. On the other hand, the interior-point approach applies the CG iteration in n-dimensional space. This, however, accounts only partly for the diﬀerences in performance. For example, we examined some runs in which tk is about n/3 to n/2 during the run of knitro/active and noticed that the diﬀerences in CG iterations between knitro/cg and knitro/active are signiﬁcantly greater than 2 or 3 toward the end of the run. As we discuss in Section 5, it is the combination of barrier and Hessian ill-conditioning that can be very detrimental to the interior-point method implemented in knitro/cg. Active-set identiﬁcation. The results in Table 1 suggest that the SLQP approach will not be competitive with gradient projection on bound constrained problems, unless the SLQP method can be redesigned so as to require fewer outer iterations. In other words, it needs to improve its active-set identiﬁcation mechanism. As already noted, the SLQP method in knitro/active computes the step in two phases. In the linear programming phase, an estimate of the optimal active set is computed. This linear program takes a simple form in the bound constrained case, and can be solved very quickly. Most of the computing eﬀort goes in the EQP phase, which solves an equality constrained quadratic program where the constraints in the working set are imposed as equalities (i.e., ﬁxed variables in this case) and all other constraints are ignored. This subproblem is solved using a projected CG iteration. Assuming that the cost of this CG phase is comparable in tron and knitro/active (we can use the same preconditioners in the two methods), the SLQP method needs to perform a similar number of outer iterations to be competitive. Comparing the detailed results of tron versus knitro/active highlights two features that provide tron with superior active-set identiﬁcation properties. First, the active set determined by SLQP is given by the solution of one LP (whose solution is constrained by an inﬁnity norm trust-region), whereas the gradient projection method, minimizes a quadratic model along the gradient projection path to determine an active-set estimate. Because it explores a

280

L. Hei et al.

whole path as opposed to a single point, this often results in a better active-set estimate for gradient projection. An enhancement to SLQP proposed in [6] mimics what is done in gradient projection by solving a parameteric LP (parameterized by the trust-region radius) rather than a single LP to determine an active set with improved results. Second, the gradient projection implementation in tron has a feature which allows it to add bounds to the active set during the unconstrained minimization phase, if inactive bounds are encountered. On some problems this signiﬁcantly decreases the number of iterations required to identify the optimal active set. In the bound constrained case, it is easy to do something similar for SLQP. In [6], this feature was added to an SLQP algorithm and shown to improve performance on bound constrained problems. The combination of these two features may result in an SLQP method that is competitive with tron. However, more research is needed to determine if this goal can be achieved. In the following section we give attention to the issue of preconditioning. Although, in this paper, we are interested in preconditioners applied to bound constrained problems, we will ﬁrst present our preconditioning approach in the more general context of constrained optimization where it is also applicable.

3 The Projected Conjugate Gradient Method Both knitro/cg and knitro/active use a projected CG iteration in the step computation. To understand the challenges of preconditioning this iteration, we now describe it in some detail. The projected CG iteration is a method for solving equality constrained quadratic programs of the form 1 T x Gx + hT x 2 subject to Ax = b, minimize x

(2a) (2b)

where G is an n × n symmetric matrix that is positive deﬁnite on the null space of the m × n matrix A, and h is an n-vector. Problem (2) can be solved by eliminating the constraints (2b), applying the conjugate gradient method to the reduced problem of dimension (n − m), and expressing this solution process in n-dimensional space. This procedure is speciﬁed in the following algorithm. We denote the preconditioning operator by P ; its precise deﬁnition is given below. Algorithm PCG. Preconditioned Projected CG Method. Choose an initial point x0 satisfying Ax0 = b. Set x ← x0 , compute r = Gx+h, z = P r and p = −z. Repeat the following steps, until z is smaller than a given tolerance:

On Bound Constrained Optimization

281

α = rT z/pT Gp x ← x + αp r+ = r + αGp z + = P r+ β = (r+ )T z + /rT z p ← −z + + βp z ← z + and r ← r+ End The preconditioning operation is deﬁned indirectly, as follows. Given a vector r, we compute z = P r as the solution of the system

z r D AT = , (3) A 0 w 0 where D is a symmetric matrix that is required to be positive deﬁnite on the null space of A, and w is an auxiliary vector. A preconditioner of the form (3) is often called a constraint preconditioner. To accelerate the convergence of Algorithm PCG, the matrix D should approximate G in the null space of A and should be sparse so that solving (3) is not too costly. It is easy to verify that since initially Ax0 = b, all subsequent iterates x of Algorithm PCG also satisfy the linear constraints (2b). The choice D = I gives an unpreconditioned projected CG iteration. To improve the performance of Algorithm PCG, we consider some other choices for D (One option is to let D be a diagonal matrix; see e.g. [1, 16]). Another option is to deﬁne D by means of an incomplete Cholesky factorization of G, but the challenge is how to implement it eﬀectively in the setting of constrained optimization. An implementation that computes the incomplete factors L and LT of G, multiplies them to give D = LLT , and then factors the system (3), is of little interest; one might as well use the perfect preconditioner D = G. However, for special classes of problems, such as bound constrained optimization, it is possible to rearrange the computations and compute the incomplete Cholesky factorization on a reduced system, as discussed in the next sections. We note that the knitro/cg and knitro/active algorithms actually solve quadratic programs of the form (2) subject to a trust region constraint x ≤ ∆; in addition, G may not always be positive deﬁnite on the null space of A. To deal with these two requirements, Algorithm PCG can be adapted by following Steihaug’s approach: we terminate the iteration if the trust region is crossed or if negative curvature is encountered [17]. In this paper, we will ignore these additional features and consider preconditioning in the simpler context of Algorithm PCG.

282

L. Hei et al.

4 Preconditioning the SLQP Method In the SLQP method implemented in knitro/active, the equality constraints (2b) are deﬁned as the linearization of the problem constraints belonging to the working set. We have already mentioned that this working set is obtained by solving an auxiliary linear program. In the experiments reported in Table 1, we used D = I in (3), i.e. the projected CG iteration in knitro/active was not preconditioned. This explains the high number of CG iterations and computing time for many of the problems. Let us therefore consider other choices for D. Diagonal preconditioners are straightforward to implement, but are often not very eﬀective. A more attractive option is incomplete Cholesky preconditioning, which can be implemented as follows. Suppose for the moment that we use the perfect preconditioner D = G in (3). Since z satisﬁes Az = 0, we can write z = Zu, where Z is a basis matrix such that AZ = 0 and u is some vector of dimension (n − m). Multiplying the ﬁrst block of equations in (3) by Z T and recalling the condition Az = 0 we have that (4) Z T GZu = Z T r. We now compute the incomplete Cholesky factorization of the reduced Hessian, LLT ≈ Z T GZ,

(5)

LLT u ˆ = Z T r,

(6)

solve the system and set z = Z u ˆ. This deﬁnes the preconditioning step. Since for nonconvex problems Z T GZ may not be positive deﬁnite, we can apply a modiﬁed incomplete Cholesky factorization of the form LLT ≈ Z T (G + δI)Z, for some positive scalar δ; see [14]. For bound constrained problems, the linear constraints (2b) are deﬁned to be the bounds in the working set. Therefore the columns of Z are unit vectors and the reduced Hessian Z T GZ is obtained by selecting appropriate rows and columns from G. This preconditioning strategy is therefore practical and eﬃcient since the matrix Z need not be computed and the reduced Hessian Z T GZ is easy to form. In fact, this procedure is essentially the same as that used in tron. The gradient projection method selects a working set (a set of active bounds) by using a gradient projection search, and computes a step by solving a quadratic program of the form (2). To solve this quadratic program, the gradient projection method in tron eliminates the constraints and applies a preconditioned CG method to the reduced problem minimize u

uT Z T GZu + hT Zu.

On Bound Constrained Optimization

283

The preconditioner is deﬁned by the incomplete Cholesky factorization (5). Thus the only diﬀerence between the CG iterations in tron and the preconditioned projected CG method based on Algorithm PCG is that the latter works in Rn while the former works in Rn−m . (It is easy to see that the two approaches are equivalent and that the computational costs are very similar.) Numerical tests of knitro/active using the incomplete Cholesky preconditioner just described will be reported in a forthcoming publication. In the rest of the paper, we focus on interior-point methods and report results using various preconditioning approaches.

5 Preconditioning the Interior-Point Method The interior-point methods implemented in knitro solve a sequence of barrier problems of the form log si (7a) minimize f (x) − µ x,s

i∈I

subject to cE (x) = 0 cI (x) − s = 0,

(7b) (7c)

where s is a vector of slack variables, µ > 0 is the barrier parameter, and cE (x), cI (x) denote the equality and inequality constraints, respectively. knitro/cg ﬁnds an approximate solution of (7) using a form of sequential quadratic programming. This leads to an equality constrained subproblem of the form (2), in which the Hessian and Jacobian matrices are given by

2 AE 0 ∇xx L 0 , (8) , A= G= 0 Σ AI −I where L(x, λ) is the Lagrangian of the nonlinear program, Σ is a diagonal matrix and AE and AI denote the Jacobian matrices corresponding to the equality and inequality constraints, respectively. (In the bound constrained case, AE does not exist and AI is a simple sparse matrix whose rows are unit vectors.) The matrix Σ is deﬁned as Σ = S −1 ΛI , where S = diag {si } , ΛI = diag {λi } , i ∈ I, and where the si are slack variables and λi , i ∈ I are Lagrange multipliers corresponding to the inequality constraints. Hence there are two separate sources of ill-conditioning in G; one caused by the Hessian ∇2xx L and the other by the barrier eﬀects reﬂected in Σ. Any ill-conditioning due to A is removed by the projected CG approach. Given the block structure (8), the preconditioning operation (3) takes the form

284

L. Hei et al.

⎡

Dx ⎢ 0 ⎢ ⎣ AE AI

⎤⎡ ⎤ ⎡ ⎤ r1 zx 0 AE T AI T ⎢ zs ⎥ ⎢ r2 ⎥ Ds 0 −I ⎥ ⎥⎢ ⎥ = ⎢ ⎥. 0 0 0 ⎦ ⎣ w1 ⎦ ⎣ 0 ⎦ w2 0 −I 0 0

(9)

The matrix Ds will always be chosen as a diagonal matrix, given that Σ is diagonal. In the experiments reported in Table 1, knitro/cg was implemented with Dx = I and Ds = S −2 . This means that the algorithm does not include preconditioning for the Hessian ∇2xx L, and applies a form of preconditioning for the barrier term Σ (as we discuss below). The high computing times of knitro/cg in Table 1 indicate that this preconditioning strategy is not effective for many problems, and therefore we discuss how to precondition each of the two terms in G. 5.1 Hessian Preconditioning Possible preconditioners for the Hessian ∇2xx L include diagonal preconditioning and incomplete Cholesky. Diagonal preconditioners are simple to implement; we report results for them in the next section. To design an incomplete Cholesky preconditioner, we exploit the special structure of (9). Performing block elimination on (9) yields the condensed system

zx r1 + AI T r2 Dx + AI T Ds AI AE T = ; (10) w1 AE 0 0 the eliminated variables zs , w2 are recovered from the relation zs = AI zx , w2 = Ds zs − r2 . If we deﬁne Dx = LLT , where L is the incomplete Cholesky factor of ∇2 L, we still have to face the problem of how to factor (10) eﬃciently. However, for problems without equality constraints, such as bound constrained problems, (10) reduces to (Dx + AI T Ds AI )zx = r1 + AI T r2 .

(11)

Let us assume that the diagonal preconditioning matrix Ds is given. For bound constrained problems, AI T Ds AI can be expressed as the sum of two diagonal matrices. Hence, the coeﬃcient matrix in (11) is easy to form. Setting Dx = ∇2xx L, we compute the (possibly modiﬁed) incomplete Cholesky factorization LLT ≈ ∇2xx L + AI T Ds AI .

(12)

The preconditioning step is then obtained by solving LLT zx = r1 + AI T r2 and by deﬁning zs = AI zx .

(13)

On Bound Constrained Optimization

285

One advantage of this approach is apparent from the structure of the matrix in the right hand side of (12). Since we are adding a positive diagonal matrix to ∇2xx L, it is less likely that a modiﬁcation of the form δI must be introduced in the course of the incomplete Cholesky factorization. Minimizing the use of the modiﬁcation δI is desirable because it can introduce undesirable distortions in the Hessian information. We note that the incomplete factorization (12) is also practical for problems that contain general inequality constraints, provided the term AI T Ds AI is not costly to form and does not lead to severe ﬁll-in. 5.2 Barrier Preconditioning It is well known that the matrix Σ = S −1 ΛI becomes increasingly illconditioned as the iterates of the optimization algorithm approach the solution. Some diagonal elements of Σ diverge while others converge to zero. Since Σ is a diagonal matrix, it can always be preconditioned adequately using a diagonal matrix. We consider two preconditioners: Ds = Σ

and

Ds = µS −2 .

The ﬁrst is the natural choice corresponding to the perfect preconditioner for the barrier term, while the second choice is justiﬁed because near the central path, ΛI ≈ µS −1 , so Σ = S −1 ΛI ≈ S −1 (µS −1 ) = µS −2 . 5.3 Numerical Results We test the preconditioners discussed above using a MATLAB implementation of the algorithm in knitro/cg. Our MATLAB program does not contain all the features of knitro/cg, but is suﬃciently robust and eﬃcient to study the eﬀectiveness of various preconditioners. The results are given by Table 2, which reports the preconditioning option (option), the ﬁnal objective function value, the number of iterations of the interior-point algorithm, the total number of CG iterations, the average number of CG iterations per interior-point iteration, and the CPU time. The preconditioning options are labeled as: option = (a, b) where a denotes the Hessian preconditioner and b the barrier preconditioner. The options are: a = 0: a = 1: a = 2: b = 0: b = 1: b = 2:

No Hessian preconditioning (current default in knitro) Diagonal Hessian preconditioning Incomplete Cholesky preconditioning Ds = S −2 (current default in knitro) Ds = µS −2 Ds = Σ.

Since our MATLAB code is not optimized for speed, we have chosen test problems with a relatively small number of variables.

286

L. Hei et al.

problem option ﬁnal objective #iteration #total CG biggsb1 (0,0) +1.5015971301e − 02 31 3962 (n = 100) (0,1) +1.5015971301e − 02 29 2324 (0,2) +1.5015971301e − 02 28 2232 (1,0) +1.5015971301e − 02 30 3694 (1,1) +1.5015971301e − 02 30 2313 (1,2) +1.5015971301e − 02 30 2241 (2,0) +1.5015971301e − 02 31 44 (2,1) +1.5015971301e − 02 29 42 (2,2) +1.5015971301e − 02 28 41 cvxbqp1 (0,0) +9.0450040000e + 02 11 91 (n = 200) (0,1) +9.0453998374e + 02 8 112 (0,2) +9.0450040000e + 02 53 54 (1,0) +9.0454000245e + 02 30 52 (1,1) +9.0450040000e + 02 30 50 (1,2) +9.0454001402e + 02 47 48 (2,0) +9.0450040000e + 02 11 18 (2,1) +9.0454000696e + 02 8 15 (2,2) +9.0450040000e + 02 53 53 jnlbrng1 (0,0) −1.7984674056e − 01 29 5239 (n = 324) (0,1) −1.7984674056e − 01 27 885 (0,2) −1.7984674056e − 01 29 908 (1,0) −1.7984674056e − 01 29 5082 (1,1) −1.7984674056e − 01 27 753 (1,2) −1.7988019171e − 01 26 677 (2,0) −1.7984674056e − 01 30 71 (2,1) −1.7984674056e − 01 27 59 (2,2) −1.7984674056e − 01 29 66 obstclbm (0,0) +5.9472925926e + 00 28 7900 (n = 225) (0,1) +5.9473012340e + 00 18 289 (0,2) +5.9472925926e + 00 31 335 (1,0) +5.9472925926e + 00 27 6477 (1,1) +5.9472925926e + 00 29 380 (1,2) +5.9473012340e + 00 18 197 (2,0) +5.9472925926e + 00 27 49 (2,1) +5.9473012340e + 00 17 32 (2,2) +5.9472925926e + 00 25 49 pentdi (0,0) −7.4969998494e − 01 27 260 (n = 250) (0,1) −7.4969998502e − 01 25 200 (0,2) −7.4969998500e − 01 28 205 (1,0) −7.4969998494e − 01 28 256 (1,1) −7.4992499804e − 01 23 153 (1,2) −7.4969998502e − 01 26 132 (2,0) −7.4969998494e − 01 27 41 (2,1) −7.4969998502e − 01 25 39 (2,2) −7.4969998500e − 01 28 42 torsion1 (0,0) −4.8254023392e − 01 26 993 (n = 100) (0,1) −4.8254023392e − 01 25 298 (0,2) −4.8254023392e − 01 24 274 (1,0) −4.8254023392e − 01 26 989 (1,1) −4.8254023392e − 01 25 274 (1,2) −4.8254023392e − 01 25 250 (2,0) −4.8254023392e − 01 25 52 (2,1) −4.8254023392e − 01 25 53 (2,2) −4.8254023392e − 01 24 51 torsionb (0,0) −4.0993481087e − 01 25 1158 (n = 100) (0,1) −4.0993481087e − 01 25 303 (0,2) −4.0993481087e − 01 23 282 (1,0) −4.0993481087e − 01 25 1143 (1,1) −4.0993481087e − 01 24 274 (1,2) −4.0993481087e − 01 23 246 (2,0) −4.0993481087e − 01 24 49 (2,1) −4.0993481087e − 01 24 49 (2,2) −4.0993481087e − 01 23 48

#average CG 1.278e + 02 8.014e + 01 7.971e + 01 1.231e + 02 7.710e + 01 7.470e + 01 1.419e + 00 1.448e + 00 1.464e + 00 8.273e + 00 1.400e + 01 1.019e + 00 1.733e + 00 1.667e + 00 1.021e + 00 1.636e + 00 1.875e + 00 1.000e + 00 1.807e + 02 3.278e + 01 3.131e + 01 1.752e + 02 2.789e + 01 2.604e + 01 2.367e + 00 2.185e + 00 2.276e + 00 2.821e + 02 1.606e + 01 1.081e + 01 2.399e + 02 1.310e + 01 1.094e + 01 1.815e + 00 1.882e + 00 1.960e + 00 9.630e + 00 8.000e + 00 7.321e + 00 9.143e + 00 6.652e + 00 5.077e + 00 1.519e + 00 1.560e + 00 1.500e + 00 3.819e + 01 1.192e + 01 1.142e + 01 3.804e + 01 1.096e + 01 1.000e + 01 2.080e + 00 2.120e + 00 2.125e + 00 4.632e + 01 1.212e + 01 1.226e + 01 4.572e + 01 1.142e + 01 1.070e + 01 2.042e + 00 2.042e + 00 2.087e + 00

Table 2. Results of various preconditioning options

time 3.226e + 01 1.967e + 01 1.880e + 01 3.086e + 01 2.010e + 01 2.200e + 01 1.950e + 00 1.870e + 00 1.810e + 00 4.420e + 00 4.220e + 00 1.144e + 01 9.290e + 00 9.550e + 00 1.527e + 01 2.510e + 00 1.940e + 00 1.070e + 01 8.671e + 01 1.990e + 01 2.064e + 01 9.763e + 01 3.387e + 01 2.917e + 01 6.930e + 00 6.390e + 00 6.880e + 00 1.919e + 02 1.268e + 01 1.618e + 01 1.620e + 02 2.246e + 01 1.192e + 01 7.180e + 00 4.820e + 00 6.650e + 00 6.490e + 00 5.920e + 00 5.960e + 00 1.111e + 01 9.640e + 00 9.370e + 00 3.620e + 00 3.350e + 00 3.640e + 00 9.520e + 00 4.130e + 00 3.820e + 00 9.760e + 00 4.520e + 00 3.910e + 00 1.760e + 00 1.800e + 00 1.660e + 00 1.079e + 01 4.160e + 00 3.930e + 00 1.089e + 01 4.450e + 00 3.700e + 00 1.720e + 00 1.700e + 00 1.630e + 00

On Bound Constrained Optimization

287

Note that for all the test problems, except cvxbqp1, the number of interiorpoint iterations is not greatly aﬀected by the choice of preconditioner. Therefore, we can use Table 2 to measure the eﬃciency of the preconditioners, but we must exercise caution when interpreting the results for problem cvxbqp1. Let us consider ﬁrst the case when only barrier preconditioning is used, i.e., where option has the form (0, ∗). As expected, the options (0, 1) and (0, 2) generally decrease the number of CG iterations and computing time with respect to the standard option (0, 0), and can therefore be considered successful in this context. From these experiments it is not clear whether option (0, 1) is to be preferred over option (0, 2). Incomplete Cholesky preconditioning is very successful. If we compare the results for options (0,0) and (2,0), we see substantial reductions in the number of CG iterations and computing time for the latter option. When we add barrier preconditioning to incomplete Cholesky preconditioning (options (2, 1) and (2, 2)) we do not see further gains. Therefore, we speculate that the standard barrier preconditioner Ds = S −2 may be adequate, provided the Hessian preconditioner is eﬀective. Diagonal Hessian preconditioning, i.e, options of the form (1, ∗), rarely provides much beneﬁt. Clearly this preconditioner is of limited use. One might expect that preconditioning would not aﬀect much the number of iterations of the interior-point method because it is simply a mechanism for accelerating the step computation procedure. The results for problem cvxbqp1 suggest that this is not the case (we have seen a similar behavior on other problems). In fact, preconditioning changes the form of the algorithm in two ways: it changes the shape of the trust region and it aﬀects the barrier stop test. We introduce preconditioning in knitro/cg by deﬁning the trust region as , - 1/2 Dx dx ≤ ∆. Ds1/2 ds 2

The standard barrier preconditioner Ds = S −2 gives rise to the trust-region Dx1/2 dx (14) ≤ ∆, S −1 ds 2 which has proved to control well the rate at which the slacks approach zero. (This is the standard aﬃne scaling strategy used in many optimization methods.) On the other hand, the barrier preconditioner Ds = µS −2 results in the trust region Dx1/2 dx ≤ ∆. √ (15) µS −1 ds 2

When µ is small, (15) does not penalize a step approaching the bounds s ≥ 0 as severely as (14). This allows the interior-point method to approach the boundary of the feasible region prematurely and can lead to very small steps.

288

L. Hei et al.

An examination of the results for problem cvxbqp1 shows that this is indeed the case. The preconditioner Ds = Σ = S −1 ΛI can be ineﬀective for a diﬀerent reason. When the multiplier estimates λi are inaccurate (too large or too small) the trust region will not properly control the step ds . These remarks reinforce our view that the standard barrier preconditioner Ds = S −2 may be the best choice and that our eﬀort should focus on Hessian preconditioning. Let us consider the second way in which preconditioning changes the interior-point algorithm. Preconditioning amounts to a scaling of the variables of the problem; this scaling alters the form of the KKT optimality conditions. knitro/cg uses a barrier stop test that determines when the barrier problem has been solved to suﬃcient accuracy. This strategy forces the iterates to remain in a (broad) neighborhood of the central path. Each barrier problem is terminated when the norm of the scaled KKT conditions is small enough, where the scaling factors are aﬀected by the choice of Dx and Ds . A poor choice of preconditioner, including diagonal Hessian preconditioning, introduces an unwanted distortion in the barrier stop test, and this can result in a deterioration of the interior-point iteration. Note in contrast that the incomplete Cholesky preconditioner (option (2, ∗)) does not adversely aﬀect the overall behavior of the interior-point iteration in problem cvxbqp1.

6 Quasi-Newton Methods We now consider algorithms that use quasi-Newton approximations. In recent years, most of the numerical studies of interior-point methods have focused on the use of exact Hessian information. It is well known, however, that in many practical applications, second derivatives are not available, and it is therefore of interest to compare the performance of active-set and interiorpoint methods in this context. We report results with 5 solvers: snopt version 7.2-1 [10], l-bfgs-b [4,19], knitro/direct, knitro/cg and knitro/active version 5.0. Since all the problems in our test set have more than 1000 variables, we employ the limited memory BFGS quasi-Newton options in all codes, saving m = 20 correction pairs. All other options in the codes were set to their defaults. snopt is an active-set SQP method that computes steps by solving an inequality constrained quadratic program. l-bfgs-b implements a gradient projection method. Unlike tron, which is a trust region method, l-bfgs-b is a line search algorithm that exploits the simple structure of limited memory quasi-Newton matrices to compute the step at small cost. Table 3 reports the results on the same set of problems as in Table 1. Performance proﬁles are provided in Figures 3 and 4.

knitro/direct (m = 20) iter feval CPU 6812 6950 1244.05 X1 X1 X1 X1 X1 X1 19 20 31.42 29 30 89.03 50 51 239.29 33 34 133.65 X4 X4 X4 X5 X5 X5 183 191 76.26 1873 1913 992.23 2134 2191 10929.97 53 166 1222.38 3953 3980 801.87 85 86 382.62 3831 3835 20043.27 1100 1129 8306.03 31 34 99.36 982 1009 1322.69 383 391 2139.91 59 61 221.98 4 8 0.53 X4 X4 X4 17 18 27.78 359 625 1891.24 X1 X1 X1 X5 X5 X5 696 716 1564.78 625 643 950.16

knitro/cg knitro/active (m = 20) (m = 20) iter feval CPU iter feval CPU 3349 3443 1192.32 X1 X1 X1 X1 X1 X1 X4 X4 X4 X1 X1 X1 X1 X1 X1 14 15 11.94 16 17 6.42 25 26 71.44 2 3 0.59 47 48 76.84 32 34 35.84 40 41 69.74 23 28 17.32 X5 X5 X5 206 645 513.75 X5 X5 X5 20 97 120.59 164 168 58.82 10 11 1.48 1266 1309 1968.66 505 515 1409.80 1390 1427 221.73 395 417 1236.32 X4 X4 X4 X5 X5 X5 1633 1665 16136.98 497 498 743.37 X1 X1 X1 9 10 2.03 8118 8119 993.03 124 125 178.97 1049 1069 27844.21 873 886 3155.06 1098 1235 2812.25 87 92 123.82 618 639 11489.74 1253 1258 2217.58 282 286 1222.99 276 279 641.07 60 62 67.39 3 7 0.72 4 5 0.30 2 4 0.10 X5 X5 X5 X1 X1 X1 24 25 34.81 4 5 0.43 66 69 150.48 15 16 5.17 X1 X1 X1 X1 X1 X1 X4 X4 X4 X1 X1 X1 336 362 15661.85 300 303 1251.40 349 370 15309.47 296 306 1272.50

Table 3. Comparative results of live methods that use quasi-Newton approximations

snopt l-bfgs-b (m = 20) (m = 20) problem n iter feval CPU iter feval CPU biggsb1 20000 X1 X1 X1 X1 X1 X1 bqpgauss 2003 5480 6138 482.87 9686 10253 96.18 chenhark 20000 X1 X1 X1 X1 X1 X1 clnlbeam 20000 41 43 45.18 22 28 0.47 cvxbqp1 20000 60 65 139.31 1 2 0.04 explin 24000 72 100 28.08 29 36 0.52 explin2 24000 63 72 25.62 20 24 0.30 expquad 24000 X4 X4 X4 X2 X2 X2 gridgena 26312 X6 X6 X6 X7 X7 X7 harkerp2 2000 50 57 7.05 86 102 4.61 jnlbrng1 21904 1223 1337 8494.55 1978 1992 205.02 jnlbrnga 21904 1179 1346 1722.60 619 640 59.24 mccormck 100000 1019 1021 10820.22 X8 X8 X8 minsurfo 10000 904 1010 8712.90 1601 1648 97.66 ncvxbqp1 20000 41 43 60.54 1 2 0.04 ncvxbqp2 20000 X6 X6 X6 151 191 4.76 nobndtor 32400 1443 1595 12429 1955 1966 314.42 nonscomp 20000 233 237 1027.41 X8 X8 X8 obstclae 21904 547 597 4344.33 1110 1114 109.11 obstclbm 21904 342 376 1332.14 359 368 35.94 pentdi 20000 2 6 0.57 1 3 0.05 probpenl 5000 3 5 8.86 2 4 0.03 qrtquad 5000 X6 X6 X6 241 308 4.85 qudlin 20000 41 43 19.80 1 2 0.02 reading1 20001 81 83 114.18 7593 15354 234.93 scond1ls 2000 X1 X1 X1 X1 X1 X1 sineali 20000 466 553 918.33 14 19 0.63 torsion1 32400 662 733 4940.83 565 579 86.39 torsiona 32400 685 768 5634.62 490 496 77.42 X1 : iteration limit reached X2 : numerical result out of range X4 : current solution estimate cannot be improved X5 : relative change in solution estimate < 10−15 X6 : dual feasibility cannot be satisﬁed X7 : rounding error X8 : line search error

On Bound Constrained Optimization 289

290

L. Hei et al. Number of function evaluations 1

Percentage of problems

0.8

0.6

0.4

0.2 SNOPT L-BFGS-B KNITRO-DIRECT KNITRO-CG KNITRO-ACTIVE 0 1

4

16

64

256

1024

4096

16384

65536

x times slower than the best

Figure 3. Number of Function Evaluations

CPU time 1

Percentage of problems

0.8

0.6

0.4

0.2 SNOPT L-BFGS-B KNITRO-DIRECT KNITRO-CG KNITRO-ACTIVE 0 1

4

16

64 256 1024 x times slower than the best

Figure 4. CPU Time

4096

16384

65536

On Bound Constrained Optimization

291

A sharp drop in robustness and speed is noticeable for the three knitro algorithms; compare with Table 1. In terms of function evaluations, l-bfgsb and knitro/active perform the best. snopt and the two interior-point methods require roughly the same number of function evaluations, and this number is often dramatically larger than that obtained by the interior-point solvers using exact Hessian information. In terms of CPU time, l-bfgs-b is by far the best solver and knitro/active comes in second. Again, snopt and the two interior-point methods require a comparable amount of CPU, and for some of these problems the times are unacceptably high. In summation, as was the case with tron when exact Hessian information was available, the specialized quasi-Newton method for bound constrained problems l-bfgs-b has an edge over the general purpose solvers. The use of preconditioning has helped bridge the gap in the exact Hessian case, but in the quasi-Newton case, improved updating procedures are clearly needed for general purpose methods.

References 1. L. Bergamaschi, J. Gondzio, and G. Zilli, Preconditioning indeﬁnite systems in interior point methods for optimization, Tech. Rep. MS-02-002, Department of Mathematics and Statistics, University of Edinburgh, Scotland, 2002. 2. R. H. Byrd, N. I. M. Gould, J. Nocedal, and R. A. Waltz, An algorithm for nonlinear optimization using linear programming and equality constrained subproblems, Mathematical Programming, Series B, 100 (2004), pp. 27–48. 3. R. H. Byrd, M. E. Hribar, and J. Nocedal, An interior point algorithm for large scale nonlinear programming, SIAM Journal on Optimization, 9 (1999), pp. 877–900. 4. R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, A limited memory algorithm for bound constrained optimization, SIAM Journal on Scientiﬁc Computing, 16 (1995), pp. 1190–1208. 5. R. H. Byrd, J. Nocedal, and R. Waltz, KNITRO: An integrated package for nonlinear optimization, in Large-Scale Nonlinear Optimization, G. di Pillo and M. Roma, eds., Springer, 2006, pp. 35–59. 6. R. H. Byrd and R. A. Waltz, Improving SLQP methods using parametric linear programs, tech. rep., OTC, 2006. To appear. ´, Benchmarking optimization software with per7. E. D. Dolan and J. J. More formance proﬁles, Mathematical Programming, Series A, 91 (2002), pp. 201–213. 8. A. Forsgren, P. E. Gill, and J. D. Griffin, Iterative solution of augmented systems arising in interior methods, Tech. Rep. NA 05-3, Department of Mathematics, University of California, San Diego, 2005. 9. R. Fourer, D. M. Gay, and B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, Scientiﬁc Press, 1993. www.ampl.com. 10. P. E. Gill, W. Murray, and M. A. Saunders, SNOPT: An SQP algorithm for large-scale constrained optimization, SIAM Journal on Optimization, 12 (2002), pp. 979–1006.

292

L. Hei et al.

11. N. I. M. Gould, D. Orban, and P. L. Toint, CUTEr and sifdec: A Constrained and Unconstrained Testing Environment, revisited, ACM Trans. Math. Softw., 29 (2003), pp. 373–394. 12. N. I. M. Gould, D. Orban, and P. L. Toint, Numerical methods for largescale nonlinear optimization, Technical Report RAL-TR-2004-032, Rutherford Appleton Laboratory, Chilton, Oxfordshire, England, 2004. 13. C. Keller, N. I. M. Gould, and A. J. Wathen, Constraint preconditioning for indeﬁnite linear systems, SIAM Journal on Matrix Analysis and Applications, 21 (2000), pp. 1300–1317. ´, Incomplete Cholesky factorizations with limited 14. C. J. Lin and J. J. More memory, SIAM Journal on Scientiﬁc Computing, 21 (1999), pp. 24–45. , Newton’s method for large bound-constrained optimization problems, 15. SIAM Journal on Optimization, 9 (1999), pp. 1100–1127. 16. M. Roma, Dynamic scaling based preconditioning for truncated Newton methods in large scale unconstrained optimization: The complete results, Technical Report R. 579, Istituto di Analisi dei Sistemi ed Informatica, 2003. 17. T. Steihaug, The conjugate gradient method and trust regions in large scale optimization, SIAM Journal on Numerical Analysis, 20 (1983), pp. 626–637. 18. R. A. Waltz, J. L. Morales, J. Nocedal, and D. Orban, An interior algorithm for nonlinear optimization that combines line search and trust region steps, Mathematical Programming, Series A, 107 (2006), pp. 391–408. 19. C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, Algorithm 78: L-BFGS-B: Fortran subroutines for large-scale bound constrained optimization, ACM Transactions on Mathematical Software, 23 (1997), pp. 550–560.

Word Similarity In WordNet Tran Hong-Minh and Dan Smith School of Computing Sciences University Of East Anglia Norwich, UK, NR4 7TJ [email protected] Abstract This paper presents a new approach to measure the semantic similarity between concepts. By exploiting advantages of distance (edge-base) approach for taxonomic tree-like concepts, we enhance the strength of information theoretic (node-based) approach. Our measure therefore gives a complete view of word similarity, which cannot be achieved by solely applying node-based approach. Our experimental measure achieves 88% correlation with human rating.

1 Introduction Understanding concepts expressed in natural language is a challenge in Natural Language Processing and Information Retrieval. It is often decomposed into comparing semantic relations between concepts, which can be done by using Hidden Markov model and Bayesian Network for part of speech tagging. Alternatively, the knowledge-based approach can also be applied but it has not been well explored due to the lack of machine readable dictionaries (such as lexicons, thesauri and taxonomies) [12]. However, more dictionaries have been developed so far (e.g., Roger, Longman, WordNet [5, 6] etc.) and the number of research on this trend has been increased. The task of understanding and comparing semantics of concepts becomes understanding and comparing such relations by exploiting machine readable dictionaries. We propose a new information theoretic measure to assess the similarity of two concepts on the basis of exploring a lexical taxonomy (e.g., WordNet). The proposed formula is domain-independent. It could be applied for either generic or speciﬁed lexical knowledge base. We use WordNet as an example of the lexical taxonomy. The rest of the paper is organized as follows. In Section 2 we give an overview of the structure of a lexical hierarchy and use WordNet as a speciﬁc example. In the following section, Section 3 we analyze two approaches (such as distance (edge) based and information theoretic (node) based) for measuring the similarity degree. Based on these analysis we present our measure

294

T. Hong-Minh and D. Smith

which combines both advantages of the two approaches in Section 4. In Section 5 we discuss our comparative experiments. Finally we outline our future work in Section 6.

2 Lexical Taxonomy A taxonomy is often organized as a hierarchical and directional structure, in which nodes present for concepts (Noun, Adjective, Verb) and edges present for relations between concepts. The hierarchical structure has seldom more than 10 levels in depth. Although hierarchies in the system vary widely in size, each hierarchy covers a distinct conceptual and lexical domain. They are also not mutually exclusive as some cross-references are required. The advantage of the hierarchical structure is that common information to many items need not be stored with every item. In other words, all characteristics of the superordinate are assumed to be characteristic of all its subordinates as well. The hierarchical system therefore is an inheritance system with possibly multiple inheritance but without forming cycles. Consequently, nodes at deeper levels are more informative and speciﬁc than nodes that are nearer to the root. In principle, the root would be semantically empty. The number of leaf-nodes is obviously very much more than the number of upper nodes. In a hierarchical system, there are three types of nodes, such as concept nodes indicating nouns (a.k.a Noun node), attribute nodes representing adjectives and function nodes standing for verbs. Nodes are linked together by edges to give a full information about concepts. A node and a set of nodes linked with by incoming edges make it distinguished. Edges represent the relations between nodes. They are currently categorized into some common types (such as, is-a, equivalence, antonymy, modiﬁcation, function and meronymy). Among them, the IS-A relation, connecting a Noun node and another Noun node, is the dominant and the most important one. Like the IS-A relation, the meronymy relation connecting two Noun nodes together also has an important role in the system. Besides the two popular relations, there are four more types of relations. The antonymy relation (e.g. man-woman, wife-husband), the equivalence relation connects synonyms together. The modiﬁcation indicates attributes of a concept by connecting a Noun node and an Adjective node and the function relation indicates behaviour of a concept by linking a Verb to a Noun. In Table 1 the characteristics of such relations are brieﬂy summarized. In practice, one example of such lexical hierarchical systems is WordNet which is currently one of the most popular and the largest online dictionary produced by Miller et al from Princeton University in the 1990s. It supports multiple inheritance between nodes and has the most numerous of relations implemented. WordNet hierarchical system includes 25 diﬀerent branches rooted by 25 distinguish concepts. Each of such 25 concepts can be considered

Word Similarity In WordNet

295

Table 1. Characteristic of relations in the lexical hierarchical system Is-A Meronymy Equivalence Modiﬁcation Function Antonymy √ √ √ √ √ Transitive × √ √ √ Symmetric × × ×

as the beginners of the branches and regarded as a primitive semantic component of all concepts in its semantic hierarchy. Table 2 shows such beginners.

Table 2. List of 25 unique beginners for WordNet nouns {act, action, activity} {animal, fauna} {artifact} {attribute, property} {body, corpus} {cognition, knowledge} {communication} {event, happening} {feeling, emotion}

{natural object} {natural phenomenon} {person, human being} {plant, ﬂora} {possession} {process} {quantity, amount} {relation}

{food} {group, collection} {location, place} {motive} {shape} {state, condition} {substance} {time}

Object sustance

Asset money

Artifact

weath

cash

treasure

coin

gold

Chemical element

Instrumentality Conveyance crystal

dime

metal Motor Vehicle Nickel*

Gold*

Car

Ware Table ware

Vehicle

nickel

Article

solid

Wheeled Chair

Cultery

Cycle

Fork

Bicycle

Figure 1. Fragments of WordNet noun taxonomy

Like many other lexical inheritance systems, the IS-A and the meronymy relations are fully supported in WordNet. Although the modiﬁcation and the function relations have not been implemented, the antonymy and the synonym sets are implemented in WordNet. Figure 1 shows fragments of WordNet noun hierarchy. With a hierarchical structure, similarity can be obtained not only by solely comparing the common semantics between two nodes in the system (information theoretic based approach) but also by measuring their position in the structure and their relations (distance based approach).

296

T. Hong-Minh and D. Smith

3 Information Theoretic vs. Conceptual Distance Approach for Measuring Similarity Based on diﬀerent underlying assumptions about taxonomy and deﬁnitions of similarity (e.g., [1–3, 7, 10], etc.), there are two main trends for measuring semantic similarity between two concepts: node based approach (a.k.a information content approach) vs. edge based approach (a.k.a conceptual distance approach). The most distinguish characteristic of node-based approach is that the similarity between nodes is measured directly and solely by the common information content. Since taxonomy is often represented as hierarchical structure — a special case of network structure — similarity between nodes can make use of the structural information embedded in the network, especially those of links between nodes. This is the main idea of edge-based approaches. 3.1 Conceptual Distance Approach The conceptual distance approach is natural, intuitive and addresses the problem of measuring similarity of concepts in the lexical hierarchical system presented in Section 2. The similarity between concepts is related to their diﬀerences in the conceptual distance between them. The more diﬀerences they have, the less similar they are. The conceptual distance between concepts is measured by the geometric distance between nodes presenting concepts. Deﬁnition 1 Given two concepts c1 and c2 and dist(c1 , c2 ) as the distance between c1 and c2 , the diﬀerence between c1 and c2 is equal to the distance dist(c1 , c2 ) between them [3]. Deﬁnition 2 The distance dist(c1 , c2 ) between c1 and c2 is the sum of weights wti of edges ei in the shortest path from c1 to c2 : dist(c1 , c2 ) = (wti ) (1) wti ∈{wti ofei |ei ∈shortestPath(c1 ,c2 )}

As being a distance, Formula (1) should satisfy the properties of a metric [10], such as zero property, positive property and triangle inequality. However, the symmetric property may not be satisﬁed, dist(c1 , c2 ) = dist(c2 , c1 ), as diﬀerent types of relations give diﬀerent contributions into the weight of the edge connecting two nodes. For example, regarding the meronymy type of relation, a aggregative relation may have diﬀerent contributions with a partof-relation, though they are reverse relation of each other. Most of contributions to the weight of an edge come from the characteristics of the hierarchical network, such as local network density, depth of a node in the hierarchy, type of link and strength of link: •

Network density of a node can be the number of its children. Richardson et al [8] suggest that the greater density the closer distance between parentchild nodes or sibling nodes.

Word Similarity In WordNet

• •

297

The distance between parent-child nodes is also closer at deeper levels, since the diﬀerentiation at such levels is less. The strength of a link is based on the closeness between a child node to its direct parent, against those of its siblings. This is the most important factor to the weight of the edge but it is still an open and diﬃcult issue.

There are studies on conceptual similarity by using the distance approach with above characteristics of the hierarchical network (e.g., [1, 11]). Most research focus on proposing an edge-weighting formula and then applying Formula (1) for measuring the conceptual distance. For instance, Sussna [11] considers depth, relation type and network density in his weight formula as follows: wt(c1 , c2 ) =

wt(c1 →r c2 ) + wt(c2 →r c1 ) 2d

in which wt(x →r y) = max − r

maxr − minr nr (x)

(2)

(3)

where →r , →r are respectively a relation of type r and its reversed. d is the deeper depth of c1 and c2 in the hierarchy. minr and maxr are respectively minimum and maximum weight of relation of type r. nr (x) is the number of relation type r of node x, which is viewed as the network density of the node x. The conceptual distance is then given by applying Formula (1). It gives a good result in a word sense disambiguation task with multiple sense words. However, the formula does not take into account the strength of relation between nodes, which is still an open issue for the distance approach. In summary, the distance approach obviously requires a lot of information on detailed structure of taxonomy. Therefore it is diﬃcult to apply or directly manipulate it on a generic taxonomy, which originally is not designed for similarity computation. 3.2 Information Theoretic Approach The information theoretic approach is more theoretically sound. Therefore it is generic and applied on many taxonomies without regarding their underlying structure. In a conceptual space, a node presents a unique concept and contains a certain amount of information. The similarity between concepts is related to the information in common of nodes. The more commonality they share, the more similar they are. Given concept c1 , concept c2 , IC(c) is the information content value of concept c. Let w be the word denoted concept c. For example, in Figure 1, word nickel has three senses: • •

“a United States coin worth one twentieth of a dollar ” (concept coin) “atomic number 28 ” (concept chemical element)

298

•

T. Hong-Minh and D. Smith

“a hard malleable ductile silvery metallic element that is resistant to corrosion; used in alloys; occurs in pentlandite and smaltite and garnierite and millerite” (concept metal).

Let s(w) be the set of concepts in the taxonomy that are senses of word w. For example, in Figure 1, words nickel, coin and cash are all members of the set s(nickel). Let Words(c) be the set of words subsumed by concept c. Deﬁnition 3 The commonality between c1 and c2 is measured by the information content value used to state the commonalities between c1 and c2 [3]: IC(common(c1 , c2 ))

(4)

Assumption 1 The maximum similarity between c1 and c2 is reached when c1 and c2 are identical, no matter how much commonality they share. Deﬁnition 4 In information theory, the information content value of a concept c is generally measured by IC(c) = − log P(c)

(5)

where P (c) is the probability of encountering an instance of concept c. For implementation, the probability is practically measured by the concept frequency. Resnik [7] suggests a method of calculating the concept probabilities in a corpus on the basis of word occurrences. Given count(w) as the number of occurrences of a word belonging to concept c in the corpus, N as the number of concepts in the corpus, the probability of a concept c in the corpus is deﬁned as follows: 1 × count(w) (6) P (c) = N w∈Words(c)

In a taxonomy, the shared information of two concepts c1 and c2 is measured by the information content value of the concepts that subsume them. Given sim(c1 , c2 ) as the similarity degree of two concepts c1 and c2 and Sup(c1 , c2 ) as the set of concepts that subsume both c1 and c2 , the formal deﬁnition of similarity degree between c1 and c2 is given as follows: ⎧ max IC(c), c1 = c2 , ⎪ ⎨ c∈Sup(c 1 ,c2 ) sim(c1 , c2 ) = (7) ⎪ ⎩ 1, c1 = c2 . The word similarity between w1 and w2 is formally deﬁned: sim(w1 , w2 ) =

max

c1 ∈S(w1 ),c2 ∈S(w2 )

[sim(c1 , c2 )]

(8)

Word Similarity In WordNet

299

When applying the above formulae to a hierarchical concept space, there are some slight speciﬁcations. A set of words Words(c), which is directly or indirectly subsumed by the concept c, is considered as all nodes in the sub-tree rooted by c, including c. Therefore, when we move from the leaves to the root of the hierarchy, Formula (6) therefore gives a higher probability to encounter a concept at the upper level. The probability of the root obviously is 1. Consequently, the information content value given by Formula (5) monotonically decreases in the bottom-up direction and the information content value of the root is 0. Those means that concepts at the upper levels are less informative and the characteristic of lexical hierarchical structure discussed in Section 2 is qualiﬁed. In a lexical hierarchical concept space, Sup(c1 , c2 ) contains all superordinates of c1 and c2 . For example, in Figure 1 coin , cash , money are all member of Sup(nickel, dime). However, as analysis above, only IC(coin) gives the highest information content value. The similarity computed by using Formula (7) sim(nickel, dime) therefore is equal to the information content value of its direct superordinate, IC(coin). So the direct superordinate of a node in a hierarchy (e.g. coin is the direct superordinate of nickel and dime ) is called the minimum upper bound of the node. Similarly for a multiple inheritance system, the similarity between concepts sim(c1 , c2 ) is equal to the maximum information content value among those of their minimum upper bound. For example, in Figure 1, sim(nickel∗ , gold∗ ) = max[IC(chemicalelement), IC(metal)] To conclude, unlike the distance approach, the information theoretic approach requires less structural information of the taxonomy. Therefore it is generic and ﬂexible and has wide applications on many types of taxonomies. However, when it is applied on hierarchical structures it does not diﬀerentiate the similarity of concepts as long as their minimum upper bounds are the same. For example, in Figure 1, sim(bicycle, fork) and sim(bicycle, tableware) are equal.

4 A Measure for Word Similarity We propose a combined model for measuring word similarity which is derived from the node-based notion by adding the structural information. We put the depth factor and link strength factor into the node-based approach. By adding such structural information of the taxonomy the node-based approach can exploit all typical characteristics of a hierarchical structure when it is applied on such taxonomy. Moreover, such information can be tuned via parameters. The method therefore is ﬂexible for many types of taxonomy (e.g., hierarchical structure or plain structure).

300

T. Hong-Minh and D. Smith

Deﬁnition 5 The strength of a link is deﬁned to be P (ci |p), the conditional probability of encountering a child node ci , given an instance of its parent node p. Using Baysian formula, we have: P (ci |p) =

P (ci ) P (ci ∩ p) = P (p) P (p)

(9)

The information content value of a concept c with regarding to its direct parent p, which is a modiﬁcation of the Formula (5), is given:

P(c) IC(c|p) = − log P(c|p) = − log = IC(c) − IC(p) (10) P(p) As we discussed in Section 2, concepts at upper levels of the hierarchy have less semantic similarity between them than concepts at lower levels. This characteristic should be taken into account as a constraint in calculating the similarity of two concepts with depth concern. Therefore, the depth function should give a higher value when applied on nodes at lower levels. The contribution of the depth to the similarity is considered as an exponential-growth function: fc1 ,c2 (d) =

eαd − e−αd , eαd + e−αd

(11)

where d = max(depth(c1 ), depth(c2 )) and α is a tuning parameter. The optimal value of the parameter is α = 0.3057, based on our numerous experiments. Function (11) is a monotonically increasing function with respect to depth d. Therefore it satisﬁes the constraint above. Moreover, by employing an exponential-growth function rather than an exponential-decay function, it is an extension of Shrepard’s Law [2, 9], which claims that exponential-decay function are a universal law of stimulus generalisation for psychological science. Then, the function given in Formula (7) is now a function of the depth and the information content with the concern of the strength of a link as follows: ⎧ max (IC(c|p) × fc (d)), c1 = c2 , ⎪ ⎨ c∈Sup(c 1 ,c2 ) (12) sim(c1 , c2 ) = ⎪ ⎩ 1, c1 = c2 .

5 Experiments Although there is no standard way to evaluate computational measures of semantic similarity, one reasonable way to judge would seem to be agreement with human similarity ratings. This can be assessed by measuring and rating the similarity of each word pair in a set and then looking at how well its ratings correlate with human ratings of the same pairs.

Word Similarity In WordNet

301

We use the human ratings done by Miller and Charles [4] and revised by Resnik [7] as our baseline. In their study, 38 undergraduate subjects are given 30 pairs of nouns and were asked to rate the smilarity of meaning for each pair on scale from 0 (dissimilar) to 4 (fully similar). The average rating of each pair represents a good estimate of how similar the two words are. Furthermore, we compare our similarity value with those produced by a simple edge-count measure and Lin’s [3].We use WordNet 2.0 as the hierarchical system to exploit the relationships among the pairs. Table 3. Results obtained evaluating with human judgement and WordNet 2.0 word1 word2 car automobile gem jewel journey voyage boy lad coast shore asylum madhouse magician wizard midday noon furnace stove food fruit bird cock bird crane tool implement brother monk crane implement lad brother journey car monk oracle cemetery woodland food rooster coast hill forest graveyard shore woodland monk slave coast forest lad wizard chord smile glass magician noon string rooster voyage correlation

Human 3.92 3.84 3.84 3.76 3.70 3.61 3.50 3.42 3.11 3.08 3.05 2.97 2.95 2.82 1.68 1.66 1.16 1.10 0.95 0.89 0.87 0.84 0.63 0.55 0.42 0.42 0.13 0.11 0.08 0.08 1.00

simedge 1.00 1.00 0.50 0.50 0.50 0.50 1.00 1.00 0.13 0.13 0.50 0.25 0.50 0.50 0.20 0.20 0.07 0.13 0.10 0.07 0.20 0.10 0.17 0.20 0.14 0.20 0.09 0.13 0.08 0.05 0.77

simLin 1.00 1.00 0.69 0.82 0.97 0.98 1.00 1.00 0.22 0.13 0.80 0.92 0.25 0.29 0.00 0.23 0.08 0.10 0.71 0.08 0.14 0.25 0.13 0.27 0.27 0.13 0.00 0.00 0.80

ours 1.00 1.00 0.92 0.87 1.00 0.90 1.00 1.00 0.32 0.73 0.85 0.85 0.73 0.54 0.80 0.27 0.00 0.26 0.07 0.26 0.71 0.13 0.27 0.31 0.38 0.21 0.07 0.07 0.00 0.00 0.88

302

T. Hong-Minh and D. Smith

6 Conclusion We have presented a review of two main trends of measuring similarity of words in a generic and hierarchical corpus. Based on this review we proposed a modiﬁcation on the node based approach to capture the structural information of a hierarchical taxonomy. Therefore our approach gives a complete view on similarity of words.

References 1. J. J. Jiang and D. W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In the International Conference on Research in Computational Linguistics, 1997. 2. Y. Li, Z. A. Bandar, and D. McLean. An approach for measuring semantic similarity between words using multiple information sources. In IEEE Transaction on Knowledge and Data Transaction, vol. 15, 871–882, 2003. 3. D. Lin. An information-theoretic deﬁnition of similarity. In ICML ’98: Proceedings of the Fifteenth International Conference on Machine Learning, 296–304, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. 4. G. Miller and W. Charles. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1–28, 1991. 5. G. A. Miller. Nouns in wordnet: A lexical inheritance system. International journal of Lexicography, 3(4):245–264, 1990. 6. G. A. Miller, C. Fellbaum, R. Beckwith, D. Gross, and K. Miller. Introduction to wordnet: An online lexical database. International journal of Lexicography, 3(4):235–244, 1990. 7. P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In IJCAI, 448–453, 1995. 8. R. Richardson and A. F. Smeaton. Using WordNet in a knowledge-based approach to information retrieval. Technical Report CA-0395, Dublin, Ireland, 1995. 9. R. N. Shepard. Toward a universal law of generalization for psychological science. 237(4820):1317–1323, September 1987. 10. R. Roy, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. In IEEE Transactions on Systems, Man and Cybernetics, volume 19, 17–30, 1989. 11. M. Sussna. Word sense disambiguation for free-text indexing using a massive semantic network. In CIKM ’93: Proceedings of the second international conference on Information and knowledge management, 67–74, New York, NY, USA, 1993. ACM Press. 12. W. Gale, K. Church, and D. Yarovsky. A method for disambiguating word senses in a large corpus. Common Methodologies in Humanities Computing and Computational Linguistics, 26:415–439, 1992.

Progress in Global Optimization and Shape Design D. Isebe1 , B. Ivorra1 , P. Azerad1 , B. Mohammadi1 , and F. Bouchette2 1 2

I3M- Universite de Montpellier II, Place Eugene Bataillon, CC051, 34095 Montpellier, France [email protected] ISTEEM - Universite de Montpellier II, Place Eugene Bataillon, 34095 Montpellier, France [email protected]

Abstract In this paper, we reformulate global optimization problems in terms of boundary value problems. This allows us to introduce a new class of optimization algorithms. Indeed, many optimization methods can be seen as discretizations of initial value problems for diﬀerential equations or systems of diﬀerential equations. We apply a particular algorithm included in the former class to the shape optimization of coastal structures.

1 Introduction Many optimization algorithms can be viewed as discrete forms of Cauchy problems for a system of ordinary diﬀerential equations in the space of control parameters [1, 2]. We will see that if one introduces an extra information on the inﬁmum, solving global optimization problems using these algorithms is equivalent to solving Boundary Value Problems (BVP) for the same equations. A motivating idea is therefore to apply algorithms solving BVPs to perform this global optimization. In this paper we present a reformulation of global minimization problems in term of over-determined BVPs, discuss the existence of their solutions and present some algorithms solving those problems. One aim here is also to show the importance of global optimization algorithms for shape optimization. Indeed, because of excessive cost of global optimization approaches usually only local minimization algorithms are used for shape optimization of distributed systems, especially with ﬂuids [3, 4]. Our semi-deterministic algorithm permits for global optimization of systems governed by PDEs at reasonable cost. Section 2 presents our global optimization method and mathematical background. In Section 3, the previous approach is applied to our considered optimization problem.

304

D. Isebe et al.

2 Optimization Method We consider the following minimization problem: min J(x)

x∈Ωad

(1)

where J : Ωad → IR is called cost function, x is the optimization parameter and belongs to a compact admissible space Ωad ⊂ IRN , with N ∈ IN. We make the following assumptions [2]: J ∈ C 2 (Ωad , IR) and coercive. The inﬁmum of J is denoted by Jm . 2.1 BVP Formulation of Optimization Problems Many minimization algorithms which perform the minimization of J can be seen as discretizations of continuous ﬁrst or second order dynamical systems with associated initial conditions [1]. A numerical global optimization of J with one of those algorithms, called here core optimization method, is possible if the following BVP has a solution: First or second order initial value problem (2) |J(x(Z)) − Jm | < where x(Z) is the solution of the considered dynamical system found at a given ﬁnite time Z ∈ IR and is the approximation precision. In practice, when Jm is unknown, we set Jm to a lower value (for example Jm = 0 for a non-negative function J) and look for the best solution for a given complexity and computational eﬀort. This BVP is over-determined as it includes more conditions than derivatives. This over-determination can be removed for instance by considering one of the initial conditions in the considered dynamical system as a new variable denoted by v. Then we could use what is known on BVP theory, for example a shooting method [1], in order to determine a suitable v solving (2). 2.2 General Method for the Resolution of BVP (2) In order to solve previous BVP (2), we consider the following general method: We consider a function h : Ωad → IR given by: h(v) = J(x(Z, v))

(3)

where x(Z, v) is the solution of considered dynamical in (2) starting from the initial condition v, deﬁned previously, at a given time Z ∈ IR. Solving BVP (2) is equivalent to minimize in Ωad function (3). Depending on the selected optimization method, h is usually a discontinuous plateau function. For example, if a Steepest Descent method is used as

Progress in Global Optimization and Shape Design

305

core optimization method, the associated dynamical system reach, in theory, the same local minimum when it starts from any points included in a same attraction basin. In other words, if Z is large enough, h(v) is piecewise constant with values corresponding to the local minima of J(x(Z, v)). Furthermore, h(v) is discontinuous where the functional reaches a local maximum, or has a plateau. In order to minimize such a kind of function we propose a multi-layer algorithm based on line search methods [1]: We ﬁrst consider the following algorithm A1 (v1 , v2 ): -

(v1 , v2 ) ∈ Ωad × Ωad given Find v ∈ argminw∈O(v2 ) h(w) where O(v2 ) = {v1 +t(v2 −v1 ), t ∈ IR}∩Ωad using a line search method return v

The line search minimization in A1 is deﬁned by the user. It might fails. For instance, a secant method [1] degenerates on plateaus and critical points. In this case, in order to have a multidimensional search, we add an external layer to the algorithm A1 by minimizing h : Ωad → IR deﬁned by: h (v ) = h(A1 (v , w ))

(4)

with w chosen randomly in Ωad . This leads to the following two-layer algorithm A2 (v1 , v2 ): -

(v1 , v2 ) ∈ Ωad × Ωad given Find v ∈ argminw∈O(v2 ) h (w) where O(v2 ) = {v1 +t(v2 −v1 ), t ∈ IR}∩Ωad using a line search method return v

The line search minimization in A2 is deﬁned by user. N.B Here we have only described a two-layers algorithm structure. But this construction can be pursued by building recursively hi (v1i ) = hi−1 (Ai−1 (v1i , v2i )), with h1 (v) = h(v) and h2 (v) = h (v) where i = 1, 2, 3, ... denotes the external layer. In this paper, we call this general recursive algorithm: Semi-Deterministic Algorithm (SDA). For each class of method used as core optimization method, we will describe more precisely the SDA implementation. 2.3 1st Order Dynamical System Based Methods We consider optimization methods that come from the discretization of the following dynamical system [1, 2]: M (ζ, x(ζ))xζ (ζ) = −d(x(ζ)) (5) x(ζ = 0) = x0 n where ζ is a ﬁctitious time. xζ = dx dζ . M is an operator, d : Ωad → IR is a function giving a suitable direction. For example:

306

• •

D. Isebe et al.

If d = ∇J, the gradient of J, and M (ζ, x(ζ)) = Id, the identity operator, we recover the classical steepest descent method. If d = ∇J and M (ζ, x(ζ)) = ∇2 J(x(ζ)) the Hessian of J, we recover the Newton method.

In this case, BVP (2) can be rewritten as: ⎧ ⎨ M (ζ, x(ζ))xζ = −d(x(ζ)) x(0) = x0 ⎩ |J(x(Z)) − Jm | <

(6)

This BVP is over-determined by x0 . i.e, the choice of x0 determines if BVP (6) admits or not a solution. For instance, in the case of a steepest descent method, BVP (6) generally has a solution if x0 is in the attraction basin of the global minimum. In order to determine a such x0 , we consider the implementation of algorithms Ai with i = 1, 2, 3, ... (here we limit the presentation to i = 2). The ﬁrst layer A1 is applied with a secant method in order to perform line search. The output is denoted byA1 (v1 , J, I, ), and the algorithm reads: Input: v1 , J, v2 chosen randomly For l going from 1 to J ol = D(vl , ) ol+1 = D(vl+1 , ) If J(ol ) = J(ol+1 ) EndFor If min{J(om ), m = 1, ..., l} < EndFor l+1 −vl vl+2 = vl+1 − J(ol+1 ) J(ovl+1 )−J(ol ) EndFor Output: A1 (v1 , J, ): argmin{J(om ), m = 1, ..., i} where v1 ∈ Ω, ∈ IR+ and J ∈ IN are respectively the initial condition, the stopping criterion and the iteration number. The second layer A2 is applied with a secant method in order to perform line search. The output is denoted by A2 (w1 , K, J, I, ), and the algorithm reads: Input: w1 , K, J, w2 chosen randomly For l going from 1 to K pl = A1 (wl , J, ) pl+1 = A1 (wl+1 , J, ) If J(pl ) = J(pl+1 ) EndFor If min{J(pm ), m = 1, ..., l} < EndFor l+1 −wl wl+2 = wl+1 − J(pl+1 ) J(pwl+1 )−J(pl ) EndFor Output: A2 (w1 , K, J, ): argmin{J(pm ), m = 1, ..., i}

Progress in Global Optimization and Shape Design

307

where w1 ∈ Ω, ∈ IR+ and (K, J) ∈ IN2 are respectively the initial condition, the stopping criterion and the iteration number. 2.4 2nd Order Dynamical System Based Methods In order to keep an exploratory character during the optimization process, allowing us to escape from attraction basins, we could use variants of previous methods after adding second order derivatives. For instance we could reformulate BVP (2) considering methods coming from the discretization of the following ‘heavy ball ’ dynamical system [2]: ⎧ ⎨ ηxζζ (ζ) + M (ζ, x(ζ))xζ (ζ) = −d(x(ζ)), x(0) = x0 , xζ (0) = xζ,0 (7) ⎩ |J(x(Z)) − Jm | < with η ∈ IR. System (7) can be solved by considering x0 (as previously) or xζ,0 as a new variable. In the ﬁrst case the existence of solution for BVP (7) is trivial. In the second case, considering particular hypothesis interesting in numerical analysis, when x0 is ﬁxed it can be proved that it exists a xζ,0 such that BVP (7) admits numerical solutions: Theorem 1 Let J : IRn → IR be a C 2 -function such that minIRn J exists and is reached at xm ∈ IRn . Then for every (x0 , δ) ∈ IRn × IR+ , exists (σ, t) ∈ IRn × IR+ such that the solution of the following dynamical system: ⎧ ⎨ ηxζζ (ζ) + xζ (ζ) = −∇J(x(ζ)) x(0) = x0 (8) ⎩ xζ (0) = σ with η ∈ IR, passes at time ζ = t into the ball Bδ (xm ). Proof : We assume x0 = xm . Let > 0, we consider the dynamical system: ⎧ ⎨ ηyτ τ (τ ) + yτ (τ ) = −2 ∇J(x(τ )) y(0) = x0 (9) ⎩ yτ (0) = (xm − x0 ) with •

in IR+ \{0}.

Assume that = 0, we obtain the following system: ⎧ ⎨ ηyτ τ,0 (τ ) = 0 y0 (0) = x0 ⎩ yτ,0 (0) = (xm − x0 )

(10)

System (10) describes a straight line of origin x0 and passing at time θ by the point xm , i.e. y0 (θ ) = xm .

308

•

D. Isebe et al.

Assume that = 0. System (9) could be rewritten as: ⎧ y(τ ) yτ (τ ) ⎪ ⎪ = ⎨ ηyτ (τ ) τ −yτ (τ ) − 2 ∇J(y(τ )) y(0) = x0 ⎪ ⎪ ⎩ yτ (0) = (xm − x0 )

(11)

System (11) is of the form yτ = f (τ, y, ), with f satisfying the CauchyLipschitz conditions. Applying the Cauchy-Lipschitz Theorem [5]: |y (θ )− y0 (θ )| →→0 0 uniformly. Thus for every δ ∈ R+ \{0}, there exists δ such that for every < δ : |y (θ ) − xm | < δ (T.1) Let δ ∈ IR+ \{0}. We consider the following variable changing ζ = δ τ and x(ζ) = yδ ( ζδ ). System (9) becomes: ⎧ ⎨ ηxζζ (ζ) + xζ (ζ) = −∇J(x(ζ)) x(0) = x0 ⎩ x(0) ˙ = δ (xm − x0 )

(12)

Let ϑ = δ θ . Under this assumption, x(ϑ) = yδ (θ ). Thus, due to (T.1) : |x(ϑ) − xm | < δ. We have found σ = δ (xm − x0 ) ∈ IRn and t = ϑ ∈ IR+ such that the solution of system (8) passes at time t into the ball Bδ (xm ). ◦ In order to determine a suitable x0 or x(ζ,0) solving BVP (7), we can consider, for instance, the same algorithms A1 and A2 introduced in section 2.3. 2.5 Other Hybridizations with SDA In practice, any user-deﬁned, black-box or commercial minimization package starting from an initial condition can be used to build the core optimization sequences in the SDA presented in section 2.2. In that sense, the algorithm permits the user to exploit his knowledge on his optimization problem and to improve it. In the same way, preconditioning can be introduced at any layer, and in particular at the lowest one.

3 Application to Shape Optimization of Coastal Structures To our knowledge, despite the fact that beach protection becomes a major problem, shape optimization techniques have never been used in coastal

Progress in Global Optimization and Shape Design

309

engineering. Groins, breakwaters and many other structures are used to decimate water waves or to control sediment ﬂows but their shapes are usually determined using simple hydrodynamical assumptions, structural strength laws and empirical considerations. In this section, we expose two examples of shape optimization in coastal engineering. First, we solve a problem concerning the minimization of the water waves scattered by reﬂective vertical structures in deep water. Secondly, we study the protection of the lido of S`ete (NW Mediterranean sea, France) by optimizing geotextile tubes with a wave refraction-diﬀraction model. All of these works are part of the COPTER research project (2006-2009) NT05 - 2-42253 funded by French National Research Agency. Both optimization problems are performed using the two-layer algorithm A2 , introduced in section 2.3 with a steepest descent algorithm [1] as core optimization method. Each steepest descent iteration number is equals to 100. The layers iteration number is set to 5 (i.e. K = J = 5). 3.1 Minimization of Water Waves Impact in Deep Water In deep water, the main scattering phenomenon is the reﬂection, so we compute the solution of a boundary value problem describing the water waves ξ r scattered by a fully reﬂective vertical structure in deep water and modify accordingly its shape, in order to minimize a pre-deﬁned cost function taking into account the strength (energy) of the water waves [6, 7]. The optimization procedure relies on the global semi-deterministic algorithm detailed in the preceding section, able to pursue beyond local minima. For the control space, we consider a free individual parameterization for the structure which allows diﬀerent original and non-intuitive shapes. Practically, a generic structure is a tree, described by its trunk represented by a set of connected principal edges and by a number of secondary branches leaving from each node.This parameterization gives a large freedom in the considered shapes. The cost function to minimize is the energy norm L2 of water waves free surface in an admissible domain representing a ten meter wide coastal strip of prescribed width, located between two successive structures. We present here an optimized shape for the structures in the case of a north-western incidental wave. Optimization reveals an original and nonintuitive optimized shape represented in Fig. 1-(Left). It is a monotonous structure which provides superior results for the control of the free surface along the coastline. To highlight the eﬀectiveness of this structure, we will compare it with a traditional structure (rectangular and perpendicular to the coastline) (See Fig. 2). The cost function decreases by more than 93% compared to rectangular structures perpendicular to the coastline, like we can see in the cost function convergence during the optimization process (Fig. 1-(Right)).

310

D. Isebe et al. 0.7 0.6

value ofJ

0.5 0.4 0.3 0.2 0.1 0 0

20 40 60 80 number of cost function evaluation

100

Figure 1. (Left) Initial (Dashed line) and optimized structures (Solid line). (Right) Cost function evolution during the optimization process (history of convergence) 1.5 m

(a)

(b)

0.6 m

1

0.4 0.5

0.2 0

0 -0.5

-1

-1.5 m

-0.2

-0.1 m

Figure 2. free surface elevation ξ resulting from a reﬂection (a) on rectangular structures perpendicular to the coastline, (b) on optimized structures with no feasibility constraints

3.2 Minimization of the Sediment Mobilization Energy in a Coastal Zone The objective is to prevent the erosion phenomenon in the region of the lido of S`ete (NW Mediterranean sea, France) by minimizing the sediment mobilization energy, for a set of given periods, directions and heights for the water waves with the help of a refraction-diﬀraction model [8,9]. In few words, it is important to note that the incident water waves can be divided in two categories, the destructive waters waves and the advantageous water waves. The cost function considered has the ambition to reduce the energy for the destructive water waves and, in addition, to be transparent with the advantageous water waves.

Progress in Global Optimization and Shape Design

311

The solution proposed is the use of geotextile tubes attenuator devices for the water waves. Initially, a preliminary draft proposes to put geotextile tubes, with an height of 3m, at a distance of 550m compared to the coast. We have the possibility to optimize the distance, the shape, the angle with the coast, the height of the geotextile tubes but in this paper, in order to show that global optimization is of great interest in coastal engineering, we ﬁx all the dimensioning quantities and we only optimize the distance compared to the coast. First, we sample the oﬀshore distance between 150m et 750m and we compute the value of J for a geotube disposed in each value of the sampling. We expose the results in Fig. 3-Left.

0.65

0.65

0.6

0.6 value of J

Cost function value J

Best convergence History of convergence

0.55

0.5

0.45 100

Space control parameters

200 300 400 500 600 700 Distance Coastline/Geotextile (m)

0.55 0.5

800

0.45 0

10

40 30 20 number of evaluation of J

50

Figure 3. (Left) Cost function value with respect to the position of the geotube. The admissible domain for the geotubes is 350 − 800m. (Right) Cost function evolution during the optimization. We see the importance of using global minimization

We see clearly that the minimum is obtain for a geotube placed with a oﬀshore distance of 350m. But what we would like to stress is that the cost function is obviously non-convex and this bring us to think that the use of a global optimization algorithm is necessary, in order to avoid the optimization process to be catch in local attraction basin. So, we apply the global algorithm described in the ﬁrst section with a starting point corresponding to the initial position (550m). The optimization process recover the optimal case seen by the sampling. More precisely, we obtain an optimal position of 353m. We expose the cost function evolution in the Fig. 3-Right.

312

D. Isebe et al.

4 Conclusions A new class of Semi-Deterministic methods has been introduced. This approach allow us to improve both deterministic and non-deterministic optimization algorithms. Various algorithms included in former class have been validated on various benchmark functions. Obtained results over-perform those given by a classical genetic algorithm in term of computational complexity and precision. One of them have been applied with success to the design of a coastal structures. These algorithms have been applied to other various industrial optimization problems: Multichannel Optical Filters Design, Shape optimization of a fast-microﬂuidic protein folding device, ﬂame temperature and pollutant control [10].

References 1. B. Mohammadi and J-H. Saiac. Pratique de la simulation num´ erique. Dunod, 2002. 2. H. Attouch and R. Cominetti. A dynamical approach to convex minimization coupling approximation with the steepest descent method. J. Diﬀerential Equations, 128(2):519–540, 1996. 3. B. Mohammadi and O. Pironneau. Applied Shape Optimization for Fluids. Oxford University Press, 2001. 4. A. Jameson, F. Austin, M. J. Rossi, W. Van Nostrand, and G. Knowles. Static shape control for adaptive wings. AIAA Journal, 32(9):1895–1901, 1994. 5. F. Verhulst. Nonlinear diﬀerential equations and dynamical systems. SpringerVerlag., 1990. 6. D. Colton and R. Kress. Inverse acoustic and electromagnetic scattering theory. Springer-Verlag, 1992. 7. D. Isebe, P. Azerad, B. Ivorra, B. Mohammadi, and F. Bouchette. Optimal shape design of coastal structures minimizing coastal erosion. In Proceedings of workshop on inverse problems, CIRM, Marseille, 2005. 8. J. T. Kirby and R. A. Dalrymple. A parabolic equation for the combined refraction diﬀraction of stokes waves by mildly varying topography. J. Fluid. Mechanics., 136:443–466, 1983. 9. J. T. Kirby and R. A. Dalrymple. Combined refraction/diﬀraction model ref/dif 1, User’s manual. Coastal and Oﬀshore Engineering and Research, Inc., Newark, DE., January, 1985. (Revised June, 1986). 10. B. Ivorra, B. Mohammadi, D. E. Santiago, and J. G. Hertzog. Semi-deterministic and genetic algorithms for global optimization of microﬂuidic protein folding devices. International Journal of Numerical Method in Engineering, 66: 319–333, 2006.

EDF Scheduling Algorithm for Periodic Messages On Switched Ethernet Myung-Kyun Kim and Dao Manh Cuong School of Computer Engineering and Information Communication, University of Ulsan, Nam-Gu, 680749 Ulsan, Republic of Korea [email protected] Abstract The switched Ethernet oﬀers many attractive features for real-time communications such as traﬃc isolation, providing large bandwidth, and full-duplex links, but the real-time features may be aﬀected due to the collisions on an output port. This paper analyzes the schedulability condition for real-time periodic messages on a switched Ethernet where all nodes operate in a synchronized mode. This paper also proposes a EDF (Earliest Deadline First)-based scheduling algorithm to support the real-time features of the periodic traﬃc over switched Ethernet without any change in the principles of switched Ethernet. The proposed algorithm allows dynamic addition of new periodic messages during system running, which gives more ﬂexibility to designing the real-time systems.

1 Introduction Switched Ethernet has been the most widely used in data communications. It has many attractive features for real-time communications such as trafﬁc isolation, providing large bandwidth, and full-duplex links [1]. However, switches have some problems that have to be solved to support real-time features for industrial communications. The main disadvantage of switched Ethernet is the collision of messages at the output port. If two or more messages are destined to the same output port at the same time, it causes variable message delay in the output buﬀer or message loss in the case of buﬀer overﬂow, which aﬀects the timely delivery of real-time messages. This paper analyzes the schedulability condition for real-time periodic messages on a switched Ethernet where the switch does not require any modiﬁcation and only the end nodes handles messages according to the EDF policy. This paper also proposes a EDF-based scheduling algorithm to support the real-time features of the periodic traﬃc over the switched Ethernet. The transmission of the periodic messages is handled by a master node to guarantee the real-time communication. The master node determines which messages can be transmitted without violating the real-time requirements of industrial communication

314

M.-K. Kim and D.M. Cuong

by checking the schedulability condition, and makes a feasible schedule on demand. The rest of paper is organized as follow. In section 2, the previous works on industrial switched Ethernet and message transmission model on the switched Ethernet are described. Section 3 describes the schedulability condition for real-time periodic messages on the Switched Ethernet and an EDF-based scheduling algorithm for the real-time messages on the switched Ethernet is described in section 4. Section 5 concludes and summaries the paper.

2 Backgrounds 2.1 Related Works From the last few years, a large number of works have been done to analyze the applicability of switched Ethernet to industrial communication. The ﬁrst idea was using Network Calculus theory for evaluating the real-time performance of switched Ethernet networks. Network Calculus was introduced by Cruz [2, 3] and described a theory for obtaining delay bounds and buﬀer requirements. George et al. [4] paid attention to the architectures of switched Ethernet networks and presented a method to design them which aimed to minimize end-to-end delays by using Network Calculus theory. This method may be eﬀective when designing static applications to guarantee real-time performance of switched Ethernet networks. Another work has been done by Loser and Hartig [5], where they used the traﬃc shaper (smoother) to regulate the traﬃc entering switched Ethernet and to bound end-to-end delay based on Network Calculus theory reﬁned by Le Boudec and Thiran [6]. In this method, the traﬃc pattern must satisfy the burstiness constraints to guarantee the delay bounds to be met. Another way to support real-time communication is modifying the original switches to have extra functionality to provide more eﬃcient scheduling policy and admission control. Hoai Hoang et al. [7] attempted to support real-time communication of switched Ethernet by adding the real-time layer in both end nodes and a switch. Instead of using FIFO queuing, packets are queued in the order of deadline in the switch. A source node, before sending a real-time periodic message to the destination node, has to establish a real-time channel that is controlled by the switch for guaranteeing the deadline. The main contribution of this paper is proposing a schedulability condition for EDF scheduling of periodic messages and a EDF-based message scheduling algorithm to support hard real-time communication over switched Ethernet without any modiﬁcation in original switches, so they can be directly applied to industrial communications.

EDF Scheduling Algorithm for Periodic Messages On Switched Ethernet

315

2.2 Message Transmission Model We assume that the switched Ethernet operates in full-suplex mode, where each node is connected to the switch through a pair of links: a transmission link and a reception link as shown in Fig. 1-(a). Both the transmission and reception links operate independently and the switches use store-and-forward or cut-through switching modes for forwarding frames from the input ports to the output ports. In the cut-through switching which is widely used nowadays for fast delivery of frames, the switch decides the output port right after receiving the header of a frame and forwards the frame to the output port. When a switch uses the cut-through switching mode, if a message from node i to node j begins to be transmitted at time t0 , the ﬁrst bit of the message arrives at node j after TL = 2 ∗ tp + tsw amount of delay from t0 if there is no collision at the output port and the output queue is empty. This is shown in Fig. 1-(b). In the ﬁgure, tp is a propagation delay on a link between a node and the switch and tsw is the switching latency (destination port look-up and switch fabric set-up time) which depends on the switch vendor. Normally, tsw is about 11µs in a 100Mbps switch.

TLi TL1

TL2

RL2 RL1

TLk RLk

RLs TLs

Node 2 (a)

Node k

t

SMij SMij

tp

ts SMij

RLj Node 1

t

SMij

t0

t

TL=2*tp+ ts (b)

Figure 1. (a) Switched Ethernet and (b) message transmission on a switch

All the transmission links and reception links are slotted into a sequence of fundamental time units, called Elementary Cycles (ECs), which is similar to that of FTT-CAN [9]. Message transmission on the switched Ethernet is triggered by a master node which sends a TM (Trigger Message) at the beginning of every EC. An EC consists of a TM, a SMP (Synchronous Message Period) and a AMP (Asynchronous Message Period) as shown in Fig. 2-(a). But, in this paper, we only consider the scheduling of periodic messages, so for the simplicity of the analysis, we assume there is no AMP in each EC as shown in Fig. 2-(b). When using the EC in Fig. 2-(b), we have to increase the length of each periodic message by multiplying E/Eo for the correct analysis, because the length of the SMP is increased by multiplying E/Eo . The TM is used to synchronize all nodes and contains a schedule for the periodic messages that are transmitted on the respective EC. The SMP is the time duration for transmitting the real-time periodic messages in this EC, and the AMP is the time duration for transmitting aperiodic messages. In Fig. 2, LT M is the

316

M.-K. Kim and D.M. Cuong

length of trigger message and E = E − TL − LT M is the available time for transmitting messages on an EC.

Figure 2. Message transmission model

The operation of real-time communication network can be described as follows. Firstly, all the slave nodes send the real-time requirements to the o ) where SMij master node that are characterized by SMij (Dij , Pij , Oij , Cij o is the synchronous message from node i to node j and Dij , Pij , Oij , Cij is the deadline, period, initial oﬀset and the amount of data of SMij , respectively. As mentioned before, the amount of message SMij has to be modiﬁed for o * E/Eo . In addition, we assume the correct analysis as follows: Cij = Cij that all the Dij , Pij and Oij are the multiple integers of E and Pij = Dij . After receiving all the request frames from slaves, the master node checks the feasibility of the synchronous messages and broadcasts the result to all the slaves to indicate which messages can be transmitted over the switched Ethernet and meet their deadline. The real-time messages over the switched Ethernet are sorted in the increasing order of their deadline when they arrive at the slave nodes. Then at the beginning of every EC, the master node broadcasts the trigger message with scheduling information that announces a set of messages that can be transmitted at the respective EC.

3 Scheduability Condition on Switched Ethernet By applying EDF scheduling condition for preemptive tasks propsed by Liu and Layland [8], Pedreira et al [9] showed that a set of periodic messages is schedulable if Ci E − max(Ci ) (1) ≤ U= Pi E i where E is the available time to transmit periodic messages on a shared medium. For real-time communication over switched Ethernet, the transmission of periodic messages must be considered both on transmission links and on reception links. In our proposed scheduling algorithm, we consider the periodic

EDF Scheduling Algorithm for Periodic Messages On Switched Ethernet

317

messages in the order of deadline. U Ti and U Rj are the utilization of T Li and RLj such as U Ti =

Cij j

Pij

; j ∈ STi

U Rj =

Cij i

Pij

; j ∈ SRj

(2)

where STi is a set of nodes to which node i sends the messages and SRj is a set of nodes from which node j receives the messages. U Tmax,j are the maximum utilization of a set of transmission links that transmit messages to node j, such that (3) U Tmax,j = max(U Ti ) j ∈ SRj The main idea of our scheduling algorithm is that the periodic messages which are transmitted at each EC of transmission link must be able to be delivered completely at respective EC on reception links, that means meet their deadline. According to the schedulability condition (2) for the periodic messages on a transmission link i, we can make a schedule for the periodic messages that satisﬁes the following condition: Ti,n ≤ U Ti ∗ E + max(Cij )

(4)

where Ti,n is the total time for transmitting messages on T Li in nth EC. We call Tmax,i = U Ti ∗ E + max(Cij ) is the maximum total time to transmit messages on T Li at every EC For hard real-time communication, all the periodic messages have to meet their deadline in the worst-case. In our message transmission model, the worstcase situation occured on RLj when, cumulatively, i) all the periodic messages arrive at a reception link at the same time, and ii) the arrival time of the ﬁrst message on an EC on RLj is latest, which means there is a shortest time for transmitting messages on current EC.

Figure 3. Worst case situation on RLj

Fig. 3 shows the worst-case situation on RLj when the messages are transmitted over switched Ethernet. The ﬁgure also shows that by this scheduling

318

M.-K. Kim and D.M. Cuong

algorithm, the temporal constraints of periodic messages can be satisﬁed if we bound the total time to transmit message on reception link by Rmax,j and they are not aﬀected by the FIFO queue of original switches. Now we can express the worst-case situation occurs on RLj when i) the utilization of all transmission links that are transmitting messages to RLj are equal U Ti = U Tmax,j ∀i ∈ SRj

(5)

Tmax,i = U Tmax,j ∗ E + max(Cij ) ∀i ∈ SRj

(6)

which leads to

th

ii) all the messages arrive at RLj on the n EC at the latest time. As shown in Fig. 3, the latest ﬁnishing time of periodic messages on RLj is Tmax,i (∀i ∈ SRj ). So the latest arrival time of messages on RLj is ATmax,j = Tmax,i − min(Cij ) = U Tmax,j ∗ E + max(Cij ) − min(Cij )

(7)

when the size of all messages SMi j(∀i ∈ SRj ) is smallest. If this worst-case situation happens, the available time to transmit messages on RLj is Rmax,j = E − ATmax,j = E − U Tmax,j ∗ E − max(Cij ) + min(Cij ).

(8)

Because our proposed scheduling algorithm considers the messages in the order of deadline, we can apply equation (2) to analyze the schedulability of messages on the reception links. Thus, a set of periodic messages on RLj is scheduable if Rmax,j − max(Cij ) E by (10), we have U Rj ≤

If we replace Rmax,j U Rj ≤

E − U Tmax,j ∗ E − max(Cij ) + min(Cij ) − max(Cij ) E

(9)

(10)

which leads to E − 2 ∗ max(Cij ) + min(Cij ) (11) E Finally, we have the schedulability condition as follows. A given set of messages SMij from node i to node j are scheduable if U Rj + U Tmax,j ≤

U Ti + U R j ≤

E − 2 ∗ max(Cij ) + min(Cij ) ∀i, j E

(12)

EDF Scheduling Algorithm for Periodic Messages On Switched Ethernet

319

Initialization

EC_Trigger? yes

Broadcast TM

no

SMP_Trigger?

no

no

AMP_Trigger?

yes

yes

Delay until next AMP

no

More PM_req message? yes

New msg added? yes

no

Perform Scheduling alg.

Receive PM_req Check schedulability yes

Schedulable?

Send PM_rep (accept)

no

Send PM_rep (reject)

Figure 4. Processing ﬂow of master node

4 Message Scheduling Algorithm on Switched Ethernet The master node receives a new periodic message request from the slave nodes in AMP, and checks the schedulability condition and makes a transmission schedule for the message if it is schedulable. At the beginning of each EC, the master transmits a TM which includes the number of messages that each node can transmit in the EC. The processing ﬂow of the master node is shown in Fig. 4. The operation of the master is triggered by three events EC T rigger, SM T rigger, and AM T rigger which are enabled at the start of EC, SMP, and AMP, respectively. A new periodic message request P M req includes the real-time requirements of the periodic message. In response to the request, the master replies by P M rep(accept) or P M rep(reject) according to the scehulability check result. The master node can also play the role of a slave node. In that case, the master node performs the operations shown in Fig. 5 during SMP and AMP, respectively. Each slave node maintains a message buﬀer which contains periodic message instances allowed to transmit according to message deadlines When a slave node receives a TM, it interprets the message and transmits the number of messages speciﬁed in the TM. When there is a new periodic message to transmit, each slave node sends a new message request to the master in AMP. The processing ﬂow of a slave node is shown in Fig. 5. The proposed scheduling algorithm is described in Algorithm1. Initially, r carries the information of ready messages at the beginning of each EC. At line 9, the algorithm checks whether a ready message can be transmitted or not on current EC. If the condition is not satisﬁed, this message is delayed to

320

M.-K. Kim and D.M. Cuong

Figure 5. Processing ﬂow of slave node

next EC. Finally, the master node considers the initial oﬀset Oij of message SMij at line 15 when establishing the list of ready messages. // Algorithm1: EDF-based Scheduling Algorithm // N Ti,n : the number of messages that node i can transmit at nth EC 1. for (k = 1;k ≤ N ;k + +){rk,1 = 0;} 2. for(n = 0;n ≤ LCM (Pij );n + +){ 3. Ti,n = 0, N Ti,n = 0 for all i; Rj,n = 0 for all j; 4. {sort messages in increasing order of deadline}; 5. for (k = 1;k ≤ N ;k + +){ 6. rk,n+1 = rk,n ; 7. if (rk,n = 1) { 8. read SMij ; 9. if ((Ti,n + Cij ≤ Tmax,i ) and (Rj,n + Cij ) ≤ Rmax,j )) { 10. Ti,n = Ti,n + Cij ; 11. Rj,n = Rj,n + Cij ; 12. N Ti,n + +; rk,n+1 = 0; 13. } 14. } 15. if ((n-1) mod Pij /E = Oij ) rk,n+1 = 1; 16. } 17. }

Now we prove that when a new message SMij satisﬁes the condition (12) and is scheduled by this scheduling algorithm, this message can be transmitted completely within the same EC on both T Li and RLj , that means meet its

EDF Scheduling Algorithm for Periodic Messages On Switched Ethernet

321

deadline. The latest ﬁnishing time of SMij on T Li is Tmax,i , so the latest arrival time of SMij on RLj is ATmax,ij = Tmax,i − min(Cij ) = U Ti ∗ E + max(Cij ) − min(Cij )

(13)

when SMij has a smallest size. After arriving at RLj , in the worst-case situation, SMij may have to wait for other messages in the switch buﬀer before being transmitted. But the total delay of this message on RLj , according to scheduling algorithm, is bounded by Rmax,j = E − U Tmax,j ∗ E − max(Cij ) + min(Cij )

(14)

so the ﬁnishing time Fij,n of SMij on nth (current) EC will be Fij,n ≤ ATmax,ij + Rmax,j

(15)

Fij,n ≤ E − (U Tmax,j − U Ti )

(16)

which leads to

Because U Ti ≤ U Tmax,j Fij,n ≤ E

(17)

so SMij is transmitted on the same EC on both T Li and RLj ,that means it meets its deadline.

5 Conclusions and Future Work Real-time distributed control systems have been more widely used in industrial applications like process control, factory automation, vehicles and so on. In those applications, each task must be executed within a speciﬁed deadline, and also the communications between the tasks have to be completed within their deadlines to satisfy the real-time requirements of the tasks. Switched Ethernet which is the most widely used in the oﬃce has good operational features for real-time communications. The switched Ethernet, however, needs some mechanisms to regulate the traﬃc on the network in order to satisfy the hard real-time communication requirements of the industrial applications. In this paper, the EDF-based scheduling algorithm for hard real-time communication over switched Ethernet was proposed. With this scheduling algorithm, there is no need to modify the original principles of switches to support hard real-time communication in industrial environment. This paper also analyzed the schedulability condition for real-time periodic messages and showed that the proposed scheduling algorithm reﬂects correctly the feasibility condition of the periodic messages on the switched Ethernet. With our assumption that the changes in synchronous requirements is carried on the aperiodic messages, we will analyze the real-time features of aperiodic message as well as the level of ﬂexibility of the scheduling algorithm in the future.

322

M.-K. Kim and D.M. Cuong

Acknowledgment The authors would like to thank Ministry of Commerce, Industry and Energy and Ulsan Metropolitan City which partly supported this research through the Network-based Automation Research Center (NARC) at University of Ulsan.

References 1. K. Lee and S. Lee, Performance evaluation of switched Ethernet for real-time industrial communications, Comput. Stand. Interfaces, vol. 24, no. 5, pp. 411–23, Nov. 2002. 2. R. L. Cruz, A calculus for network delay Part I: Network elements in isolation, IEEE Trans. Inform. Theory, vol. 37, no. 1, pp. 114–131, Jan. 1991. 3. R. L. Cruz, A calculus for network delay Part II : Network analysis, IEEE Trans. Information Theory, vol. 37, no. 1, pp. 132–141, Jan. 1991. 4. J.-P. Georges, N. Krommenacker, T. Divoux, and E. Rondeau, A design process of switched Ethernet architectures according to real-time application constraints, Eng. Appl. of Artiﬁcial Intelligence, Volume 19, Issue 3, April 2006, pp 335–344 5. J. Loser and H. Hartig, Low-latency hard real-time communication over switched Ethernet, In Proc. 16th Euromicro Conf. Real-Time Systems, ECRTS 2004, pp. 13–22, July 2004. 6. J. Y. Le Boudec and P. Thiran, Network Calculus. Berlin, Germany: Springer Verlag, LNCS, July 2001, vol. 2050. 7. H. Hoang, M. Jonsson, U. Hagstrom, and A. Kallerdahl, Real-time Switched Ethernet with earliest deadline ﬁrst scheduling protocols and traﬃc handling, In Proc 10th Int. Workshop on Parallel and Distributed Real-Time Systems, FL, Apr. 2002. 8. C. L. Liu and J. W. Layland, Scheduling algorithms for multiprogramming in a hard real-time environment, J. ACM, vol. 20, no. 1, pp. 46–51, 1973. 9. L. Almeida, P. Pedreiras, and J. A. Fonseca, The FTT-CAN protocol: Why and how, IEEE Trans. Ind. Electron., vol. 49, no. 6, pp. 1189–1201, Dec. 2002.

Large-Scale Nonlinear Programming for Multi-scenario Optimization Carl D. Laird∗ and Lorenz T. Biegler Chemical Engineering Department, Carnegie Mellon University, Pittsburgh, PA 15213 [email protected]

Abstract Multi-scenario optimization is a convenient way to formulate design optimization problems that are tolerant to disturbances and model uncertainties and/or need to operate under a variety of diﬀerent conditions. Moreover, this problem class is often an essential tool to deal with semi-inﬁnite problems. Here we adapt the IPOPT barrier nonlinear programming algorithm to provide eﬃcient parallel solution of multi-scenario problems. The recently developed object oriented software, IPOPT 3.1, has been speciﬁcally designed to allow specialized linear algebra in order to exploit problem speciﬁc structure. Here, we discuss the high level design principles of IPOPT 3.1 and develop a parallel Schur complement decomposition approach for large-scale multi-scenario optimization problems. A large-scale example for contaminant source inversion in municipal water distribution systems is used to demonstrate the eﬀectiveness of this approach, and parallel results with up to 32 processors are shown for an optimization problem with over a million variables.

1 Introduction This study deals with development of specialized nonlinear programming algorithms for large, structured systems. Such problems often arise in the optimal design, planning and control of systems described by nonlinear models, and where the model structure is repeated through the discretization of time, space or uncertainty distributions. We focus on the important application of optimal design with unknown information. Here the goal is to incorporate the eﬀects of variable process inputs and uncertain parameters at the design stage. For general nonlinear models, many approaches rely on the solution of multiscenario problems. To develop the multi-scenario formulation, we consider optimal design problems with two types of unknown information [13, 15]. First, we consider uncertainty, i.e., what is not known well. This includes model parameters (kinetic and transport coeﬃcients, etc.) as well as unmeasured ∗

Carl D. Laird ([email protected]) is currently with the Department of Chemical Engineering, Texas A&M University, College Station, TX 77843-3122.

324

C.D. Laird and L.T. Biegler

and unobservable disturbances (e.g., ambient conditions). Second, we could also incorporate process variability (process parameters that are subject to change, but measurable at run time) which can be compensated by operating variables. Examples of variability include feed ﬂow rates, changing process conditions, product demands and measured process disturbances. While nonlinear multi-scenario optimization formulations can be solved directly with general purpose NLP solvers, the problem size can easily become intractable with these oﬀ-the-shelf tools. Traditionally, large-scale structured optimization problems like this one have been handled by specialized problem level decomposition algorithms. In contrast, this study develops the concept of internal linear decomposition for a particular NLP algorithm, IPOPT. The dominant computational expense in this algorithm is the solution of a large linear system at each iteration. With the internal linear decomposition approach, the fundamental interior point algorithm is not altered, but the mathematical operations performed by the algorithm are made aware of the problem structure. Therefore, we can develop decomposition approaches for these mathematical operations that exploit the induced problem structure, preserving the desirable convergence properties of the overall NLP algorithm. Similar concepts have also been advanced by Gondzio and coworkers [9, 10], primarily for linear, quadratic, and convex programming problems. In this work, we exploit the structure of large multi-scenario problems with a parallel Schur complement decomposition strategy and eﬃciently solve the problems on a distributed cluster. In the next section we provide a general statement of the optimal design problem with uncertainty and variability, along with the derivation of multi-scenario formulations to deal with this task. Section 3 then reviews Newton-based barrier methods and discusses their adaptation to multiscenario problems. Section 4 presents the high level design of the IPOPT 3.1 software package and describes how the design enables development of internal linear decomposition approaches without changes to the fundamental algorithm code. The parallel Schur complement decomposition is implemented within this framework. This approach is demonstrated in Section 5 for a source inversion problem in municipal water networks with uncertain demands. Scaling results are shown for a parallel implementation of the multiscenario algorithm with up to 32 processors. Section 6 then concludes the paper and discusses areas for future work.

2 Optimization Formulations Optimal design with unknown information can be posed as a stochastic optimization problem that includes varying process parameters and uncertain model parameters. For generality, we write the optimization problem with unknown input parameters as:

NLP Algorithm for Multi-scenario Optimization

min

d,z,y

325

Eθ∈Θ [P (d, z, y, θv , θu )

s.t. h(d , z , y, θv , θu ) = 0 , g(d, z, y, θv , θu ) ≤ 0]

(1)

where Eθ∈Θ is the expected value operator and Θ = Θv ∪ Θu is the space of unknown parameters, with Θv containing the “variability” parameters that are measurable at run time and Θu containing the “uncertainty” parameters. In addition, d ∈ Rnd are the design variables, z ∈ Rnz are the control variables, y ∈ Rny are the state variables, and models are represented by the inequality and equality constraints, g : Rnd +nz +ny → Rm and h : Rnd +nz +ny → Rny , respectively. To develop the multi-scenario formulation, we select discrete points from Θ. We deﬁne index set K for discrete values of the varying process parameters θkv ∈ Θv , k ∈ K, and deﬁne index set I for discrete values of the uncertain model parameters, θiu ∈ Θu , i ∈ I. Both sets of points can be chosen by performing a quadrature for the expectation operator in (1). We assume that the control variables z can be used to compensate for the measured process variability, θv , but not the uncertainty associated with model parameters, θu . Thus, the control variables are indexed over k in the multi-scenario design problem, while the state variables, y, determined by the equality constraints, are indexed over i and k. With these assumptions, the multi-scenario design problem is given by: ωik fik (d, zk , yik , θkv , θiu ) min P = f0 (d) + d,z,y

s.t.

k∈K i∈I ) hik (d, zk , yik , θkv , θiu ) = 0, k gik (d, zk , yik , θkv , θiu ) ≤ 0

∈ K, i ∈ I

(2)

While d from the solution vector of (2) may provide a reasonable approximation to the solution of (1), additional tests are usually required to ensure constraint feasibility for all θ ∈ Θ. These feasibility tests require the global solution of nested optimization problems and remain challenging for general problem classes. Moreover, they may themselves require the eﬃcient solution of multi-scenario problems [13, 14]. Detailed discussion of feasibility tests is beyond the scope of this study and the reader is directed to [5] for further information.

3 NLP Solution Algorithm In principle, multi-scenario problems, such as (2), can be solved directly with general purpose NLP algorithms, but specialized solution approaches are necessary for large problems. SQP-based strategies have been developed that exploit the structure of these problems [4, 16]. In this study, we present an improved multi-scenario strategy based on a recently developed, primal-dual barrier NLP method called IPOPT.

326

C.D. Laird and L.T. Biegler

IPOPT [17] applies a Newton strategy to the optimality conditions that result from a primal-dual barrier problem, and adopts a novel ﬁlter based line search strategy. Under mild assumptions, the IPOPT algorithm has global and superlinear convergence properties. Originally developed in FORTRAN, the IPOPT algorithm was recently redesigned to allow for structure dependent specialization of the fundamental linear algebra operations. This new package is implemented in C++ and is freely available through the COIN-OR foundation from the following website: http://projects.coin-or.org/Ipopt. The key step in the IPOPT algorithm is the solution of linear systems derived from the linearization of the ﬁrst order optimality conditions (in primal-dual form) of a barrier subproblem. More information on the algorithm and analysis is given in [17]. Here, we derive the structured form of these linear systems for the multi-scenario optimization problem and present a specialized decomposition for their solution. To simplify the derivation, we consider problem (2) with only a single set of discrete values for the uncertain parameters, replacing i ∈ I and k ∈ K with the indices q ∈ Q. This simpliﬁcation forces a discretization of z over both of the sets I and K instead of K alone, and constraints can be added to enforce equality of the z variables across each index of I. A more eﬃcient nested decomposition approach that recognizes the structure resulting from the two types of unknown information can be formulated and will be the subject of future work. By adding appropriate slack variables sq ≥0 to the inequalities, and by deﬁning linking variables and equations in each of the individual scenarios, we write a generalized form of the multi-scenario problem as: fq (xq ) min xq ,d

s.t.

q∈Q

cq (xq ) = 0, Sq xq ≥ 0, ¯qd = 0 Dq xq − D

⎫ ⎬ ⎭

q∈Q

(3)

¯ q and Sq matrices extract suitwhere xTq = [zqT yqT sTq dTq ] and the Dq , D able components of the xq vector to deal with linking variables and variable bounds, respectively. If all of the scenarios have the same structures, we can ¯ q = I (where | · | indicates the set S1 = · · · =S|Q| , Dq = [0 | 0 | 0 | I] and D cardinality of the set). On the other hand, indexing these matrices also allows ¯ q may not even scenarios with heterogeneous structures to be used where D be square. Using a barrier formulation, this problem can be converted to: {fq (xq ) − µ ln[(Sq xq )(j) ]} min xq ,d

s.t.

q∈Q

cq (xq ) = 0, ¯qd = 0 Dq xq − D

j

) q∈Q

(4)

NLP Algorithm for Multi-scenario Optimization

327

where indices j correspond to scalar elements of the vector (Sq xq ). Deﬁning the Lagrange function of the barrier problem (4), L(x, λ, σ, d) = =

L¯q (xq , λq , σq , d)

q∈Q

{Lq (xq , λq , σq , d) − µ

q∈Q

=

q∈Q

{fq (xq )−µ

ln[(Sq xq )(j) ]}

(5)

j (j)

ln[(Sq xq )

1

2T

¯qd ] + cq (xq ) λq + Dq xq −D T

σq }

j

with the multipliers λq and σq . Deﬁning Gq =diag(Sq xq ) leads to the primal dual form of the ﬁrst order optimality conditions for this equality constrained problem, written as: ⎫ ∇xq fq (xq ) + ∇xq cq (xq )λq + DqT σq − SqT νq = 0 ⎪ ⎪ ⎬ cq (xq ) = 0 q∈Q (6) ¯ Dq xq − Dq d = 0 ⎪ ⎪ ⎭ Gq νq − µe = 0 ¯ qT σq = 0 − D q∈Q

where we deﬁne eT = [1, 1, . . . , 1]. Writing the Newton step for (6) at iteration " leads to: ⎫ ∇xq xq Lq ∆xq + ∇xq cq ∆λq + DqT ∆σq − SqT ∆νq = −(∇xq Lq − SqT νq ) ⎪ ⎪ ⎬ ∇xq cq ∆xq = −cq q ∈ Q (7) ¯ ¯ Dq ∆xq − Dq ∆d = −Dq xq + Dq d ⎪ ⎪ ⎭ Vq Sq ∆xq + Gq ∆νq = µe − Gq νq T T ¯ q ∆σq = ¯ q σq − D D q∈Q

q∈Q

where the superscript " indicates that the quantity is evaluated at the point (xq , λq , σq , νq , d ). Eliminating ∆νq from the resulting linear equation gives the primal-dual augmented system ⎫ Hq ∆xq + ∇xq cq ∆λq + DqT ∆σq = − ∇xq L¯q ⎬ ∇xq cq ∆xq = −cq q∈Q (8) ¯ q ∆d = −Dq xq + D ¯ q d ⎭ Dq ∆xq − D ¯ qT ∆σq = ¯ qT σq D D − q∈Q

q∈Q

where Hq = ∇xq xq Lq +SqT (Gq )−1 Vq Sq , and Vq = diag(νq ). According to the IPOPT algorithm [17], the linear system (8) is modiﬁed as necessary by adding diagonal terms. Diagonal elements are added to the block Hessian terms in the augmented system to handle nonconvexities and to the lower right corner in each block to handle temporary dependencies in the constraints. The linear system (8), with these modiﬁcations, can be written with a block bordered diagonal (arrowhead) structure given by:

328

C.D. Laird and L.T. Biegler

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

W1 W2 W3 ..

.

WN AT1 AT2 AT3 · · · ATN

A1 A2 A3 .. .

⎤ ⎡

u1 u2 u3 .. .

⎤

⎡

r1 r2 r3 .. .

⎤

⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥=⎢ ⎥·⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎣ ⎣ ⎦ ⎦ AN uN rN ⎦ ∆d rd δ1 I

(9)

¯ q d )T ], uT = [∆xT ∆λT ∆σ T ], where rqT = −[(∇xq Lq )T , (cq )T , (Dq xq − D q q q q T T ¯ Aq = [ 0 0 − Dq ], ⎤ Hq +δ1 I ∇xq cq DqT Wq = ⎣ (∇xq cq )T −δ2 I 0 ⎦ 0 −δ2 I Dq ⎡

¯ T σ . for q ∈ Q, and rd = q∈Q D q q The IPOPT algorithm requires the solution of the augmented system (9), at each iteration. IPOPT also requires the inertia (the number of positive and negative eigenvalues), limiting the available linear solvers that can be used eﬃciently. Conceptually, (9) can be solved with any serial direct linear solver conﬁgured with IPOPT. However, as the problem size grows the time and memory requirements can make this approach intractable. Instead, applying a Schur complement decomposition allows an eﬃcient parallel solution technique. Eliminating each Wq from (9) we get the following expression for ∆d, ⎡ ⎤ ⎣δ1 I − ATq (Wq )−1 Aq ⎦ ∆d = rd − ATq (Wq )−1 rq (10) q∈Q

q∈Q

which requires forming the Schur complement, B = δ1 I − q∈Q ATq (Wq )−1 Aq , and solving this dense symmetric linear system for ∆d. Once a value for ∆d is known, the remaining variables can be found by solving the following system, Wq uq = rq − Aq ∆d

(11)

for each q ∈ Q. This approach is formally described by the following algorithm. ⎤ ⎡ 1. Form the Schur-Complement, B= ⎣δ1 I − ATq (Wq )−1 Aq ⎦ q∈Q

initialize B=δ1 I for each period q ∈ Q for each column j=1..M of Aq Solve the linear system Wq pj = A<j> for pj q Add to column j of the Schur Complement, B <j> =B <j> −ATq pj

NLP Algorithm for Multi-scenario Optimization

2. Solve the Schur complement system, B∆d=rd −

329

ATq (Wq )−1 rq , for ∆d

q∈Q

initialize r¯=rd for each period q ∈ Q Solve the linear system, Wq p = rq for p Add to the right hand side, r¯ = r¯ − ATq p Solve the dense linear system, B∆d = r¯ for ∆d 3. Solve for remaining variables for each period q ∈ Q Solve the linear system, Wq uq = rq −Aq ∆d for uq This decomposition strategy applies speciﬁcally to the solution of the augmented system within the overall IPOPT algorithm and simply replaces the default linear solver. The sequence of steps in the overall IPOPT algorithm are not altered, and as such, this specialized strategy inherits all of the convergence properties of the IPOPT algorithm for general purpose nonlinear programs [17]. As discussed in [17], the method is globally and superlinearly convergent under mild assumptions. Furthermore, this decomposition strategy is straightforward to parallelize with excellent scaling properties. Let N =|Q|, the number of scenarios, and M =dim(d), the number of common or design variables in the problem. The number of linear solves of the Wq blocks required by the decomposition approach is N · M + 2N . If the number of available processors in a distributed cluster is equal to N (one processor for each scenario), then the number of linear solves required by each processor is only M + 2, independent of the number of scenarios. This implies an approach that scales well with the number of scenarios. As we increase the number of scenarios under consideration, the cost of the linear solve remains fairly constant (with minimal communication overhead) as long as an additional processor is available for each new scenario. More importantly, the memory required on each processor is also nearly constant, allowing us to expand the number of scenarios and, using a large distributed cluster, move beyond the memory limitation of a standard single processor machine. The eﬃcient use of a distributed cluster to solve large problems that were previously not possible with a single standard machine is a major driving force of this work.

4 Implementation of Internal Linear Decomposition The Schur Complement algorithm described above is well-known. Nevertheless, the implementation of this linear decomposition in most existing NLP software requires a nontrivial modiﬁcation of the code. In many numerical codes, the particular data structures used for storing vectors and matrices are exposed to the fundamental algorithm code. With this design it is straightforward to perform any necessary mathematical operations eﬃciently within the

330

C.D. Laird and L.T. Biegler

algorithm code. However, changing the underlying data representation (e.g. storing a vector in block form across a distributed cluster instead of storing it as a dense array) requires that the algorithm code be altered every place it has access to the individual elements of these vectors or matrices. Developing algorithms that exploit problem speciﬁc structure through internal linear decomposition requires the use of eﬃcient (and possibly distributed) data structures that inherently represent the structure of the problem. As well, it requries the implementation of mathematical operations that can eﬃciently exploit this structure. If the fundamental algorithm code is intimately aware of the underlying data representation (primarily of vectors and matrices) then altering that representation for a particular problem structure can require a signiﬁcant refactoring of the code. In the recent redesign of IPOPT, special care was taken to separate the fundamental algorithm code from the underlying data representations. The high level structure of IPOPT is described in Figure 1. The fundamental algorithm code communicates with the problem speciﬁcation through a welldeﬁned NLP interface. Moreover, the fundamental algorithm code is never allowed access to individual elements in vectors or matrices and is purposely unaware of the underlying data structures within these objects. It can only perform operations on these objects through various linear algebra interfaces. While the algorithm is independent of the underlying data structure, the NLP implementation needs to have access to the internal representation so it can ﬁll the necessary data (e.g. specify the values of Jacobian entries). Therefore, the NLP implementation is aware of the particular linear algebra implementation, but only returns interface pointers to the fundamental algorithm code. The IPOPT package comes with a default linear algebra representation (using dense arrays for vectors and a sparse structure for matrices) and a default set of NLP interfaces. However, this design allows the data representations and mathematical operations to be modiﬁed for a particular problem structure without changes to the fundamental algorithm code. Similar ideas have also been used in the design of reduced-space SQP codes, particularly for problems constrained by partial diﬀerential equations [1–3]. In this work, we tested the redesigned IPOPT framework by implementing the Schur complement decomposition approach for the multi-scenario design problem. This implementation makes use of the Message Passing Interface (MPI) to allow parallel execution on a distributed cluster. The implementation uses the composite design pattern and implements a composite NLP that forms the overall multi-scenario problem by combining individual speciﬁcations for each scenario. This implementation has also been interfaced to AMPL [8], allowing the entire problem to be speciﬁed using individual AMPL models for each scenario and AMPL suﬃxes to describe the connectivity (im¯ q , and Sq matrices). This allows the formulation of plicitly deﬁning the Dq , D large multi-scenario problems with relative ease. Furthermore, when solving the problem in parallel, each processor only evaluates functions for its own

NLP Algorithm for Multi-scenario Optimization

331

Figure 1. Redesigned IPOPT structure, allowing for specialized linear algebra

scenarios, allowing distribution of data and parallelization of these computations across processors. Parallel implementations for vectors and matrices have also been developed that distribute individual blocks across processors. All the necessary vector operations (e.g. BLAS operations, etc.) have been implemented for efﬁcient calculation in parallel. Finally, a distributed solver has been written for the augmented system that uses a parallel version of the algorithm described in the previous section. This distributed solver uses a separate linear solver instance for the solution of each of the Wq blocks (and can use any of the linear solvers already interfaced with IPOPT). This separation allows solution of heterogeneous multi-scenario problems where the individual scenarios may have diﬀerent structures. Finally, the distributed solver calls LAPACK routines for the dense linear solve of the Schur complement.

5 Source Detection Application and Results To illustrate these concepts on a large-scale application, we consider the determination of contamination sources in large municipal water distribution systems. Models for these can be represented by a network where edges represent pipes, pumps, or valves, and nodes represent junctions, tanks, or reservoirs. Assuming that contaminant can be injected at any network node, the goal of this source inversion approach is to use information from a limited number of sensors to calculate injection times and locations. Identifying the contamination source enables security or utilities personnel to stop the contamination and propose eﬀective containment and cleanup strategies. This problem can be formulated as a dynamic optimization problem which seeks to ﬁnd the unknown injection proﬁles (at every network node) that

332

C.D. Laird and L.T. Biegler

minimize the least-squares error between the measured and calculated network concentrations. For the water quality model, the pipes are modeled with partial diﬀerential equations in time and space, and the network nodes are modeled with dynamic mass balances that assume complete mixing. This produces an inﬁnite dimensional optimization problem that can be discretized to form a large-scale algebraic problem. A naive approach requires a discretization in both time and space due to the constraints from the pipe model. Instead, following the common assumption of plug ﬂow in the main distribution lines, we developed an origin tracking algorithm which uses the network ﬂow values to precalculate the time delays across each of the network pipes. This leads to a set of algebraic expressions that describe the time-dependent relationship between the ends of each of the pipes. The discretized dynamic optimization problem was solved using IPOPT [17]. This large scale problem had over 210000 variables and 45000 degrees of freedom. Nevertheless, solutions were possible in under 2 CPU minutes on a 1.8 GHz Pentium 4 machine and the approach was very successful at determining the contamination source [11]. Furthermore, the approach has been extended to address very large networks by formulating the problem on a small window or subdomain of the entire network [12]. In previous work, we assumed that the network ﬂows were known. In this work, we assume that the network ﬂows are not known exactly, but can only be loosely characterized based on a demand generation model. Following the approach of Buchberger and coworkers, we use the PRPSym software [6,7], to generate reasonable residential water demands. Varying parameters in the Poisson Rectangular Pulse (PRP) model allows us to generate numerous scenarios for the water demands. To account for uncertain water demands in the source inversion, we formulate a multi-scenario problem over a set of possible ﬂow scenarios as, ) ρ T T m mq [cq − c ] Ωq [cq − c ] + min pq ,cq ,mq ,m ¯ |Q| q q∈Q ⎫ s.t. ϕq (pq , cq , mq ) = 0, ⎬ mq ≥ 0, q∈Q (12) ⎭ mq − m ¯ =0 where q ∈ Q represents the set of possible realizations or scenarios, c are the measured concentrations at the sensor nodes (a subset of the entire domain), cq and mq are the vectors of calculated concentrations and injection variables for scenario q at each of the nodes (both discretized over all nodes in the subdomain and all timesteps). The vector of pipe inlet and outlet concentrations for scenario q are given by pq and are discretized over all pipes in the subdomain and all timesteps. The weighting matrix Ωq and the discretized model equations ϕq (pq , cq , mq ) are discussed in more detail in previous publications [11, 12]. The variables m ¯ are the aggregated injection proﬁles that are common across each of the scenarios.

NLP Algorithm for Multi-scenario Optimization

333

To demonstrate the multi-scenario decomposition approach on this problem, we selected a real municipal network model with 400 nodes and 50 randomly placed sensors. We then restricted ourselves to a 100 node subdomain of this model. Three variable parameters in the PRP model were assumed to be unknown, but with reasonable bounds shown in Table 1. Assuming uniform Uncertain Parameter Lower Bound Upper Bound Avg. Demand Duration (min) 0.5 2.0 Avg. Demand Intensity (gpm) 1.0 4.0 Total Base Demand (gpm) 500 700 Table 1. Range of parameters used in the PRP model in order to generate reasonable ﬂow patterns for the distribution system. Diﬀerent scenarios were formulated by randomly selecting values assuming uniform distributions. The total base demand is the sum of the base demands over all the nodes in the network

distributions for the PRP parameters, random values were used to produce the “true” water demands and 32 diﬀerent possible realizations of the uncertain water demands. Using the “true” demands, a pulse contamination injection was simulated from one of the nodes in the network to generate time proﬁles for the concentration measurements. Hydraulic simulations were performed for each of the 32 realizations to calculate the network ﬂows. We formulate the optimization problem with a 6 hour time horizon and 5 minute integration timesteps. This generates an individual scenario with 36168 variables. Across scenarios, we aggregate the common variables and discretize m ¯ over 1 hour timesteps. This gives 600 common variables, one for each node across each of the 6 aggregate time discretizations. We then test the scalability of our parallel decomposition implementation by formulating multi-scenario optimization problems with 2 to 32 scenarios and solve the problems using 16 nodes of our in-house Beowulf cluster. Each node has 1.5 GB of RAM and dual 1 GHz processors. Timing results for these runs are shown in Figure 2 where the number of scenarios is shown on the abscissa, and the wall clock time (in seconds) is shown on the left ordinate. It is important to note that for each additional scenario, an additional processor was used (i.e. 2 processors were used for the 2 scenario formulation and 16 for the 16 scenario formulation, etc). The right ordinate shows the total number of variables in the problem. The timing results for 2 to 16 processors show nearly perfect scaleup where the additional time required as we add scenarios (and processors) is minimal (the same is true from 17 to 32 scenarios). Tests were performed on a 16 node cluster and the jump in time as we switch from 16 to 17 scenarios corresponds to the point where both processors on a single machine were ﬁrst utilized. When two processors on the same machine are each forming their own local contribution to the Schur complement, the process appears

334

C.D. Laird and L.T. Biegler

Figure 2. Timing Results for the Multi-Scenario Problem with 600 Common Variables: This ﬁgure shows the scalability results of the parallel interior-point implementation on the multi-scenario problem. The number of processors used was equal to the number of scenarios in the formulation. The total number of variables in the problem is shown with the secondary axis

to be limited by the memory bandwidth of the dual processor machine. This observation, coupled with the scaleup results in Figure 2, demonstrates that the approach scales well and remains eﬀective as we increase the number of nodes and scenarios. Furthermore, it implies that the distributed cluster model is appropriate for this problem and that the scaling of communication overhead is quite reasonable.

6 Conclusions This study deals with the formulation and eﬃcient solution of multiscenario optimization problems that often arise in the optimal design of systems with unknown information. Discretizing the uncertainty sets leads to large multi-scenario optimization problems, often with few common variables. For the solution of these problems we consider the barrier NLP algorithm, IPOPT, and have developed an eﬃcient parallel Schur complement approach that exploits the block bordered structure of the KKT matrix. The formulation and implementation is demonstrated on a large-scale multi-scenario problem with over 30000 variables in each block and 600 common variables linking the blocks. Testing up to 32 scenarios, we observe nearly perfect scaleup with additional scenarios using a distributed Beowulf cluster. Furthermore, this implementation is easily facilitated by the software structure of the redesigned IPOPT code, because of the separation of the fundamental algorithm code and the linear algebra code. The MPI implementation of the parallel Schur complement solver and the parallel vector and

NLP Algorithm for Multi-scenario Optimization

335

matrix classes are possible without any changes to the fundamental algorithm code. Finally, this implementation has been interfaced with the AMPL [8] modeling language to allow straightforward speciﬁcation of the multi-scenario problem. Individual scenarios can be speciﬁed as AMPL models of their own, with the connectivity described using AMPL suﬃxes. This easily allows the development of both homogeneous and heterogeneous problem formulations. The decomposition presented in this work was formulated using a single discretized set for the unknown parameters. A formulation which explicitly includes both forms of unknown information, uncertainty and variability, leads to a nested block bordered structure in the KKT system. Developing a recursive decomposition strategy for problems of this type will be the subject of future work. Also, while the motivation for this work was the multi-scenario optimization problem arising from design under uncertainty, other problems produce similar KKT systems. Large-scale nonlinear parameter estimation problems have a similar structure with an optimization of large models over many data sets where the unknown parameters are the common variables. This problem and others are excellent candidates for this solution approach. Acknowledgments Funding from the National Science Foundation (under grants ITR/AP0121667 and CTS-0438279) is gratefully acknowledged.

References 1. Bartlett, R. A. (2001). New Object-Oriented Approaches to Large-Scale Nonlinear Programming for Process Systems Engineering, Ph.D. Thesis, Chemical Engineering Department, Carnegie Mellon University. 2. Bartlett, R. A. and van Bloemen Waanders, B. G. (2002). A New Linear Algebra Interface for Eﬃcient Development of Complex Algorithms Independent of Computer Architecture and Data Mapping, Technical Report, Sandia National Laboratories, Albuquerque, NM. 3. Bartlett, R. A. (2002). rSQP++, An Object-Oriented Framework for Reduced Space Successive Quadratic Programming, Technical Report, Sandia National Laboratories, Albuquerque, NM. 4. Bhatia, T. and Biegler, L. (1999). Multiperiod design and planning with interior point methods, Comp. Chem. Eng. 23(7): 919–932. 5. Biegler, L. T., Grossmann, I. E., and Westerberg, A. W. (1997). Systematic Methods of Chemical Process Design, Prentice-Hall, Upper Saddle River, NJ. 6. Buchberger, S. G. and Wells, G. J. (1996). Intensity, duration and frequency of residential water demands, Journal of Water Resources Planning and Management, ASCE, 122(1):11-19. 7. Buchberger, S. G. and Wu, L. (1995). A model for instantaneous residential water demands. Journal of Hydraulic Engineering, ASCE, 121(3):232-246. 8. Fourer, R., Gay, D. M., and Kernighan, B. W. (1992). AMPL: A Modeling Language for Mathematical Programming. Belmont, CA: Duxbury Press.

336

C.D. Laird and L.T. Biegler

9. Gondzio, J. and Grothey, A. (2004). Exploiting Structure in Parallel Implementation of Interior Point Methods for Optimization, Technical Report MS-04-004, School of Mathematics, The University of Edinburgh. 10. Gondzio, J. and Grothey, A. (2006). Solving Nonlinear Financial Planning Problems with 109 Decision Variables on Massively Parallel Architectures, Technical Report MS-06-002, School of Mathematics, The University of Edinburgh. 11. Laird, C. D., Biegler, L. T., van Bloemen Waanders, B., and Bartlett, R. A. (2005). Time Dependent Contaminant Source Determination for Municipal Water Networks Using Large Scale Optimization, ASCE Journal of Water Resource Management and Planning, 131, 2, p. 125. 12. Laird, C. D., Biegler, L. T. and van Bloemen Waanders, B. (2007). Real-time, Large Scale Optimization of Water Network Systems using a Subdomain Approach, in Real-Time PDE-Constrained Optimization, SIAM, Philadelphia. 13. Ostrovsky, G. M., Datskov, I. V., Achenie, L. E. K., Volin, Yu.M. (2003) Process uncertainty: the case of insuﬃcient process data at the operation stage. AIChE Journal 49, 1216-1240. 14. Ostrovsky, G., Volin, Y. M. and Senyavin, N. M. (1997). An approach to solving a two-stage optimization problem under uncertainty, Comp. Chem. Eng. 21(3): 317. 15. Rooney, W. and Biegler, L. (2003). Optimal Process Design with Model Parameter Uncertainty and Process Variability, Nonlinear conﬁdence regions for design under uncertainty, AIChE Journal, 49(2), 438. 16. Varvarezos, D., Biegler, L. and Grossmann, I. (1994). Multi-period design optimization with SQP decomposition, Comp. Chem. Eng. 18(7): 579–595. 17. W¨ achter, A., and Biegler, L. T. (2006). On the Implementation of an Interior Point Filter Line Search Algorithm for Large-Scale Nonlinear Programming, Mathematical Programming, 106(1), 25-57.

On the Eﬃciency of Python for High-Performance Computing: A Case Study Involving Stencil Updates for Partial Diﬀerential Equations Hans Petter Langtangen1,2 and Xing Cai1,2 1 2

Simula Research Laboratory, P.O. Box 134, N-1325 Lysaker, Norway {hpl,xingca}@simula.no Department of Informatics, University of Oslo, P.O. Box 1080, Blindern, N-0316 Oslo, Norway

Abstract The purpose of this paper is to assess the loss of computational eﬃciency that may occur when scientiﬁc codes are written in the Python programming language instead of Fortran or C. Our test problems concern the application of a seven-point ﬁnite stencil for a three-dimensional, variable coeﬃcient, Laplace operator. This type of computation appears in lots of codes solving partial diﬀerential equations, and the variable coeﬃcient is a key ingredient to capture the arithmetic complexity of stencils arising in advanced multi-physics problems in heterogeneous media. Diﬀerent implementations of the stencil operation are described: pure Python loops over Python arrays, Psyco-acceleration of pure Python loops, vectorized loops (via shifted slice expressions), inline C++ code (via Weave), and migration of stencil loops to Fortran 77 (via F2py) and C. The performance of these implementations are compared against codes written entirely in Fortran 77 and C. We observe that decent performance is obtained with vectorization or migration of loops to compiled code. Vectorized loops run between two and ﬁve times slower than the pure Fortran and C codes. Mixed-language implementations, Python-Fortran and Python-C, where only the loops are implemented in Fortran or C, run at the same speed as the pure Fortran and C codes. At present, there are three alternative (and to some extent competing) implementations of Numerical Python: numpy, numarray, and Numeric. Our tests uncover signiﬁcant performance diﬀerences between these three alternatives. Numeric is fastest on scalar operations with array indexing, while numpy is fastest on vectorized operations with array slices. We also present parallel versions of the stencil operations, where the loops are migrated to C for eﬃciency, and where the message passing statements are written in Python, using the high-level pypar interface to MPI. For the current test problems, there are hardly any eﬃciency loss by doing the message passing in Python. Moreover, adopting the Python interface of MPI gives a more elegant parallel

338

H.P. Langtangen and X. Cai

implementation, both due to a simpler syntax of MPI calls and due to the eﬃcient array slicing functionality that comes with Numerical Python.

1 Introduction The Python language has received signiﬁcant attention in the scientiﬁc computing community over the last few years. Many research projects have experienced the gain in using Python as a scripting language, i.e., to administer simulation, data analysis, and visualization tasks as well as archiving simulation results [5]. For this purpose, shell languages were used in the past, but Python serves as a much more powerful and advanced programming language with comprehensive standard libraries. For example, built-in heterogeneous lists and hash maps can conveniently hold data of various kinds; the strong support for text manipulation comes handy for interpreting data formats; and graphical or web-based user interfaces are well supported. Such features make it easy to bring new life to old, less user-friendly code, by gluing applications together and equipping them with modern interfaces. Important features of Python, compared to competing scripting languages like Perl and Ruby, is the very clean syntax and the wide range of add-on modules for numerical computations and graphics. Although Python without doubt has demonstrated a great potential as a scripting language in scientiﬁc computing, Python also has a potential as a main language for writing new numerical codes. Such codes are traditionally implemented in compiled languages, such as Fortran, C, or C++. Python is known to be very much slower than compiled languages and may hence seem unsuitable for numerical computations. However, most of a scientiﬁc code deals with user interfaces (ﬁles, command line, web, graphical windows), I/O management, data analysis, visualization, ﬁle and directory manipulation, and report generation, to mention some items for which Python is much better suited than the traditional compiled languages. The reason is that Python has more support for such tasks, and the resulting code is much more compact and convenient to write. Usually, only a small portion of the total code volume deals with intensive numerics where high performance matters. This paper addresses how to write these number crunching snippets when Python is chosen to be the main language of a scientiﬁc code. A lot of computational scientists have moved from compiled languages to Matlab. Despite the chance for experiencing decreased performance, scientists choose Matlab because it is a more productive and convenient computing environment. The recent popularity of Python is a part of the same trend. Python shares many of Matlab’s features: a simple and clean syntax of the command language; integration of simulation and visualization; interactive execution of commands, with immediate feedback; lots of built-in functions operating eﬃciently on arrays in compiled code; satisfactory performance of everyday operations on today’s computers; and good documentation and support.

Python for High-Performance Computing

339

In addition, Python is a full-ﬂedged programming language that supports all major programming styles (procedural, functional, object-oriented, and generic programming). Simple programs are concise, as in Matlab, while the concepts of classes, modules, and packages are available to users for developing huge codes. Many widely used Java-type tools for documentation in source code ﬁles and testing frameworks are also mirrored in the Python world. Packing and distributing Python applications are well supported by several emerging tools. We should also mention that Python code can easily be made truly cross-platform, even when graphical user interfaces and operating system interaction are present. To all these advantages adds the feature that Python is free and very actively developed by an open source community. The many advantages and promises of Python make it an interesting platform for building scientiﬁc codes. A major question is, however, the performance. We shall address this question by doing performance tests on a range of codes that solve partial diﬀerential equations (PDEs) numerically via ﬁnite diﬀerence methods. More precisely, we are concerned with standard stenciltype operations on large three-dimensional arrays. The various implementations diﬀer in the way they use Python and other languages to implement the numerics. It turns out that naive loops in pure Python are extremely slow for operations on large three-dimensional arrays. Vectorization of expressions may help a lot, as we shall show in this paper, but for optimal performance one has to combine Python with compiled languages. Python comes with strong support for calling C code. To call a C function from Python, one must write wrapper code that converts function arguments and return values between Python and C. This may be quite tedious, but several tools exist for automating the process, e.g., F2PY, SWIG, Boost.Python, and SIP. We shall apply F2PY to automate gluing Python with Fortran code, and we shall write complete extension modules by hand in C for comparison. Modern scientiﬁc codes for solving PDEs frequently make use of parallel computing. In this paper, we advocate that the parallelization can be done more conveniently at the “high” Python level than at the “low” C/MPI level. Parallelization at the Python level utilizes modules like pypar, which gives access to MPI functionality, but at a higher abstraction level than in C. We shall demonstrate how our PDE solvers can be parallelized with pypar, and carefully evaluate the potential performance loss. There are few studies of the performance of Python for numerical computing. Ramachandran [11] ran some tests of a two-dimensional Laplace equation (solved by an SOR method) with diﬀerent combinations of Python tools and compiled languages (Weave, F2PY, Pyrex, Psyco) in a pure serial setting. Some simpler operations and case studies appear in [5], while [2] contains a study quite similar to the present paper, but with simpler PDE and stencil models. These models, arising from constant-coeﬃcient PDEs, involve so few arithmetic operations inside the loops that the conclusions on performance may be misleading for applications involving heterogeneous media and/or more complicated constitutive relations in the PDEs. We therefore saw the

340

H.P. Langtangen and X. Cai

need to repeat the PDE-related tests from [2] in a more physically demanding context in order to reach conclusions of wider relevance. These new applications and tests constitute the core of the present paper. In the remaining text of this section we present the model problems used in the performance tests. Next, we present various implementation strategies and their eﬃciency for stencil operations on three-dimensional arrays. Thereafter, the most promising implementation strategies are employed in a parallel context, where the parallelization is carried out at the Python level. Finally, we give some concluding remarks. 1.1 Two Simple PDEs with Variable Coeﬃcients To study the performance of Python in realistic scientiﬁc applications, while keeping the mathematical and numerical details to a minimum, we will in the upcoming text heavily use the following two simple PDEs in three space dimensions: ∂u = ∇ · (a(x, y, z)∇u) , (1) ∂t ∂2u = ∇ · (a(x, y, z)∇u) . (2) ∂t2 These PDEs arise in a number of physical applications, either as standalone equations for diﬀusion and wave phenomena, respectively, or as part of more complicated systems of PDEs for multi-physics problems. Both (1) and (2) need to be accompanied with suitable boundary and initial conditions, which we will not discuss in the present paper, but rather use simple choices (such as u = 0 on the boundary) in the numerical experiments. The numerical operations associated with solving (1) and (2) enter not only codes solving these speciﬁc equations, but also many other PDE codes where variable-coeﬃcient Laplace operators are present. For example, an explicit “Forward Euler” ﬁnite diﬀerence update of (1) and (2) at a time step involves (to a large extent) the same code and numerical operations as encountered in, e.g., a multigrid solver for a Poisson equation. This means that a performance analysis of such a ﬁnite diﬀerence update gives a good idea of how Python will perform in real-life PDE applications. 1.2 Finite Diﬀerence Discretization The ﬁnite diﬀerence discretization of the common right-hand sides of (1) and (2) takes the following form on a uniform three-dimensional mesh with grid spacings ∆x, ∆y, ∆z: Li,j,k ≡

" 1 ! ai+ 1 ,j,k (ui+1,j,k − ui,j,k )/∆x − ai− 1 ,j,k (ui,j,k − ui−1,j,k )/∆x 2 2 ∆x " 1 ! ai,j+ 1 ,k (ui,j+1,k − ui,j,k )/∆y − ai,j− 1 ,k (ui,j,k − ui,j−1,k )/∆y + 2 2 ∆y " 1 ! ai,j,k+ 1 (ui,j,k+1 − ui,j,k )/∆z − ai,j,k− 1 (ui,j,k − ui,j,k−1 )/∆z . + 2 2 ∆z

Python for High-Performance Computing

341

Here, ai+ 12 ,j,k denotes the value of a(x, y, z) at mid-point ((i + 12 )∆x, j∆y, k∆z), likewise for ai− 12 ,j,k , ai,j+ 12 ,k , ai,j− 12 ,k , ai,j,k+ 12 , and ai,j,k− 12 . In real-life applications, it is not unusual that the values of a(x, y, z) are only available at the mesh points, thus not known on the mid-points such as required by ai+ 12 ,j,k etc. We therefore assume in this paper that values of a(x, y, z) have to be approximated at these mid-points. The harmonic mean is a robust technique for approximating ai+ 12 ,j,k , especially in the case of strong discontinuities in a, such as those often met in geological media. The harmonic mean between ai,j,k and ai+1,j,k is deﬁned as follows: ai+ 12 ,j,k ≈

1 ai,j,k

2 . 1 + ai+1,j,k

(3)

Similarly, we can use the harmonic mean to approximate ai− 12 ,j,k , ai,j+ 12 ,k , ai,j− 12 ,k , ai,j,k+ 12 , and ai,j,k− 12 . Our numerical tests will involve sweeps of ﬁnite diﬀerence stencils over three-dimensional uniform box grids. To this end, it makes sense to build test software that solves (1) and (2) by explicit ﬁnite diﬀerence schemes. Even with a more suitable implicit scheme for (1), the most time-consuming part of a typical multigrid algorithm for the resulting linear system will closely resemble an explicit ﬁnite diﬀerence update. The explicit schemes for (1) and (2) read, respectively, u+1 i,j,k − ui,j,k

∆t −1 − 2u u+1 i,j,k + ui,j,k i,j,k ∆t2

= Li,j,k ,

(4)

= Li,j,k .

(5)

Both equations can be solved with respect to the new and only unknown value u+1 i,j,k . Any implementation essentially visits all grid points at a time level and evaluates the formula for u+1 i,j,k .

2 Python and High-Performance Serial Computing The basic computational kernel for our model PDEs consists of loops over three-dimensional arrays, with arithmetic operations and array look-up of neighboring points in the underlying grid. We shall in this section study different types of implementation of this computational kernel, keeping a focus on both performance and programming convenience. 2.1 Alternative Numerical Python Implementations The fundamental data structure is a contiguous three-dimensional array. Python has an add-on package, called Numerical Python, which oﬀers a

342

H.P. Langtangen and X. Cai

multi-dimensional array object encapsulating a plain C array. Because the underlying data structure is just a pointer to a stream of numbers in memory, it is straightforward to feed C, C++, and Fortran functions with the data in Numerical Python arrays. Numerical Python consists of several modules for eﬃcient computations with arrays. These include overloaded arithmetic operators for array objects, standard mathematical functions (trigonometric, exponential, etc.), random number generation, and some linear algebra (eigenvalues, eigenvectors, solution of dense linear systems). Numerical Python makes extensive use of LAPACK, preferably through the highly optimized ATLAS library. In short, Numerical Python adds “basic numerical Matlab functionality” to Python. At the time of this writing, there are three alternative implementations of Numerical Python. These are named after the name of the fundamental module that deﬁnes the array object: Numeric, numarray, and numpy. Numeric is the “old”, original implementation from the mid 1990s, developed by Jim Hugunin, David Ascher, Paul Dubois, Konrad Hinsen, and Travis Oliphant. Much existing numerical Python software makes heavy use of Numeric. Now the source code of Numeric is not maintained any longer, and programmers are encouraged to port code using Numeric to the new numpy implementation. This is happening to a surprisingly large extent. The numarray implementation, mainly developed by Perry Greenﬁeld, Todd Miller, and Rick White, appeared early in this century and was meant to replace Numeric. This did not happen, partly because so much software already depended on Numeric and partly because some numerical operations were faster with Numeric. To avoid two competing Numerical Python implementations, Travis Oliphant decided to merge ideas from Numeric and numarray into a new implementation, numpy (often written as NumPy, but this shortform is also widely used as a synonym for “Numerical Python”). The hope is that numpy can form the basis of the single, future Numerical Python source, which can be distributed as part of core Python. The new numpy implementation was released in the beginning of 2006, with several deﬁciencies with respect to performance, but many improvements have taken place lately. The experiments later will precisely report the relative eﬃciency of the three Numerical Python versions. The interface to Numeric, numarray, and numpy is almost the same for the three packages, but there are some minor annoying diﬀerences. For example, modules for random number generation, linear algebra, etc., have diﬀerent names. Also, some implementations contain a few features that the others do not have. If one has written a numerical Python program using Numeric, most of the statements will work if one changes the Numeric import with numarray or numpy, but it is unlikely that all statements work. One way out of this is to deﬁne a common (“least common denominator”) interface such that application writers can use this interface and afterwards transparently switch between the three implementations. This is what we have done in all software developed for the tests in the present paper. The interface, called numpytools,

Python for High-Performance Computing

343

is available on the Web as part of the SciTools package [4, 5]. As we will show later in the paper, the performance of these three implementations diﬀer so many application developers may want to write their code such that the user can trivially switch between underlying array implementations. The SciPy package [12] builds on Numerical Python and adds a wide range of useful numerical utilities, including improved Numerical Python functions, a library of mathematical functions (Bessel functions, Fresnel integrals, and many, many more), as well as interfaces to various Netlib [7] packages such as QUADPACK, FITPACK, ODEPACK, and similar. Weave is a part of the SciPy package that allows inline C++ code in Python code. This is the only use of SciPy that we make in the present paper. SciPy version 0.3 depended on Numeric, while version 0.4 and later require numpy. Our use of Weave employs numpy arrays. All experiments reported in this section (see Table 1) are collected in software that can be downloaded from the Web [13] and executed in the reader’s own hardware and software environment. We have run the serial experiments on a Linux laptop, using Linux 2.6 and GNU compilers version 4.1.3 with -O3 optimization. 2.2 Plain Python Loops The schemes (4) and (5) are readily coded as a loop over all internal grid points. Using Numerical Python arrays and standard Python for loops, with xrange as a more eﬃcient way than range to generate the indices, the code for (4) takes the following form: def scheme(unew, u, a, dx, dy, dz, dt): nx, ny, nz = u.shape; nx -= 1; ny -= 1; nz -= 1; dx2 = dx*dx; dy2 = dy*dy; dz2 = dz*dz for i in xrange(1,nx): for j in xrange(1,ny): for k in xrange(1,nz): a_c = 1.0/a[i,j,k] a_ip = 2.0/(a_c + 1.0/a[i+1,j,k]) a_im = 2.0/(a_c + 1.0/a[i-1,j,k]) a_jp = 2.0/(a_c + 1.0/a[i,j+1,k]) a_jm = 2.0/(a_c + 1.0/a[i,j-1,k]) a_kp = 2.0/(a_c + 1.0/a[i,j,k+1]) a_km = 2.0/(a_c + 1.0/a[i,j,k-1]) unew[i,j,k] = u[i,j,k] + dt*( \ (a_ip*(u[i+1,j,k] - u[i ,j,k]) - \ a_im*(u[i ,j,k] - u[i-1,j,k]))/dx2 + \ (a_jp*(u[i,j+1,k] - u[i,j ,k]) - \ a_jm*(u[i,j ,k] - u[i,j-1,k]))/dy2 + \ (a_kp*(u[i,j,k+1] - u[i,j,k ]) - \ a_km*(u[i,j,k ] - u[i,j,k-1]))/dz2) return unew

344

H.P. Langtangen and X. Cai

The three-dimensional arrays unew and u correspond to u+1 and u , whereas nx, ny, nz represent the numbers of grid cells in the three spatial directions. We remark that xrange(1,nx) returns indices from 1 to nx-1 (nx is not included) according to the Python convention. Thus the above code kernel will update the interior mesh points at one time step. (The boundary point layers i = 0, i = nx , j = 0, j = ny , k = 0, k = nz require additional computations according to the actual boundary conditions – here we just assume u = 0 on the boundary.) Moreover, dt, dx2, dy2, dz2 contain the values of ∆t, ∆x2 , ∆y 2 , ∆z 2 . The CPU time of the best performance of all implementations addressed in this paper is scaled to have value 1.0. Time consumption by the above function is 70 CPU time units in case of Numeric arrays, 147 for numarray objects, and 151 for numpy (v1.0.3.1) arrays. First, these results show how much slower the loops run in Python compared with Fortran, C, or C++. Second, the newer implementations numarray and numpy are about a factor of two slower than the old Numeric on such scalar operations. Similar loops implemented in Matlab (from version 6.5 and onwards, when loops are optimized by a justin-time compiler) run much faster, in fact as fast as the vectorized Python implementation (Section 2.4). The syntax u[i,j,k] implies that a tuple object (i,j,k) is used as index argument. In older versions of Numeric the array look-up could be faster if dimension was indexed separately, as in plain C arrays: u[i][j][k]. We have tested the eﬃciency of u[i,j,k] versus u[i][j][k] in the loops above, and it appears that the latter is always slower than the former for all the most recent Numerical Python implementations. Iterators have recently become popular in Python code. One can think of an iterator3 with name it over a Numerical Python array u such that a three-dimensional loop can be written as it, dummy = NumPy_array_iterator(u, offset_stop=1, offset_start=1) for i,j,k, value in it(u): <same code as in function scheme>

The overhead of using such an iterator is almost negligible in the present test problems, regardless of the Numerical Python implementation. The numpy implementation comes with its own iterator, called ndenumerate, such that the loops can be written as for index, value in numpy.ndenumerate(u): i,j,k = index if i == 0 or i == u.shape[0]-1 or \ j == 0 or j == u.shape[1]-1 or \ k == 0 or k == u.shape[2]-1: continue # next pass <same code as in function scheme> 3

The iterator is deﬁned as Tools package [4, 5].

NumPyArray iterator

in the

numpyutils

module of the Sci-

Python for High-Performance Computing

345

From Table 1 we see that ndenumerate is clearly slower than the it iterator above or plain nested loops. To increase the eﬃciency of scalar array operations, numpy oﬀers functions for setting/getting array entries instead of using subscripts [i,j,k]: u.itemset(i,j,k, b) b = u.item(i,j,k)

# means u[i,j,k] = b # means b = u[i,j,k]

The code now requires only 51 time units. That is, numpy is faster than Numeric for scalar array operations if item and itemset are used instead of subscripting. 2.3 Psyco-Accelerated Loops Psyco [8] is a Python module, developed by Armin Rigo, that acts as a kind of just-in-time compiler. The usage is trivial: import psyco scheme = psyco.proxy(scheme)

The psyco.proxy function here returns a new version of the scheme function where Psyco has accelerated various constructions in the code. Unfortunately, the eﬀect of Psyco in the present type of intensive loops-over-arrays applications is often limited. For indexing with numpy and item/itemset the CPU time is halved, but for standard subscripting the gain is only a reduction of 15%. With Numeric, the speed-up of plain loops with Psyco is almost 30%. Since a CPU time reduction of 2-3 orders of magnitude is actually required, we conclude that Psyco is insuﬃcient for improving the performance signiﬁcantly. The CPU time results so far show that array indexing in Python is slow. For one-dimensional PDE problems it may often be fast enough on today’s computers, but for three-dimensional problems one must avoid explicit indexing and use instead the techniques discussed in the forthcoming text. 2.4 Vectorized Code Users of problem-solving environments such as Maple, Mathematica, Matlab, R, and S-Plus, which oﬀer interpreted command languages, know that loops over arrays run slowly and that one should attempt to express array computations as a set of basic operations, where each basic operation applies to (almost) the whole array, and the associated loop is implemented eﬃciently in a C or Fortran library. The rewrite of “scalar” code in terms of operations on arrays such that explicit loops are avoided, is often referred to as vectorization [6]. For example, instead of looping over all array indices and computing the sine function of each element, one can just write sin(x). The whole array x is then sent to a C routine that carries out the computations and returns the result as an array. In the present test problems, we need to vectorize the scheme function, using Numerical Python features. First, the harmonic mean computations

346

H.P. Langtangen and X. Cai

can be carried out on the whole array at once (utilizing overloaded arithmetic operators for array objects). Second, the combination of geometric neighboring values must be expressed as array slices shifted in the positive and negative space directions. A slice in Python is expressed as a range of indices, e.g., 1:n, where the ﬁrst number is the lowermost index and the upper number is the last index minus one (i.e., 1:n corresponds to indices 1,2,...,n-1). Negative indices count reversely from the end, i.e., 1:-1 denotes all indices from 1 up to, but not including, the last index. If the start or end of the slice is left out, the lowermost or uppermost index is assumed as limit. Now, u[1:-1,1:-1,1:-1] represents a view to all interior points, while u[2:,1:-1,1:-1] has a shift “i+1” in the ﬁrst index. Vectorization of the code in Section 2.2 can take the following form, using shifted array slices: def schemev(unew, u, a, dx, dy, dz, dt): nx, ny, nz = u.shape; nx -= 1; ny -= 1; nz -= 1; dx2 = dx*dx; dy2 = dy*dy; dz2 = dz*dz a_c a_ip a_im a_jp a_jm a_kp a_km

= = = = = = =

1.0/a[1:-1,1:-1,1:-1] 2.0/(a_c + 1.0/a[2:,1:-1,1:-1]) 2.0/(a_c + 1.0/a[:-2,1:-1,1:-1]) 2.0/(a_c + 1.0/a[1:-1,2:,1:-1]) 2.0/(a_c + 1.0/a[1:-1,:-2,1:-1]) 2.0/(a_c + 1.0/a[1:-1,1:-1,2:]) 2.0/(a_c + 1.0/a[1:-1,1:-1,:-2])

unew[1:-1,1:-1,1:-1] = u[1:-1,1:-1,1:-1] + dt*( \ (a_ip*(u[2:,1:-1,1:-1] - u[1:-1,1:-1,1:-1]) - \ a_im*(u[1:-1,1:-1,1:-1] - u[:-2,1:-1,1:-1]))/dx2 + \ (a_jp*(u[1:-1,2:,1:-1] - u[1:-1,1:-1,1:-1]) - \ a_jm*(u[1:-1,1:-1,1:-1] - u[1:-1,:-2,1:-1]))/dy2 + \ (a_kp*(u[1:-1,1:-1,2:] - u[1:-1,1:-1,1:-1]) - \ a_km*(u[1:-1,1:-1,1:-1] - u[1:-1,1:-1,:-2]))/dz2) return unew

All the slow loops have now been removed. However, since all arithmetic operations on array objects are necessarily binary (pair-wise) operations, a lot of temporary arrays are created to store intermediate results in compound arithmetic expressions. Python is good at removing such temporaries fast, but their allocation and deallocation imply some overhead. The vectorized version of the ﬁnite diﬀerence scheme, still implemented solely in Python code, requires only 2.3 CPU time units when numpy arrays are used. The number decreases to 3.1 for Numeric arrays and 4.6 numarray objects4 . In this case, the recent numpy implementation outperforms the older Numeric. For many applications where other parts of the code, e.g. involv4

On a MacBook Pro running Mac OS X 10.5, vectorization was even less eﬃcient, the CPU time numbers being 4.1, 5.2, and 5.5 for numpy, Numeric, and numarray, respectively. Most other timing results on the Mac were compatible with those on the IBM computer.

Python for High-Performance Computing

347

ing I/O, consume signiﬁcant CPU time, vectorized implementation of ﬁnite diﬀerence stencils may exhibit satisfactory performance. 2.5 Inline C++ Code Using Wave The Weave tool [15], developed by Eric Jones, comes with the SciPy package and allows us to write the loops over the arrays in C++ code that will be automatically compiled, linked, and invoked from Python. The C++ code employs Blitz++ arrays and the corresponding subscripting syntax must be applied in the C++ code snippet. Let us exemplify the use in our case: def schemew(unew, u, a, dx, dy, dz, dt): nx, ny, nz = u.shape; nx -= 1; ny -= 1; nz -= 1; dx2 = dx*dx; dy2 = dy*dy; dz2 = dz*dz code = r""" int i,j,k; double a_c, a_ip, a_im, a_jp, a_jm, a_kp, a_km; for (i=1; i
The weave.inline function accepts the C++ code as a string, followed by a list of all variables in the Python code that must be transferred to the C++ code, and ﬁnally some information on the type C++ array and the compiler we want to use. The C++ code string is embedded in a C++ function in an extension module, and its function is automatically called. No compilation or linking is performed if not strictly necessary. The Weave version of our stencil runs as fast as a pure Fortran or C implementation of the whole problem. Migration to C++ via Weave might

348

H.P. Langtangen and X. Cai

therefore be an attractive strategy since all program statements are kept in the Python ﬁles and the creation of extension modules is automatic. This can be convenient when developing comprehensive PDE solver packages where intensive array numerics constitutes a small fraction of the total code volume. 2.6 Migration of Loops to Fortran The F2PY [3] tool, by Pearu Peterson, makes it very easy to combine Fortran code with Python. In our case we may implement the loops over the arrays as a Fortran 77 subroutine and call that routine from Python. The Fortran subroutine may be implemented as follows: subroutine stencil(unew, u, a, nx, ny, nz, dx, dy, dz, dt) integer nx, ny, nz real*8 unew(0:nx, 0:ny, 0:nz), u(0:nx, 0:ny, 0:nz) real*8 a(0:nx, 0:ny, 0:nz) real*8 dx, dy, dz, dt, dx2, dy2, dz2 Cf2py intent(in, out) unew integer i, j, k real*8 a_ip, a_im, a_jp, a_jm, a_kp, a_km, a_c dx2 = dx*dx dy2 = dy*dy dz2 = dz*dz do k = 1, nz-1 do j = 1, ny-1 do i = 1, nx-1 a_c = 1.0/a(i,j,k) a_ip = 2.0/(a_c + 1.0/a(i+1,j,k)) a_im = 2.0/(a_c + 1.0/a(i-1,j,k)) a_jp = 2.0/(a_c + 1.0/a(i,j+1,k)) a_jm = 2.0/(a_c + 1.0/a(i,j-1,k)) a_kp = 2.0/(a_c + 1.0/a(i,j,k+1)) a_km = 2.0/(a_c + 1.0/a(i,j,k-1)) & & & & & &

unew(i,j,k) = u(i,j,k) + dt*( (a_ip*(u(i+1,j,k) - u(i,j,k)) a_im*(u(i,j,k) - u(i-1,j,k)))/dx2 + (a_jp*(u(i,j+1,k) - u(i,j,k)) a_jm*(u(i,j,k) - u(i,j-1,k)))/dy2 + (a_kp*(u(i,j,k+1) - u(i,j,k)) a_km*(u(i,j,k) - u(i,j,k-1)))/dz2) end do end do end do return end

The line starting with Cf2py is a special comment line to tell F2PY that unew is both an input and output argument. F2PY tries to make the Fortran routine look as “Pythonic” as possible from the Python side: output arguments are returned, and array sizes that can be extracted from the supplied Numerical Python arrays become optional arguments with correct default values.

Python for High-Performance Computing

349

Say this routine is saved in a ﬁle stencil.f. The F2PY command f2py -m floops -c stencil.f

parses the Fortran code, generates wrapper code in C, and compiles and links an extension module floops that can be called from Python: import floops unew = floops.stencil(unew, u, a, dx, dy, dz, dt)

Notice that the array sizes nx, ny, and nz are left out of the call (as one would do in pure Python). The idea of having unew as input and output argument is important. If it were just an output argument, which is natural from the mathematical formula, F2PY will generate code that allocates the returned array each time the routine is invoked. To eliminate this overhead, we supply the array from the Python side. The storage of arrays in Python and Fortran diﬀers. Numerical Python employs C arrays, which are stored “row by row”, i.e., the last index varies fastest. Fortran arrays, on the other hand, are stored “columnwise”, i.e., the ﬁrst index varies fastest. We therefore need to transpose a Numerical Python array before Fortran operates on it. This transpose operation is automatically done by the F2PY-generated wrapper code, but it may be time consuming and thus degrade the performance. We therefore strongly recommend programmers to explicitly convert their arrays to Fortran storage before Fortran routines are called. The conversion is done by u = numpy.asarray(u, order=’Fortran’)

or the order argument can be given to the functions that construct numpy arrays. (In older versions of F2PY the conversion was done by a function in the compiled extension module.) A related issue of concern is type incompatibility: if a Numerical Python array argument does not have the same element type in Python and Fortran, F2PY generates code that automatically takes a copy of the input array and converts all elements to the type accepted by Fortran. Such a copy may lead to signiﬁcant overhead, so careful checking of type compatibility of arrays is of utmost importance for performance. When the loops are migrated to Fortran and the Fortran routine is called from Python, the speed of the program is optimal, in the sense that a standalone pure Fortran code runs at (approximately) the same speed. These tests employ the new numpy-based version of F2PY. Earlier versions, combined with Numeric, show a similar optimal performance. 2.7 Migrating Loops to C The classical way of importing C code into Python is to write a complete extension module by hand. This requires much more work and competence

350

H.P. Langtangen and X. Cai

than using Fortran and F2PY. Fortunately, there is extensive documentation on how to write extension modules [5, 14]. First, the tuple of function arguments received from Python must be unpacked, and each argument, which is a Python object, must be converted to a C built-in type or a C struct. Numerical Python arrays are typically accessible as C structs (of type PyArrayObject) in C code. Parts of the source code of such an extension module is shown below. #include "Python.h" #include "Numeric/arrayobject.h" static PyObject *stencil (PyObject *self, PyObject *args) { PyArrayObject *unew_array, *u_array, *a_array; double *unew, *u, *a; int i, j, k, nx, ny, nz, offset0, offset1; int ijk, ip1jk, im1jk, ijp1k, ijm1k, ijkp1, ijkm1; double a_ip, a_im, a_jp, a_jm, a_kp, a_km, a_c; double dx, dy, dz, dt, dx2, dy2, dz2; if (!PyArg_ParseTuple(args, "O!O!O!dddd:stencil", &PyArray_Type, &unew_array, &PyArray_Type, &u_array, &PyArray_Type, &a_array, &dx, &dy, &dz, &dt)) { return NULL; /* in case of an exception */ } nx = u_array->dimensions[0]-1; ny = u_array->dimensions[1]-1; nz = u_array->dimensions[2]-1; /* access to the data arrays in Numerical Python objects */ unew = (double*)unew_array->data; u = (double*)u_array->data; a = (double*)a_array->data; /* beginning of core code */ dx2 = dx*dx; dy2 = dy*dy; dz2 = dz*dz; offset0 = (ny+1)*(nz+1); offset1 = nz+1; ijk = offset0 + offset1; for (i=1; i
1.0/a[ip1jk]); 1.0/a[im1jk]); 1.0/a[ijp1k]); 1.0/a[ijm1k]);

Python for High-Performance Computing

351

a_kp = 2.0/(a_c + 1.0/a[ijkp1]); a_km = 2.0/(a_c + 1.0/a[ijkm1]); unew[ijk] = u[ijk] + dt*( \ (a_ip*(u[ip1jk] - u[ijk]) - \ a_im*(u[ijk] - u[im1jk]))/dx2 + \ (a_jp*(u[ijp1k] - u[ijk]) - \ a_jm*(u[ijk] - u[ijm1k]))/dy2 + \ (a_kp*(u[ijkp1] - u[ijk]) - \ a_km*(u[ijk] - u[ijkm1]))/dz2); } ijk += 2; } ijk += 2*offset1; } /* end of core code */ return PyArray_Return(unew_array); }

When writing the above function we have utilized the fact that a threedimensional Numerical Python array has a contiguous underlying memory layout, with the k-index varying most rapidly (then j and ﬁnally i). An initialization routine and a table are also needed to complete the code of the extension module. The source code must be compiled and linked with Numerical Python to form a shared library that can act as a module in Python. The usage of this module is exactly the same as demonstrated in the Fortran case. We also mention that F2PY can be used to simplify the writing of C extension modules, because F2PY can wrap C functions in the same manner as Fortran subroutines. In the present example we can write a C function with the arguments as in the Fortran case, except that a multi-dimensional array must be a single pointer since Numerical Python array data are available as a single pointer only. The function signature becomes void stencil(double *unew, double *u, int nx, int ny, int nz, double dx, double dy, double dz, double dt)

The function body consists of the lines between the comments /* beginning of core code */ and /* end of core code */ in the previous stencil function with PyObjects* arguments (variable declarations of dx2 etc. are also needed). F2PY cannot automatically ﬁgure out which integer variables that describe the sizes of arrays in C so we need to write an equivalent Fortran subroutine signature in F2PY’s Fortran 90 module syntax. This can be written directly in a .pyf ﬁle (see the F2PY manual), or one can automatically generate the .pyf ﬁle from a Fortran 77 ﬁle with the subroutine signature. The rest of the task with building the extension module in C is as easy as building a Fortran extension module with F2PY. The performance of both types of C function (complete hand-written extension module or F2PY-generated module from C code) is the same as for the F2PY-wrapped Fortran subroutine, i.e., all implementations run at the same speed as the corresponding pure Fortran or C stand-alone program.

352

H.P. Langtangen and X. Cai

2.8 Stand-Alone C and Fortran Programs We have also developed pure C and Fortran programs to check the performance when no Python code is involved. The Fortran version merely calls the previously shown Fortran subroutine, while the C version applies native three-dimensional C arrays and indexing like u[i][j][k]. This gives a slightly diﬀerent implementation than in the stencil function in the handwritten C extension module (where the arrays are Numerical Python array structs). Nevertheless, we could not observe any diﬀerences in the speed compared with the similar Fortran program - or the Python programs that called Fortran and C implementations of the loops. Implementation stand-alone C program stand-alone Fortran 77 program Python code, but loops in Fortran 77, Numeric, numarray and Python code, but loops in C, Numeric and numarray schemew: C++ loops with weave.inline (and numpy) schemev: vectorization via slices, numpy schemev: vectorization via slices, numarray schemev: vectorization via slices, Numeric scheme w/Psyco: plain loops w/item/itemset, numpy scheme: plain loops w/item/itemset, numpy scheme w/Psyco: plain loops w/u[i,j,k], numpy scheme: plain loops w/u[i,j,k], numpy scheme w/Psyco: plain loop w/it iterator, numpy scheme: plain loop w/it iterator, numpy scheme w/Psyco: plain loop w/ndenumerate, numpy scheme: plain loop w/ndenumerate, numpy scheme w/Psyco: plain loops w/u[i,j,k], Numeric scheme: plain loops w/u[i,j,k], Numeric scheme w/Psyco: plain loop w/it iterator, Numeric scheme plain loop w/it iterator, Numeric scheme w/Psyco: plain loops w/u[i,j,k], numarray scheme: plain loops w/u[i,j,k], numarray

numpy

CPU time 1.0 1.0 1.0 1.0 1.0 2.3 3.1 4.6 22 51 129 151 133 158 135 167 70 97 72 98 120 147

Table 1. Scaled CPU times for various implementations of a ﬁnite diﬀerence scheme (4), combined with diﬀerent types of Numerical Python arrays. The problem size correspond to a grid with 803 cells. The experiments are run with Python 2.5, Numeric 24.2, numarray 1.5.2, and numpy 1.0.5

3 Python and High-Performance Parallel Computing To push the pursuit of performance even further, it is obvious that we should adopt parallel computing. As mentioned before, having the vast portion

Python for High-Performance Computing

353

of a scientiﬁc code implemented in Python is advantageous since large parts of the code often deal with managing tasks for which Python is superior than traditional compiled languages. If the computationally intensive loops are migrated to Fortran and C, as described in the previous section, close-tooptimal execution speed is obtained. Nevertheless, on parallel computers the loops must be partitioned, and data must be communicated using the message passing paradigm and the MPI library. It would be ideal if the compiled loops developed for a scalar computer could be reused in a parallel context and if the message passing could be invoked from high-level Python statements. This section thus aims to examine how much extra communication will be induced by using Python in the context of parallel computing. The attention is restricted to distributed memory and message passing, because sharedmemory parallel programing via Python is so far not an option yet. 3.1 Parallel Programming via Python Several Python extension modules exist for calling MPI commands. With respect to eﬃciency, the best known MPI module is pypar [10], which provides an interface to a subset of the MPI commands. High performance of pypar is due to the fact that Numeric arrays can be handled directly using the underlying C arrays, not as general Python objects. (For access to more MPI functionality, the reader is referred to pyMPI [9].) Writing MPI programs in Python follows the same principles as in C or Fortran. Take the three-dimensional wave equation (2) for example. Parallelism arises from dividing the entire computational mesh into subdomains, such that each subdomain is responsible for computing u+1 on its local mesh. More speciﬁcally, each subdomain computes on its assigned interior points by the following triple for-loop: for i in xrange(1,nx_loc): for j in xrange(1,ny_loc): for k in xrange(1,nz_loc):

We remark that the only diﬀerence is that nx loc, ny loc, nz loc have replaced nx, ny, nz from Section 2.2. Between neighboring subdomains, communication takes the following form: • • • • • • • • • •

unew unew unew unew unew unew unew unew unew unew

loc[1,:,:] is sent to the negative-x neighbor. loc[0,:,:] is received from the negative-x neighbor. loc[-2,:,:] is sent to the positive-x neighbor. loc[-1,:,:] is received from the positive-x neighbor. loc[:,1,:] is sent to the negative-y neighbor. loc[:,0,:] is received from the negative-y neighbor. loc[:,-2,:] is sent to the positive-y neighbor. loc[:,-1,:] is received from the positive-y neighbor. loc[:,:,1] is sent to the negative-z neighbor. loc[:,:,0] is received from the negative-z neighbor.

354

• •

H.P. Langtangen and X. Cai unew loc[:,:,-2] is sent to the positive-z neighbor. unew loc[:,:,-1] is received from the positive-z neighbor.

The actual pypar commands for exchanging a message between two neighbors can be as follows: pypar.send (unew_loc[1,:,:], negative_x_neigh_id, bypass=True) pypar.receive (negative_x_neigh_id, buffer=x_buffer, bypass=True) unew_loc[0,:,:] = x_buffer

Notice that the pypar commands are simpler in syntax than their C counterparts. The option bypass=True is essential for skipping unnecessary checks of the message, thereby securing eﬃciency. Likewise is the option buffer=x buffer important for eﬃciency in the receive command. Here we assume that x buffer is a pre-allocated array of dimension (ny loc+1,nz loc+1). For more details about parallelizing ﬁnite diﬀerence codes using pypar, we refer to [1,2]. 3.2 Migration of Computations to C To secure high performance of the serial computations in the parallel code, it is important to migrate the Python code kernel to C or Fortran (as explained in the previous section). The serial C function stencil can be reused in a parallel context (by interpreting nx as nx loc, and similarly for ny and nz). 3.3 Measurements We compare the performance of the above mixed Python-C code with that of a matching pure C code, for which the wave equation (2) is solved on a three-dimensional 200 × 200 × 200 mesh using 692 time steps. In Table 2, wallclock time measurements are listed together with the corresponding speedup results obtained on diﬀerent numbers of processors. The parallel hardware platform used is a cluster of 3.4 GHz Itanium2 processors, interconnected through 1Gbits ethernet. The C compiler used is GNU gcc of version 3.4.5, whereas the version of pypar is 1.9.1. To show that the computation-communication ratio has a much larger impact than the Python-induced communication overhead, we also report in Table 2 the performance of simpliﬁed Python-C and C codes used to solve (5) when a is constant. (Measurements for these simpliﬁed codes were reported in [1] for a much smaller cluster using up to 16 processors.) For this simpliﬁed case, the ﬁnite diﬀerence discretization results in the following numerical scheme: −1 u+1 i,j,k − 2ui,j,k + ui,j,k

∆t2

ui+1,j,k − 2ui,j,k + ui−1,j,k ∆x2 ui,j+1,k − 2ui,j,k + ui,j−1,k +a2 ∆y 2 ui,j,k+1 − 2ui,j,k + ui,j,k−1 +a2 . ∆z 2 = a2

Python for High-Performance Computing

355

Table 2. Wall-clock time measurements (in seconds) and speedup results associated with solving two three-dimensional wave equations with constant and variable coeﬃcients, respectively. For each equation, the performance of a mixed Python-C code is compared with that of a pure C code

P 1 2 4 8 16 32 64 128

Wave equation with constant a Mixed Python-C Pure C Time Speedup Time Speedup 88.241 N/A 74.140 N/A 48.074 1.84 40.826 1.82 26.507 3.33 21.824 3.40 16.717 5.28 13.354 5.55 9.576 9.21 7.954 9.32 6.103 14.46 5.583 13.28 4.718 18.70 3.900 19.01 3.107 28.40 2.417 30.67

Wave equation with a(x, y, z) Mixed Python-C Pure C Time Speedup Time Speedup 1045.15 N/A 1050.81 N/A 532.849 1.96 532.715 1.97 271.093 3.86 268.518 3.91 140.105 7.46 136.611 7.69 73.599 14.20 70.147 14.98 38.743 26.99 38.816 27.07 21.556 48.49 21.162 49.66 11.960 87.39 11.459 91.70

The corresponding Python code kernel for the above numerical scheme, which is much less computation intensive than that from Section 2.2, is shown below with Cx2, Cy2, Cz2 having constant values of a2 ∆t2 /∆x2 , a2 ∆t2 /∆y 2 , a2 ∆t2 /∆z 2 . Of course, the actual computation has been migrated from Python to C for eﬃciency. for i in xrange(1,nx): for j in xrange(1,ny): for k in xrange(1,nz): unew[i,j,k] = 2.0*u[i,j,k] Cx2*(u[i+1,j,k] Cy2*(u[i,j+1,k] Cz2*(u[i,j,k+1] -

- uold[i,j,k] + \ 2*u[i,j,k] + u[i-1,j,k]) + \ 2*u[i,j,k] + u[i,j-1,k]) + \ 2*u[i,j,k] + u[i,j,k-1])

We can observe from Table 2 that solving the wave equation with variable coeﬃcient a(x, y, z) is drastically more computation intensive than solving the wave equation with constant a, mainly because of the calculation of ai+ 12 ,j,k , ai− 12 ,j,k , ai,j+ 12 ,k , ai,j− 12 ,k , ai,j,k+ 12 , and ai,j,k− 12 . This makes the computation-communication ratio more favorable for the case with variable coeﬃcient a(x, y, z), and thereby better speedup results are obtained. We argue that real-life numerical codes have similar computational intensity as the case of a Laplacian operator with variable coeﬃcient. Therefore, the impact of extra communication overhead induced by Python will not be a real obstacle for performance. The results in Table 2 substantiate this important conclusion.

4 Conclusions Many computational scientists have recently noticed that Python is emerging as a very interesting platform for writing scientiﬁc codes. To assist

356

H.P. Langtangen and X. Cai

scientists with measurements on the performance of Python for numerical computing, we have extended previous performance tests in the literature in order to provide more relevant data. The new tests concern PDEs involving variable-coeﬃcient Laplace operators, discretized by ﬁnite diﬀerence methods. The resulting stencil operations constitute time-consuming building blocks in lots of numerical methods for solving PDEs. We have looked at several ways of implementing a ﬁnite diﬀerence stencil in a Python program. Pure Python loops over Numerical Python arrays run very slowly, requiring two orders of magnitude longer CPU times than a similar pure Fortran program. This slowdown is dependent on the type of implementation of Numerical Python. The new numpy package is inferior to the old Numeric package for standard subscripting, but much faster if the special new functions item and itemset are used to get and set array values. Several techniques exist for speeding up Python loops. The just-in-time compiler Psyco is the simplest technique, but showed modest improvements in the present application (15% to 50%). Performance at the level of Matlab and even compiled languages requires either vectorization of array expressions or migration of the loops to native Fortran, C, or C++ code. In the latter case, performance at the same level as pure Fortran and C codes can be obtained, while in the former case we observed an eﬃciency loss of a factor 2.3 for numpy. Our tests indicate that numpy is superior to Numeric and numarray for vectorized expressions. The scalar codes mentioned above have also been extended to parallel platforms with distributed memory. All communication, based on the message passing technique, is conveniently programmed (using a simpler syntax) in Python. Our comparison with a similar parallel program written entirely in C/MPI shows that Python can be used for parallelization with negligible loss of computational eﬃciency, at least in the present type of stencil operations. In comparison with our earlier ﬁndings reported in [1, 2], the better computation-communication ratio, which is due to the variable coeﬃcient, gives us more reasons to use Python-interfaced MPI commands. This is because the speedup diﬀerences between a mixed Python-C implementation and a pure C implementation are considerably smaller than those in the case of a constant coeﬃcient. Moreover, the advantages of a simpler MPI syntax and access to the elegantly eﬃcient array slicing functionality prevail for the Python-C (or Python-Fortran) combination. We conclude that Python is a very interesting language for writing scientiﬁc codes. Python is superior to Fortran, C, and C++ for all non-numerical tasks in a simulation code. For the computing intensive tasks, plain nested loops over multi-dimensional arrays run slowly in Python, but with the simple technqiues listed in the previous sections, one can easily achieve the performance of compiled Fortran and C code for both serial and parallel applications.

Python for High-Performance Computing

357

Acknowledgement. The staﬀ at USIT, University of Oslo, is acknowledged for providing technical support when using their parallel computing platform. The authors would also like to thank Ola Skavhaug and Rolv Bredesen for interesting discussions on performance issues in Python.

References 1. X. Cai and H. P. Langtangen. Parallelizing PDE solvers using the Python programming language. In A.M. Bruaset and A. Tveito, editors, Numerical Solution of Partial Diﬀerential Equations on Parallel Computers, volume 51 of Lecture Notes in Computational Science and Engineering, pages 295–325. Springer-Verlag, 2006. 2. X. Cai, H. P. Langtangen, and H. Moe. On the performance of the Python programming language for serial and parallel scientiﬁc computations. Scientiﬁc Programming, 13(1):31–56, 2005. 3. F2PY software package. http://cens.ioc.ee/projects/f2py2e. 4. H. P. Langtangen. Scripting utilities for [5], 2006. http://folk.uio.no/hpl/ scripting. 5. H. P. Langtangen. Python Scripting for Computational Science. Springer, 3rd edition, 2008. 6. Matlab code vectorization guide. http://www.mathworks.com/support/ tech-notes/1100/1109.html, 2004. 7. Netlib repository of numerical software. http://www.netlib.org. 8. Psyco home page. http://psyco.sourceforge.net/, 2004. 9. PyMPI software package. http://sourceforge.net/projects/pympi, 2004. 10. PyPar software package. http://datamining.anu.edu.au/∼ole/pypar, 2004. 11. P. Ramachandran. Performance of various Python implementations for solving the 2D Laplace equation. http://www.scipy.org/PerformancePython. 12. SciPy software package. http://www.scipy.org. 13. Software for running the computational experiments in the present paper. http: //folk.uio.no/xingca/python/efficiency2/. 14. G. van Rossum and F. L. Drake. Extending and Embedding the Python Interpreter. http://docs.python.org/ext/ext.html. 15. Weave. http://www.scipy.org/documentation/weave. Part of the SciPy package.

Designing Learning Control that is Close to Instability for Improved Parameter Identiﬁcation Richard W. Longman1 , Kevin Xu2 , and Benjamas Panomruttanarug3 1 2 3

Columbia University, New York, NY, USA, [email protected] Columbia University, New York, NY, USA, [email protected] Columbia University, New York, NY, USA, [email protected]

Abstract Iterative learning control (ILC) uses an iteration in hardware adjusting the input to a system in order to converge to zero tracking error following a desired system output. Convergence is sensitive to model error, and errors that are suﬃciently large to cause divergence, produce inputs that particularly excite unmodeled or poorly modeled dynamics, producing experimental data that is focused on what is wrong with the current model. A separate paper studied the overall concept, and speciﬁcally addressed issues of model order error. The ﬁrst purpose of this paper is to develop modiﬁed ILC laws that are intentionally non-robust to model errors, as a way to ﬁne tune the use of ILC for identiﬁcation purposes. And the second purpose is to study the non-robustness with respect to its ability to improve identiﬁcation of system parameters when the model order is correct. It is demonstrated that in many cases the approach makes the learning particularly sensitive to relatively small parameter errors in the model, but sensitivity is sometimes limited to parameter errors of a speciﬁc sign.

1 Introduction The optimal experiment design ﬁeld most often makes use of the Fisher information matrix, and develop methods to generate a sequence of inputs to experiments aiming to maximize a likelihood function [1–3]. This type of approach is based on stochastic modeling, and necessarily uses the current model in deciding how to optimize the next experiment. In [4] a totally diﬀerent approach is considered that also develops a sequence of experiments, each one based on the results of the previous one. But this time, use is made of iterative learning control (ILC) which is a relatively new ﬁeld that develops iterative methods to adjust the input to a system aiming to converge to zero error in the system output following a desired trajectory [5–10]. A major objective in the ILC ﬁeld is to ﬁnd ways to make the decay of the errors robust to model errors. When using ILC for experiment design, one instead takes

360

R.W. Longman et al.

advantage of the lack of robustness to model errors, and lets the iterations progress, going unstable and creating larger and larger signals that are isolating and amplifying information concerning what is suﬃciently wrong with the model to produce instability. Then system identiﬁcation algorithms such as in [11–13] can be used on the data to correct the model. There are very many approaches to designing iterative learning control laws. Since our experiment design objective is diﬀerent than that of the ILC designer, we want a learning law whose stability is particularly sensitive to model errors. The main purpose of this paper is to develop ILC design approaches that deliberately makes the stability of the iterations very sensitive to errors in the parameters of a model. Reference [4] uses an ILC design that is based on linear-quadratic optimal control theory, and this design is somewhat more robust to model errors than most. The approach used here starts from the phase cancellation ILC design method of [9] and investigates several ways to modify it so that the stability robustness is small. The emphasis in [4] was on model errors that relate to the order of the model, i.e. model errors that relate to missing dynamics such as parasitic poles or residual second order modes. Here we put the emphasis on model errors when the model order is correct but the model coeﬃcients are inaccurate.

2 A Condition for Decay or Growth of Error with ILC Iterations This section presents the basic formulation and certain important properties of ILC. See [7] for more detail using the same approach. Let y ∗ (k) be a chosen desired system output that is p time steps long. In the ﬁrst run or iteration, one applies a chosen p step input and records the response. After each iteration the system is reset to the same initial starting conditions, and the input is updated according to an ILC law and applied to the system. Let subscript j denote the iteration number, and write the real world dynamics as a single input, single output diﬀerence equation xj (k + 1) = Axj (k) + Buj (k) ; yj (k) = Cxj (k)

(1)

Deﬁne the output error ej (k) = y ∗ (k)−yj (k). The actual dynamics may be governed by a linear diﬀerential equation, and the input is applied through a zero order hold, holding uj (k) constant throughout time step k. Then (1) can represent this diﬀerential equation without approximation, and the original diﬀerential equation can be recovered uniquely provided the sample rate is high enough to avoid aliasing. Use underbars to indicate a column vector of the history of a variable for iteration j. Then uj is a column vector of inputs u(k) for time steps 0 through p − 1, and y j , ej are similar except that they start and end one time step later. This one time step shift is incorporated into the deﬁnitions to account for the usual one time step delay between a

Designing Learning Control for Parameter Identiﬁcation

361

change in the input and the ﬁrst time step a change is seen in the sampled output. A general linear ILC law takes the form uj+1 = uj + Lej where L is a p by p matrix of learning control gains. One can write the solution to (1) for p time steps in terms of the convolution sum, making use of the lower triangular Toeplitz matrix P of Markov parameters, whose diagonal elements are all CB , all elements in the ﬁrst subdiagonal are CAB , and continuing in this manner to the element CAp−1 B in the lower left corner. Then one can give the error history evolution with iterations as ej+1 = (I − P L)ej ; max |λi (I − P L)| < 1 i

(2)

where I is the p by p identity matrix, and the second equation in (2) gives the stability boundary, i.e. satisfying it guarantees convergence to zero tracking error for all possible initial inputs. Suppose that the matrix learning law L has a Toeplitz structure so that all entries along any given diagonal are the same. Then L is a ﬁnite time version of a transfer function which we denote as L(z) . Let G(z) be the z-transfer function of system (1). Then one can take transforms of the system and the learning law to obtain Ej+1 (z) = [1 − G(z)L(z)]Ej (z)

(3)

The square brackets represent a transfer function from the error in one iteration to that in the next. Set z = exp(iωT ) in this expression to form the frequency transfer function, where T is the sample time and ω is the radian frequency. If inequality I − G(eiωT )L(eiωT ) < 1 ∀ω

(4)

is satisﬁed for all frequencies, then every steady state frequency component of the error will decay monotonically with iterations. This does not guarantee stability because of the transient parts of the trajectory. But if one makes p large enough compared to the time constants of the system and picks an appropriate y ∗ , the behavior described by (4) will dominate the responses for the early iterations, even if the learning process is actually unstable [10]. We will often study the learning behavior of (4) by plotting G(z)L(z) for z = exp(iωT ) for ωT from zero to π, i.e. from zero frequency to Nyquist frequency. When plotted as in Fig. 1, if the curve stays inside the unit circle centered at +1, (4) is satisﬁed at all frequencies. The radial distance from +1 to a point on the curve for a speciﬁc frequency indicates the factor by which the amplitude of a component of the error at that frequency will be multiplied every iteration. If that number is greater than one, that frequency component is ampliﬁed every iteration. The design problem for ILC requires producing a compensator that moves the curve inside the unit circle for all frequencies in order to produce decaying error. Our objective is to make an ILC law that is deliberately non robust, so that small errors in the parameters of the model

362

R.W. Longman et al.

used for the ILC design will make the ILC iterations become unstable. We will do this by designing a learning law that moves all frequencies to points that are inside the unit circle based on the current model. But the points are chosen to be very near the stability boundary so that small model errors are likely to put the learning process outside the unit circle. Then the error components for those frequencies for which the model was inaccurate enough to make the ILC unstable, will grow with iterations. With enough iterations the errors will be arbitrarily ampliﬁed so that system identiﬁcation will be able to see and correct the parameter error.

H

A

B

C

D text

E

F I J

Figure 1. Deﬁnitions of circles and points for polar plots of frequency response of learning law times system

3 Creating a Deliberately Non Robust ILC Law Reference [9] develops a phase cancellation ILC law. The error is decomposed into its frequency components, a phase lead (or lag) is introduced in each component such that when it goes through the system, the system supplies the opposite phase lag (or lead). In this way every component of the error after going through the ILC law and then the system will be real and positive. This means that the plot of G(z)L(z) for z = exp(iωT ) is on the positive real axis in Fig. 1. And an appropriately chosen gain will keep it smaller than 2 so that (4) is satisﬁed. Experiments in [9] on a robot performing a high speed maneuver decreased the root mean square of the tracking error by a factor of nearly 1000 in about 15 to 20 iterations. Note that numerical studies suggest that this learning law is actually unstable, i.e. it does not satisfy inequality (2). Simulations will be documented elsewhere that show small wiggles near the end of the trajectory start to become evident by iteration 1000. The onset of these wiggles can be delayed by including a constant section of trajectory at the end. In any case the instability takes many iterations to appear, and

Designing Learning Control for Parameter Identiﬁcation

363

it appears starting from the end of the trajectory, while lack of satisfying (4) creates growth of error for all time steps after a settling time of the system, i.e. once steady state frequency response thinking applies. These diﬀerent signatures make the two sources of growth easily distinguishable if the trajectory is chosen substantially longer than one settling time of the system. It is this phase cancellation law that we seek to alter: instead of placing the plot of G(z)L(z) on the positive real axis, we seek to place it on a circle of chosen radius r1 (length DF in Fig. 1) which is inside the unit circle. This is done using our current model which we denote by Gn (z) (with corresponding magnitude rn (ω) and phase angle θn (ω) made with the positive real axis, for z = exp(iωT )) creating the learning law Ln (z). When we study how it behaves when applied to a real world that is diﬀerent than our current model, we denote the real world behavior by Gr (z) (with corresponding rr , θr ). Since circle r1 is inside the unit circle, inequality (4) is satisﬁed for the nominal model, but if radius r1 is near unity, then one expects that small model errors will send the learning process unstable. A question to be addressed is, how do we choose what frequency between zero and Nyquist should be placed at each point on this circle. Two methods will be suggested and studied. The most common source of instability in ILC comes from phase inaccuracy of the model at high frequency, often due to missing high frequency dynamics. A missing high frequency pole introduces more phase lag at high frequency than in the nominal model. This suggests that one place all frequencies on the lower half of the circle. Here we are interested in parameter errors, and one expects that parameter errors of one sign will produce a positive phase error, and of the opposite sign will produce a negative phase error. For this reason, we investigate using two ILC iterations, one mapping points to the lower half of the circle, and the other mapping to the upper half. The degree to which this is eﬀective will be investigated. The learning law is developed as follows. Given the p time step history of the error, one can see approximately p/2 discrete frequencies. These frequencies are related to (2π/(pT ))j for j = 0, 1, 2, ..., p − 1, corresponding to these numbers up through Nyquist frequency and then folding onto existing frequencies below Nyquist. Deﬁne z0 = exp(i2π/p) and construct matrix H −(α−1)(β−1) whose α, β component is given by Hαβ = z0 . Then He produces the discrete Fourier transform (DFT) of the error vector, where the ﬁrst element is related to DC, the next element and the last element combine to form the frequency component related to discrete frequency 2π/(pT ), etc. In the frequency domain, our objective is to create Ln (z) to satisfy Ln (eiωT )Gn (eiωT ) = r2 (ω)eiθ2 (ω) ; Ln (eiωT ) = [r2 (ω)/rn (ω)]ei(θ2 (ω)−θn (ω))

(5)

Here, the chosen location for frequency ω is point F in Fig. 1 with AF being of length r2 (ω), and the angle DAF being θ2 (ω) except that we wish to measure this angle in a manner consistent with the associated phase lag and

364

R.W. Longman et al.

hence make it measured positive in the counterclockwise direction (the angle is negative for the point F as pictured). In matrix form we can produce this change in magnitude and phase for each of the elements related to its discrete frequency, by premultiplying He by a diagonal matrix given by

Λ = diag

r2 (ωj ) rn (ωj )

ei[θ2 (ωj )−θn (ωj )] ; ωj =

2π j; j = 0, 1, 2, ..., p − 1 (6) pT

Note that j = 1 and j = p − 1 that are associated with the same frequency, have opposite signs in the exponential. And the frequencies in (6) go up to two Nyquist, doing so in such a way as to accomplish the desired phase change with the right sign in both terms. Then one must convert back to the time domain using H −1 = (1/p)(H ∗ )T where the asterisk denotes the complex conjugate, and T is the transpose. The resulting learning law is uj+1 = uj + Ln ej = uj + (1/p)(H ∗ )T ΛHej

(7)

3.1 Mapping linearly with central angle One choice for the mapping onto the circle of radius r1 is to map these points linearly with frequency to the angle θ1 corresponding to ∠EDF , measured positive clockwise for placement on the lower half of the circle and positive counterclockwise on the upper half, starting at zero frequency at angle zero, and ending at Nyquist at angle 180 degrees. This produces θ1 (ω) = ωT . In order to use this statement to produce the control law (7) we need to compute for each frequency what the polar coordinates r2 , θ2 are for the chosen point F . We will need the law of cosines for general triangles which says that the square of the length of one side is equal to the sum of the squares of the other two sides minus two times the product of these two sides times the cosine of the angle between them. Use triangle ADF to compute r2 which is AF , and AD is one, DF is r1 , and ∠ADF is π −θ1 . To ﬁnd θ2 which is ∠DAF (but adjusted for the sign convention), again use triangle DAF , but this time adjust the choice of sides in the law of cosines so that the angle involved is ∠DAF , and then solve for this angle. The results are r2 = sqrt[1 + r12 − 2r1 cos(π − θ1 )] ; θ2 = ±acos[(1 − r12 + r22 )/(2r2 )]

(8)

The negative sign is used for mapping to the bottom half of the circle, and the plus used for mapping to the top half. 3.2 Mapping linearly with horizontal component Consider a second choice that maps frequencies onto the chosen circle starting with zero frequency at point E, and progressing to Nyquist at point

Designing Learning Control for Parameter Identiﬁcation

365

B, and doing so with the frequency made linear in the horizontal component x of point F , i.e. x = 1 + r1 corresponds to ωT = 0 and x = 1 − r1 corresponds to ωT = π. Given this x which is the horizontal component of point F , we can compute the vertical component y, by using (x − 1)2 + y 2 = r12 , picking y as negative for mapping to the bottom half, and positive for the top half. Then the needed polar coordinates are r2 = sqrt(x2 + y 2 ) and θ2 = atan2(y, x), where the arc tangent function of two arguments is used in order to have the right quadrant for the result. 3.3 Computing stability limits on phase and gain error Consider that we have designed Ln (z) based on our current system model Gn (z) according to equation (5) (and in matrix form (6,7)), and we apply it to the real world hardware whose transfer function is Gr (z). The intended point for frequency ω is point F given by r2 (ω)exp[iθ2 (ω)]. The actual point produced is given by Ln (eiωT )Gr (eiωT ) = Ln Gn

Gr Gn

= r2 (ω)eiθ2 (ω)

rr (ω) rn (ω)

ei(θr (ω)−θn (ω))

(9)

It is of interest to determine what the limits are on the phase angle error θr −θn before inequality (4) is violating and the iteration makes certain frequency components of the error grow. Similarly we are interested in the maximum value of rr /rn before magnitude error produces growth. Two limits on phase error for point F are an additional phase lag corresponding to ∠F AI and a phase lead corresponding to ∠HAF . Of course, if we are interested in the limits when mapping to the upper half of the circle, the same limits will apply but with reversed sign, so we only consider mapping to the lower half. We need to ﬁnd ∠DAI, and then knowing ∠DAF from the θ2 computation above, allows one to sum the angles for the positive tolerance ∠F AH and diﬀerence them for the negative tolerance ∠F AI. First we ﬁnd the horizontal component xI of point I (or H), by noting that this point is on two circles: x2I +yI2 = r22 and (xI −1)2 +yI2 = 1. Substituting the left hand side of the ﬁrst into the second produces xI = r22 /2, and substituting this into the ﬁrst produces yI = −r2 sqrt(1 − r22 /4). Then ∠DAI is given by the arc tangent of yI /xI . Figure 2 plots the results. The solid lines are the upper and lower limits for phase error using r1 = 0.95, and the dashed lines are for r1 = 0.9. The top left plot uses linear in central angle, and the top right uses linear in horizontal component. Figure 3 gives a detailed view. This time the solid lines are for linear in horizontal component with the upper plot of the two being for radius 0.95. The dashed lines are corresponding curves for 0.90. We see that linear in horizontal component is much more uniform in its sensitivity to phase lag, and hence is to be preferred. Both approaches have a minimum tolerance of -2.865 degrees for radius 0.95, and of 5.732 degrees

366

R.W. Longman et al.

for radius 0.90. One would expect that such a tight robustness limit would make ILC very eﬀective at producing data that is ampliﬁed where the model is wrong even by a small amount. To study the magnitude tolerance, we need to ﬁnd point J. Triangle DAJ is an isosceles triangle and we know ∠DAJ. Bisecting this angle forms a right triangle whose base is half the maximum rr allowed, and whose hypotenuse is unity. Hence, rr = 2cosθ2 . The bottom left plot in Fig. 2 gives rr /rn versus percent Nyquist frequency for linear in central angle, and the bottom right plot gives the corresponding plot for linear in horizontal component. Note the only possibility for going unstable is an increase in the amplitude of the output. Linear in x 200

150

150

Stable Phase Inverval (deg)

Stable Phase Inverval (deg)

Linear in angle 200

100 50 0 −50 −100 0

20

40 60 Percent Nyquist

80

100

100 50 0 −50 −100 0

20

Linear in angle

40 60 Percent Nyquist

80

100

80

100

Linear in x 40

Magnitude Tolerance

Magnitude Tolerance

40

30

20

10

0

0

20

40 60 Percent Nyquist

80

100

30

20

10

0 0

20

40 60 Percent Nyquist

Figure 2. Stability limits on phase error (top) and magnitude error (bottom) for linear in central angle (left) and linear in horizontal component (right)

4 Numerical Investigation of Sensitivity to Parameter Error A simple but rather good model of the input to output transfer function for the feedback control systems for each link of the robot used in experiments in [9] is given in Laplace transfer function form as

Designing Learning Control for Parameter Identiﬁcation

367

Stable Phase Inverval (deg)

0

−3

−6

−9

−12

−15

10

20

30

40 50 60 Percent Nyquist

70

80

Figure 3. Detail of the stability limit on phase error using linear in horizontal component law

G(s) =

(s +

a)(s2

K + 2ζΩs + Ω 2 )

(10)

where a = 8.8, ζ = 0.5, Ω = 37, and K = 8.8 ∗ (37)2 . We choose to discretize this as fed by a zero order hold sampling every T = 0.01 second, and then regard the resulting transfer function as the current model Gn (z). Consider the linear in horizontal component ILC law above with r1 = 0.95. Then Fig. 4 plots equation (9), applying this learning law to real world models Gr (z) that corresponds to having the parameters a, ζ, Ω individually changed by +10% and by -10%. The lower half of each plot corresponds to mapping onto the lower half of the circle of radius r1 , and the upper half of the ﬁgures gives the corresponding results for mapping onto the upper half of the circle. Any frequency component that plots outside the unit circle centered at +1 will grow with iterations by an ampliﬁcation factor equal to the radial distance to that point on the curve. Letting it grow for enough iterations, will make arbitrarily large this part of the system response that is not properly predicted by the nominal model. Hence, it can pull the errors out of the noise level and produce data that is rich in information about the model error. For the case of the actual frequency Ω being 10% smaller than in the current model, i.e. the lower right plot in Fig. 4, the curve reaches a radial distance from +1 of 1.47 at frequency 3.5 Hz and hence will grow large within a few iterations. The original objective of doing one iteration with the ILC mapping to the lower half of the circle and one ILC mapping to the upper half of the circle, was to have one of the two be sent unstable when there is a relatively small phase error. Examining Fig. 4 we see that when a or ζ is increased by 10%, both mappings send the plot in the stable direction so that neither one produces data that helps identify errors in this direction for these parameters. In the case of increasing Ω, using both mappings has the intended eﬀect: when using the upper mapping the ILC becomes unstable while it stays stable with the lower mapping. When the parameters are decreased by 10% all cases result in instability whether mapping to the lower half or the upper half.

368

R.W. Longman et al.

The behavior for parameter a can be understood by considering that it is not only the phase that changes, but also the magnitude. When a is increased, the phase angle is less than expected at all frequencies which should send the plot in the stable direction when mapping to the lower half, and toward the unstable direction when mapping to the upper half. However, when a is increased by 10% for the real world compared to the model, the learning law has placed the DC gain smaller than anticipated, which pulls both plots in the stable direction. The same eﬀect is happening at other frequencies as well. Based on the top left plot in Fig. 4, it appears that the second eﬀect overpowers the ﬁrst and prevents the mapping to the upper half from producing instability. a−10%

0.8

0.8

0.4

0.4

Imaginary Part

Imaginary Part

a+10%

0 −0.4 −0.8 −0.5

0 −0.4 −0.8

0

0.5

1 Real Part

1.5

2

2.5

−0.5

0

0.5

0.8

0.8

0.4

0.4

0 −0.4 −0.8 −0.5

2.5

1.5

2

2.5

1.5

2

2.5

−0.4 −0.8

0

0.5

1 Real Part

1.5

2

2.5

−0.5

0

0.5

1 Real Part Ω−10%

0.8

0.8

0.4

0.4

Imaginary Part

Imaginary Part

2

0

Ω+10%

0 −0.4 −0.8 −0.5

1.5

ζ−10%

Imaginary Part

Imaginary Part

ζ+10%

1 Real Part

0 −0.4 −0.8

0

0.5

1 Real Part

1.5

2

2.5

−0.5

0

0.5

1 Real Part

Figure 4. Polar plots with a, ζ, Ω changed

Designing Learning Control for Parameter Identiﬁcation

369

5 Conclusions The use of iterative learning control was suggested in a previous publication that used a speciﬁc form of ILC and studied the ability to pull unmodeled residual modes or poles out of the noise level in data used for identiﬁcation. The ILC law used there is easy to apply without any typical design process being needed, but it is also a law that is relatively robust to model error. In this paper we have introduced a new law that is intentionally non-robust to model error. And we have studied its ability to identify parameter errors such as pole locations and damping factors. It is seen that the ILC law is very sensitive to phase errors in the model. But this sensitivity is often oﬀset by a correlated change in the magnitude response, with the result that often the sensitivity is limited to model errors of a given sign. Hence, direct application of the methods can be very eﬀective, but are not guaranteed to produce data that helps with the parameter identiﬁcation. One can address this issue by modifying the nominal model parameters of interest, going both up and down in value, when designing the learning laws so that one of the two ILC iterations will result in the desired data. Of course the method will maintain the sensitivity to missing residual modes or poles that was demonstrated in the previous work.

References [1] [2]

[3]

[4]

[5] [6] [7]

[8]

Goodwin G. C., Payne, R.: Dynamic System Identiﬁcation, Experiment Design, and Data Analysis. Chapter 6, Experiment Design, Academic Press, NY (1977) Bauer I., Bock H. G., K¨ orkel S., Schl¨ oder J. P.: Numerical Methods for Optimum Experimental Design in DAE Systems. Journal of Computational and Applied Mathematics, Vol. 120, pp. 1-25 (2000) K¨ orkel S., Kostina E., Bock H. G., Schl¨oder J. P.: Numerical Experiments for Nonlinear Dynamic Processes. Optimization Methods and Software Journal, Vol. 19(3-4), pp. 327-338 (2004) Longman R. W., Phan, P. Q.: Iterative Learning Control as a Method of Experiment Design for Improved System Identiﬁcation, Optimization Methods and Software, Vol. 21, 6, pp. 919–941 (2006) Moore K. L., Xu J.-X. Guest Editors: Special Issue on Iterative Learning Control, International Journal of Control, 73(10), (July 2000) Bien Z., Xu, J.-X. Editors: Iterative Learning Control: Analysis, Design, Integration and Applications, Kluwer Academic Publishers, Boston, (1998) Longman R. W.: 2000, Iterative Learning Control and Repetitive Control for Engineering Practice. International Journal of Control, Special Issue on Iterative Learning Control, Vol. 73(10), pp. 930-954 (2000) Longman R. W.: 1998, Designing Iterative Learning and Repetitive Controllers. In: Bien and Xu editors, Iterative Learning Control: Analysis, Design, Integration and Applications, Kluwer Academic Publishers, Boston, pp. 107-146 (1998)

370 [9]

[10]

[11]

[12] [13]

R.W. Longman et al. Elci H., Longman R. W., Phan M. Q., Juang J.-N., Ugoletti R.: Automated Learning Control through Model Updating for Precision Motion Control. Adaptive Structures and Composite Materials: Analysis and Applications, ASME, AD-Vol. 45/MD-Vol. 54, pp. 299-314 (1994) Longman R. W., Huang Y.-C.: The Phenomenon of Apparent Convergence Followed by Divergence in Learning and Repetitive Control. Intelligent Automation and Soft Computing, Special Issue on Learning and Repetitive Control, Guest Editor: H. S. M. Beigi, Vol. 8(2), pp. 107-128 (2002) Juang J.-N., Phan M. Q., Horta L. G., Longman R. W.: Identiﬁcation of Observer/Kalman Filter Markov Parameters: Theory and Experiments. Journal of Guidance, Control, and Dynamics, Vol. 16(2), pp. 320-329 (1993) Juang J.-N.: Applied System Identiﬁcation. Prentice Hall, Englewood Cliﬀs, NJ (1994) Van Overschee P., De Moor, B.: Subspace Identiﬁcation for Linear Systems: Theory, Implementation, and Applications. Kluwer Academic Publishers, Boston (1996)

Fast Numerical Methods for Simulation of Chemically Reacting Flows in Catalytic Monoliths Hoang Duc Minh1∗ , Hans Georg Bock1 , Hoang Xuan Phu2 , and Johannes P. Schl¨ oder2 1

2

Interdisciplinary Center for Scientiﬁc Computing, University of Heidelberg, Im Neuenheimer Feld 368, 69120 Heidelberg, Germany [email protected], [email protected], [email protected] Institute of Mathematics, Vietnamese Academy of Science and Technology, 18 Hoang Quoc Viet Road, 10307 Hanoi, Vietnam [email protected]

Abstract Chemically reacting ﬂows in catalytic monoliths are investigated. The ﬂuid dynamics are modelled by the boundary layer equations (BLEs), which are a large system of parabolic partial diﬀerential equations (PDEs) with highly nonlinear boundary conditions arising from the coupling of surface processes with the ﬂow ﬁeld inside the channel. The BLEs are obtained by simplifying the comprehensive model described by the Navier-Stokes equations and applying the boundary approximation theory. The surface and gas-phase chemical reactions are described by detailed models. The PDEs are semi-discretized using the method of lines leading to a structured system of diﬀerential-algebraic equations (DAEs). The DAEs are solved by an implicit method, based on the backward diﬀerentiation formulas (BDF). Solution of DAEs by BDF methods requires the partial derivatives of the DAE model functions with respect to the state variables. By exploiting the structure of the DAEs, we develop eﬃcient methods for computation of the partial derivatives in the framework of automatic diﬀerentiation and of ﬁnite diﬀerences. Applying these methods, we obtain a signiﬁcant improvement in computing time. Moreover, the results also show that for the solution of our DAE systems the computation of the derivatives by automatic diﬀerentiation always outperforms the computation of derivatives by ﬁnite diﬀerences. Numerical results for a practical application of catalytic oxidation of methane with a complex reaction mechanism are presented.

∗

Supported by International Research Training Program “Complex Processes: Modelling, Simulation and Optimization” and DFG-Sonderforschungsbereich 359 “Reactive Flows, Diﬀusion and Transport”.

372

H.D. Minh et al.

1 Introduction Theoretical analysis and simulation of complicated processes in catalytic monoliths, used widely in industries, are currently active research topics. The aim of this work is modeling and developing robust numerical methods for simulating physical-chemical processes in a single channel, as a ﬁrst step to study the complex ones in the whole catalytic monoliths. Earlier investigations can be found in [4–6, 9]. In the report [9], boundary layer theory is applied for approximation the ﬂow ﬁeld. The Navier-Stokes equations are simpliﬁed to get a system of large-scale parabolic partial diﬀerential equations (PDEs). This paper is organized as follows. The mathematical model using the boundary layer equations is stated in Section 2. In Section 3, numerical methods are discussed. Some computational results of practical reactive ﬂows are given in Section 4. The paper concludes in Section 5 with a summary of the obtained results and discussion of further work.

2 Mathematical Model 2.1 Modeling of the ﬂuid dynamical process To model ﬂows in a channel, one can employ the boundary layer equations which are simpliﬁed from the Navier-Stokes equations. Since our considered channel is an axisymmetrical cylinder, we assume the simpliﬁcation that the investigated ﬂow in it is also axisymmetrical, which can be described by two spatial coordinates, namely the axial one z and and the radial one r. By applying von Mises transformation r ψ= ρur dr , 0

where ψ is stream variable and u is axial velocity, there arises the following equation system, which we use for modeling reactive ﬂow in a channel of monoliths [10]. Momentum: ∂ ∂u ∂u ∂p + = ρu ρu ρuµr2 , (1) ∂z dz ∂ψ ∂ψ ∂p = 0. (2) ∂ψ Energy: ρucp

∂ ∂T = ρu ∂z ∂ψ

ρuλr2

∂T ∂ψ

−

Ng k=1

ω˙ k Wk hk − ρur

Ng k=1

Jkr cpk

∂T . ∂ψ

(3)

Fast Numerical Methods for Reacting Flows in Catalytic Monoliths

373

Species: ρu

∂ ∂Yk = ω˙ k Wk − ρu (rJkr ), ∂z ∂ψ

(k = 1, . . . , Ng ).

(4)

State: p=

ρRT . W

(5)

Where z, r : cylindrical coordinates, u, v : axial and radial components of the velocity vector, p : pressure, T : temperature, Yk : mass fraction of the kth species, µ : viscosity, ρ : mass density, cp : heat capacity of the mixture, λ : thermal conductivity of the mixture, cpk : speciﬁc heat capacity of the kth species, Ng : total number of gas phase species, Kg : total number of elementary reactions, Jkr : radial component of mass ﬂux vector of the kth species, ω˙ k : rate of production of the kth species by the gas phase reactions, hk : speciﬁc heat enthalpy of the kth species, Wk : molecular weight of the kth species, W : mixture mean molecular weight, R : universal gas constant. The diﬀusion mass ﬂux Jkr appearing in (3) and (4) is given by Jkr = −Dkm

Wk 2 ∂Xk ρur ∂T − DkT , ρ ur ∂ψ T ∂ψ W

(6)

where Dkm and DkT are diﬀusion coeﬃcients. As mass fraction, Yk must satisfy 0 ≤ Yk ≤ 1

(k = 1, . . . , Ng ),

Ng

Yk = 1.

(7)

k=1

The relation between ψ and r will be treated later in the diﬀerential form 0=

2 ∂r2 − . ∂ψ ρu

(8)

374

H.D. Minh et al.

2.2 Modeling of the chemical process Modeling of gas-phase chemistry To model the chemical kinetics of a moving gas mixture, we use the detailed chemistry describing elementary reactions on molecular level (see, e.g., [8, 11] for more details). The rate of production ω˙ k of kth species appearing in equations (3) and (4) is given by ω˙ k =

Kg

Ng 7

vki kfi

i=1

[Xj ]vji

(k = 1, . . . , Ng ),

j=1

where Kg : total number of elementary reactions, Ng : total number of gas phase species, : stoichiometric coeﬃcient of the kth species vki in the ith reaction at the left side, : stoichiometric coeﬃcient of the kth species vki in the ith reaction at the right side, − vki , vki : = vki [Xj ] : concentration of the jth species, kf i : forward rate coeﬃcient of the ith reaction. The forward rate coeﬃcient kf i is calculated by the Arrhenius expression: Eai βi kfi = Ai T exp − (i = 1, . . . , Kg ), RT where Ai : pre-exponential factor of the ith reaction [the units are given in terms of m, mol, and s], βi : temperature exponent of the ith reaction, Eai : activation energy of the ith reaction [J], R : universal gas constant being 8.314 [J/(mol · K)], T : gas temperature [K]. Modeling of surface chemistry The chemistry source term s˙ i appearing later in the boundary conditions (12) and (14) states the creation or depletion rate of the ith species due to the adsorption/desorption process, which is given by s˙ i =

Ks k=1

7

Ng +Ns

vik kfk

vjk

[Xj ]

,

(9)

j=1

where Ks and Ns are the numbers of elementary surface reactions and of adsorbed species, respectively. If the jth species is a gas then [Xj ] is its concentration, otherwise, if it is a surface species then [Xj ] = Θi Γ , where Θi is the surface coverage and Γ is the surface site density.

Fast Numerical Methods for Reacting Flows in Catalytic Monoliths

375

In some reactions, the rate coeﬃcient kf k is calculated by the standard Arrhenius formula Eak βk kfk = Ak T exp − , RT and in some other reactions, it is described by the modiﬁed Arrhenius formula Ns ik Θi Eak 7 Θiµik exp kfk = Ak T βk exp − RT i=1 RT

(10)

where µik and ik are surface parameters, and Θi is the surface coverage, which must satisfy 0 ≤ Θi ≤ 1 (i = 1, . . . , Ns ),

Ns

Θi = 1.

(11)

i=1

2.3 Initial and boundary conditions We want to explicitly mention that we consider the system only at steady state. For initial conditions, the values of u, p, T , and Yk are speciﬁed at the inlet of the channel. At ψ = 0, which corresponds to the centerline of cylindrical channel, we can deduce the following boundary conditions from the assumed axisymmetry r = 0, ∂u/∂ψ = 0, ∂p/∂ψ = 0, ∂T /∂ψ = 0, ∂Yk /∂ψ = 0. 4r At ψ = ψmax = 0 max ρur dr z=0 , which corresponds to the channel wall, it holds r = rmax , u = 0, T = Twall . In addition, since an essential part of chemical reactions takes place at the catalytic wall and the system is in a steady state, the gas species mass ﬂux produced by heterogeneous chemical reactions and the mass ﬂux of that species in the gas must be balanced there, i.e. s˙ k Wk = Jkr

(k = 1, . . . , Ng ),

(12)

where s˙ k is the rate of creation/depletion of the kth gas-phase species by surface reactions (see Section 2.2). Note that Jkr represents the diﬀusive ﬂux and normally we still have to take the convective ﬂux ρYk vst into account, where vst is Stefan velocity calculated by 1 = s˙ k Wk . ρ Ng

vst

k=1

376

H.D. Minh et al.

But in our case, the convective ﬂux is neglected because it actually vanishes. Due to steady state assumption, the surface coverage Θi does not depend on time, i.e. ∂Θi /∂t = 0. By deﬁnition, ∂Θi /∂t = s˙ i /Γ

(i = Ng + 1, . . . , Ng + Ns ).

(13)

Hence, we have s˙ i = 0

(i = Ng + 1, . . . , Ng + Ns ).

(14)

Equations (12)–(14) stand for the reactions at the catalytic surface, which play a crucial role in the whole physical-chemical process. They must be solved with high accuracy, since their solution strongly inﬂuences the solution of the whole system. By (6), (9), (10), and (12), these equations are highly nonlinear with respect to the unknowns Yk and Θi . This fact causes major diﬃculties in the numerical treatment and makes an essential diﬀerence between our problem and the one without catalytic surface. Note that the boundary conditions of our PDEs are not standard ones, such as a Dirichlet condition where the dependent variables are explicitly speciﬁed at the boundary or a Neumann condition where the ﬁrst-order derivatives of dependent variables are known, but they are given as the algebraic equations (12) and (14).

3 Numerical Methods The PDE model equations presented in Section 2 are semi-discretized by using the method of lines leading to a structured system of diﬀerentialalgebraic equations (DAEs). The DAEs are solved by an implicit method, based on the backward diﬀerentiation formulas (BDF). For the practical computation, based on the code DAESOL [1,2], we develop a new code that allows us to solve this problem. Features of this code are variable step size and variable order controlled by error estimation, modiﬁed Newton’s method for the solution of the implicit nonlinear problems, an eﬃcient monitoring strategy to control the computation and decomposition of the Jacobian. For more details, see [10]. With a suitable arrangement of variables and model functions (e.g., natural order scheme), the iteration matrices of the Newton iteration at each step of integration of the DAEs are of block tridiagonal structure. The solution of DAEs by BDF methods requires the partial derivatives of the DAE model functions with respect to the state variables. For computation of the (partial) derivatives of the model functions with respect to the state variables, we take the block tridiagonal structure into account. In particular, we employ the CPR [7] method, which exploits the sparsity structure of the Jacobian J by identifying the structurally orthogonal columns of J. Instead of computing the full dense Jacobian, which requires n + 1 model function evaluations (n is the number of components of the variable vector) by forward ﬁnite diﬀerences, we compute a compressed Jacobian,

Fast Numerical Methods for Reacting Flows in Catalytic Monoliths

377

whose the number of columns is nb (nb is the full bandwidth of the Jacobian). This requires only nb + 1 function evaluations, which is much smaller than n. Then the full Jacobian is extracted from the compressed one. Alternatively, the partial derivatives can also be computed by using automatic diﬀerentiation [3], which gives more accurate numerical derivatives. Here, we use ADIFOR 2.0 [3], for generating Fortran 77 derivative code from the model function code in Fortran 77. Similar to the ﬁnite diﬀerence method instead of computing the full Jacobian by using the identity matrix as the seed matrix, we compute a compressed Jacobian using a compressed seed matrix. This requires only computing nb directional derivatives instead of n ones.

4 Numerical Results We investigate a practical application—the catalytic oxidation of methane. The chemical processes are described by 21 gas phase species, 11 surface species, 128 gas phase reactions, and 23 surface reactions. The gas-phase and surface reactions are given in [10]. The gas mixture ﬂows in a channel with the following setting. – –

–

Channel geometry: radius rmax = 2.5 × 10−4 m, channel length zmax = 0.01 m. Initial conditions: at the inlet, three species are present, and the initial mole fractions are as follows: XCH4 = 0.5, XO2 = 0.3, XN2 = 0.2, the other species are absent at the inlet. The initial gas temperature is Tgas = 298 K, the initial pressure is p = 1.2 × 105 [Pa], and the initial velocity is u = 0.5 m/s. Boundary conditions: the temperature at the wall is Twall = 1373 K.

The following computations are performed on a Pentium 4, 2.6GHz, Linux with Intel Fortran compiler using double precision. The integration error is controlled with the relative error tolerance RTOL = 10−4 and the absolute error tolerance ATOL = 10−12 . FD and AD are the abbreviations for Finite Diﬀerences and Automatic Diﬀerentiation, respectively. LA is the abbreviation for linear solver. Figure 1 shows the results of a simulation using 20 grid points for semidiscretization of the stream domain ψ. Because of space restriction, here we only show some major product and source species. The source species— methane and oxygen—are consumed completely within the ﬁrst millimeter of the reactor, giving mainly CO, CO2 , H2 and H2 O. Let us deﬁne the Speedup to be the ratio between the CPU time for solving a problem using the standard method in DAESOLE (ﬁnite diﬀerences for computation of derivatives) which is named as standard FD, and the CPU time for solving the problem by the indicated other method. Tables 1 and 2 show CPU times for running simulation code and speedups with diﬀerent modes of derivatives computation. The block tridiagonal FD/AD

378

H.D. Minh et al. YCH4

YO2

0.4 0.3 0.3

0.25 0.2

0.2

0.15 0.1

0.1

0.05

2

2

0.01

1.5

−4

1

x 10

0.01

1.5

−4

1

x 10

0.005 0.5

0.5 0

r

0.005

0

0

r

z

YCO

0

z

YCO2

0.04

0.5

0.03

0.4 0.3

0.02

0.2 0.01

0.1 0

0 2

−4

2

0.01

1.5 1

x 10

0.01

1.5

−4

1

x 10

0.005 0.5

0.5 0

r

0.005

0

0

r

z

0

z

YH2

YH2O

0.12 0.06

0.1 0.08

0.04

0.06 0.04

0.02

0.02 0

0 2

−4

2

0.01

1.5 1

x 10

0.005

−4

0.01

1.5 1

x 10

0.5 r

0.005 0.5

0

0

z

r

0

0

z

Figure 1. Proﬁles of some major species

option is few times faster than the dense FD/AD option. This is due to a large amount of time for model function calls, for computing the Jacobian by FD, in the dense case. On the other hand, only a ﬁxed number of model function calls for the block tridiagonal FD (independent of the number of grid points) is needed for approximating the Jacobian. The results shown in Tables 1 and 2 also reﬂect the prediction that when the number of grid points is increased, the computing times for the dense cases are much greater than for the block tridiagonal FD cases. Moreover, it also shows that for simulation, the computation of derivatives by automatic diﬀerentiation outperforms the computation of derivatives by ﬁnite diﬀerences.

Fast Numerical Methods for Reacting Flows in Catalytic Monoliths

379

Table 1. Timings with diﬀerent modes of derivatives computation Nodes Standard FD (secs) Blk.Tri. FD (secs) Standard AD (secs) Blk.Tri. AD (secs)

12 13.62 4.72 3.50 2.40

20 73.43 13.89 12.30 7.50

28 216.66 27.84 27.40 14.20

36 440.55 47.96 46.60 24.30

Table 2. Speedup gained by diﬀerent methods Nodes & (#DAE) Blk.Tri. AD Standard AD Blk.Tri. FD 12 (286) 5.66 3.88 2.77 9.78 5.96 5.09 20 (486) 15.25 7.90 7.44 28 (686) 28.23 9.45 8.79 36 (886)

5 Conclusion We have shown eﬃcient numerical methods for the simulation of chemically reacting ﬂows in a single channel of catalytic monoliths. By exploiting the structure of semi-discretization of the PDEs model, i.e., the block tridiagonal structure, we can obtain a speedup of the simulation code up to 28. The new simulation software is applied to a practical application, catalytic oxidation of methane. The results of simulation are presents. Further developments of the methods to optimization of chemically reacting ﬂows in catalytic monoliths were also considered, and will be published elsewhere.

Acknowledgments The ﬁrst author would like to thank Prof. Dr. Olaf Deutschmann and Dr. Steﬀen Tischer, Institute for Chemical Technology and Polymer Chemistry, University of Karlsruhe for supplying the reaction mechanisms and helpful discussions. He also gratefully acknowledges many helpful suggestions and comments of Prof. Dr. Robert J. Kee, Division of Engineering Colorado School of Mines, USA, especially during the time he visited the IWR. The ﬁrst author very much appreciates the ﬁnanical support from the German Science Foundation (DFG — Deutsche Forschungsgemeinschaft) within the Graduiertenkolleg program: “Complex Processes: Modeling, Simulation and Optimization”, and SFB (Sonderforschungsbereich) 359 “Reactive Flow, Diﬀusion and Transport”.

References 1. I. Bauer, H. G. Bock, and J. P. Schl¨ oder. DAESOL – a BDF-code for the numerical solution of diﬀerential algebraic equations. Technical report, SFB 359, IWR, University of Heidelberg, 1999.

380

H.D. Minh et al.

2. I. Bauer, F. Finocchi, W. Duschl, H. Gail, and J. Schl¨ oder. Simulation of chemical reactions and dust destruction in protoplanetary accretion disks. Astronomy & Astrophys., 317:273–289, 1997. 3. C. Bischof, A. Carle, P. Hovland, P. Khademi, and A. Mauer. ADIFOR 2.0 User’s Guide, 1995. 4. M. E. Coltrin, R. J. Kee, and J. A. Miller. A mathematical model of the coupled ﬂuid mechanics and chemical kinetics in a chemical vapor deposition reactor. Journal of The Electrochemical Society, 131(2):425–434, 1984. 5. M. E. Coltrin, R. J. Kee, and J. A. Miller. A mathematical model of silicon chemical vapor deposition. Journal of The Electrochemical Society, 133(6): 1206–1213, 1986. 6. M. E. Coltrin, H. K. Moﬀat, R. J. Kee, and F. M. Rupley. CRESLAF (verison 4.0): A Fortran program for modelling laminar, chemically reacting, boundarylayer ﬂow in the cylindrical or planar channels. Technical Report SAND93–0478, Sandia National Laboratories, Apr 1993. 7. A. R. Curtis, M. J. D. Powell, and J. K. Reid. On the estimation of sparse Jacobian matrices. Journal of the Institute of Mathematical Applications, 13: 117–119, 1974. 8. O. Deutschmann. DETCHEM - User manual, version 1.4. IWR, University of Heidelberg, 2000. 9. R. J. Kee and J. A. Miller. A computational model for chemically reacting ﬂow in boundary layers, shear layers, and ducts. Technical Report SAND81-8241, Sandia National Laboratories, Albuquerque, NM, 1981. 10. H. D. Minh. Numerical Methods for Simulation and Optimization of Chemically Reacting Flows in Catalytic Monoliths. PhD thesis, Faculty of Mathematics and Computer Science, University of Heidelberg, December 2005. 11. J. Warnatz, R. Dibble, and U. Maas. Combustion, Physical and Chemcial Fundamentals, Modeling and Simulation, Experiments, Pullutant Formation. Springer–Verlag, New York, 1996.

A Deterministic Optimization Approach for Generating Highly Nonlinear Balanced Boolean Functions in Cryptography Le Hoai Minh1 , Le Thi Hoai An1 , Pham Dinh Tao2 , and Pascal Bouvry3 1

2

3

Laboratory of Theoretical and Applied Computer Science (LITA EA 3097) University of Paul Verlaine - Metz, Ile du Saulcy, 57045 Metz, France [email protected], [email protected] Laboratory of Modelling, Optimization & Operations Research, National Institute for Applied Sciences-Rouen, BP 08, Place Emile Blondel, 76131 Mont Saint Aignan Cedex, France [email protected] Computer Science Research Unit, University of Luxembourg, Campus Kirchberg, 6 Rue Richard Coudenhove-Kalergi, L-1359 Luxembourg [email protected]

Abstract We propose in this work a deterministic continuous approach for constructing highly nonlinear balanced Boolean functions, which is an interesting and open question in Cryptography. Our approach is based on DC (Diﬀerence of Convex functions) programming and DCA (DC optimization Algorithms). We ﬁrst formulate the problem in the form of a combinatorial optimization problem, more precisely a mixed 0-1 linear program. By using exact penalty technique in DC programming, this problem is reformulated as polyhedral DC program. We next investigate DC programming and DCA for solving this latter problem. Preliminary numerical results show that the proposed algorithm is promising and more eﬃcient than somes heuristic algorithms.

1 Introduction Boolean functions play an important role in cryptography, especially in S-box analysis. They are elementary building blocks for various cryptographic algorithms - stream ciphers, block ciphers, hash functions, etc. Cryptography needs ways to ﬁnd good Boolean functions so that ciphers can resist cryptanalytic attack. The main properties required are high nonlinearity and low autocorrelation, so that linear cryptanalysis and diﬀerential cryptanalysis do not succeed faster than exhaustive key search. These properties hve been widely studied in the literature (see e.g. [5, 12] and references therein).

382

L.H. Minh et al.

The purpose of this paper is constructing highly nonlinear balanced Boolean functions, which is an interesting and open question in cryptography. To our knowledge, while heuristic optimization approaches, in particular genetic algorithms are useful for this problem (see e.g. [6, 11] and reference therein) there are no deterministic models and methods for them. In this work we attemp to use a deterministic continuous optimization approach based on DC (Diﬀerence of Convex functions) programming and DCA (DC optimization Algorithms) for the above purpose. DC programming - which deals with the minimization of a DC function over a convex set - and DCA were introduced by Pham Dinh Tao in their preliminary form in 1986 and have been extensively developed since 1994 by Le Thi Hoai An and Pham Dinh Tao to become now classic and more and more popular (see e.g. [1–3, 9, 10] and references therein). DCA has been successfully applied to many large-scale (smooth or nonsmooth) nonconvex programs in various domains of applied sciences, among them data analysis and data mining (see [3, 13] and references therein), for which it provides very often a global solution and proves to be more robust and eﬃcient than standard methods. This is the main motivation for our present work. We ﬁrst formulate the problem in the form of a combinatorial optimization problem, more precisely a mixed 0-1 convex polyhedral program. The problem is then reformulated as a polyhedral DC program via exact penalty techniques in DC programming [4]. We next investigate DC programming and DCA for solving the related polyhedral DC program. The paper is organized as follows. Section 2 introduces some notations and properties concerning Boolean functions that will be used in the next. Section 3 deals with the optimization formulation and reformulations of the problem. Section 4 is devoted to DC programming and DCA for solving the resulting polyhedral DC program. Preliminary computational results are reported in the last section. They show that the proposed algorithm is promising and more eﬃcient than the heuristic algorithms presented in [6].

2 Preliminaries Deﬁnition 1: Denote by B := {0, 1} and B n := {0, 1}n . A Boolean function f : B n → B is a function which produces a Boolean result. The binary truth table of a Boolean function of n contains 2n elements corresponding to all possible combinations of the n binary inputs. If ones ﬁx the table x = (x1 , x2 , ..., xn ), the Boolean function f can be determined by the last column of its binary truth table, namely a binary vector of components f (x) in dimension 2n . n In this work we consider a Boolean function f as a vector in {0, 1}2 . The n 2n set F of Boolean functions F := {f : B → B} is exactly B . Deﬁnition 2: The polarity truth table of a Boolean function denoted f is deﬁned by f(x) = (−1)f (x) = 1 − 2f (x), where f(x) ∈ {1, −1}.

Highly Nonlinear Balanced Boolean Functions in Cryptography

383

Deﬁnition 3: A linear Boolean function Lw (x), selected by w ∈ Z2n , is a Boolean function given by (⊕ denotes the Boolean operation “XOR”) Lw (x) = wx = w1 x1 ⊕ w2 x2 ⊕ ...wn xn .

(1)

Deﬁnition 4: An aﬃne Boolean function Aw (x) is a Boolean function which can be represented in the form: Aw (x) = wx ⊕ c, wherec ∈ Z2 . Deﬁnition 5: The Hamming weight of a Boolean function, denoted hwt(f ), is the number of ones in the binary truth table, or equivalently the number of −1 in the polarity truth table. * + n f (x) = (1/2) 2 − f (x) . (2) hwt(f ) := x∈B n

x∈B n

Deﬁnition 6: The Hamming distance between two Boolean functions f and g is the number of positions in which their truth tables diﬀer. It can be computed as d(f, g) := f (x) ⊕ g(x) := (2n − f(x) g (x)). (3) x∈B n

x∈B n

(x) . For crypto f x∈B n graphic Boolean functions it is usually desired that there are an equal number of 0’s and 1’s in the binary truth table (balanced function). Balance is a primary cryptographic criterion: an imbalanced function has sub-optimum unconditional entropy (i.e., it is correlated to a constant function). Deﬁnition 8: The nonlinearity of a Boolean function is the Hamming distance to the closest aﬃne function. That is the number of bits which must change in the truth table of a Boolean function to reach the closest aﬃne function. Note that if the “linearity” is considered as a signiﬁcant cryptographic weakness, the nonlinearity is an explicit measure of the lack of that weakness. Deﬁnition 9: The Walsh-Hadamard Transform (WHT) of a Boolean function w (x). is deﬁned as: F(w) := f(x)L Deﬁnition 7: The imbalance is deﬁned as: If :=

1 2

x

The nonlinearity of a Boolean function f , denoted Nf , is related to the maximum magnitude of WHT values, and given by Nf := 2n−1 −

1 max |F(w)|. 2 w∈B n

(4)

3 Optimization Formulation and Reformulation The aim of our work is to ﬁnd a balanced Boolean function which has maximal nonlinearity. Mathematically the problem can be written as, according to the above notations and properties: * + 1 n w (x)| . 2 −| f(x)L max Nf = max minn f ∈F f ∈F w∈B 2 n x∈B

384

L.H. Minh et al.

With

Φ(f) := maxn | w∈B

w (x)|) f(x)L

x∈B n

we have max Nf = 2n−1 − f ∈F

1 min Φ(f). 2 f ∈F

(5)

In this formulation the presence of the operation “XOR” in L(w) (cf. Deﬁnition 3) is not suitable to apply continuous optimization techniques. We will express L(w) in other way. By the recursive demonstration we get w (x) = 1 − 2wx = 1 − 2(w1 x1 ⊕ w2 x2 ⊕ ...wn xn ) L n wi xi is a even number, − 1 otherwise. = 1 if w, x = i=1

w (x) ∈ {−1, 1} for w, x ∈ B n , we can write the funcDenoting awx := L tion Φ in the form Φ(f ) = maxn awx f (x) . Consequently, according to w∈B

x∈B n

Deﬁnition 2, we have Φ(f) = maxn awx (2fx − 1) = 2 maxn awx fx − w∈B w∈B n n x∈B

x∈B

1 2

x∈B n

awx . (6)

Hence, from (5) it follows that max Nf = 2 f ∈F

n−1

1 − minn Ψ (f ), with Ψ (f ) := maxn awx fx − awx . (7) w∈B w∈B 2 n n x∈B x∈B

Thus, maximizing Nf amounts to minimizing Ψ (f ) onB n . For ﬁnding a balfx −2n−1 |≤ b, with anced Boolean function we add the next constraint | x∈B n

a nonnegative number b. Clearly that if b = 0, then the function is balanced. Finally we get the following optimization problem fx ≤ 2n−1 + b, f ∈ B n }. (8) β := min{Ψ (f ) : 2n−1 − b ≤ x∈B n

It is easy to see that the function Ψ is a polyhedral convex (by the deﬁnition, a function is polyhedral if it is a pointwise supremum of a ﬁnite collection of aﬃne functions). We are then faced with the minimization of a convex polyhedral function with binary variables under linear constraints which is in fact equivalent to a mixed zero-one linear program (with exactly one continuous variable). In our convenient for DC programming approach, for the moment, we consider the problem in the form (8). Reformulation. We now reformulate (8) in the form of a continuous optin mization problem. Let p : IR2 → IR be the function deﬁned by

Highly Nonlinear Balanced Boolean Functions in Cryptography

p(f ) :=

385

min{fx , 1 − fx }.

x∈B n

It is clear that p is a nonnegative concave function on B n . Moreover p(f ) = 0 iﬀ f ∈ B n . Hence problem (8) can be expressed as % ( n−1 n−1 min Ψ (f ) : 2 −b≤ fx ≤ 2 + b, p(f ) ≤ 0 . x∈B n

Using a exact penalty technique in DC programming ([1, 4]) leads us to the more tractable continuous optimization problem (t > 0 is the penalty parameter): % (Q) min

Ψ (f ) + tp(f ) : 2

n−1

−b≤

( fx ≤ 2

n−1

+ b, 0 ≤ fx ≤ 1, ∀x ∈ B

n

.

x∈B n

We will prove in Section 4 that problem (Q) can be reformulated as DC program and show how to use DCA for solving it.

4 Generating Highly Nonlinear Balanced Boolean Function by DCA 4.1 An introduction of DC programming and DCA Let Γ0 (IRn ) denote the convex cone of all lower semicontinuous proper convex functions on IRn . Consider the general DC program α = inf{F(x) := g(x) − h(x) : x ∈ IRn } (Pdc ) with g, h ∈ Γ0 (IRn ). Such a function F is called DC function, and g − h, DC decomposition of F while the convex functions g and h are DC components of F. If g or h are polyhedral convex functions then (Pdc ) is called a polyhedral DC program. It should be noted that a constrained DC program whose feasible set C is convex can always be transformed into an unconstrained DC program by adding the indicator function χC of C (χC (x) = 0 if x ∈ C, +∞ otherwise) to the ﬁrst DC component g. Let g ∗ (y) := sup{x, y − g(x) : x ∈ IRn } be the conjugate function of g. Then, the following program is called the dual program of (Pdc ): (Ddc ) αD = inf{h∗ (y) − g ∗ (y) : y ∈ IRn }.

(9)

Under the natural convention in DC programming that is +∞−(+∞) = +∞, and by using the fact that every function h ∈ Γ0 (IRn ) is characterized as a pointwise supremum of a collection of aﬃne functions, say h(x) := sup{x, y − h∗ (y) : y ∈ IRn },

386

L.H. Minh et al.

one can prove that α = αD . We observe the perfect symmetry between primal and dual DC programs: the dual to (Ddc ) is exactly (Pdc ). The necessary local optimality condition for the primal program (Pdc ) is: ∂g(x∗ ) ⊃ ∂h(x∗ ).

(10)

A point x∗ that veriﬁes the condition ∂h(x∗ ) ∩ ∂g(x∗ ) = ∅ is called a critical point of g − h. The condition (10) is also suﬃcient for many classes of DC programs. In particular it is suﬃcient for the next cases quite often encountered in practice (see [1–3, 9] and references therein): • •

In polyhedral DC programs with h being a polyhedral convex function; In case the function F is locally convex at x∗ ([3]).

Based on local optimality conditions and duality in DC programming, the DCA consists in the construction of two sequences {xk } and {y k }, candidates to be optimal solutions of primal and dual programs respectively, such that the sequences {g(xk ) − h(xk )} and {h∗ (y k ) − g ∗ (y k )} are decreasing, and {xk } : (resp. a dual feasible (resp. {y k }) converges to a primal feasible solution x solution y:) verifying local optimality conditions and x : ∈ ∂g ∗ (: y ),

y: ∈ ∂h(: x).

(11)

These two sequences {xk } and {y k } are determined in the way that xk+1 (resp. y k ) is a solution to the convex program (Pk ) (resp. (Dk )) deﬁned by inf{g(x) − h(xk ) − x − xk , y k : x ∈ IRn } (Pk ) inf{h∗ (y) − g ∗ (y k−1 ) − y − y k−1 , xk : y ∈ IRn } (Dk ). In fact, at each iteration one replaces in the primal DC program (Pdc ) the second component h by its aﬃne minorization hk (x) := h(xk ) + x − xk , y k at a neighbourhood of xk to give birth to the convex program (Pk ) whose the solution set is nothing but ∂g ∗ (y k ). Likewise, the second DC component g ∗ of (Ddc ) is replaced by its aﬃne minorization (g ∗ )k (y) := g ∗ (y k ) + y − y k , xk+1 at a neighbourhood of y k to obtain the convex program (Dk ) whose ∂h(xk+1 ) is the solution set. DCA performs so as a double linearization with the help of the subgradients of h and g ∗ and the DCA then yields the next scheme: y k ∈ ∂h(xk );

xk+1 ∈ ∂g ∗ (y k ).

(12)

It is worth noting that DCA works with the convex DC components g and h but not the DC function F itself. Moreover, a DC function F has inﬁnitely many DC decompositions which have crucial impacts on the qualities (speed of convergence, robustness, eﬃciency, globality of computed solutions, ...) of DCA. For a given DC program, the choice of optimal DC decompositions is still open. Convergence properties of DCA can be found in ([1, 3, 9, 10]), for instant it is important to mention that

Highly Nonlinear Balanced Boolean Functions in Cryptography

• • • •

387

DCA is a descent method (the sequences {g(xk ) − h(xk )} and {h∗ (y k ) − g ∗ (y k )} are decreasing) without linesearch; If the optimal value α of problem (Pdc ) is ﬁnite and the inﬁnite sequences : (resp. y:) of the se{xk } and {y k } are bounded then every limit point x quence {xk } (resp. {y k }) is a critical point of g − h (resp. h∗ − g ∗ ). DCA has a linear convergence for general DC programs. DCA has a ﬁnite convergence for polyhedral DC programs.

For a complete study of DC programming and DCA the redear is referred to [1–3, 9, 10] and references therein. The solution of a nonconvex program by DCA must be composed of two stages: the search of an appropriate DC decomposition and that of a good initial point. We shall apply all these DC enhancement features to solve problem (Q) in its equivalent DC program given in the next subsection. 4.2 Solving Problem (Q) by DCA We ﬁrst reformulate (Q) in the form of DC program. Let K be the feasible set of Problem (Q), namely # K :=

f : 2n−1 − b ≤

x∈B n

fx ≤ 2n−1 + b, 0 ≤ fx ≤ 1, ∀x ∈ B n

$ .

Let χK be the indicator function of K, say χK (f ) = 0 if f ∈ K, +∞ otherwise. n Since K is a convex set, χK is a convex function on IR2 . A natural DC decomposition of the objectif function of (Q) is Ψ (f ) + tp(f ) := G(f ) − H(f ), with G(f ) := χK(f ) + Ψ (f ) and H(f ) := −tp(f ). It is clear that G and H are convex functions. Thus problem (Q) is a DC program of the form (Qdc )

β := min{G(f ) − H(f ) :

n

f ∈ IR2 }.

awx fx − Let ψw be the function deﬁned by ψw (f ) := n x∈B

1 2

x∈B

awx . n

Then ψw is a convex function, and Ψ (f ) = maxw∈B n ψw (f ). Therefore, as mentioned in Section 2, Ψ is a convex polyhedral function. Likewise, the function H is also convex polyhedral. So (Qdc ) is a polyhedral DC program where all DC decompositions are polyhedral. This property enhances DCA as will be shown later in the convergence theorem of our algorithm. Applying DCA to (Qdc ) amounts to computing, at each iteration k: v k ∈ ∂H(f k ); f k+1 ∈ ∂G∗ (v k ).

388

L.H. Minh et al.

By the very deﬁnition of H, we can take v k as follows: vxk := {−t if fxk ≤ 0.5,

t otherwise}

(13)

On the other hand, according to the above results concerning DC programming and DCA, the condition f k+1 ∈ ∂G∗ (v k ) is equivalent to : f k+1 = argmin{Ψ (f ) − v k , f : f ∈ K}.

(14)

Since Ψ is a convex polyhedral function, problem (14) is equivalent to a linear program. Indeed, min{Ψ (f ) − v k , f : f ∈ K} = min maxn ψw (f ) − v k , f f ∈K w∈B

⇔ min{ξ − v k , f , : ψw (f ) ≤ ξ, ∀w ∈ B n , f ∈ K} ⎧ min ξ − v k , f : awx fx − 12 awx ≤ ξ, ∀w ∈ B n , ⎪ ⎪ n ⎪ x∈B x∈B n ⎨ awx fx + 12 awx ≤ ξ, ∀w ∈ B n , − ⇔ n n x∈B x∈B ⎪ ⎪ ⎪ fx ≤ 2n−1 + b, 0 ≤ fx ≤ 1, ∀x ∈ B n . ⎩ 2n−1 − b ≤

(15)

(16)

x∈B n

The DCA applied to (Qdc ) can be now described as follows: Algorithm DCA. n Initialisation: Let f 0 ∈ IR2 , and be a positive enough small number. Repeat - Set v k ∈ ∂H(f ) via the formula (13); - Solving the linear program (16) to obtain f k+1 ; - Set k := k + 1. Until f k − f k−1 < . Denote by Ω the feasible set of the linear program (16), and V (Ω) the vertex set of Ω. Let f ∗ be a solution computed by DCA. The convergence of DCA can be summarized in the next theorem whose proof is essentially based on the convergence theorem of a DC polyhedral program. Theorem 1. (Convergence properties of Algorithm DCA) (i) DCA generates a sequence {f k } contained in V (Ω) such that the sequence {Ψ (f k ) + tp(f k )} is decreasing. n (ii) For a number t suﬃciently large, if at iteration r we have f r ∈ {0, 1}2 , n then f k ∈ {0, 1}2 for all k ≥ r. (iii) The sequence {f k } converges to {f ∗ } ∈ V (Ω) after a ﬁnite number of iterations. The point f ∗ is a critical point of problem (Qdc ). Moreover if fx∗ = 1 n ∗ 2 for all x ∈ B , then f is a local solution to (Qdc ).

Highly Nonlinear Balanced Boolean Functions in Cryptography

389

Proof. i) is consequence of DCA’s convergent Theorem for$ a general DC pro# gram. ii) Let t > t1 := max Ψ (fθ)−η : f ∈ V (Ω), p(f ) ≤ 0 , where η := min{Ψ (f ) : f ∈ V (Ω)} and θ := min{p(f ) : f ∈ V (Ω)}. n Let {f k )} ⊂ V (Ω) (k ≥ 1) be generated by DCA. If V (Ω) ⊂ {0, 1}2 , n then the assertion is trivial. Otherwise, let f r ∈ {0, 1}2 and f r+1 ∈ V (Ω) be an optimal solution of (16). Then from (i) of this theorem we have Ψ (f r+1 ) + tp(f r+1 ) ≤ Ψ (f r ) + tp(f r ). Since p(f r ) = 0, it follows tp(f r+1 ) ≤ Ψ (f r ) − Ψ (f r+1 ) ≤ Ψ (f r ) − η. r r )−Ψ (f r+1 ) ≤ Ψ (f θ)−η ≤ t1 which contradicts If p(f r+1 ) > 0, then t ≤ Ψ (f p(f r+1 ) the fact that t > t1 . iii) Since (Qdc ) is a polyhedral DC program, DCA has a ﬁnite convergence (Subsection 4), say, the sequence {f k } converges to {f ∗ } ∈ V (Ω) after a ﬁnite number of iterations. Moreover {f ∗ } is a critical point of G − H, i.e. ∂G(f ∗ ) ∩ ∂H(f ∗ ) = ∅.

(17)

If fx∗ = 1/2, ∀x ∈ B n , then H is diﬀerentiable at f ∗ and then the condition (17) becomes ∂H(f ∗ ) ⊂ ∂G(f ∗ ). This subdiﬀerential inclusion is the necessary and suﬃcient local optimality condition for a polyhedral DC program whose H is polyhedral convex function. The proof is then complete. From Theorem 1 we see that, starting with a feasible solution to the combinatorial problem (8) DCA provides a better feasible solution, although it works on a continuous feasible set of (Qdc ). It is so important to ﬁnd a good feasible point to (8) for starting DCA. For this, we use again the DCA applied to the concave quadratic programming problem developed in [2]: $ # fx (1 − fx ) : f ∈ K . (18) 0 = min x∈B n

5 Preliminary Computational Results We run DCA and compare the computational results with two algorithms: a combined of random search for Boolean functions with high nonlinearity and Hill Climbing algorithm (R HC), and GA HC - the best genetic algorithm studied in [6]. In [6] several versions of Genetic Algorithms with and without Hill Climbing are proposed among them the genetic algorithm with Hill Climbing (GA HC) is the best. We take b = 0 in (8) (the balanced constraint), and = 10−6 . The CPLEX code (version 7.5) has been used for solving the linear program (16). In table below we present the best nonlinearity achieved after testing 10000 functions of R HC, GA HC and the nonlinearity given by DCA with n varies from 8 to 12 (No it means the number of iterations).

390

L.H. Minh et al. n 8 9 10 11 12

DCA Nf No it 120 8 236 4 486 12 984 8 1984 10

GA HC Nf 114 236 482 980 1980

R HC Nf 114 232 476 968 1961

From the preliminary computational results we see that DCA always gives the best results. It is shown to be signiﬁcantly superior to other algorithms, and all Boolean functions generated by DCA are balanced. Moreover, the results of the methods GA HC and R HC are obtained after testing 10000 Boolean functions while the maximal of the number of Boolean functions generated in our algorithm is only 14 (that corresponds to the case where the number of iterations is 12). Conclusion. For constructing highly nonlinear balanced Boolean Functions in cryptography we have proposed a new and eﬃcient approach based on DC programming and DCA. The considered combinatorial problem has been formulated as a DC program, and the resulting DCA consists of a sequence of linear programs. DCA is original because it can give an integer solution while it works in a continuous domain. The preliminary results shown that DCA is signiﬁcantly superior to some heuristic approaches. The computational experiments could be extended to higher dimensions.

References 1. Le Thi Hoai An and Pham Dinh Tao, Solving a class of linearly constrained indeﬁnite quadratic problems by DC algorithms, Journal of Global Optimization, Vol 11, No 3, pp 253-285, 1997. 2. Le Thi Hoai An and Pham Dinh Tao, A continuous Approach for Globally Solving Linearly Constrained Quadratic Zero - One Programming Problems, Optimization 2001, Vol 50, pp. 93-120. 3. Le Thi Hoai An and Pham Dinh Tao, The DC (diﬀerence of convex functions) Programming and DCA revisited with DC models of real world nonconvex optimization problems, Annals of Operations Research 2005, Vol 133, pp. 23-46. 4. Le Thi Hoai An, Pham Dinh Tao, Huynh Van Ngai, Exact penalty techniques in DC programming. SIAM Conference on Optimzation, 2005. 5. P. Charpin, A. Canteau, C. Carlet and C. Fotaine, Propagation characteristic and correlation-immunity of hight nonlinenar Boolean function. In Lecture Note in Computer Science, Springer Verlag 2000, N 1807, pp. 507-522. 6. A. J. Clark, Optimisation Heuristics for Cryptology, Ph D Thesis, 1998. 7. W. Millan, A. Clark, and E. Dawson, Smart Hill Climbing Finds Better Boolean Functions, In Workshop on Selected Areas in Cryptology 1997, Workshop Record, pages 50-63, 1997.

Highly Nonlinear Balanced Boolean Functions in Cryptography

391

8. J. A. Clark, J. L. Jacob, S. Stepney, The design of S-boxes by simulated annealing, CEC 2004: International Conference on Evolutionary Computation, Portland OR, USA, June 2004, pp 1533-1537. IEEE 2004. 9. Pham Dinh Tao and Le Thi Hoai An, Convex analysis approach to d.c. programming: Theory, Algorithms and Applications, Acta Mathematica Vietnamica, dedicated to Professor Hoang Tuy on the occasion of his 70th birthday, Vol.22, N. 1 (1997), pp. 289-355. 10. Pham Dinh Tao and Le Thi Hoai An, DC optimization algorithms for solving the trust region subproblem, SIAM J. Optimization, Vol. 8, pp. 476-505 (1998). 11. P. Sarkar and S. Maitra, Construction of nonlinear Boolean functions with important cryptographic properties. In Lecture Note in Computer Science, Springer Verlag 2000, N 1807,pp. 485-506. 12. J. Seberry, X. M. Zhang, and Y. Zheng, Nonlinearly balanced Boolean functions and their propagation characteristics. In Advances in Cryptology - CRYPT0’93, pp. 49-60, Springer Verlag 1994. 13. S. Weber, T. Sch¨ ule, C. Schn¨ orr, Prior Learning and Convex-Concave Regularization of Binary Tomography, Electr. Notes in Discr. Math., 20:313-327, 2005. December 2003.

Project-Oriented Scheduler for Cluster Systems T. N. Minh, N. Thoai, N. T. Son, and D. X. Ky Faculty of Information Technology, HoChiMinh City University of Technology 268 Ly Thuong Kiet, District 10, HoChiMinh City, Vietnam {minhtran, nam, sonsys} @cse.hcmut.edu.vn Abstract Parallel processing is the key to fulﬁll the high demands on computational resources in scientiﬁc computing. This has further pushed research in High Performance Computing into the mainstream. Numerous powerful computer systems have appeared. Especially, low-cost powerful clusters, which are set up by connecting many personal computers/workstations via a high speed network, have been developed rapidly during the last decade. Batch scheduling systems for clusters are very important on transparent access to cluster resources. Most batch scheduling systems have essentially focused on maximizing the use of computing resources like processors, but not on improving quality of services (QoS). This paper presents a batch scheduler called Project-Oriented Scheduler (POS), which schedules jobs from projects with diﬀerent priorities. The higher priority level one project has, the longer service time it is assigned. Moreover, starvation is also considered. POS has been evaluated using SimGrid, a simulation tool that provides core functions for the simulation of distributed applications in distributed environments, and the results show that POS improves not only the utilization of the system but also the satisfaction of the projects as compared with other scheduling strategies.

1 Introduction Low-cost PC-based clusters with high-speed communication network have become the mainstream of parallel and distributed platforms. To enable eﬀective resource management on those clusters, numerous resource management and scheduling systems have been built. Scheduler is a very important component of cluster management systems. Users submit their jobs to the system together with their requirements, and the mission of scheduler is to ﬁnd an eﬀective allocation way to satisfy these requirements. However, most of schedulers nowadays have been focused on maximizing the system utilization [15] to reduce the overall execution time of all users, but not on satisfying other requirements such as reservation, user or job priority while users may want to run their jobs on the system in such a way that not only reducing the total execution time but also satisfying other

394

T.N. Minh et al.

demands is taken into account. In reality, users may run simultaneously many projects, each one contains a number of batch jobs. There are certain projects which are more important than others. Therefore, users want to have more resourses allocated to these important projects so that they can be ﬁnished earlier. Unfortunately, current systems such as Portable Batch System (PBS) [1–4, 14], Sun Grid Engine (SGE) [8], and Load Sharing Facility (LSF) [13] provide limited means for users to express these kinds of demands. In this paper we present our POS scheduler that allows users to specify the priority of each project. This priority parameter is very important in determining number of resources to be allocated to each project. Concurrently, our scheduler assures the high eﬃciency for the whole system. Moreover, less important projects still have resources allocated so that starvation, i.e. when a certain job is prevented from executing because there are no resources allocated, does not happen. The rest of this paper is divided into four sections. Section 2 describes related work. The Project-Oriented Scheduler (POS) is presented in section 3. Experimental results are shown in section 4. We conclude the paper with some discussions of our future work in section 5.

2 Related Work We will discuss in this section some well-known schedulers together with their commonly used Cluster Management System(CMS). 2.1 Portable Batch System Portable Batch System (PBS) [1–4, 14] is a workload management system developed by Veridian Systems. PBS operates in UNIX environments and a network of heterogeneous workstations. The purpose of PBS is to provide additional controls over initiating or scheduling execution of batch jobs [15]. The default scheduler in PBS is FIFO which starts any job from the queued job list that ﬁts the available resources. This mechanism prevents large jobs from executing and lead to the starvation problem. This problem is solved completely in our POS. 2.2 Libra Scheduler Libra [11, 15–17] is a computational economy-based job scheduling system for clusters. It has been designed to support resource allocations based on the users’ QoS requirements. The scheduler oﬀers market-based economy driven service for managing batch jobs on clusters by scheduling CPU time according to user-perceived value, determined by their budget and deadline rather than system performance considerations. Libra is intended to work as an add-on

Project-Oriented Scheduler for Cluster Systems

395

to the existing queueing and resource management system. The ﬁrst version of Libra has been implemented as a plugin scheduler to the Portable Batch System. POS also schedules jobs based on the user-perceived value. However, diﬀerent from Libra, this value is determined by the priority of each project instead of budget and deadline. 2.3 Maui Scheduler Like Libra, Maui scheduler [4, 9, 10, 12] is also implemented as a plugin scheduler to PBS. The primary purpose of Maui scheduler is to provide an advanced reservation infrastructure allowing sites to control exactly when, how, and by whom resources are used while still assuring fairness and fairshare policies. Diﬀerent from Maui, POS does not focus on advanced reservation but provide QoS in term of project priority so that important projects can run quickly. 2.4 REXEC Scheduler REXEC [6, 7] is a remote execution environment for a campus-wide network of workstations, which is a part of the Berkeley Millennium Project. Users input the credits per minute rate to pay for CPU time as the parameter. REXEC allocates resources to user jobs according to the user valuation irrespective of their job needs, while POS allocates resources according to the priority of each project that is taken as a user input parameter.

3 Project-Oriented Scheduler 3.1 Problem Description Several companies have the need of running many diﬀerent projects concurrently on their shared cluster. Each project contains a number of batch jobs. These projects have diﬀerent importance, and users submit each project together with its speciﬁc priority. POS generates a schedule based on their priorities. For example, when serving in the ﬁeld of ﬁlm rendering, this system allows many ﬁlms to be rendered simultaneously. Each ﬁlm is considered as a project with its priority. The higher priority projects should be ﬁnished before the lower priority projects due to constraints from their customers. 3.2 Goal of POS When the available resources are limited, POS is designed and implemented in such a way that a higher priority project must be allocated more resources than a lower priority project so that it could be ﬁnished earlier. Secondly, POS is starvation free, i.e. no job is prevented from executing. Finally, the overall execution time of all projects is acceptably assured in comparison with other scheduling strategies.

396

T.N. Minh et al.

3.3 POS Algorithm Assume that we have N projects JSET1 , JSET2 ,..., JSETN with priorities corresponding to P ri1 , P ri2 ,..., P riN . P rii is an positive integer. Each project JSETi includes a number of jobs Ji s. Also assume that P ri1 ≥ P ri2 ≥ ... ≥ P riN . In Algorithm 3, POS uses an execution queue to do scheduling. Execution queue is an intermediate place to put jobs of projects into and the scheduler extracts jobs one-by-one from the beginning of the queue to assign resources. When the execution queue is empty, POS uses Algorithm 2 to build the execution queue and at this time, new projects that have just been submitted will be taken into account. The key part of our algorithm is on building the execution queue. Job quantity in the execution queue is determined by sum of priorities of projects, i.e. sum(P ri1 , P ri2 , ..., P riN ). Each project JSETi provides P rii jobs to the execution queue. Jobs in the queue will be arranged alternately in the following order: J1 , J2 ,..., JN , J1 , J2 ,..., JN ,..., J1 , J2 ,..., Ji−1 , Ji+1 ,..., JN ,... if Ji has appeared P rii times before in the queue. Fig. 1 illustrates how the execution queue is built. As JSET1 has its priority of 3, it takes 3 slots in the execution queue. Similarly, number of slots in the execution queue of JSET2 and JSET3 are 2 and 1 respectively. Therefore, the execution queue receives 6 jobs (3 jobs from JSET1 , 2 jobs from JSET2 , and 1 job from JSET3 ) in the following order J1 , J2 , J3 , J1 , J2 , J1 .

Figure 1. POS scheduling

From Algorithm 2, in line 10 we see that the quantity of jobs of one project taken into the execution queue is determined by the priority of that project. The higher priority one project has, the more jobs of that one are in the execution queue, meaning jobs of that project will be assigned more resources, so its ﬁnish time is shorter. Moreover, jobs of projects with low priorities are still chosen to be put in the execution queue. This assures no job starvation in the system.

Project-Oriented Scheduler for Cluster Systems

397

Algorithm 2 Build ExecQueue 1: Input: A global list including N projects which were sorted in a decreasing order based on comparing their priorities. The list is JSETi (with the corresponding priority P rii ), i = 1, N 2: Output: The ExecQueue 3: Create an array named T mpList including N empty lists, T mpListi corresponds to JSETi , i = 1, N 4: N umJob ⇐ 0 {N umJob is the number of jobs taken into the ExecQueue} 5: for i = 1 to N do 6: Count ⇐Number of jobs remaining in JSETi 7: if Count ≥ P rii then 8: Take P rii jobs from JSETi to T mpListi 9: Delete these P rii jobs from JSETi 10: N umJob ⇐ N umJob + P rii 11: else 12: Take Count jobs from JSETi to T mpListi 13: Delete these Count jobs from JSETi 14: N umJob ⇐ N umJob + Count 15: end if 16: end for 17: j ⇐ 1, i ⇐ 0 18: while i < N umJob do 19: if T mpListj = N ull then 20: Insert one job from T mpListj to the ExecQueue 21: Remove this job from T mpListj 22: i⇐i+1 23: end if 24: j ⇐ (j mod N ) + 1 25: end while 26: return ExecQueue

Algorithm 3 POS Scheduling 1: while true do 2: if ExecQueue is empty then 3: Call Building ExecQueue 4: end if 5: if No resource is leisure then 6: The scheduler will wait until there is free resource 7: end if 8: while There are leisure resources and there are jobs in ExecQueue do 9: Extract one job from the queue ExecQueue 10: Assign this job to one of free resources 11: end while 12: end while

The algorithm complexity is O(M ) with M is the total quantity of jobs submitted to the system. However, this algorithm is just suitable for

398

T.N. Minh et al.

homogeneous systems. At the moment, the project priority should be manually tuned by users but it can also be computed automatically based on other user’s requirements such as cost function, deadlines etc.

4 Experimental Results Most evaluations were carried out using SimGrid [5]. SimGrid is a toolkit that provides core functions for the simulation of distributed applications in distributed environments. We choose to use SimGrid because it facilitates research in the area of distributed and parallel application scheduling on distributed computing platforms. In our experiment, we simulate two homogeneous cluster systems: one 8node cluster and one 16-node cluster, and we many times run simultaneously 8 projects including 120 jobs, each includes 15 jobs on these systems to measure the execution time as well as the submission time of the last job of each project. The result from Fig. 2 and Fig. 3 shows as we expected: higher priority projects have smaller submission and execution time than lower priority projects. In Fig. 2 and Fig. 3, notation Pi (j) means project i has the priority j.

1400

1200

Time (s)

1000

800

600

400

200

0 P1(1)

P2(2)

P3(2)

P4(3)

P5(4)

P6(5)

P7(5)

P8(7)

Project Execution Time

Submission Time

Figure 2. 8 Projects on 8-node cluster

From diagrams in Fig. 2 and Fig. 3, we can see that the submission and execution time decreases while project priority increases. Moreover, low priority projects still ﬁnish within a ﬁnite duration, this means no starvation happens.

Project-Oriented Scheduler for Cluster Systems

399

1200

1000

Time (s)

800

600

400

200

0 P1(1)

P2(2)

P3(3)

P4(3)

P5(5)

P6(5)

P7(7)

P8(8)

Project Execution Time

Submission Time

Figure 3. 8 Projects on 16-node cluster

Besides satisfying user demands, our experimental results also shows that the total execution time of all projects is acceptable as comparing with other scheduling strategies such as FIFO or RSS. RSS strategy RSS is a space-sharing scheduling strategy. RSS will divide the resource set into a number of subsets, each preserved for a project. Number of resources in each subset depends on priority of the corresponding project. The higher priority one project has, the bigger subset of resources is preserved for it. In Fig. 4, RSET1 , RSET2 , and RSET3 are subsets of resources that correspond to three projects JSET1 , JSET2 , and JSET3 . FIFO strategy FIFO strategy operates in such a way that jobs come ﬁrst will be served ﬁrst without caring which project jobs belong to. From the diagrams in Fig. 6 and Fig. 7, we can see that POS algorithm gives a pretty good result. In most cases, the ratio between the execution time of POS and execution time of other algorithms is smaller than one. This means POS assures the high eﬃciency for the whole system.

5 Conclusions and Further Study We have presented a scheduler called POS for cluster systems which takes project priorities from users into account. POS does not only satisﬁes user

T.N. Minh et al.

JSET1 (prioriy = 3)

RSET1

JSET2 (prioriy = 2) RSET2

Execution Queue JSET3 (prioriy = 1)

RSET3

Figure 4. RSS strategy

Execution List

Resource Set

Figure 5. FIFO strategy

1.2 1 0.8

Ratio

400

0.6 0.4 0.2 0 3

4

5

6

7

Number of Projects POS/FIFO

POS/RSS

Figure 6. Comparing execution time on 8-node cluster

Project-Oriented Scheduler for Cluster Systems

401

1.2 1

Ratio

0.8 0.6 0.4 0.2 0 3

4

5

6

7

8

9

10

11

12

Number of Projects POS/FIFO

POS/RSS

Figure 7. Comparing execution time on 16-node cluster

demands on the priorities of projects but also assures the high eﬃciency of the whole system. POS is designed and implemented in such a way that higher priority project must be allocated more resources than other lower priority projects so that it can be ﬁnished earlier. Besides, starvation is assured not to happen. Observing expected experimental results, we see POS is useful, suitable and applicable in reality. Although POS has many advantages, the performance of POS depends so much on choosing priority for project, which is not yet considered in this paper. Practically, we had implemented a CMS that uses POS as a scheduler. This CMS is developed on the Windows platform. At present, project priorities are inputted by users. But in future, we will enhance POS to automatically choose priorities for projects based on evaluating other parameters such as deadline or project requirements. Simultaneously, we will integrate POS into PBS as a plugin scheduler to contribute to the open source community.

References [1]

[2]

[3]

Bayucan A., Lesiak C., Mann B., Henderson R. L., Proett T., and Tweten D., External Reference Speciﬁcation, Release 1.1.12, August 1998, http:// www-unix.mcs.anl.gov/openpbs/ Bayucan A., Lesiak C., Mann B., Henderson R. L., Proett T., and Tweten D., Internal Design Speciﬁcation, Release 1.1.12, August 1998, http://www-unix. mcs.anl.gov/openpbs/ Bayucan A., Lesiak C., Mann B., Henderson R. L., Proett T., Tweten D., and Jasinskyj L. T., Portable Batch System Administrator Guide, Release 1.1.12, August 1998,

402

[4]

[5] [6]

[7]

[8] [9] [10] [11] [12] [13] [14] [15]

[16]

[17]

T.N. Minh et al. http://www.compsci.wm.edu/SciClone/documentation/software/OpenPBS/ Bode B., Halstead D. M., Kendall R., and Lei Z., The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters, Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, October 2000. Casanova H., SimGrid: A Toolkit for the Simulation of Application Scheduling. http://grail.sdsc.edu/papers/simgrid Chun B., and Culler D., Market-Based Proportional Resource Sharing for Clusters, Technical Report CSD-1092, University of California at Berkeley, Computer Science Division, January 2000. Chun B., and Culler D., Rexec: A Decentralized, Secure Remote Execution Environment for Clusters, 4th Workshop on Communication, Architecture, and Applications for Network-Based Parallel Computing (CANPC), France, January 2000, pp. 1-14. Gentzsch W., Sun Grid Engine (SGE): A Cluster Resource Manager, October 2002, http://gridengine.sunsource.net/ Jackson D., Maui Scheduler: A Multifunction Cluster Scheduler, Beowulf Cluster Computing with Linux, November 2001, pp. 351-368. Jackson D., Snell Q., and Clement M., Core Algorithms of the Maui Scheduler, Lecture Notes in Computer Science, Vol. 2221, June 2001, pp. 87-102. Libra, http://www.gridbus.org/libra/ Maui, http://www.clusterresources.com/pages/resources/documentation. php Platform Load Sharing Facility (LSF), October 2002, http://www.platform.com/Products/Platform.LSF.Family/Platform.LSF/ Portable Batch System (PBS), http://www.pbsgridworks.com/ Sherwani J., Ali N., Lotia N., Hayat Z., and Buyya R., Libra: A Computational Economy-Based Job Scheduling System for Clusters, Software: Practice and Experience, Vol. 34, Issue 6, May 2004, pp. 573-590. Sherwani J., Ali N., Lotia N., Hayat Z., and Buyya R., Libra: An EconomyDriven Job Scheduling System for Clusters, Proceedings of the 6th International Conference on High Performance Computing in Asia-Paciﬁc Region (HPC Asia 2002), India, December 2002. Yeo C. S., and Buyya R., Pricing for Utility-Driven Resource Management and Allocation in Clusters, Proceedings of the 12th International Conference on Advanced Computing and Communication (ADCOM 2004), India, December 2004.

Optimizing Spring-Damper Design in Human Like Walking that is Asymptotically Stable Without Feedback Katja D. Mombaur1 , Richard W. Longman2 , Hans Georg Bock1 , and Johannes P. Schl¨ oder1 1 2

IWR, Universit¨ at Heidelberg, Im Neuenheimer Feld 368, 69120 Heidelberg, Germany [email protected] Dept. of Mechanical Engineering, Columbia University, New York, 10027, USA [email protected]

Abstract Special purpose numerical optimal control algorithms can be used to create biped multibody systems and open loop joint torque histories that create periodic motions that are asymptotically stable without any feedback. In this context, we have produced open-loop stable biped walking, running, hopping, somersaults and ﬂip-ﬂops. In this paper, we speciﬁcally investigate the stabilizing role of springs and dampers added to a biped walking system by including the spring and damper constants in the stability optimization. It is shown that stability and robustness to state disturbances of the asymptotically stable open loop gaits can be very substantially increased by an optimization-based selection of spring and damper components and that springs and dampers help to induce a more natural appearing solution.

1 Introduction A fundamental understanding of the principles underlying dynamic walking and running motions is important in many ﬁelds of research and industry, e.g. for the construction of better walking robots, in prosthetics and rehabilitation, or for the visualization of motions in computer graphics. In order to achieve this understanding, model-based numerical optimization and optimal control is a very helpful tool making it possible to ”look inside the dynamics” of the gait or to improve its performance. One crucial property of real life walking and running motions is stability, and the faster a motion, the harder it is to make it stable. We are especially interested in exploiting the natural stability of the system as much as possible since this should facilitate the task of any feedback control system to cope with disturbances. The idea of exploiting the natural stability has been introduced in robotics in the form of the passive-dynamic walkers (e.g. Mochon & McMahon [10], McGeer [9]), which are purely mechanical devices without

404

K.D. Mombaur et al.

any actuators or sensors moving down inclined slopes. More recent robots of this type are capable of free 3D dynamic walking on level ground power by very simple actuation and sensors (Collins et al. [5]) . If these ideas are to be transferred to more complex walking and running robots with general forms of actuations, this will not be possible based on intuition and experimental testing only, but instead guidelines produced by a thorough theoretical analysis should be used to support the design process of these robots. In our opinion, numerical optimization is the right tool to approach this issue. In our previous research, we have developed special stability optimization techniques and we were able to determine, for a range of robot models, limit cycle gaits that are asymptotically stable without any feedback at all [11, 12, 14]. There are diﬀerent properties of a robot to be tuned in order to improve its stability, namely: • • • •

the the the the

kinematic and inertial properties of all robot segments passive force elements, in particular springs and dampers force/torque input histories to the robot produced by its actuators robot trajectories, i.e. the motion performed by the robot.

In this paper, we pay particular attention to the second category, the optimal selection of springs and dampers. It is generally accepted that springs play an important role to increase the eﬃciency of motions, especially at high speeds [2], since they help to store energy and release it at a later instant. Biological running depends heavily on the springy properties of muscles, tendons and soft tissue. Many authors have pursued the goal of introducing more elasticity in robots and robotics actuators, e.g [1, 15, 16]. Dampers and the associated loss of energy can be useful to produce asymptotic stability, but on the other hand destroy eﬃciency. In a combination, springs and dampers acting in parallel to a torque motor have the same eﬀect as a simple proportional controller with rate feedback and thus in general increase stability and robustness. We note that it is important to carefully adjust the spring and damper constants and the motion and determine both at the same time. Walking motions are modeled as hybrid dynamical systems involving both continuous and discrete phases. The continuous phases may have diﬀerent degrees of freedom and are described by nonlinear diﬀerential equations, and the discrete phases describe sudden changes, i.e. there are discontinuities in the state variables. Phase changes do not take place at given time points but depend on the state of the system. A speciﬁc gait type is described by a certain order and certain types of continuous and discrete phases and a set of complex additional constraints. Periodicity of walking and running is also imposed as a constraint. In this paper, the example of a stiﬀ-legged walker is investigated, which has two torque actuators (one at the hip and one at the ankle to represent toe-oﬀ) in parallel with two spring-damper elements. More details about gait models in general and this particular biped model will be given in section 2 of this paper.

Optimizing Spring-Damper Design in Human Like Walking

405

The stability of a periodic walking motion is deﬁned in terms of the spectral radius of the Jacobian of the Poincare´e map which leads to a diﬃcult nondiﬀerentiable optimization criterion or constraint. In section 3 we describe special optimal control techniques based on the direct boundary value problem approach using multiple shooting, that allow a very eﬃcient solution of this complex problem class. The following sections 4 and 5 present the optimization results for the stiﬀ-legged biped walker with and without spring-damper elements, and it is shown that spring-damper elements can considerably improve stability of walking, Diﬀerent problem formulations and suitable objective functions are discussed. In section 6 the relationships between spring-damper elements and feedback controllers are discussed.

2 Mathematical Models of Biped Walking 2.1 General Form of Walking Models Mathematical models of gaits involve distinct model phases with possibly diﬀerent degrees of freedom, each described, in the general form, by a diﬀerent set of diﬀerential equations. These can be ordinary diﬀerential equations (ODEs) q(t) ˙ = v(t) v(t) ˙ = a(t) = M −1 (q(t), p) · f (q(t), v(t), w(t), p)

(1) (2)

In these equations, the vector q contains the position variables of the system, and v the corresponding velocities; together they form the vector of state variables xT = (q T , v T ). t is the physical time, a the vector of accelerations, w(t) are the input torques or forces of the robot, and p are the free model parameters. M denotes the mass matrix, and f the vector of forces. Alternatively, depending on the choice of coordinates, one may obtain a system of diﬀerential-algebraic equations (DAE) of index 3 for some or all phases M (q(t), p) · a = f (q(t), v(t), u(t), p) − GT (q(t), p)λ gpos (q(t), p) = 0

(3) (4)

with the Lagrange multipliers λ, the constraint equations gpos , and their par∂gpos . We formulate the DAEs in the equivalent index 1 tial derivatives G = ∂q form with invariants

406

K.D. Mombaur et al.

q(t) ˙ = v(t) v(t) ˙ = a(t) T a f (q(t), v(t), w(t), p) M (q(t), p) G (q(t), p) = G(q(t), p) 0 λ γ(q(t), v(t), p) gpos = g(q(t), p) = 0 gvel = G(q(t), p) · q(t) ˙ = 0.

(5) (6) (7) (8) (9)

with the abbreviation γ(q(t), v(t), p) = −v T

d G(q(t), p) v. dq

(10)

Phase boundaries are implicitly deﬁned by the roots of switching functions si (t, q(t), v(t), p) = 0.

(11)

At these switching points, there may be discontinuities in the right hand side of the linear system, i. e. ∆f (q, v, w, p), ∆γ(q, v, p) (which translates into discontinuities in the accelerations ∆a), or even in the velocities, ∆v(t, q, v, w, p), i.e. of the state variables themselves. Walking problems also involve a number of complex linear and nonlinear, coupled and decoupled equality and inequality constraints; such as e.g.the periodicity constraints on the state variables ˜(0). The cycle time Tcycle is generally a (or a subset thereof) x ˜(Tcycle ) = x priori unknown. 2.2 A Biped Walking Model with Springs and Dampers In this paper, we investigate the example of a stiﬀ-legged biped walker with two degrees of freedom, walking in the vertical plane. It represents a simpliﬁed model of human walking, but is suﬃciently complex to study the phenomenon of interest. The state variables of this robot are φ1 , the relative angle at the hip, and φ2 , the absolute angle of the stance leg and the corresponding velocities (12) xT = (φ1 , φ2 , φ˙ 1 , φ˙ 2 ). The robot has two torque actuators - one corresponding to each degree of freedom - the ﬁrst one at the hip, and the second one at the ankle to replace the action of a foot with an actuated ankle joint. There are springs and dampers in parallel to both torques. This model can be considered as an extension of the classical passive-dynamic stiﬀ-legged bipeds of McGeer [9]. The robot is shown in Fig. 1. The rigid body model is characterized by nine free parameters, the ﬁrst three of which concern the kinematics and kinetics of the rigid body system: • •

the mass of each leg, denoted by m the leg length l

Optimizing Spring-Damper Design in Human Like Walking

407

Figure 1. Sketch of stiﬀ-legged biped with springs and dampers

•

the relative location of the leg’s center of mass measured from the hip, c.

Using these three parameters, the moment of inertia Θ of a leg is deﬁned as Θ=

1 2 ml (1 + 2c2 − 2c). 6

(13)

In addition, there are six more parameters describing the characteristics of the two spring-damper elements (i = i, 2): • • •

spring constants ki zero position oﬀsets ∆i of the springs damper constants bi .

One cycle of the gait model includes one step of the robot followed by a leg switch, and not a full physical gait cycle consisting of two steps (as presented in Fig. 2). Applying periodicity constraints to this model assures the generation of equal right and left steps, which would not necessarily be the case otherwise. One cycle of this model consists of one continuous phase describing the forward swing and a discrete phase including the sudden change of velocities at foot contact and the leg switch. The dynamic equations of this robot model are φ¨ (14) M · ¨1 = F φ2 with mass matrix

408

K.D. Mombaur et al.

Figure 2. Two steps of stiﬀ-legged biped gait

M=

m11 m12 m21 m22

(15)

with m11 = 2ml2 + Θ − 2ml2 c + ml2 c2

(16)

m12 = ml c sin φ2 sin φ1 − ml c cos φ2 cos φ1 m21 = ml2 c sin φ2 sin φ1 − ml2 c cos φ2 cos φ1

(17) (18)

m22 = ml2 c2 + Θ

(19)

2

2

and forces F = F1 + Fsd

(20)

with ⎛

⎞ −mφ˙ 22 l2 c sin φ2 cos φ1 + mφ˙ 22 l2 c cos φ2 sin φ1 + 2mgl sin φ1 ⎜ ⎟ −mgl sin φ1 c + u1 + u2 ⎜ ⎟ F1 = ⎜ ⎟(21) 2 2 2 2 ⎝ ⎠ ˙ ˙ −mφ1 l sin φ1 c cos φ2 + mφ1 l cos φ1 c sin φ2 −mglc sin φ2 − u1 and spring damper forces k1 (φ2 − φ1 − ∆1 ) + b1 (φ˙ 2 − φ˙ 1 ) − (k2 (φ1 − ∆2 ) + b2 φ˙ 1 ) Fsd = (22) −(k1 (φ2 − φ1 − ∆1 ) + b1 (φ˙ 2 − φ˙ 1 )) with obviously Fsd = 0 for the model version without springs and dampers. The end of the step is determined by the equation s(x(t)) = φ1 + φ2 = 0 with a negative vertical velocity of the swing foot point.

(23)

Optimizing Spring-Damper Design in Human Like Walking

409

Velocities after the discrete phase, i.e. after touchdown discontinuity and leg shift, are determined by the set of equations ˙ ˙ φ˙ + 1 = ((C1 − A1 )φ1 + D1 φ2 )/B1 + φ˙ 2 = φ˙ 1

(24) (25)

using the abbreviations A1 = ml2 c(1 − cos(φ1 − φ2 )) + Θ B1 = ml2 (1 − c cos(φ1 − φ2 ) + (1 − c)2 ) + Θ

(26) (27)

C1 = ml2 (1 − c)(2 cos(φ1 − φ2 ) − c) + Θ D1 = ml2 (c − 1)c + Θ

(28) (29)

Periodicity conditions are applied to all position and velocity variables x(T ) = x(0).

(30)

3 Formulation and Solution of Optimal Control Problems Involving Stability 3.1 General Form of Optimal Control Problem The problem of generating an optimal periodic gait based on the complex models described in the previous section leads to the formulation of a multiphase optimal control problem of the following form: T φ(x(t), u(t), p) dt + Φ(T, x(T ), p) (31) min x,u,p,T

s. t.

0

x(t) ˙ = fj (t, x(t), u(t), p)

for t ∈ [τj−1 , τj ], (32)

j = 1, ..., nph , τ0 = 0, τnph = T − ˜ x(τj+ ) = J(x(τ j ), p) for j = 1, ..., nph gj (t, x(t), u(t), p) ≥ 0 for t ∈ [τj−1 , τj ]

req (x(0), ..., x(T ), p) = 0 rineq (x(0), ..., x(T ), p) ≥ 0.

(33) (34) (35) (36)

There are many possible objective functions for gaits, e.g. related to the stability, eﬃciency or energy consumption of the motion. 3.2 Stability of a Periodic Gait Solution with Discontinuities Stability is an important property of gaits. A gait that is not stable by itself or cannot be stabilized by means of appropriate feedback control algorithms

410

K.D. Mombaur et al.

is of no practical validity. When computing periodic gait solutions, we always take into account the open-loop stability of the gait, either as an objective function (31) of the problem, or as a constraint (36). This section serves to discuss the stability criterion that we apply. According to Lyapunov’s ﬁrst method (see e.g. Hsu & Meyer [6]), a periodic solution x(T ) = x(0) of a periodic system of diﬀerential equations x˙ = f (t, x(t)) with f (t + T, ·) = f (t, ·)

(37)

) is asymptotically stable if all eigenvalues of the monodromy matrix X = ∂x(T ∂x(0) (also called the sensitivity matrix, or Jacobian of the Poincare map) of the solution are smaller than one in magnitude.

|λi (X)| < 1

(38)

ρ(X) < 1 with ρ(X) = |λi (X)|max

(39)

or equivalently where ρ is the spectral radius. For the application of Lyapunov’s ﬁrst method which, in its original form, is for smooth systems of type (37) only, to the type of non-smooth models discussed in this paper, two things are required: • •

a proof that the method is also valid for hybrid multi-phases systems with discontinuities: this has been given in a previous publication by the same authors [13] a formula for the computation of the monodromy matrix in the presence of discontinuities the times ts of which change in the presence of perturbations (compare [3]): 1 Ux = (∆RHS − Jt − Jx flef t (ts )) · (sx )T + I + Jx s˙ X = X(0, T ) = X(ts , T ) Ux X(0, ts )

(40) (41)

As mentioned above, we will use this stability criterion either as an objective function - i.e. minimize the spectral radius ρ min ρi

(42)

ρi ≤ c < 1

(43)

or as a constraint The reason to use the latter is that one may not be interested in “the most stable solution” (in terms of the spectral radius) but only in one that is “suﬃciently stable”, e.g. has a spectral radius below 0.8 or 0.7. It is also interesting to note that the absolute size of the eigenvalue is a measure of the speed of decay of the perturbation (in some norm), but not a measure for the robustness of the solution, so a smaller value is not necessarily better in this sense. We will discuss both problem formulations for our example robot.

Optimizing Spring-Damper Design in Human Like Walking

411

Minimizing the maximum eigenvalue of the monodromy matrix - a nonsymmetric matrix - involves certain diﬃculties. This objective function is nondiﬀerentiable at points where several eigenvalues coalesce, and in certain cases even non-Lipschitz. (Obviously the same problems can occur in the constraintbased formulation). In addition, the evaluation of the objective function (or constraint function) already implies computing ﬁrst order derivatives of the discontinuous trajectory, thus requiring second order derivatives for the gradients, etc. 3.3 Numerical Methods for Optimal Control Problems Involving Stability This section brieﬂy describes the numerical techniques that we use in order to be able to solve optimal control problems involving the stability criterion outlined above. In order to address the issue of generating higher order derivatives, we use an augmented state variable vector x ˜ in the optimal control problem formulation (31) - (36), where ¯) x ˜T = (x, x

(44)

which contains the four original state variables of the model (compare equation (12)), as well as the subvector x ¯ containing the 16 entries of the sensitivity matrix X. The vector x ¯ has to satisfy dynamic constraints of the form (32) described by the variational diﬀerential equation ˙ X(t) = fj,x (t, x(t), u(t), p)X(t)

(45)

requiring partial derivatives of the right hand side of the original dynamics for x. For the discrete phases (33), updates of form (40) have to be computed for the sensitivity matrix. Objective functions or constraints using the eigenvalues of the monodromy matrix can now easily be formulated using the values of x ¯ at the end of the cycle. We solve the resulting optimal control problem with the augmented vector of state variables using a variant of the optimal control code MUSCOD (Bock & Plitt [4], Leineweber [7]) suited for periodic gait problems. MUSCOD is based on a direct method for the solution of the optimal control problem, restricting the controls to a discretized space described by a ﬁnite set of parameters. For numerical eﬃciency, we use functions with local support, in this case piecewise constant functions on a grid with m intervals. In order to handle the dynamics of the problem, a multiple shooting state parameterization is applied: the basic idea of this technique is to split the long integration interval [0, T ] into many smaller ones and to introduce the values of the state variables x at all those grid points as new variables sij . The original boundary value problem is thus transformed into a set of initial value problems with corresponding continuity conditions between the integration intervals.

412

K.D. Mombaur et al.

Using these discretization techniques, the original inﬁnite dimensional optimal control problem is transformed into a ﬁnite dimensional nonlinear programming problem (NLP). This NLP is large but very structured and can therefore be solved eﬃciently by a tailored sequential quadratic programming algorithm exploiting the structure of the problem (Leineweber et al. [7], [8]). Note that this approach even works in the presence of non-diﬀerentiabilities introduced by the spectral radius criterion despite the fact that SQP methods assume a smooth twice diﬀerentiable objective function. The gradients of objective functions and constraint in this context are produced by means of ﬁnite diﬀerences. This method is much more eﬃcient and produces better results than our previous methods taking into account the non-diﬀerentiable nature of the problem [13]. More detailed investigations on descent and convergence properties of the algorithms are currently being performed. Note also that this approach splits the problems of NLP solution and treatment of the dynamic model equations. Integration and sensitivity generation must be handled in parallel, since the results are required for the evaluation of objective functions, continuity constraints of the NLP, and the derivatives thereof. For this task, fast and reliable integrators are used that also include a computation of sensitivities based on the techniques of internal numerical diﬀerentiation (for details see Bock [3]).

4 Open-Loop Stable Walking without Springs and Dampers At ﬁrst we investigate the simpler version of the stiﬀ-legged biped without spring-damper elements. The question to be addressed is if open-loop stable walking can be achieved by this robot conﬁguration without springs and dampers, just tuning geometry and inertia of the legs and the input torques. As our investigations show this is in fact possible: by means of stability optimization we were able to bring the spectral radius down to 0.486. However, a closer look at this solution shows that it is not very useful for a practical implementation since it does not at all resemble a natural gait. The stance leg which would be expected to perform a continuous forward rotation during one step performs a strange and very signiﬁcant wiggle about the upright position, i.e. rotates forward, then backward and then forward again (compare ﬁgure 3). The torque histories responsible for that gait are also very unusual, constantly changing signs i.e alternating between accelerating and decelerating the bodies, and producing torques opposed to the direction of motion - a mode of actuation that does not at all seem to be eﬃcient. It is highly unlikely that a robotics hardware would sustain the execution of such a gait. We can conclude that for this problem the eigenvalue optimization does not deliver satisfying results. We therefore look at a diﬀerent problem formulation. We use the previous solution as a staring solution, but this time us stability as a constraint and

Optimizing Spring-Damper Design in Human Like Walking

413

Figure 3. State and torque histories of result of eigenvalue optimization for biped robot without springs and dampers which is not useful for practical purposes

allow the spectral radius to go up to 0.7. As optimization objective, we apply a minimization of torques squared T u2i dt (46) min x,p,u,T

0

with the intention to reduce the large torque eﬀort required for the previous solution. The results of this optimization are shown in ﬁgure 4. The torques are reduced to a much smaller size, and the spectral radius is driven to its limit 0.7. The resulting motion looks very natural. the corresponding model parameters are m = 1kg (ﬁxed), l = 0.2, and c = 0.25, and the cycle time is T = 0.459s. The initial values of this solution are xT0 = (0.25, −0.25, −1.94, −2.68). We can summarize that there are useful open-loop stable solutions for the stiﬀ-legged biped without springs and dampers. We also expect that there are more practically applicable solutions for lower bounds on the spectral radius (e.g. 0.6), but we have not yet done any further investigations. For the example investigated here, this second optimization problem formulation using stability in the constraints deﬁnitely is to be preferred.

5 Can We Do Better with Springs and Dampers? The answer deﬁnitely is yes. Springs and dampers help to increase both stability and eﬃciency of dynamic walking motions. The most stable solution we found for the robot conﬁguration with springs and dampers in

414

K.D. Mombaur et al.

Figure 4. State and torque histories of result of torque minimization with bounded spectral radius for biped robot without springs and dampers

parallel to both actuators has a spectral radius of only 0.102. The parameters of this solution are m = 1kg (ﬁxed), l = 0.202m, c = 0.289, k1 = 0.158N m, ∆1 = 1.05, b1 = 0.981N ms, k2 = 0.363N m, ∆2 = 1.035, b2 = 1.09N ms. The solution has a cycle time of T = 0.722s and initial values of xT = (0.25, −0.25, −1.082, −1.505). As the graphs in ﬁgure 5 (as well as the corresponding animations that we produced) show, this solution does not suﬀer from the same drawback as the solution of stability optimization for the model version without springs and dampers: the resulting motions and the corresponding torques are very natural without the repetitive changes between accelerations and decelerations as observed before. Springs and dampers have a smoothing eﬀect; the dampers which always produce a force opposed to the direction of motion perform some of the stabilizing action produced by the quickly changing torques of the ﬁrst solution above. It is also interesting to check how much the overall torque input can be reduced in the presence of springs and dampers by using objective function (46). Again, we formulate stability as a constraint (|λ(C)|max < 0.5), and start from the previous stability optimization solution. We ﬁnd that torques can be reduced to very small quantities, barely visible in the graphics at the previously used scale (ﬁgure 6, bottom). The torques are however not zero; some torque input is still required in order to compensate the loss of energy in the dampers (which are not driven to zero) and in the inelastic impact events. The resulting parameters of this optimization are m = 1kg (ﬁxed), l = 0.2m, c = 0.25, k1 = 0.14N m, ∆1 = 1.5, b1 = 0.048N ms,

Optimizing Spring-Damper Design in Human Like Walking

415

Figure 5. State and torque histories resulting from stability optimization for biped robot with springs and dampers

k2 = 0.632N m, ∆2 = −1.07, b2 = 0.781N ms. The initial values are xT = (0.25, −0.25, −1.16, −1.560), and the cycle time is T = 0.682s. Trajectories are shown in ﬁgure 6, top.

6 Relationship of Springs and Dampers and Feedback Control Springs and dampers introduce inherent feedback in the system. The spring part is producing backdriving forces/or torques if the spring is not in its rest length (which is one of the parameters to be tuned in our robot), and the damper part always leads to a force/torque opposing the direction of relative motion. If one looks at the equations of the torques produced by a spring-damper element in parallel to an actuator in more detail Ti,spring,damper = ui + ki (∆i − θi ) − bi θ˙i

(47)

(where θi is the (relative) angle on which the torque acts) and compares this to the torques produced by a proportional controller with rate feedback, as shown in ﬁgure 7 Ti,f eedback = κi (θcomm,i − θi ) − βi θ˙i ,

(48)

it becomes clear that there is a equivalence if the characteristic parameters in both equations satisfy the following equalities:

416

K.D. Mombaur et al.

Figure 6. States and torques for a minimization of torques squared and bounded spectral radius for the biped robot with springs and dampers

θ comm + -

θ

Motor

+

c2

c1

-

Robot

c3

θ

Figure 7. Feedback controller with proportional control and rate feedback

κi = ki

(49)

βi = bi θcomm,i = θ¯comm,i + θ0,i

(50) (51)

θ0,i = ∆i ui θ¯comm,i = κi

(52) (53)

The command to the feedback controller can be split in one constant part corresponding to the spring oﬀset, and a variable part corresponding the motor torque divided by a constant. This also means that the command to the feedback controller θcomm,i can be interpreted as a command to change the zero location of the spring. If we look at two identical trajectories - one produced by a spring-damper element in parallel to a torque actuator, the other one produced by a proportional controller with rate feedback, satisfying the above equivalences -

Optimizing Spring-Damper Design in Human Like Walking

417

the stability properties of both solutions are the same since the dynamics are equivalent. Note however that in the ﬁrst case we are talking about open-loop stability while in the second case it is closed-loop stability. The stability properties of both versions are the same since with the equivalences listed above the dynamics are the same. However, if one looks at the amount of energy consumed, there is a signiﬁcant diﬀerence between the feedback version and the spring-damper version. The latter can store energy in the spring when the angle is not the zero position angle for the spring, and this energy can be given back to the motion - while the feedback version has no means of temporarily storing and releasing energy. On the other hand, the damper consumes energy while there is no such energy destruction in the feedback version. It is not clear a priori which version is better from the energy point of view. In both cases, the energy consumption could be minimized with diﬀerent results for the two cases.

7 Conclusions In this paper, we have investigated the role of passive spring and damper elements for the stabilization of a motion. As our computations for the example biped robot show, it is possible to achieve stable walking with and without springs and dampers As expected, the use of springs and dampers seems to be favorable: if measured in terms of the spectral radius criterion, much better - i.e. more stable - solutions can be achieved if springs and damper are included. The eﬀect of diﬀerent objective functions is studied, and it is shown that it is sometimes favorable to use stability as a constraint instead of an objective function. In order to achieve natural and stable walking motions it is important to optimally select parameters, controls and trajectories all at once. Open-loop sytems with springs and dampers in parallel to torque actuator can be compared to closed-loop systems with proportional controllers with rate feedback. For adjusted sets of parameters, two identical solutions of the two versions are equally stable since the have the same dynamic properties, but may be very diﬀerent in terms of energy consumption with the possibility of either version being more favorable.

Acknowledgments The ﬁrst author would like to gratefully acknowledge ﬁnancial support by the Margarete von Wrangell program of the State of Baden-W¨ urttemberg and by the Foundation Landesstiftung Baden-W¨ urttemberg. The second author would like to thank the Alexander von Humboldt Foundation for a research award for partial support of the research presented here.

418

K.D. Mombaur et al.

References 1. M. Ahmadi and M. Buehler. Stable control of a simulated one-legged running robot with hip and leg compliance. IEEE Transactions on Robotics and Automation, 13(1):96–104, Feb. 1997. 2. R. McNeill Alexander. Principles of Animal Locomotion. Princeton University Press, 2002. 3. H. G. Bock. Randwertproblemmethoden zur Parameteridentiﬁzierung in Systemen nichtlinearer Diﬀerentialgleichungen. In Bonner Mathematische Schriften 183. Universit¨ at Bonn, 1987. 4. H. G. Bock and K.-J. Plitt. A multiple shooting algorithm for direct solution of optimal control problems. In Proceedings of the 9th IFAC World Congress, Budapest, 242–247. International Federation of Automatic Control, 1984. 5. S. H. Collins, A. L. Ruina, R. Tedrake, and M. Wisse. Eﬃcient bipedal robots based on passive-dynamic walkers. Science, 307:1082 – 1085, 2005. 6. J. C. Hsu and A. U. Meyer. Modern Control Principles and Applications. McGraw-Hill, 1968. 7. D. B. Leineweber, I. Bauer, H. G. Bock, and J. P. Schl¨ oder. An eﬃcient multiple shooting based reduced SQP strategy for large-scale dynamic process optimization - part I: theoretical aspects. Comput. Chem. Engng, 27:157 – 166, 2003. 8. D. B. Leineweber, A. Sch¨ afer, H. G. Bock, and J. P. Schl¨ oder. An eﬃcient multiple shooting based reduced SQP strategy for large-scale dynamic process optimization - part II: software aspects and applications. Comput. Chem. Engng, 27:167 – 174, 2003. 9. T. McGeer. Passive dynamic walking. International Journal of Robotics Research, 9:62–82, 1990. 10. S. Mochon and T. S. McMahon. Ballistic walking. Biomechanics, 13:49 – 57, 1980. 11. K. D. Mombaur. Performing open-loop stable ﬂip-ﬂops - an example for stability optimization and robustness analysis of fast periodic motions. In Fast Motions in Robotics and Biomechanics - Optimization and Feedback Control, Lecture Notes in Control and Information Science. Springer, 2006. 12. K. D. Mombaur, H. G. Bock, J. P. Schl¨ oder, and R. W. Longman. Open-loop stability – a new paradigm for periodic optimal control and analysis of walking mechanisms. In Proceedings of IEEE CIS-RAM 04, Singapore, 2004. 13. K. D. Mombaur, H. G. Bock, J. P. Schl¨ oder, and R. W. Longman. Open-loop stable solution of periodic optimal control problems in robotics. ZAMM - Journal of Applied Mathematics and Mechanics/Zeitschrift f¨ ur Angewandte Mathematik und Mechanik, 85(7):499 – 515, July 2005. 14. K. D. Mombaur, H. G. Bock, J. P. Schl¨ oder, and R. W. Longman. Self-stabilizing somersaults. IEEE Transactions on Robotics, 21(6), Dec. 2005. 15. D. Papadopoulos and M. Buehler. Stable running in a quadruped robot with compliant legs. In Proceedings of IEEE International Conference on Robotics and Automation, 444–449, San Francisco, USA, April 2000. 16. G. A. Pratt, M. M. Williamson, P. Dillworth, J. Pratt, K. Uhland, and A. Wright. Stiﬀness isn’t everything. In Proceedings of International Symposium of Experimental Robotics ISER, Stanford, 1995.

Stability Optimization of Juggling Katja Mombaur1 , Peter Giesl2 , and Heiko Wagner3 1 2 3

IWR, Universit¨ at Heidelberg, Im Neuenheimer Feld 368, 69120 Heidelberg, Germany [email protected] Department of Mathematics, Mantell Building, University of Sussex, Falmer, Brighton, United Kingdom [email protected] Institut f¨ ur Sportwissenschaft, Universit¨ at M¨ unster, Horstmarer Landweg 62b, 48149 M¨ unster [email protected]

Abstract Biological systems like humans or animals have remarkable stability properties allowing them to perform fast motions which are unparalleled by corresponding robot conﬁgurations. The stability of a system can be improved if all characteristic parameters, like masses, geometric properties, springs, dampers etc. as well as torques and forces driving the motion are carefully adjusted and selected exploiting the inherent dynamic properties of the mechanical system. Biological systems exhibit another possible source of self-stability which are the intrinsic mechanical properties in the muscles leading to the generation of muscle forces. These eﬀects can be included in a mathematical model of the full system taking into account the dependencies of the muscle force on muscle length, contraction speed and activation level. As an example for a biological motion powered by muscles, we present periodic single-arm self-stabilizing juggling motions involving three muscles that have been produced by numerical optimization. The stability of a periodic motion can be measured in terms of the spectral radius of the monodromy matrix. We optimize this stability criterion using special purpose optimization methods and leaving all model parameters, control variables, trajectory start values and cycle time free to be determined by the optimization. As a result we found a self-stable solution of the juggling problem.

1 Introduction 1.1 Stability of Biological Motions In this paper, we investigate the stability of biological motion. Stability is the ability of a system to return to its original state after a disturbance. In the present study, we are not so much interested in static stability of postures or slow gaits of multi-legged creatures but rather in the dynamic stability of periodic trajectories belonging to fast dynamic motions like running or juggling. Biological systems like humans or animals have remarkable stability properties which allow them to perform motions at speeds much faster than

420

K. Mombaur et al.

their technicfal counterparts. Stability of these motions is not only achieved by the controlling actions of the neuronal system, but also by the intrinsic stabilizing mechanical properties of the musculoskeletal system. It has been shown, that biological movements are, at least partly, self-stabilizing – selfstability is comparable with open-loop stability used for technical systems. The self-stability of a system can be improved by diﬀerent measures. On the one hand, stability can be enhanced if all characteristic parameters, like masses, inertia, lengths, passive spring and damper constants are properly adjusted. On the other hand, it has been discovered recently, that for biological system the force generating mechanisms inside the muscles can be another source of self-stability. It is especially this second possibility that we will investigate in this paper in more detail on the basis of a simple example. 1.2 A Simple Juggling Example As a ﬁrst step we have chosen to study a simple example with a small number of degrees of freedom and muscles, cf. ﬁgure 1. In order to understand the self-stabilizing properties of muscles such an example may be more illuminating than complex multi-degree of freedom, multi-actuator models with many interactions and interdependencies. We study the most basic version of juggling using just one arm and one ball which is thrown in the air and then caught again resulting in a cyclic motion. We assume that the humerus is ﬁxed at some chosen position and that the ulna can rotate freely in the elbow joint. The model has one degree of freedom – the angle at the elbow – and it is powered by three muscles – two ﬂexor muscles, m. biceps, m. brachioradialis, and one extensor muscle m. triceps – which are the three most important muscles for this motion. Muscle forces depend on the current length, contraction speed and activation level. A detailed description of the arm and muscle models is given in section 2. An investigation of self-stability comes down to the question if the motion can be performed with eyes closed and touch sensors etc. “turned oﬀ”. All three muscles receive an invariable periodic input; if the system is self-stable, then the motion of the arm, hand and ball will always return to the original periodic orbit under the eﬀect of perturbations. 1.3 A Mathematical Approach Based on Optimization In order to produce self-stable motions for this juggling example we use a mathematical approach based on dynamic models of the system and special numerical optimization techniques. The question to be addressed is how to choose all free parameters of the models, i.e. the parameters of the rigid body system and of the muscles, and the command inputs of the system, i.e. the muscle activations, such that self-stability is optimized. A brief review of the stability criterion based on Lyapunov’s ﬁrst method as well as of the optimization algorithms that we developed to be able to perform this task is given in section 3. The advantage of such an optimization based approach is

Stability Optimization of Juggling

421

Figure 1. One-armed juggling example involving three muscles

that it can in principle be applied to systems of any level of complexity, if a correct mathematical description is given. 1.4 Review of Related Work Several authors have considered juggling motions with bouncing balls to investigate how the kinematics chosen by the subjects inﬂuenced the global stability of this task [23,24,26]. They showed that bouncing the ball while the racket head decelerated enhanced the stability of ball-bouncing. It has been known since several years that muscle actuators can contribute to stability and robustness of biomechanical motions. Based on a biomechanical model of the human leg, Bobbert and van Zandwijk [4] showed that, compared with a simple moment-driven model, muscle properties supported the robustness against perturbations of simulated squat-jumps. These results agreed with simulations performed with a human arm model (Loeb [15], Brown and Loeb [7], Cheng et al. [8]). Encouraged by these investigations, the self-stabilizing properties of muscles and their interaction with the skeletal system have been proved by applying a Lyapunovian stability analysis based on biomechanical models of a single muscle (Siebert et al. [25]), the human leg (Wagner & Blickhan [28,29]) the human arm (Giesl et al. [10]), and the human trunk (Wagner et al. [27]). The self-stabilizing properties of musculoskeletal systems in humans and animals are embedded within the motor control system and support the global stability of complex movements [2,3,9]. Mombaur et al. [19] have developed numerical stability optimization techniques and applied them to improve the open-loop stability properties of robotic conﬁgurations [17, 18, 20, 21] tuning rigid body system parameters as well as motor torque histories. 1.5 What is New in This Paper? This paper presents the ﬁrst stability optimization for a biomechanical system including muscles, determining at the same time optimal parameters -

422

K. Mombaur et al.

for the mechanical system as well as for the muscles, within the possible ranges of variation - and optimal muscle inputs. While previous work in biomechanics was usually focused on analyzing stability of a given static position, or a given orbit, here the periodic motion is not a priori ﬁxed but determined by the optimization. In contrast to previous investigations on the sensitivity of stability with respect to single parameter or state variations, this time many free quantities are varied and optimized at the same time. From the juggling point of view, the considered example is quite simple, but the present paper serves to demonstrate the principal feasibility of stability optimization to biomechanical problems. Since the applied techniques are by far not driven to their limits by this problem size, they can in principle be applied to much more complex juggling examples, or other types of biomechanical motions.

2 Mathematical Models of Juggling and Muscle Forces In this section, we give an explanation of the mathematical models used for this investigation. We start by describing the coordinates and equations of motion for the full system in section 2.1, and then discuss the equations for computing muscle forces and torques in section 2.2. The model is a modiﬁed version of the model used for the static analysis in Giesl et al. [10] and Giesl and Wagner [11]. 2.1 Equations of Motion for the Juggling Arm The arm consists of three rigid bodies, the upper arm (humerus), the lower arm (ulna/radius) and the hand, cf. ﬁgure 1. We assume for this study, that the position of the upper arm is ﬁxed throughout the motion, but can be freely chosen (parameter α). The orientation of the catching and throwing hand is assumed to be always horizontal such that the only degree of freedom to be considered for the arm is the angle at the elbow β between ulna and humerus. For the ball, we only consider the motion in vertical direction described by the coordinate z. For the motion of the system in state space, we therefore use ˙ z). ˙ The the following vector, containing positions and velocities, xT = (β, z, β, motion consists of two phases: a ﬂight phase where the ball is in the air and ball and arm move independently, and a contact phase in which the motions of arm and ball are coupled. The second-order equations of motion during the ﬂight phase are: θu+h,e β¨ = Tmuscles − 0.5 mu+h g lu+h sin(α − β) z¨ = −g

(1) (2)

where mu+h and lu+h are the mass and the length of the lower arm, including the hand with (3) lu+h = lulna + lhand

Stability Optimization of Juggling

423

and θu+h,e is the corresponding moment of inertia about the elbow θu+h,e =

1 2 mu+h lu+h . 3

(4)

The computation of Tmuscles , the sum of torques of all muscles acting on the lower arm, will be discussed in the next section. During the contact phase, there is an additional algebraic constraint coupling the two degrees of freedom, leading to a third equation for the Lagrange multiplier λ. The equations of motion for this contact phase become ⎞ ⎛ ⎞ ⎛ 0 −lulna sin(α − β) θu+h,e β¨ ⎠ · ⎝ z¨ ⎠ = ⎝ 1 0 mb 0 −lulna sin(α − β) 1 λ ⎛ ⎞ Tmuscles − 0.5 mu+h g lu+h sin(α − β) ⎝ ⎠ −mb g (5) 2 ˙ −lulna cos(α − β)β with mb being the mass of the ball. At the time of the inelastic impact of the ball in the palm ts , i.e. at the start of the contact phase, there generally is a discontinuity in the velocities (before impact, the ball moves downward faster than the hand, and afterwards they move at a common speed). Velocities (β˙ + , v˙ + ) after impact are computed by means of the equations β˙ + =

b1 z˙ + θu+h,e β˙ b1 lulna sin(α − β) + θu+h,e

(6)

with b1 = (lulna sin(α − β) − 0.5 lhand )mb

(7)

z˙ + = lulna sin α − β β˙ +

(8)

and Lift-oﬀ of the ball is assumed to be smooth. We aim at ﬁnding periodic solutions for this model such that periodicity constraints are applied x(T ) = x(0)

(9)

with the cycle time T = Tcontact + Tf light being the sum of the two phase times. 2.2 Modeling Muscle Forces In this section, we formulate the equations to compute the torques generated by all three muscles Tmuscles = Tbiz + Tbra + Ttri

(10)

424

K. Mombaur et al.

where in all three cases the torques are the product of the muscle force and the lever arm h. The muscle models to compute the force of the muscle consist of the force-velocity relation, the force-length relation and an activation term. There are certain diﬀerences between the ﬂexors (f lex = biz, bra) and the extensors (ext = tri) such that we treat them separately. The torque of the ﬂexors is computed as Tf lex = hf lex · act · fl · fv

(11)

hf lex = −kh · ku · sin(β)/lm .

(12)

with moment arm This results from the assumption that the muscles and tendons form a straight line between the origins kh and insertion points ku which, for the two ﬂexor muscles in this problem, take the following values: kh,biz = lhum

(13)

kh,bra = 0.104lhum ku,biz = 0.186lulna ku,bra = lulna .

(14) (15) (16)

The current muscle length lm is computed as lm = kh2 + ku2 − 2 · kh · ku · cos β.

force−length relation

(17)

force−velocity relation

1

3000

0.9 2500

0.8 0.7

2000 force [N]

force [%]

0.6 0.5

1500

0.4 1000

0.3 0.2

500

0.1 0 0.23

0.24

0.25

0.26 0.27 muscle length [m]

0.28

0.29

0.3

0 −1.5

−1

−0.5

0 velocity [m/s]

0.5

1

1.5

Figure 2. Force-length and force-velocity relationship

The qualitative force-length and force-velocity relationships are shown in ﬁgure 2. The corresponding equation for the ﬂexor muscles is fl = ga · arctan(gb · (zz − gc )) + gd with ga = 1.09/π, gb = 15, gc = 0.6, gd = 0.49 and

(18)

Stability Optimization of Juggling

425

=

8 kh2 + ku2 − 2 · kh · ku · cos( π) 9 = 1 l3 = kh2 + ku2 − 2 · kh · ku · cos( π) 3 lm − l2 zz = z 0 · +1 l2 − l 3 l2 =

(19) (20) (21)

As a model for the force-velocity dependency we use so-called Hill-type functions which are very common in literature [12, 13]. The excentric and concentric parts of the motions follow diﬀerent rules, but the transition is twice continuously diﬀerentiable in the model we chose: c − a if v ≤ 0 (−v + b) C B + fv = A + (−v − D) (−v − D)2

(22)

fv =

else

(23)

with contraction speed v = −hf lex · β˙

(24)

and the abbreviations √

fiso · pmax √ √ √ √ pmax + fiso · vmax ) · vmax √ pmax · vmax √ b= √ √ √ fiso · (−2 · pmax + fiso · vmax ) √ 2 √ √ − pmax + fiso · vmax √ c = pmax · √ √ −2 · pmax + fiso · vmax A = fexz · fiso = b3 · fiso · (fexz − 1) D = −b + b2 + c c B = (3b + 2D) · (fiso (fexz − 1) − 2D 2 ) b C = D · (fiso · (fexz − 1) · (D − b) + 2 · D · c/b) a=

(−2 ·

(25) (26) (27) (28) (29) (30) (31)

In the case of the extensor muscle, the torque produced is computed as Text = hext · act · fl · fv

(32)

The muscle and tendon can obviously not form a straight line since they must be bent at the elbow bone. The moment arm hext can be computed by the equation hext = lulna ·(0.0993+0.0371·(π−β)−0.0524·(π−β)2 +0.0147·(π−β)3 ) (33)

426

K. Mombaur et al.

Following [22], the force of the extensor muscle does not depend on the length of the muscle, and thus we set fl = 1. The force-velocity relationship obey the same rules (eqns. (22) - (31)). For all three muscles, the activation levels are used as free controls, i.e. input variables the optimal history of which is to be determined via the solution of the optimal control problem. The activation levels denote fractions of the maximal forces and thus may only attain values between 0 and 1.

3 Numerical Stability Optimization of Periodic Motions The task of determining optimal juggling motions is formulated as multiphase optimal control problem of the following form: T φ(x(t), u(t), p) dt + Φ(T, x(T ), p) (34) min x,u,p,T

s. t.

0

x(t) ˙ = fj (t, x(t), u(t), p) for t ∈ [τj−1 , τj ], j = 1, ..., nph , τ0 = 0, τnph = T (35) for t ∈ [τj−1 , τj ] gj (t, x(t), u(t), p) ≥ 0 + − x(τj ) = h(x(τj )) for j = 1, ..., nph req (x(0), .., x(T ), p) = 0 rineq (x(0), .., x(T ), p) ≥ 0

(36) (37) (38) (39)

x denotes the state variables, u the control variables, p the parameters, t the physical time, T the duration of a juggling cycle, and nph the number of model phases (in this case two). In the juggling model investigated in this paper, there are altogether 18 parameters p to be modiﬁed (within suitable bounds) by the optimization: mb , α, mu+h , lulna , lhum , fiso,biz , fiso,bra , fiso,ext , vmax,biz , vmax,bra , vmax,ext , pmax,biz , pmax,bra , pmax,ext , fexz,biz , fexz,bra , fexz,ext , z0,biz , z0,bra (for an explanation of the physical meaning of these parameters, see previous section). There are three free control variables u, namely the activation levels of all three muscles. Besides, the phase times and the initial values of all four state variables are free for optimization. We have evaluated several diﬀerent objective functions for the juggling problem. While achieving self-stable juggling is the main goal of our computations, we have observed that using stability as sole objective functions does in general not lead to natural and eﬃcient solution. We therefore use either a combination of stability and some energy or eﬃciency related measure in the objective function, or formulate stability as a constraint. Energy input can either be qualitatively described as the integral over the squared muscle activations or computed via the mechanical energy produced by the muscles. Stability is deﬁned in terms of the spectral radius of the Jacobian X of the Poincar´e map associated with the periodic solution.

Stability Optimization of Juggling

|λ(X(T ))|max < 1 with X(T ) =

dx(T ) dx(0)

427

(40)

This criterion describes the robustness of the periodic solution against small perturbations. If the spectral radius is smaller than one, then the solution is asymptotically stable, and if it is larger than one, then the solution is unstable. We have proven that this criterion based on linear theory and typically applied to simple smooth systems can also be used to characterize the stability of solutions of a nonlinear multiphase system with discontinuities (Mombaur [16], Mombaur et al. [19]). For the computation of this stability criterion, ﬁrst order derivatives of the trajectory are required. These are generated by augmenting the model dynamics x(t) ˙ = f (t, x(t), u(t), p) (41) (i.e. the ﬁrst order versions of eqns. (1)/(2) and (5) respectively, by the corresponding variational diﬀerential equation ˙ X(t) = fx (t, x(t), u(t), p) · X(t, x(t), u(t), p),

(42)

for which, at the discontinuity points, an update of the following form has to be performed (compare [5]): X + (ts ) = Ux · X − (ts ) 1 with Ux = ∆RHS − Jt − Jx f − (ts ) · (sx )T + I + Jx s˙

(43) (44)

We solve this optimal control problem using a variant of the optimal control code MUSCOD (Bock & Plitt [6], Leineweber [14]) suited for periodic gait problems. It is based on • •

a direct method for the optimal control problem discretization (also known as “ﬁrst-discretize-then-optimize” approach) using control base functions with local support, in this case piecewise constant functions and a multiple shooting state parameterization which transforms the original boundary value problem into a set of initial value problems with corresponding continuity and boundary conditions and allows to introduce knowledge about the trajectory as initial guesses for the solution.

Both discretization steps are performed on identical grids, leading to a large but very structured non-linear programming problem (NLP). For the solution of this discretized problem an eﬃcient tailored SQP algorithm is used that exploits the particular structure of the problem (also compare Leineweber [14]). We would like to point out that this approach still includes a simulation of the full problem dynamics on each of the multiple shooting intervals. This is performed simultaneously to the NLP solution using fast and reliable integrators also capable of an eﬃcient and accurate computation of sensitivity information (Bock [5]). Although the optimization may sometimes lead to points at which the stability objective function/constraint is non-diﬀerentiable, we

428

K. Mombaur et al.

have observed that the SQP approach works well and delivers descent based on ﬁnite-diﬀerence gradients despite the violation of its assumptions on the diﬀerentiability of functions. It is much more eﬃcient and produces better optimization results than our previous two-level optimization methods that took into account the non-diﬀerentiability of the objective function and used a split of variables [19]. The ﬁrst author is currently performing more detailed studies on descent and convergence properties of the algorithms for spectral radius optimization.

4 Numerical Solution: Self-Stable Juggling In this section we present the optimal solution found using problem formulation and numerical solution techniques as outlined in the previous sections. We have chosen a piecewise continuous discretization of the control functions and the same grid of 15 intervals for ﬂight phase and contact phase for both control and state discretization (the grids represent relative equidistant splits of each phase, however the grid points are not a priori ﬁxed timewise since both phase times are free variables in the optimization). For the results presented here we have applied the following optimization objectives: in a ﬁrst optimization run, an optimally stable solution has been determined which resulted in a spectral radius of 0.045. Since this turned out to be much lower than required, in a second run “some stability” has been sacriﬁced in order to produce a more eﬃcient solution in terms of muscle activations. Starting from the ﬁrst result, the spectral radius was allowed to go up to 0.5, and control inputs were minimized. The spectral radius of the resulting solution was in fact at the imposed boundary of 0.5 which is still safely enough below the critical border of one. The computations show that self-stable single handed juggling is possible, and can even be achieved with little eﬀort. The results presented in the text and the ﬁgures of this paper are those of the second optimization run.

Figure 3. Muscle activation levels (0 ≤ ui ≤ 1) for biceps, brachioradialis and triceps muscles for optimal juggling solution

Stability Optimization of Juggling

429

Figure 4. State variable histories for optimal solution (lower arm angle β in rad, ball height z in m, and corresponding velocities β˙ in rad/s and z˙ in m/s) for one cycle consisting of a contact phase and a ﬂight phase (the line at ≈ 0.292s marks lift-oﬀ)

The required control inputs for the optimal solution are shown in ﬁgure 3. Figure 4 shows the resulting periodic trajectories of all four state variables. We recall that the observed cycle starts with the contact phase, followed by the ﬂight phase of the ball (the start of which is marked by the vertical line in the middle of all plots), and ends with the touchdown discontinuity which in particular becomes visible in the jumps of β˙ and z. ˙ The initial values of the juggling cycle are xT (0) = (1.712, −0.0364, 2.496, −0.6425)T . The phase times of this optimal solution are Tcontact = 0.292s, Tf light = 0.370s It is important to note that the solution has a pronounced ﬂight phase which is longer than the contact phase and a large throwing height. In contrast to what one might expect, stability optimization does make the ball stay as close as possible to the hand and does not lead to a reduced throwing height. The solution shows certain characteristics observed in juggling motions of skillful jugglers, cf. [1]: • •

the ball is caught while the hand is moving down the point where a ball is released is higher than the point where it is caught

430

K. Mombaur et al.

Caused by the choice of the optimization criterion, the motion is characterized by very little co-activation. The optimal values of all model parameters are given in the following table:

Parameter name m∗b α m∗u+h ∗ lulna ∗ lhum fiso,biz fiso,bra fiso,ext vmax,biz vmax,bra

value 0.1 0.0004 1.628 0.26 0.27 2149.8 1678.1 3346.9 1.481 1.562

Parameter name vmax,ext pmax,biz pmax,bra pmax,ext fexz,biz fexz,bra fexz,ext z0,biz z0,bra

value 0.984 305.5 346.2 500.1 1.347 1.317 1.494 0.395 0.271

Table 1. Parameter values for optimal juggling solution (ﬁxed quantities are marked by an asterisk∗)

All free parameters are modiﬁed, but none reaches its speciﬁed bounds. It is hard to explain intuitively why the parameters change during optimization in the observed manner and why the result is this particular combination of parameters. From our experience with stability optimization we know that the developments of single parameters sometimes even may seem against intuition, but this is caused by the cross-eﬀects in the multi-dimensional parameter space.

Figure 5. One cycle of optimal juggling

Stability Optimization of Juggling

431

5 Conclusions In this paper we have presented a stability optimization of a biomechanical model including a rigid body system and three muscles driving this motion. Time constant parameters of both the bodies and the actuators, as well as the muscle activations and characteristics of the periodic trajectories have been modiﬁed during optimization in order to produce self-stable motions. This shows that the self-stabilizing properties of muscle actuators can also be exploited in the dynamic case. For the simple juggling example investigated, we have shown that self-stable solutions exist, and the resulting optimal motions reveal several characteristics of skillful juggling. Since the numerical techniques applied in this study are capable to solve much larger problems, we plan to extend the investigation of stable juggling to more complex problems involving both hands and more balls.

Acknowledgments The work of Katja Mombaur is supported by the Margarete von Wrangell program of the State of Baden-W¨ urttemberg and by the Foundation Landesstiftung Baden-W¨ urttemberg.

References 1. R. J. Beek. Juggling Dynamics. Free University Press, Amsterdam, 1989. 2. R. Blickhan, A. Seyfarth, H. Geyer, S. Grimmer, H. Wagner, and M. G¨ unther. Intelligence by mechanics. Philos Transact A Math Phys Eng Sci, 2007. 3. R. Blickhan, H. Wagner, and A. Seyfarth. Brain or muscles? recent research developments in biomechanics. S. G. Pandalai. Trivandrum, India, Transworldresearch, 1:215 – 245, 2003. 4. M. F. Bobbert and J. P. van Zandwijk. Sensitivity of vertical jumping performance to changes in muscle stimulation onset times: a simulation study. Biol Cyb, 81(2):101 – 108, 1999. 5. H. G. Bock. Randwertproblemmethoden zur Parameteridentiﬁzierung in Systemen nichtlinearer Diﬀerentialgleichungen. In Bonner Mathematische Schriften 183. Universit¨ at Bonn, 1987. 6. H. G. Bock and K.-J. Plitt. A multiple shooting algorithm for direct solution of optimal control problems. In Proceedings of the 9th IFAC World Congress, Budapest, pages 242–247. International Federation of Automatic Control, 1984. 7. I. E. Brown and G. E. Loeb. A reductionist approach to creating and using neuromusculoskeletal models, In Neuro-Control of posture and movement, J. Winters, P. Crago (eds.), Springer-Verlag, 148–163, 2000. 8. E. J. Cheng, I. E. Brown, and G. E. Loeb. Virtual muscle: a computational approach to understanding the eﬀects of muscle properties on motor control. J Neurosci Methods, 101(2):117 – 130, 2000. 9. M. H. Dickinson, C. T. Farley, R. J. Full, M. A. Koehl, R. Kram, and S. Lehman. How animals move: an integrative view. Science, 288(5463):100 – 106, 2000. 10. P. Giesl, D. Meisel, J. Scheurle, and H. Wagner. Stability analysis of the elbow with a load. J Theor Bio, 228:115 – 125, 2004.

432

K. Mombaur et al.

11. P. Giesl and H. Wagner. Lyapunov functions and the basin of attraction for a single joint muscle-skeletal model. J Math Biol., 54(4):453 – 464, 2007. 12. A. V. Hill. The heat of shortening and the dynamic constants of muscle. Proc. Royal Society (B), 126:136 – 195, 1938. 13. A. V. Hill. First and last experiments in muscle mechanics. Cambridge University Press, 1970. 14. D. B. Leineweber. Eﬃcient Reduced SQP Methods for the Optimization of Chemical Processes Described by Large Sparse DAE Models. PhD thesis, University of Heidelberg, 1999. VDI-Fortschrittbericht, Reihe 3, No. 613. 15. G. E. Loeb. Control implications of musculosceletal mechanics. In Annual International Conference IEEE-EMBS, volume 17, pages 1393 – 1394, 1995. 16. K. D. Mombaur. Stability Optimization of Open-loop Controlled Walking Robots. PhD thesis, University of Heidelberg, 2001. www.ub.uni-heidelberg.de/ archiv/1796 VDI-Fortschrittbericht, Reihe 8, No. 922. 17. K. D. Mombaur. Performing open-loop stable ﬂip-ﬂops - an example for stability optimization and robustness analysis of fast periodic motions. In Fast Motions in Robotics and Biomechanics - Optimization and Feedback Control, Lecture Notes in Control and Information Science. Springer, 2006. 18. K. D. Mombaur, H. G. Bock, J. P. Schl¨ oder, and R. W. Longman. Humanlike actuated walking that is asymptotically stable without feedback. In Proceedings of IEEE International Conference on Robotics and Automation, pages 4128 – 4133, Seoul, Korea, May 2001. 19. K. D. Mombaur, H. G. Bock, J. P. Schl¨ oder, and R. W. Longman. Open-loop stable solution of periodic optimal control problems in robotics. ZAMM - Journal of Applied Mathematics and Mechanics/Zeitschrift f¨ ur Angewandte Mathematik und Mechanik, 85(7):499 – 515, July 2005. 20. K. D. Mombaur, H. G. Bock, J. P. Schl¨ oder, and R. W. Longman. Self-stabilizing somersaults. IEEE Transactions on Robotics, 21(6), Dec. 2005. 21. K. D. Mombaur, R. W. Longman, H. G. Bock, and J. P. Schl¨oder. Open-loop stable running. Robotica, 23(01):21 – 33, 2005. 22. W. Murray, T. Buchanan, and D. Scott. The isometric functional capacity of muscle that cross the elbow. Journal of Biomechanics, 33:943 – 952, 2000. 23. S. Schaal and C. G. Atkeson. Open loop stable control strategies for robot juggling. In IEEE International Conference on Robotics and Automation, pages 913 – 918, 1993. 24. S. Schaal, D. Sternad, and C. G. Atkeson. One-handed juggling: A dynamical approach to a rhythmic movement task. Journal of Motor Behavior, 28(2):165 – 183, 1996. 25. T. Siebert, H. Wagner, and R. Blickhan. Not all oscillations are rubbish: Forward simulation of quick-release experiments. JMBB, 3(1):107 – 122, 2003. 26. D. Sternad, M. Duarte, H. Katsumata, and S. Schaal. Bouncing a ball: tuning into dynamic stability. J Exp Psychol Hum Percept Perfo, 2001. 27. H. Wagner, C. Anders, C. Puta, A. Petrovitch, F. Morl, N. Schilling, H. Witte, and R. Blickhan. Musculoskeletal support of lumbar spine stability. Pathophysiology, 12(4):257 – 265, 2005. 28. H. Wagner and R. Blickhan. Stabilizing function of sceletal muscles: an analytical investigation. J Theor Biol, 199(2):163 – 179, 1999. 29. H. Wagner and R. Blickhan. Stabilizing function of antagonistic neuromusculoskeletal systems - an analytical investigation. Biol. Cybernetics, 89:71 – 79, 2003.

Numerical Model of Far Turbulent Wake Behind Towed Body in Linearly Stratiﬁed Media N. P. Moshkin1 , G. G. Chernykh2 , and A. V. Fomina3 1 2 3

Suranaree University of Technology, School of Mathematics, Nakhon Ratchasima, 30000, Thailand [email protected], Institute of Computational Technologies, Siberian Division of Russian Academy of Sciences, Novosibirsk, 630090, Russia [email protected] Kuzbass State Academy of Pedagogy, Novokuznetsk, 654066, Russia [email protected]

Abstract Based on three-dimensional parabolized system of averaged equations for motion, continuity, incompressibility, the energy of turbulence, the rate of dissipation transfer and the splitting method the numerical model of dynamics of turbulent wake behind towed body in linearly stratiﬁed ﬂuid has been constructed. The turbulent wake behind towed body has signiﬁcantly larger size and excess pressure compared with momentumless wake. This phenomenon is explained by production of turbulent energy due to the gradient of longitudinal velocity vector components in drag wake. The anisotropy decay of axial values of the dispersion of turbulent ﬂuctuations of the horizontal and vertical velocity components in the case of drag and momentumless wakes is compared. The constructed model is applied to numerical modelling of dynamics of turbulent wake with small total excess momentum in linearly stratiﬁed media.

1 Introduction A ﬂow that arises in a turbulent wake behind a body that moves in a stratiﬁed ﬂuid is rather peculiar. With a relatively weak stratiﬁcation a turbulent wake ﬁrst develops essentially in the same way as in a homogeneous ﬂuid and extends symmetrically. However, buoyancy forces oppose vertical turbulent diﬀusion. Therefore a wake has ﬂattened form at large distances from the body and, ﬁnally, ceases to extend in a vertical direction. Because of turbulent mixing the ﬂuid density within the wake is distributed more uniformly than outside it. Buoyancy forces tend to restore the former unperturbed state of a stable stratiﬁcation. As a result, convective ﬂows, which give rise to internal waves in an ambient ﬂuid, arise in the plane perpendicular to the wake axis.

434

N.P. Moshkin et al.

Turbulent wakes behind bodies of revolution in stratiﬁed ﬂuids have been considered in many publications. The literature abounds in studies of this ﬁeld. A number of articles have suﬃciently complete references review on this issue. Some of the more recent are [1–5]. The analysis of known mathematical models of turbulent wakes past bodies in stratiﬁed ﬂuid speciﬁes insuﬃcient completeness of models of turbulent wake past towed bodies for distance t ≤ 10T (T −period of Brunt-Vaisala) behind the body. There are no detailed comparison of wake characteristics behind towed and self-propelled bodies. In the present work the numerical approach which is based on the ﬁnite diﬀerence splitting methods for the 3-D parabolized system of equations is utilized to ﬁll some gaps in the numerical modeling.

2 Problem Formulation To describe the ﬂow in a far turbulent wake of a body of revolution in a stratiﬁed medium the following parabolized system of the averaged equations for the motion, continuity and incompressibility in the Oberbeck-Boussinesq approach is used: ∂Ud ∂Ud ∂Ud ∂ ∂ +V +W = u v + u w , ∂x ∂y ∂z ∂y ∂z

(1)

∂V ∂V ∂V 1 ∂ p1 ∂ > 2 ? ∂ +V +W =− − v − v w , ∂x ∂y ∂z ρ0 ∂y ∂y ∂z

(2)

U0 U0 U0

∂W ∂W 1 ∂ p1 ∂ ∂W ρ1 ∂ > 2 ? +V +W =− − v w − w −g , (3) ∂x ∂y ∂z ρ0 ∂z ∂y ∂z ρ0

U0

∂ ρ1 ∂ ρ1 dρs ∂ ∂ ρ1 ∂ +V +W +W =− v ρ − w ρ , ∂x ∂y ∂z dz ∂y ∂z

(4)

∂V ∂W ∂Ud + = . ∂y ∂z ∂x

(5)

In equations (1)-(5), U0 is the free stream velocity; Ud = U0 − U is the defect of the mean free stream velocity component; U, V, W, are velocity components of the mean ﬂow in the direction of the axes x, y, z; p1 is the deviation of the pressure from the hydrostatic one conditioned by the stratiﬁcation ρs ; g is the gravity acceleration; ρ1 is the mean density defect: ρ1 = ρ − ρs , ρs = ρs (z) is the undisturbed ﬂuid density: dρs dz ≤ 0 (stable stratiﬁcation), ρ0 = ρs (0); the dash denotes the pulsation components; the symbol · denotes the averaging. The coordinate system is related to the moving body in such a way that the velocity of its motion is equal to −U0 , and the z-axis is directed vertically upwards, in the counter-gravity direction. The ﬂuid density is assumed to be a linear function of temperature and the stratiﬁcation is assumed to be weak.

Numerical Model of Far Turbulent Wake

435

Both small items involving the derivative with respect to the variable x and the factors in the form of a coeﬃcient of laminar viscosity or diﬀusion have been omitted in the right hand sides of equations (1)-(4). The modiﬁed e − ε model of turbulence is used for the system to be closed. 2 In this model the unknown values of the Reynolds stresses ui , i = 1, 2, 3, u v = u1 u2 , u w = u1 u3 and the turbulent ﬂuxes ui ρ , i = 1, 2, 3, are determined by the algebraic approximations (see [1, 6, 7] ): ui uj 2 2 P 2 G 1 − c2 Pij 1 − c3 Gij = δij + − δij − δij + , (6) e 3 c1 ε 3 ε c1 ε 3 ε ) ∂Uj ∂Ui + uj uk , (7) Pij = − ui uk ∂xk ∂xk Gij =

1 (ui ρ gj + uj ρ gi ), i, j, k = 1, 2, 3; ρ0

g = (0, 0, −g), 2P = Pii , 2G = Gii , U1 = U, U2 = V, U3 = W,

e ∂ρ ∂U −u ρ = + (1 − c2T )w ρ u w , c1T ε ∂z ∂z −v ρ =

=

(9) (10)

∂ρ v 2 e ∂ρ = Ky , c1T ε ∂y ∂y

(11)

2 e ∂ρ w ρ , cT ε ∂z

(12)

ρ2 = − −w ρ =

(8)

∂ρ g + (1 − c2T ) ρ2 = w2 C1T ε ∂z ρ0 e

(13)

∂ρ ew2 ∂ρ = Kρz . 2 ∂z 1 − c2T g e ∂ρ ∂z c1T ε 1 − 2 c1T cT ρ0 ε2 ∂z

Here and below the summation is assumed over repeating indices. To determine the values of the turbulent kinetic energy e, the dissipation ε and the shear Reynolds stress v w we make use of the diﬀerential equations U0 U0

∂e ∂e ∂ ∂ ∂e ∂e ∂e +V +W = Key + Kez + P + G − ε, ∂x ∂y ∂z ∂y ∂y ∂z ∂z

(14)

∂ε ∂ε ∂ ∂ ∂ε ∂ε ∂ε ε ε2 +V +W = Kεy + Kεz + cε1 (P + G) − cε2 , (15) ∂x ∂y ∂z ∂y ∂y ∂z ∂z e e U0 +

∂v w ∂v w ∂v w ∂v w ∂ +V +W = Key + ∂x ∂y ∂z ∂y ∂y

∂ ∂v w ε Kez + (1 − c2 )P23 + (1 − c3 )G23 − c1 v w , ∂z ∂z e

(16)

436

N.P. Moshkin et al.

where the turbulent viscosity coeﬃcients are deﬁned from simpliﬁed relation (6) as follows 2 1 − c2 ev Key , Kεy = , Key = · c1 ε σ

(1 − c3 )(1 − c2T ) e2 g 2 w ρ (1 − c2 )ew − Kez c1T ε ρ0 Kez = , ; Kεz = σ (1 − c3 ) g e2 ∂ρ c1 ε 1 − c1 c1T ρ0 ε2 ∂z So that −u v = Key

∂U ∂U , −u w = Kez . ∂y ∂z

The quantities c1 , c2 , c3 , c1T , c2T , cT , cε1 , cε2 , σ are empirical constants. Their values are taken to be equal to 2.2, 0.55, 0.55, 3.2, 0.5, 1.25, 1.45, 1.9, 1.3, respectively. The choice of this model of turbulence is due to the following reasons: it is close to the standard e − ε model of turbulence and we can take into account the anisotropy of the turbulence characteristics in the wakes in stratiﬁed ﬂuid. The marching variable x in equations (1)-(4), (14)-(16) plays the role of time. At the distance x = x0 from the body the following initial conditions were speciﬁed Ud (x0 , y, z) = Θ1 (r), e(x0 , y, z) = Θ2 (r), ε(x0 , y, z) = Θ3 (r), r2 = y 2 + z 2 , 2 e, −∞ < y, z < ∞. 3 Here Θ1 (r), Θ2 (r) and Θ3 (r) are the functions consistent with the experimental data of Lin and Pao [8, 9] in the homogeneous ﬂuid. At r → ∞ the free stream conditions were speciﬁed v w = ρ1 = V = W = 0, u2 = v 2 = w2 =

Ud = V = W = ρ1 = e = ε = v w = 0,

x ≥ x0 .

(17)

From the symmetry considerations the solution was determined only in the ﬁrst quadrant of the (y, z) plane with using the following boundary conditions v w =

∂W ∂Ud ∂e ∂ε ∂ ρ1 =V = = = = = 0, y = 0, z ≥ 0, ∂y ∂y ∂y ∂y ∂y

∂Ud ∂e ∂ε ∂V = = = = 0, z = 0, y ≥ 0. ∂z ∂z ∂z ∂z In the numerical solution of the problem the boundary conditions (17) corresponding to r → ∞ were translated to the boundaries of a suﬃciently large rectangle 0 ≤ y ≤ y∗ ; 0 ≤ z ≤ z∗ . The problem variables can be made dimensionless by using the characteristic length D, the body diameter, and the velocity scale U0 . As a result the v w = ρ1 = W =

Numerical Model of Far Turbulent Wake

437

value 4π 2 /F r2 will appear in the dimensionless equations instead of g, where F r is the density Froude number deﬁned as 1 dρs 2π 1 U0 T , T = √ = , a=− , Fr = D ag N ρ0 dz where T , N are the Brunt-Vaisala period and frequency. For the interpretation of the computational results, it is convenient to introduce the time t related to the distance from the body t=

x xD x∗ t = = . , t∗ = U0 T U0 DT Fr

The symbol ∗ in the upper position here and everywhere further denotes dimensionless quantities.

3 Algorithm of Problem Solution For the construction of a ﬁnite diﬀerence algorithm the new independent variables are introduced x = x, ξ = χ1 (y), η = χ2 (z), (x = x , y = φ1 (ξ), z = φ2 (η)).

(18)

This mapping is used to transform the nonuniform mesh in physical space (x, y, z) into a uniform rectangular mesh in a computational domain (x , ξ, η). The functions φ1 and φ2 were speciﬁed in a tabular form; their choice enabled us to condense the grid nodes in the turbulent wake neighborhood. In the computational domain (x , ξ, η) the nodes of the mesh in the (ξ, η) plane are distributed uniformly: ξi = i·ξ, ηj = j·η, i = 0, . . . , M1 , j = 0, . . . , M2 ,, ϕ1 (ξM1 ) = y∗ , ϕ2 (ηM2 ) = z∗ . It should be mentioned again that variable x plays the role of time and the equations are integrated by marching in x direction. The stepsize x in the marching direction is chosen a priori (not adaptively) variable. The upper index n denotes the values of the variables in the marching direction. Algorithm of the problem solution is based on the implicit splitting into space variables for the equations (1), (4), (14)-(16). Equations (2), (3), and (5) for the mean deviation of pressure and velocity vector components are similar to 2-D incompressible Navier-Stokes equations. These equations are solved by an explicit splitting into physical processes method [10]. The calculations are eﬀected in following sequential manner. First, velocity defect Ud calculated by means of integration of equation (1). In the second place, the values V n+1 , W n+1 , and p1 n+1 were determined on the basis of equations (2), (3), (5). In the solution of these equations the terms ρ1 , e, ε, v w appearing in the equations were set from the previous level on x. This stage is based on explicit method of the splitting into the physical processes. At the last stage, the functions ρ1 , e, ε, v w were

438

N.P. Moshkin et al.

calculated from appropriate equations. The ﬁnite diﬀerence analogs of these equations are not shown here due to their bulky form. To make computer implementation simpler we have used the idea of well-known “block” analog of the Gauss-Seidel. At the computation of the functions Ud , ρ1 , e, ε, v w at the new level n + 1 we have used the quantities already known at this level; the remaining functions were taken from the level n.

4 Computational Results In order to check the accuracy and the eﬃciency of the mathematical model and numerical algorithm we carried out a series of numerical experiments. The calculations are conducted on a grid sequence and are compared with experimental data of Lin and Pao [8, 9] on the decay of the momentumless and drag turbulent wakes in a linearly stratiﬁed medium. At x = x0 the initial conditions were speciﬁed according to Hassid [9], which agreed with the experimental data of Lin and Pao on the decay of a turbulent wake in a homogeneous ﬂuid. The main calculations were performed on a grid with 72 × 37 nodes in the yz-plane. The nodes of the grid in domain were distributed as follows yi = i · hy , i = 0, . . . , 31; yi = yi−1 · qy , i = 32, . . . , 72, qy = 1.06, zj = j · hz , j = 0, . . . , 11; zj = zj−1 · qz , j = 12, . . . , 37, qz = 1.113. where hy = hz = 0.075. The step in marching direction hnx was varied from = 2.0 by the formula hn+1 = hnx + 0.055 and was further h0x = 0.055 to hmax x x assumed to be constant. The reﬁnement of the mesh cell sizes in the wake √ neighborhood has led to the deviations in the quantities e0 , UD0 which did not exceed 1 − 3%. Figure 1 shows the variation of the axial values of the turbulence energy e0 (x) = e(x, 0, 0) and the computed axial values of velocity defect Ud (x, 0, 0) = UD (x) versus the distance from a body in the drag wake. The experimental data of Lin and Pao are represented by circle symbols, for homogeneous ﬂuid and ﬁlled circles correspond to stratiﬁed ﬂuid. The calculations of Hassid [9] are represented by dashed lines. As seen from Figure 1 the numerical model is in satisfactory agreement with the experimental data. The dynamics of the turbulent wakes in a linearly stratiﬁed ﬂuid are illustrated by Figures 2 — 5. Figure 2 shows the lines of equal energy e/e0 (t) = const, e0 (t) = e(t, 0, 0) obtained for the time values t/T = 1, 2, 4, 6 in the momentumless and drag wakes. These isolines are shown only in the ﬁrst quadrant of (y, z)−plane. It is possible to observe in Figure 2 that the region of turbulent wake behind a towed body is signiﬁcantly larger than the region of turbulent wake behind a self-propelled body. This phenomenon is explained by production of turbulent energy due to the gradient of longitudinal velocity vector components in drag wake. In the momentumless wake,

Numerical Model of Far Turbulent Wake

439

UD / U0

0.07

1/2

e0 / U0

Hassid Lin & Pao [7] Lin & Pao [7] present present

Nonstratified Fr=31 Nonstratified Fr=31

(a)

0.06

Hassid Lin & Pao [7] Lin & Pao [7] present present

Nonstratified Fr=31 Nonstratified Fr=31

(b)

0.25

0.2

0.05

0.04

0.15

0.03 0.1

0.02 0.05

0.01

0

0

10

20

30

40

50

60

70

80

90

x/D

100

0

0

50

150

100

x/D

Figure 1. Variation of the axis value of the turbulence energy e0 (x) = e(x, 0, 0) (a) and the defect of the longitudinal velocity component UD (x) = Ud (x, 0, 0) (b) versus the distance from the body

Figure 2. Isolines of turbulent energy e/e0 (t) = const, e0 (t) = e(t, 0, 0) in the momentumless and drag wakes for diﬀerent values of time; Fr = 280

the eﬀect of the longitudinal velocity vector component is insigniﬁcant. The anisotropy decay of axial values of the dispersion of turbulent ﬂuctuations of the horizontal and vertical velocity components in the case of drag and momentumless wakes is compared in Figure 3. Unfortunately, in the work of Lin and Pao [8] there is no experimental data on decay of dispersion of turbulent ﬂuctuations in the drag wake. Detailed comparison of the decay of intensities

440

N.P. Moshkin et al. −3

−3

10 〈 w’ 2 〉0

10 2 〈 w’ 〉0

momentumless

〈 v’ 2 〉

drag

〈 v’ 2 〉

0

0

−4

10

−4

10 −5

10

−5

2

−6

10

−7

10

〈 v’ 〉0, 2 〈 v’ 〉0, 2 〈 w’ 〉0, 2 〈 w’ 〉 , 0

−1

10

10

g=0 Fr =280 g=0 F =280 r

−6

0

10

1

t / T 10

10

2

〈 v’ 〉0, g=0 2 〈 v’ 〉0, Fr =280 2 〈 w’ 〉0, g=0 2 〈 w’ 〉0, Fr =280 −1

10

0

10

1

t / T 10

Figure 3. Anisotropy decay of axial values of the dispersion of turbulent ﬂuctuations of horizontal and vertical velocity components

of the turbulent ﬂuctuations with experimental data [8] is carried out in [1] for the case of the momentumless wake. As already it has been noted above, there is signiﬁcant distinction between evolution of axisymmetric turbulent wakes past towed and self-propelled bodies. In the wakes behind towed bodies in the homogeneous ﬂuid the production of turbulent energy due to the average ﬂow has signiﬁcant eﬀect. In experiments and computations an almost shearless ﬂow has been observed within momentumless wakes downstream at the distance of several dozens of the body diameter. In the wakes of self-propelled bodies the turbulence decay faster compared with wakes past towed bodies. As a result, in the wakes behind towed bodies in the stratiﬁed ﬂuid turbulence gives rise to mixing of greater volume of ﬂuid and action of gravity is a reason of generation of internal waves of greater amplitude [7,10] and greater pressure defect. This phenomenon can be illustrated by analysis of the behavior of the total turbulent energy and energy of internal waves in the wake cross-section + ∞ ∞ * ∗2 ∗2 4π 2 ρ1 V + W ∗2 ∗ ∗ ∗ ∗ ∗ + 2 e dy dz , Pt (t) = e∗ dy ∗ dz ∗ . Et (t) = 2 Fd 2 0 0 The variation of these quantities vs. time is shown in Figure 4. Solid lines correspond to the momentumless wake. The dash-doted lines correspond to the case of the wake past towed body. It can be seen that quantity Et∗ (t) monotonously decreases with time. The reason of this decreasing is the dissipation of turbulent energy into the heat under molecular viscosity action and the part of turbulent energy converts into the potential energy of internal waves. The magnitude of the total energy Pt∗ (t) of internal waves increases up to the values of time t/T 1. At longer periods of time Pt∗ (t) remains almost constant. The magnitude of the total energy Pt∗ (t) of internal waves increases up to the values of time t/T 1. Such a behavior of Pt∗ (t) and Et∗ (t) was pointed out in the case of the dynamics of the localized turbulent mixed region in stratiﬁed ﬂuids [11] and in the case of the momentumless

Numerical Model of Far Turbulent Wake

441

*

Et

−4

E*t

10

P* t −5

10

P* t

−6

10

−7

10

momentumless

drag −8

10

−1

0

10

1

10

10

t/T

Figure 4. Time variation of the values of the total kinetic turbulence energy Et∗ (t), and the energy of internal waves Pt∗ (t); Fr = 280 −6

∗

−6

x 10

〈 p1〉

6

t/T=1 t/T=3 t/T=5

momentumless

∗

〈 p1〉

6

4

4

2

2

0

0

−2

−2

−4

−4

−6

−6

t/T=1 t/T=3 t/T=5

drag

−8

−8 −10 0

x 10

10

20

30

40

50

∗ 60

y

−10 0

10

20

30

40

50

∗ 60

y

Figure 5. Time evolution of curve p1 ∗ (y ∗ , z0∗ , t), z0∗ = 2 in momentumless and drag wakes; Fr = 280

turbulent wake in stratiﬁed media [1]. Distinctions in dynamics of the drag and momentumless turbulent wakes in a linearly stratiﬁed ﬂuid can be shown by distribution of excess pressure in a wake. Figure 5 shows the time evolution (t/T = 1.0, 3.0, 5.0) of curves p1 ∗ (y ∗ , z0∗ , t), z0∗ = 2. It is necessary to note the wave behavior of ﬁeld p1 ∗ . The peak characteristics of p1 are essentially greater in comparison with the wake after self-propelled body. If the thrust of the body is not exactly equal to its drag then a deviation of momentum or excess momentum above zero momentum occurs in the wake. The calculation results show that the wake’s excess momentum of order of ±5 ÷ ±10% from the total momentum in drag wake has weak inﬂuence on the decay of the turbulent energy. As in the case of homogeneous ﬂuid, a more signiﬁcant inﬂuence of total excess momentum on the axis value of the defect of the longitudinal velocity component was observed. The internal waves generated by the wake behind towed body have essentially greater amplitude compared with waves behind self-propelled body or waves generated by the wake with small excess momentum.

442

N.P. Moshkin et al.

5 Conclusion The main results of the present work can be formulated as follows: •

• •

Based on three-dimensional parabolized system of averaged equations for motion, continuity, incompressibility, the energy of turbulence, the rate of dissipation transfer and the splitting method the numerical model of dynamics of turbulent wake behind towed body in linearly stratiﬁed ﬂuid has been constructed. The turbulent wake behind towed body has signiﬁcantly larger size and excess pressure compared with momentumless wake. The physical explanation of the observed eﬀects has been given. The constructed model is applied to numerical modelling of dynamics of turbulent wake with small total excess momentum in linearly stratiﬁed media.

6 Acknowledgments This work was partially supported by the Russian Foundation for the Basic Research (04-01-00209).

References 1. Chernykh, G. G., Voropayeva, O. F.: Numerical modeling of momentumless turbulent wake dynamics in a linearly stratiﬁed medium. Computers and Fluids, 28, 281–306 (1999) 2. Spedding, G. R.: Anisotropy in turbulence proﬁles of stratiﬁed wakes. Physics of Fluids, 13, 8, 2361–2372 (2001) 3. Gourlay, M. J., Arendt, S. C., Fritts, D. C. and Werne, J.: Numerical modeling of initially turbulent wakes with net momentum. Physics of Fluids, 13, 12, 3783–3802 (2001) 4. Dommermuth, D. G., Rottman, J. W., Innis, G. E. and Novikov, E. A.: Numerical simulation of the wake of a towed sphere in a weakly stratiﬁed ﬂuid. J. Fluid Mech, 473, 83–101 (2002) 5. Meunier, P., Spedding, G. R.: Stratiﬁed propelled wakes. J. Fluid Mech., 552, 229-256 (2006) 6. Rodi, W.: Examples of calculation method for ﬂow and mixing in stratiﬁed ﬂuids. J. Geophys. Res., 92, C5, 5305–5328 (1987) 7. Chernykh, G. G., Moshkin, N. P. and Voropayeva, O. F.: Numerical modeling of internal waves generated by turbulent wakes behind self-propelled and towed bodies in stratiﬁed media. In: Satofuka (Ed.), Proceedings of the First international Conference on Computational Fluid dynamics, ICCFD, Kyoto, Japan, 10-14 July 2000: Springer, 455–460 (2001) 8. Lin, J. T., Pao, Y. H.: Wakes in stratiﬁed ﬂuids. Annu. Rev. Fluid Mech., 11, 317–336 (1979)

Numerical Model of Far Turbulent Wake

443

9. Hassid, S.: Collapse of turbulent wakes in stratiﬁed media. J. Hydronautics, 14, 25–32 (1980) 10. Voropayeva, O. F., Moshkin, N. P., Chernykh, G. G.: Internal waves generated by turbulent wakes behind towed and self-propelled bodies in linearly stratiﬁed medium. Mat. Model., 12, 10, 77–94 (2000) (in Russian) 11. Chernykh, G. G., Lytkin, Y. M., Sturova, I. V.: Numerical simulation of internal waves induced by the collapse of turbulent mixed region in stratiﬁed medium, In: Proceedings of the International Symposium on Reﬁned Modeling of Flows, Paris, Ecole Nationale des ponts et chausses, 671–679 (1982)

A New Direction to Parallelize Winograd’s Algorithm on Distributed Memory Computers D. K. Nguyen1 , I. Lavallee2 , and M. Bui3 1 2 3

CHArt - Ecole Pratique des Hautes Etudes & Universit´e Paris 8, France [email protected] LaISC - Ecole Pratique des Hautes Etudes, France [email protected] LaISC - Ecole Pratique des Hautes Etudes, France [email protected]

Abstract Winograd’s algorithm to multiply two n × n matrices reduces the asymptotic operation count from O(n3 ) of the traditional algorithm to O(n2.81 ), hence on distributed memory computers, the combination of Winograd’s algorithm and the parallel matrix multiplication algorithms always gives remarkable results. Within this combination, the application of Winograd’s algorithm at the inter-processor level requires us to solve more diﬃcult problems but it leads to more eﬀective algorithms. In this paper, a general formulation of these algorithms will be presented. We also introduce a scalable method to implement these algorithms on distributed memory computers. This work also opens a new direction to parallelize Winograd’s algorithm based on the generalization of Winograd’s formula for the case where the matrices are partitioned into 2k parts (the case k = 2 gives us the original formula).

1 Introduction Matrix multiplication (MM) is one of the most fundamental operations in linear algebra and serves as the main building block in many diﬀerent algorithms, including the solution of systems of linear equations, matrix inversion, evaluation of the matrix determinant and the transitive closure of a graph. In several cases the asymptotic complexities of these algorithms depend directly on the complexity of matrix multiplication - which motivates the study of possibilities to speed up matrix multiplication. Also, the inclusion of matrix multiplication in many benchmarks points out its role as a determining factor for the performance of high speed computations. Strassen was the ﬁrst to introduce a better algorithm [14] for MM with O(N log2 7 ) than the traditional one (hereafter referred as T-algo) which needs O(N 3 ) operations. Then Winograd variant [16] of Strassen’s algorithm (hereafter referred as W-algo) has the same exponent but a slightly lower constant as the number of additions/subtractions is reduced from 18 down to 15. The

446

D.K. Nguyen et al.

record of complexity owed to Coppersmith and Winograd is O(N 2.376 ), resulted from arithmetic aggregation [5]. However, only W-algo and Strassen’s algorithm oﬀer better performance than T-algo for matrices of practical sizes, say, less than 1020 [11], hence in this paper, we concentralize only on the parallelization of W-algo. In fact, our method is applicable with all the fast matrix multiplication algorithms, which are always in the recursive form [13]. There have been mainly three approaches to parallelize W-algo. The ﬁrst approach is to use T-algo at the top level (between processors) and W-algo at the bottom level (within one processor). The most commonly algorithms used T-method between processors include 1D-systolic [7], 2D-systolic [7], Fox (BMR) [6], Cannon [1], PUMMA [3], BiMMeR [9], SUMMA [15], DIMMA [2]. Since W-algo is most eﬃcient for large matrices (because of the great diﬀerence of complexity between the operation multiplication and the operation addition/subtraction of matrix), it is well suited to use at the top level, not at the bottom level. The second approach is to use W-algo at both the top and the bottom level. The ﬁrst implementation applying this approach [4] on Intel Paragon reached better performance than T-algo. However, W-algo in [4] requires that the number of processors used in the computation to be a power of seven. This is a severe restriction since many MIMD computers use hypercube or mesh architecture and powers of seven numbers of processors are not a natural grouping. Therefore, the algorithm presented in [4] is not scalable. Moreover, it requires a large working space, with each matrix to be multiplied being duplicated 3 or 4 times. For these reasons, in [12] Luo and Drake explored the possibility of other parallel algorithms with more practical potential: they introduced an algorithm which uses W-algo at the top level and Fox algorithm at the bottom level. This is the ﬁrst work that represents the third approach: use W-algo at the top level (between processors) and T-algo at the bottom level (also between processors). To continue, an improvement is introduced in [8]: algorithm SUMMA is used in the place of Fox algorithm at the bottom level. The third approach is more complicated than the others, but it gives the scalable and eﬀective algorithms in multiplying large matrices [12]. In this paper, we will generalize these algorithms by using Cannon algorithm at the bottom level and show that the total running time for the Winograd-Cannon algorithm decreases when the recursion level r increases. This result is also correct when we replace Cannon algorithm at the bottom level with the other parallel MM algorithms. To use W-algo at the top level, the most signiﬁcant point is to determine the sub matrices after having recursively executed r time the Winograd’s formula (these sub matrices correspond to the nodes of level r in the execution tree of W-algo) and then to ﬁnd the resulting matrix from these sub matrices (corresponding to the process of backtracking the execution tree). It is easy to solve this problem for a sequential machine, but it’s much harder for a parallel machine. With a deﬁnite value of r, we can manually do it for r = 1, 2, 3, following [4,8,12], but the solution for the general case has not been found. In this paper, we present

A New Direction to Parallelize Winograd’s Algorithm

447

the method to determine all nodes at an unspeciﬁed level r in the execution tree of Winograd’s algorithm, and to deﬁne the relation between the resulting matrix and the sub matrices at the level recursion r; which allows us to calculate directly the resulting matrix from the sub matrices calculated by parallel matrix multiplication algorithms at the bottom level. By combining this result with a good storage map of sub matrices to processor, and with the parallel matrix multiplication algorithms based on T-algo (1D-systolic, 2D-systolic, Fox (BMR), Cannon, PUMMA, BiMMeR, SUMMA, DIMMA etc) we have a general scalable parallelization of Winograd’s algorithm on distributed memory computers. This result gives us a completely new direction to parallelize Winograd’s algorithm based on the generalization of Winograd’s formula for the case where the matrices are partitioned into 2k parts (the case k = 2 gives us the original formula).

2 Background 2.1 Winograd Algorithm We start by considering the formation of the matrix product Q = XY , where Q ∈ m×n , X ∈ m×k , and Y ∈ k×n . We will assume that m, n, and k are all even integers. By partitioning X00 X01 Y00 Y01 Q00 Q01 X= ,Y = ,Q = X10 X11 Y10 Y11 Q10 Q11 where

Qij ∈ 2 × 2 , Xij ∈ 2 × 2 , and Yij ∈ 2 × 2 , m

n

m

k

k

n

it can be shown [7, 16] that the following computations compute Q = XY : S0 = X10 + X11 S3 = X01 − S1 S6 = Y11 − Y01 M0 = S 1 S 5 M3 = S 2 S 6

S1 = S0 − X00 S4 = Y01 − Y00 S7 = S5 − Y10 M1 = X00 Y00 M4 = S 0 S 4 M6 = X11 S7

S2 = X00 − X10 S5 = Y11 − S4 M2 = X01 Y10 M5 = S3 Y11

T 0 = M0 + M 1 T 1 = T 0 + M 3 Q00 = M1 + M2 Q01 = T0 + M4 + M5 Q10 = T1 − M6 Q11 = T1 + M4 W-algo does the above computation recursively until one of the dimensions of the matrices is 1.

448

D.K. Nguyen et al.

2.2 Cannon Algorithm Cannon algorithm [1] is a commonly used parallel matrix multiply algorithm based on the T-algo. It can be used on any rectangular processor templates and on matrices of any dimensions [3]. For simplicity of discussion, we only consider square processor templates and square matrices. Suppose we have p2 processors logically organized in a p × p mesh. The processor in ith row and j th column has coordinates (i, j), where 0≤ i, j ≤ p-1. Let matrices X, Y , and Q be of size m × m. For simplicity of discussion we assume m is divisible by p. Let s = m/p. All matrices are partitioned into p × p blocks of s × s sub matrices. The block with coordinates (i, j) is stored in the corresponding processor with the same coordinates. With the addition of a link between processors on opposite sides of the mesh (a torus interconnection), the mesh can be thought of as composed of rings of processors both in the horizontal and vertical directions. Let Xij , Yij , Qij stand for the blocks of X, Y , Q respectively stored in the processor with coordinates (i, j). The following pseudo code describes the Cannon algorithm. The complete ith row of X is shifted leftward i times (i.e., Xij ← Xi,j+i ) The complete j th column of Y is shifted upward j times (i.e., Yij ← Yi+j,j ) Qij = Xij Yij for all processors (i, j) DO (p − 1) times Shift X leftwards and Y upwards (i.e., Xij ← Xi,j+1 ; Yij ← Yi+1,j ) Qij = Qij + Xij Yij for all processors ENDDO The running time of the Cannon algorithm consists of two parts: the communication time Tshif t and the computation time Tcomp . On the distributed memory computer, the communication time for a single message is T = α+βn, where α is the latency, β is the byte-transfer rate, and n is the number of bytes in the message. In the Cannon method, both matrices X and Y are shifted p times. There is a total of 2p shifts. The total latency is 2pα. In each shift a sub matrix of order (m/p × m/p) is passed from one processor to another, where m is the dimension of the matrices. Therefore the total byte transfer time is 2pβB (m/p)2 , where B is the number of bytes used to store one entry 2 of the matrices. The total communication time is Tshif t = 2pα + 2Bβ p m . The 2t

3 computation time is Tcomp = comp p2 m , where tcomp is the execution time for one arithmetic operation. Here we assume that ﬂoating point addition and multiplication has the same speed. The total running time is

T (m) =

2tcomp 3 2Bβ 2 m + 2pα. m + p2 p

(1)

A New Direction to Parallelize Winograd’s Algorithm

449

3 New Direction to Parallelize Winograd’s Algorithm 3.1 Winograd-Cannon algorithm and storage pattern of matrices The motivation for the Winograd-Cannon algorithm comes from the observation that W-algo is most eﬃcient for large matrices and therefore should be used at the top level (between processors) instead of the bottom level (within one processor). For a distributed memory parallel algorithm the storage map of sub matrices to processors is a primary concern. Here we have a pattern to store the matrices which is based on the result of Luo and Drake in [12]. Figures 1 and 2 show the pattern of storing matrix X with 6 x 6 blocks when the recursion level is 1. Figure 1 is from a matrix point of view. Figure 2 is from a processor point of view. Each processor stores one block of the four sub matrices. Figures 3 and 4 show the pattern when the recursion level is 2. The four sub matrices with 6 x 6 blocks are stored in the same pattern, as well as the 16 sub matrices with 3 x 3 blocks. This pattern can be easily replicated for higher recursion levels.

Figure 1. Matrix X with 6 × 6 blocks is distributed over a 3 × 3 processor template from a matrix point-of-view. The 9 processors are labeled from 0 to 8

Figure 2. Same as Figure 1, but from a processor point-of-view

These patterns of storing matrices make it possible for all the processors to act as one processor. Each processor has a portion of each sub matrix at each recursion level. The addition (or subtraction) of sub matrices performed in W-algo at all recursion levels can thus be performed in parallel without any inter processor communication.

450

D.K. Nguyen et al.

Figure 3. Matrix X with 12×12 blocks is distributed over a 3×3 processor template from a matrix point-of-view. The 9 processors are numbered from 0 to 8

Figure 4. Same as Figure 3, but from a processor point-of-view

Suppose the recursion level in W-algo is r. Let n = m/p, m0 = m/2, and n0 = m0 /p. Assume n, m0 , n0 ∈ N. Since there are 15 sub matrix additions and subtractions and 7 sub matrix multiplications in each recursion, the total running time for the Winograd-Cannon algorithm is: T (m) = 15Tadd

m m + 7T 2 2

(2)

where Tadd m 2 is the running time to add or subtract sub matrices of order m/2. Note that there are p2 processors running in parallel. Therefore Tadd m 2 = 2 (m 2 ) tcomp . Substituting ! p2 " 15tcomp m2 + 7T ( m 4p2 2)

the above formula into equation (2) we have T (m) = 15t

comp = sm2 + 7T ( m 2 ) where s = 4p2 . Use the above formula recursively to obtain ! " m 2 2 T (m) = sm2 + 7T ( m + 7T m = ··· 2 ) = sm + 7T s 2 4 (3) 7 r 1−( ) r = sm2 1−47 + 7r T (m0 ) ≈ 43 s 74 + 7r T (m0 ) 4

At the bottom level, the Winograd-Cannon algorithm uses the Cannon algorithm for sub matrix multiplications. Therefore we can use equation (1) to

A New Direction to Parallelize Winograd’s Algorithm

451

ﬁnd T (m0 ). Substituting the value of T (m0 ) and s we have !

r

Tm ≈ =

5( 74 ) tcomp 2 2tcomp 3 2Bβ 2 m + 7r p2 p2 m0 + p m0 r 7 r 2tcomp 3 5( 74 ) tcomp 7 r + 4 2pα 8 p2 m + p2

" + 2pα

(4)

Since the ﬁrst term is the dominant cubic term, the Winograd-Cannon algorithm should be faster than the Cannon algorithm when m is large enough. 3.2 Recursion Removal in Fast Matrix Multiplication In formula (4), we showed that the total running time for the WinogradCannon algorithm decreases when the recursion level r increases. The following part presents our method to determine all the nodes at the unspeciﬁed level r in the execution tree of Winograd’s algorithm and to determine the direct relation between the result matrix and the sub matrices at the level recursion r. We represent the Winograd’s formula: xij SX(l, i, j) × yij SY (l, i, j) ml = i,j=0,1

l = 0···6

6

and qij =

i,j=0,1

(5) ml SQ(l, i, j)

l=0

with

At the recursion level r, ml can be represented as in the following: xij SXk (l, i, j) × yij SYk (l, i, j) ml = i,j=0,n−1

l = 0...7k − 1 7k −1 ml SQk (l, i, j) and qij =

i,j=0,n−1

(6)

l=0

In fact, SX = SX1 , SY = SY1 , SQ = SQ1 . Now we have to determine values of matrices SXk , SYk , and SQk from SX1 , SY1 , and SQ1 . In order to obtain this, we extend the deﬁnition of tensor product in [10] for arrays of arbitrary dimensions as followed:

452

D.K. Nguyen et al.

Deﬁnition 1. Let A and B are arrays of same dimension l and of size m1 × m2 × . . . × ml , n1 × n2 × . . . × nl respectively. Then the tensor product (TP) is an array of same dimension and of size m1 n1 × m2 n2 × . . . × ml nl deﬁned by replacing each element of A with the product of the element and B. P = A⊗B where P [i1 , i2 , ..., il ] = A [k1 , k2 , ..., kl ] B [h1 , h2 , ..., hl ] , ij = kj nj + hj with ∀1 ≤ j ≤ l; n

Let P = ⊗ Ai = (...(A1 ⊗ A2 ) ⊗ A3 )... ⊗ An ) with Ai is array of dimension i=1

l and of size mi1 × mi2 × . . . × mil . The following theorem allows computing directly elements of P (see the proof of this theorem in appendix): Theorem 1. P [j1 , j2 , ..., jl ] =

n @

n

Ai [hi1 , hi2 , ..., hil ] where jk =

hsk

s=1

i=1

n @

mrk

r=s+1

Proof. We prove the theorem by induction. With n = 1, the proof is trivial. With n = 2, it is true by the deﬁnition. Suppose it is true with n − 1. We show that it is true with n. n−1 @ Ai [hi1 , hi2 , ..., hil ] where We have Pn−1 [t1 , t2 , ..., tl ] = i=1

tk =

n−1

* hsk

s=1

n−1 7

+ mrk

r=s+1

with ∀1 ≤ k ≤ l; and then Pn = Pn−1 ⊗ An . By deﬁnition Pn [j1 , j2 , ..., jl ] = Pn−1 [p1 , p2 , ..., pl ] An [hn1 , hn2 , ..., hnl ] n 7 = Ai [hi1 , hi2 , ..., hil ] i=1 n−1

n−1 @

where jk = pk mnk + hnk = mnk × mrk hsk s=1 r=s+1 n n n−1 n @ @ = mrk + hnk = mrk hsk hsk s=1

r=s+1

s=1

+ hnk

r=s+1

The theorem is proved. In particular, the theorem implies that if all Ai have the same size m1 × n @ m2 × . . . × ml , we have P [j1 , j2 , ..., jl ] = Ai [hi1 , hi2 , ..., hil ] where jk = n s=1

hsk mn−s k

i=1

.

A New Direction to Parallelize Winograd’s Algorithm

Remark 1. jk =

n s=1

453

hsk mn−s is a jk ’s factorization in base mk . We denote k

by a = a1 a2 ...al(b) the a’s factorization in base b. Since P [j1 , j2 , ..., jl ] =

n 7

Ai [hi1 , hi2 , ..., hil ]

i=1

then jk = hi1 hi2 ...hin(mk ) . Now we return to our algorithm. We have following theorem: Theorem 2. k

k

k

i=1

i=1

i=1

SXk = ⊗ SX, SYk = ⊗ SY, SQk = ⊗ SQ

(7)

Proof. We prove the theorem by induction. Clearly it is true with k = 1. Suppose it is true with k − 1. The algorithm’s execution tree is balanced with depth k and degree 7. According to (6), we have at the level k − 1 of the tree: M *l =

+

×

Xk−1,ij SXk−1 (l, i, j)

0≤i,j≤2k−1 −1 k−1

0≤l≤7

*

−1

+

Yk−1,ij SYk−1 (l, i, j)

0≤i,j≤2k−1 −1

Then according to (5) at the level k we have Ml [l ] = ** 0≤i ,j ≤1

0≤i ,j ≤1

0≤i,j≤2k−1 −1

**

0≤i ,j ≤1

+

Xk−1,ij [i , j ]SXk−1 (l, i, j) SX(l , i , j ) × +

+

Yk−1,ij [i , j ]SYk−1 (l, i, j) SY (l , i , j )

0≤i,j≤2k−1 −1 k−1

0≤l≤7 0 ≤ l ≤ 6 = * 0≤i ,j ≤1

+

−1

0≤i,j≤2k−1 −1

*

+

(8)

Xk−1,ij [i , j ]SXk−1 (l, i, j)SX(l , i , j ) × +

Yk−1,ij [i , j ]SYk−1 (l, i, j)SY (l , i , j )

0≤i,j≤2k−1 −1 k−1

0≤l≤7 0 ≤ l ≤ 6

−1

where Xk−1,ij [i , j ], Yk−1,ij [i , j ] are 2k ×2k matrices obtained by partitioning Xk−1,ij , Yk−1,ij into 4 sub matrices (i , j indicate the sub matrix’s quarter).

454

D.K. Nguyen et al.

We represent l, l& in the 'base 7, and i, j, i , j in the base 2 and note that Xk−1,ij [i , j ] = Xk ii2 , jj2 . Then (8) becomes M [ll (7) ] = ⎛ ⎝ 0≤ii

⎛

(2)

,jj

Xk [ii (2) , jj (2) ]SXk−1 (l, i, j)SX(l , i , j )⎠ × ≤2k−1 −1

⎝ 0

(2)

⎞

⎞

(9)

Yk [ii (2) , jj (2) ]SYk−1 (l, i, j)SY (l , i , j )⎠

0≤ii (2) ,jj (2) ≤2k−1 −1 ≤ ll (7) ≤ 7k−1 − 1

In addition, we have directly from (6): M ⎛ [ll (7) ] = ⎝ 0≤ii

⎛ ⎝ 0

(2)

,jj

(2)

!

"

⎞

Xk [ii (2) , jj (2) ]SXk ll (7) , ii (2) , jj (2) ⎠ × ≤2k−1 −1

!

"

⎞

(10)

Yk [ii (2) , jj (2) ]SYk ll (7) , ii (2) , jj (2) ⎠

0≤ii (2) ,jj (2) ≤2k−1 −1 ≤ ll (7) ≤ 7k−1 − 1

Compare (9) and (10) we have ! " SXk ll7 , ii2 , jj2 = SXk−1 (l, i, j) SX (l , i , j ) ! " SYk ll7 , ii2 , jj2 = SYk−1 (l, i, j) SY (l , i , j ) By deﬁnition, we have k

SXk = SXk−1 ⊗ SX = ⊗ SX i=1 k

SYk = SYk−1 ⊗ SY = ⊗ SY i=1

Similarly k

SQk = SQk−1 ⊗ SQ = ⊗ SQ i=1

The theorem is proved. According to Theorem 2 and Remark 1 we have SXk (l, i, j) = =

k @ r=1

k @

SX (lr , ir , jr ), SYk (l, i, j)

r=1

SY (lr , ir , jr ), SQk (l, i, j) =

k @ r=1

(11) SQ (lr , ir , jr )

A New Direction to Parallelize Winograd’s Algorithm

455

Apply (11) in (6) we have nodes leafs ml and all the elements of result matrix. To implement a fast matrix multiplication algorithm on distributed memory computers, we stop at the recursion level r and according to (11) and (6), we have the entire corresponding sub matrices: + * r 7 Ml = SX (lt , it , jt ) Xij i, j = 0, 2r − 1 ×

t=1

* Yij

i, j = 0, 2 − 1

r 7

+ SY (lt , it , jt )

(12)

t=1

r

l = 0...7r − 1 with

⎞ ... xi∗2k−r ,j∗2k−r +2k−r −1 xi∗2k−r ,j∗2k−r ⎠ ... ... Xij = ⎝ ... ... x x k−r k−r k−r k−r k−r k−r k−r i∗2 +2 −1,j∗2 +2 −1⎞ ⎛ i∗2 +2 −1,j∗2 ... yi∗2k−r ,j∗2k−r +2k−r −1 yi∗2k−r ,j∗2k−r ⎠ ... ... Yij = ⎝ ... ... y y k−r k−r k−r k−r k−r k−r k−r i∗2 +2 −1,j∗2 +2 −1 ⎞ ⎛ i∗2 +2 −1,j∗2 ... qi∗2k−r ,j∗2k−r +2k−r −1 qi∗2k−r ,j∗2k−r ⎠ ... ... Qij = ⎝ ... qi∗2k−r +2k−r −1,j∗2k−r ... qi∗2k−r +2k−r −1,j∗2k−r +2k−r −1 i = 0, 2r − 1, j = 0, 2r − 1 ⎛

Because of the storage map of⎛ sub matrices to processors that we⎞have ⎜ ⎟ r ⎜ ⎟ @ ⎜ just presented, the submatrices ⎜ Xij SX (lt , it , jt ) ⎟ ⎟ and t=1 ⎝ i = 0, 2r − 1 ⎠ r j = 0, 2 − 1 ⎛ ⎞ ⎜ ⎟ r ⎜ ⎟ @ ⎜ ⎟ are locally determined within each Y SY (l , i , j ) ij t t t ⎜ ⎟ t=1 ⎝ i = 0, 2r − 1 ⎠ j = 0, 2r − 1 processor. Their product Ml will be calculated by parallel algorithms based on T-algo like Fox algorithm, Cannon algorithm, SUMMA, PUMMA, DIMMA etc. Finally, according to (11) and (6) we have directly sub matrix elements of resulting matrix by applying matrix additions instead of backtracking manually the recursive tree to calculate the root as in [8, 12]: r r r 7 −1 7 −1 @ (13) Ml SQr (l, i, j) = Ml SQ (lt , it , jt ) Qij = l=0

l=0

t=1

456

D.K. Nguyen et al.

Performance tests for all the algorithms discussed here are carried out on a Fujitsu Siemens Computers/hpcLine, 16 nodes (32 CPUs), switched Ethernet 1Gbit. Each processor is an Intel Xeon 2.4 GHz with 2GB memory. All the algorithms discussed here are easily scalable to any number of processors and matrices of any dimensions. For simplicity only the results from the square processor templates and square matrices are presented. The entries of all the matrices tested here are random double precision numbers uniformly distributed between -1 and 1. Here we can see that the Winograd-Cannon (Fox) is very much faster than the Cannon method when the matrix size is large enough.

4 Conclusion We have presented a general scalable parallelization for all matrix multiplication algorithms on distributed memory computers that uses Winograd’s algorithm at the inter-processor level. The running time for these algorithms decreases when the recursion level increases, hence this general solution enables us to ﬁnd optimal algorithms (which correspond to a deﬁnite value of the recursive level and a deﬁnite parallel matrix multiplication algorithm at the bottom level) for all particular cases. From a diﬀerent point of view, we generalized the formula of Winograd for the case where the matrices are partitioned into 2k parts (the case k = 2 gives us the original formula), thus we have a completely new direction to parallelize Winograd’s algorithm. In addition, we applied the ideas presented in this paper to generalize the algorithm presented in [4].

References 1. L. E. Cannon. A cellular computer to implement the kalman ﬁlter algorithm. Ph.d. thesis, Montana State University, 1969.

A New Direction to Parallelize Winograd’s Algorithm

457

2. J. Choi. A fast scalable universal matrix multiplication algorithm on distributedmemory concurrent computers. In 11th International Parallel Processing Symposium, pages 310–317, Geneva, Switzerland, April 1997. IEEE CS. 3. J. Choi, J. J. Dongarra, and D. W. Walker. Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers. Concurrency: Practice and Experience, 6(7):543–570, 1994. 4. C.-C. Chou, Y. Deng, G. Li, and Y. Wang. Parallelizing strassen’s method for matrix multiplication on distributed memory mimd architectures. Computers and Math. with Applications, 30(2):4–9, 1995. 5. D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9(3):251–280, 1990. 6. G. Fox, S. Otto, and A. Hey. Matrix algorithms on a hypercube i: Matrix multiplication. Parallel Computing, 4:17–31, 1987. 7. G. H. Golub and C. F. V. Loan. Matrix Computations. Johns Hopkins University Press, 2nd edition, 1989. 8. B. Grayson, A. Shah, and R. van de Geijn. A high performance parallel Strassen implementation. Parallel Processing Letters, 6(1):3–12, 1996. 9. S. Huss-Lederman, E. M. Jacobson, A. Tsao, and G. Zhang. Matrix multiplication on the intel touchstone delta. Concurrency: Practice and Experience, 6(7):571–594, 1994. 10. B. Kumar, C.-H. Huang, R. W. Johnson, and P. Sadayappan. A tensor product formulation of Strassen’s matrix multiplication algorithm. Applied Mathematics Letters, 3(3):67–71, 1990. 11. J. Laderman, V. Y. Pan, and H. X. Sha. On practical algorithms for accelerated matrix multiplication. Linear Adgebra and Its Applications, 162:557–588, 1992. 12. Q. Luo and J. B. Drake. A scalable parallel Strassen’s matrix multiplication algorithm for distributed memory computers. In Proceedings of the 1995 ACM symposium on Applied computing, pages 221 – 226, Nashville, Tennessee, United States, 1995. ACM Press. 13. V. Y. Pan. How can we speed up matrix multiplication? SIAM Review, 26(3):393–416, 1984. 14. V. Strassen. Gaussian elimination is not optimal. Numer. Math., 13:354–356, 1969. 15. R. van de Geijn and J. Watts. Summa: Scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience, 9(4):255–274, April 1997. 16. S. Winograd. On multiplication of 2 x 2 matrices. Linear Algebra and its Applications, 4:381–388, 1971.

Stability Problems in ODE Estimation Michael R. Osborne Mathematical Sciences Institute, Australian National University, Canberra 0200, Australia [email protected]

Abstract The main question addressed is how does the stability of the underlying diﬀerential equation system impact on the computational performance of the two major estimation methods, the embedding and simultaneous algorithms. It is shown there is a natural choice of boundary conditions in the embedding method, but the applicability of the method is still restricted by the requirement that this optimal formulation as a boundary value problem be stable. The most attractive implementation of the simultaneous method would appear to be the null space method. Numerical evidence is presented that this is at least as stable as methods that depend on stability of the boundary value formulation.

1 Introduction The description of the estimation problem begins with a system of diﬀerential equations depending explicitly on a ﬁxed vector of parameters together with data obtained by sampling solution trajectories typically in the presence of noise. The system of diﬀerential equations is written: dx = f (t, x, β) , dt

(1)

where the state vector x ∈ Rm , the parameter vector β ∈ Rp , and it is assumed that f ∈ R × Rm × Rp → Rm is smooth enough. The data is assumed to have the form: yi = Hx(ti , β ∗ ) + εi ,

i = 1, 2, · · · , n, (2) where H : Rm → Rk , and the observational error εi ∼ N 0, σ 2 I . The problem is to estimate β by making use of the given data and the structural information contained in the diﬀerential equation statement. An alternative formulation of the estimation problem as a smoothing problem by incorporating the parameter vector into the state vector is also useful in certain

460

M.R. Osborne

circumstances. This approach expands the system of diﬀerential equations by making the substitutions:

x(t) f (t, x) x← ,f ← (3) β 0 The standard estimation methods of least squares and maximum likelihood are equivalent in this problem context. The basic idea is that β is to be estimated by minimizing the objective: F (xc , β) =

n

yi − Hx (ti , β) 2

(4)

i=1

over all allowable values of the state variables x(ti , β), i = 1, 2, · · · , n. Methods diﬀer in the manner of generating these comparison function values. Two well deﬁned classes are considered here. 1. Embedding method: The diﬀerential equation solutions are restricted to the class of boundary value problems satisfying the conditions: dx = f (t, x, β), dt

B0 x(0) + B1 x(1) = b.

(5)

Here the boundary matrices B0 , B1 are imposed and b becomes an extra vector of parameters to be determined. The boundary matrices must be chosen in such a way that the boundary value problem has a well determined solution for the range of parameter values of interest. These methods require that the boundary value problem be solved explicitly each time a new value of the state variable is required. 2. Simultaneous method: The idea here is that diﬀerential equation discretization information is incorporated as explicit constraints on the state variables leading to a constrained optimization problem. In the case of the trapezoidal rule this gives ci (xc ) = xi+1 − xi −

h (f i+1 + f i ) = 0, 2

i = 1, 2, · · · , n − 1,

(6)

with xi = x(ti , β), xc the composite vector with sub-vector components xi , and h the discretization mesh spacing. A feature of these methods is that the state and parameter vectors are corrected simultaneously. Mesh selection for integrating the ODE system or deﬁning the constraint equations would typically take the data points {ti , i = 1, 2, · · · , n} as a starting conﬁguration. These could be expected to be required to cluster in regions where the solution trajectory is changing rapidly. Their choice is further conditioned by two important considerations: •

The asymptotic analysis of the eﬀects of noisy data on the parameter estimates shows that this gets small typically no faster than O n−1/2 .

Stability Problems

•

461

It is not diﬃcult to obtain diﬀerential equation discretizations that give errors at most O n−2 .

This suggests that selection of the data points is a more serious consideration than reducing discretization error. Consequences include: • •

That the trapezoidal rule provides an adequate integration method. As linear interpolation has an accuracy comparable with the trapezoidal rule it should be easily possible to integrate the diﬀerential equation on a mesh coarser than that provided by the observation points .

The basic assumption made is that the estimation problem has a well determined solution for n, the number of observations, large enough. This requirement takes slightly diﬀerent forms for the two problem approaches. It becomes a stability requirement for the boundary formulation in the embedding method. This is discussed in the next section where it is shown that an “optimal” choice of boundary matrices is possible. However, the connection between stability and dichotomy suggest possible limitations to the embedding method. In the third section it is shown that the simultaneous method is capable of a number of implementations and that these can give rise to different stability considerations. It is concluded that there is likely a preferred implementation

2 ODE Stability The basic idea is that a system is stable if small changes to its inputs leads to small changes in its outputs. Computational considerations enter through the requirement that the discretized scheme mimic the structural properties of the original. Also this requirement could hold for all discretization scales or only for those scales small enough. These cases could be summarized as types of structurally stable discretization. In addition, in suitably controlled circumstances, it may be possible to obtain useful information by applying computational schemes to follow bounded solutions in unstable situations. Control is needed because even if the desired solution could be followed precisely in exact arithmetic it is likely unstable modes will be introduced by rounding errors and eventually swamp the computation. This is an example of numerical instability. Initial value stability (IVS) Here the problem considered is: dx = f (t, x) , dt

x(0) = b.

The classical stability requirement is that solutions with close initial conditions x1 (0), x2 (0) remain close in an appropriate sense. For example:

462

• •

M.R. Osborne

x1 (t) − x2 (t) → 0, t → ∞. Strong IVS. x1 (t) − x2 (t) remains bounded as t → ∞. Weak IVS.

In this context structurally stable discretizations which place only weak conditions on the discretization scale are described as stiﬄy stable. Numerical instability is an important consideration in multiple shooting [6]. Control must be exercised to ensure reasonably accurate fundamental matrices can be computed over short enough time intervals. Example 1. Constant coeﬃcient case: Here f (t, x) = Ax − q If the constant matrix A is non-defective then weak IVS requires that the eigenvalues λi (A) satisfy Reλi ≤ 0, while this inequality must be strict for strong IVS. A one-step discretization of the ODE (ignoring the q contribution) can be written xk+1 = Th (A) xk . where Th (A) is the ampliﬁcation matrix. Here a stiﬀ discretization requires the stability inequalities to map into the condition |λi (Th ) | ≤ 1. For the trapezoidal rule 1 + hλi (A)/2 , |λi (Th )| = 1 − hλi (A)/2 ≤ 1 if Re {λi (A)} ≤ 0.

Boundary value stability (BVS) Here the problem is dx = f (t, x) , B (x) = B0 x(0) + B1 x(1) = b. dt Behaviour of perturbations about a solution trajectory x∗ (t) is governed to ﬁrst order by the linearized equation dz − ∇x f (t, x∗ (t)) z = 0. (7) dt Here stability is closely related to the existence of a modest bound for the Green’s matrix: L (z) =

G (t, s) = Z(t) [B0 Z(0) + B1 Z(1)]

−1

= −Z(t) [B0 Z(0) + B1 Z(1)]

B0 Z(0)Z −1 (s),

−1

B1 Z(1)Z

−1

(s),

t > s, t < s.

Where Z(t) is a fundamental matrix for the linearised equation (7). Let α be a bound for |G(t, s)|. The dependence of this stability bound on the behaviour of the possible solutions Zd of (7) is explained by the idea of dichotomy:

Stability Problems

463

Deﬁnition 1. Dichotomy (weak form): ∃ projection P depending on the choice of Z such that, given S1 ← {ZP w, w ∈ Rm } ,

S2 ← {Z (I − P ) w, w ∈ Rm } ,

it follows that |φ(t)| ≤ κ, |φ(s)| |φ(t)| φ ∈ S2 ⇒ ≤ κ, |φ(s)|

φ ∈ S1 ⇒

t ≥ s, t ≤ s.

These conditions can always be satisﬁed if t, s ∈ [0, 1] . The computational context requires modest κ. If Z satisﬁes B0 Z(0) + B1 Z(1) = I then P = B0 Z(0) is a suitable projection in the sense that for separated boundary conditions an allowable setting is κ = α. There is a basic equivalence between stability and dichotomy. The key paper is [2]. BVS has implications for the structural stability of possible discretizations. • • •

The dichotomy projection separates increasing and decreasing solutions. Compatible boundary conditions pin down decreasing solutions at 0, growing solutions at 1. Discretization needs similar property so that the given boundary conditions exercise the same control. This requires solutions of (7) which are increasing (decreasing) in magnitude to be mapped into solutions of the discretization which are increasing (decreasing) in magnitude.

This property is called di-stability in [3]. They note that the trapezoidal rule is di-stable in the constant coeﬃcient case. 1 + hλ(A)/2 > 1. λ(A) > 0 ⇒ 1 − hλ(A)/2 Example 2. The importance of compatible boundary conditions is well illustrated by the following diﬀerential equation [1]. ⎡ ⎤ 1 − 19 cos 2t 0 1 + 19 sin 2t ⎦, 0 19 0 A(t) = ⎣ (8) −1 + 19 sin 2t 0 1 + 19 cos 2t ⎡ t ⎤ e (−1 + 19 (cos 2t − sin 2t)) ⎦. −18et q(t) = ⎣ (9) et (1 − 19 (cos 2t + sin 2t)) Here the right hand side is chosen so that z(t) = et e satisﬁes the diﬀerential equation. The fundamental matrix displays the fast and slow solutions:

464

M.R. Osborne

⎤ e−18t cos t 0 e20t sin t ⎦. 0 0 e19t Z(t, 0) = ⎣ −18t sin t 0 e20t cos t −e ⎡

For boundary data with two terminal conditions and one initial condition: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 000 100 e B0 = ⎣ 0 0 0 ⎦ , B1 = ⎣ 0 1 0 ⎦ , b = ⎣ e ⎦ , 100 000 1 the trapezoidal rule discretization scheme gives the following results. These computations are apparently satisfactory. Table 1. Boundary point values – stable computation ∆t = .1 ∆t = .01 x(0) 1.0000 .9999 .9999 1.0000 1.0000 1.0000 x(1) 2.7183 2.7183 2.7183 2.7183 2.7183 2.7183

In contrast, for two initial and one terminal condition: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 001 000 1 B0 = ⎣ 0 0 0 ⎦ , B1 = ⎣ 0 1 0 ⎦ , b = ⎣ e ⎦ . 100 000 1 The results are given in following Table. The eﬀects of instability are seen clearly in the ﬁrst and third solution components. Table 2. Boundary point values – unstable computation ∆t = .1 ∆t = .01 x(0) 1.0000 .9999 1.0000 1.0000 1.0000 1.0000 x(1) -7.9+11 2.7183 -4.7+11 2.03+2 2.7183 1.31+2

Nonlinear stability There are well known examples of forms of stability associated with systems of diﬀerential equations which cannot be classiﬁed as BVS. Any realization of (7) in which the labelling of solutions as fast or slow cannot be done unambiguously over the interval of interest, and which clearly has a local stability property provides a counterexample. One fruitful source corresponds to systems with stable limit cycles.

Stability Problems

465

Example 3. The FitzHugh-Nagumo equations V3 dV =γ V − +R , dt 3

(10)

dR 1 = − (V − α − βR) . dt γ

(11)

The limit cycle is exempliﬁed in the case α = .2, β = .2, γ = 1. in ﬁgure 1. Figure 2 gives the sum of squares of discrepancies between this solution and the solution for perturbed values of the α and β parameters. It shows that the minimum is well determined in a neighbourhood of the target values, but it also shows that there are deﬁnite restrictions on the size of this neighbourhood, and that changes in solution structure would render global searching very diﬃcult. These ﬁgures are taken from [8].

Figure 1. Limit cycle trajectory

This example can be solved numerically as a boundary value problem by transforming the range of a complete cycle to [0, 1], introducing the unknown range as an extra variable as in the smoothing approach, imposing periodic boundary conditions, and using (11) to impose a zero derivative condition at one boundary to ﬁx the extra unknown. Thus it does not show a severe instability.

3 The Embedding Method First problem is to set suitable boundary conditions. Expect good boundary conditions should lead to a relatively well conditioned linear system.

466

M.R. Osborne

Figure 2. Objective function as function of α and β

Assume the ODE discretization is ci (xi , xi+1 ) = cii (xi ) + ci(i+1) (xi+1 ). Consider the factorization of the diﬀerence equation (gradient) matrix C = ∇x cc with ﬁrst column permuted to end: ⎡

C12 ⎢ C21 C22 ⎢ ⎣

C11

⎤

⎥ U V ⎥→Q ⎦ 0 ··· H G

(12)

C(n−1)(n−1) C(n−1)n 0 This step is independent of the boundary conditions.

Inserting the boundH G ary conditions gives the system with matrix to solve for x1 , xn . B1 B0 Orthogonal factorization again provides a useful strategy.

1 2 1 2 S1T HG = L0 S2T It follows that the system determining x1 , xn is best conditioned by choosing 2 1 B1 B0 = S2T . (13) These conditions depend only on the diﬀerential equation. For the Mattheij example (8) the “optimal” boundary matrices for h = .1 are given in Table 3. These conﬁrm the importance of weighting the boundary data to reﬂect the

Stability Problems

467

Table 3. Optimal boundary matrices when h = .1 B1 B2 .99955 0.0000 .02126 -.01819 0.0000 -.01102 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 .02126 0.0000 .00045 .85517 0.0000 .51791

stability requirements of a mix of fast and slow solutions. The solution does not diﬀer from that obtained when the split into fast and slow was correctly anticipated. Example 4. Solution of the embedding problem would typically use the GaussNewton method [7]. Consider the modiﬁcation of the Mattheij problem (8) with parameters β1∗ = γ, and β2∗ = 2 corresponding to the solution x (t, β ∗ ) = et e: ⎡ ⎤ 1 − β1 cos β2 t 0 1 + β1 sin β2 t ⎦, 0 0 β1 A(t) = ⎣ −1 + β1 sin β2 t 0 1 + β1 cos β2 t ⎤ ⎡ t e (−1 + γ (cos 2t − sin 2t)) ⎦. −(γ − 1)et q(t) = ⎣ t e (1 − γ (cos 2t + sin 2t)) In the numerical experiments optimal boundary conditions are set at the ﬁrst iteration. The aim is to recover estimates of β ∗ , b∗ from simulated data eti He+εi , εi ∼ N (0, .01I) using Gauss-Newton, stopping when ∇F h < 10−8 . Results are given in Table 4. There is relatively little change observed in the 1 2 1 2T optimum boundary conditions (13) as B1 B2 1 B1 B2 k − IF < 10−3 , k > 1. Thus no updating was deemed to be necessary. Table 4. Embedding method: Gauss-Newton results for the Mattheij problem

1 2 .5 0 .5 H = 1/3 1/3 1/3 H= 0 1 0 n = 51, γ = 10, σ = .1 n = 51, γ = 10, σ = .1 14 iterations 5 iterations n = 51, γ = 20, σ = .1 n = 51, γ = 20, σ = .1 11 iterations 9 iterations n = 251, γ = 10, σ = .1 n = 251, γ = 10, σ = .1 9 iterations 4 iterations n = 251, γ = 20, σ = .1 n = 251, γ = 20, σ = .1 8 iterations 5 iterations

468

M.R. Osborne

4 The Simultaneous Method Associated with the equality constrained problem is the Lagrangian L = F (xc ) +

n−1

λTi ci .

(14)

i=1

The necessary conditions for a stationary point give: ∇xi L = 0, i = 1, 2, · · · , n,

c (xc ) = 0.

The Newton equations determining corrections dxc , dλc are: ∇2xx Ldxc + ∇2xλ Ldλc = −∇x LT ,

(15)

∇x c (xc ) dxc = Cdxc = −c (xc ) ,

(16)

Note sparsity! ∇2xx L is block diagonal, ∇2xλ L = C T is block bidiagonal. The Newton equations also correspond to necessary conditions for the quadratic program: 1 min ∇x F dxc + dxTc M dxc ; dx 2

c + Cdxc = 0,

in case M = ∇2xx L, λu = λc + dλc [5]. A standard approach is to use the constraint equations to eliminate variables (see [4] and references given there). This can use the factorization (12) to give dxi = vi + Vi dx1 + Wi dxn ,

i = 2, 3, · · · , n − 1.

The reduced constraint equation is Gdx1 + Hdxn = w. This variable elimination would appear to be restricted by BVS considerations; but there is an alternative approach called the null space method in [5]. 2 U 1 T Let C = Q1 Q2 then the Newton equations (15), (16) can be written 0

⎤ ⎡

T

T U T 2 Q ∇x F T c ⎣ Q ∇xx LQ 0 ⎦ Q dx . = − 1 T 2 λu c U 0 0 These can be solved in sequence U T QT1 dxc = −c, QT2 ∇2xx LQ2 QT2 dxc = −QT2 ∇2xx LQ1 QT1 dxc − QT2 ∇x F T , U λu = −QT1 ∇2xx Ldxc − QT1 ∇x F T . A direct stability test is possible using the Mattheij problem data (8) as QT1 dxc estimates QT1 vec {exp ti } when xc = 0. Computed and exact results are compared in Table 5.

Stability Problems

469

Table 5. Stability test: comparison of exact and computed values test results n = 11 .87665 -.97130 -1.0001 .74089 -1.0987 -1.3432 .47327 -1.2149 -1.6230 .11498 -1.3427 -1.8611 -.32987 -1.4839 -2.0366 -.85368 -1.6400 -2.1250 -1.4428 -1.8125 -2.1018 -2.0773 -2.0031 -1.9444 -2.7309 -2.2137 -1.6330 -3.3719 -2.4466 -1.1526

particular integral QT1 x .87660 -.97134 -1.0001 .74083 -1.0988 -1.3432 .47321 -1.2150 -1.6231 .11491 -1.3428 -1.8612 -.32994 -1.4840 -2.0367 -.85376 -1.6401 -2.1250 -1.4429 -1.8125 -2.1019 -2.0774 -2.0032 -1.9444 -2.7310 -2.2138 -1.6331 -3.3720 -2.4467 -1.1527

5 In Conclusion • •

•

Embedding makes use of carefully constructed, explicit boundary conditions. Thus BVS restrictions must apply. The variable eliminations form of the simultaneous method partitions variables into sets {x1 , xn }, and {x2 , · · · , xn−1 } which are found in a sequential order corresponding to a ﬁxed pivoting sequence. This approach relies implicitly on a form of BVS . 5 T 6 The 5 T null 6 space variant partitions the variables into the sets Q1 xc , Q2 xc . It appears at least as stable as the variable elimination procedure. Sparsity preserving implementation is straightforward.

Acknowledgment I am indebted to Giles Hooker and Jim Ramsay for their preprint [8] which gives a rather diﬀerent approach to the estimation problem, and for permission to use the insightful FitzHugh-Nagumo ﬁgures taken from this paper.

References 1. U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical solution of boundary value problems for ordinary diﬀerential equations, SIAM, Philadelphia, 1995. 2. F. R. de Hoog and R. M. M. Mattheij, On dichotomy and well-conditioning in bvp, SIAM J. Numer. Anal. 24 (1987), 89–105. 3. R. England and R. M. M. Mattheij, Boundary value problems and dichotomic stability, SIAM J. Numer. Anal. 25 (1988), 1037–1054. 4. Z. Li, M. R. Osborne, and T. Prvan, Parameter estimation of ordinary diﬀerential equations, IMA J. Numer. Anal. 25 (2005), 264–285. 5. J. Nocedal and S. J. Wright, Numerical optimization, Springer Verlag, 1999.

470

M.R. Osborne

6. M. R. Osborne, On shooting methods for boundary value problems, J. Math. Analysis and Applic. 27 (1969), 417–433. 7. M. R. Osborne, An approach to parameter estimation and model selection in differential equations, Modeling, Simulation and Optimization of Complex Processes (H. G. Bock, E. Kostina, H. X. Phu, and R. Rannacher, eds.), Springer, 2005, pp. 393–408. 8. J. O. Ramsay, G. Hooker, C. Cao, and C. Campbell, Estimating diﬀerential equations, preprint, Department of Psychology, McGill University, Montreal, Canada, 2005, p. 40.

A Fast, Parallel Performance of Fourth Order Iterative Algorithm on Shared Memory Multiprocessors (SMP) Architecture M. Othman1∗ and J. Sulaiman2 1

2

Department of Communication Technology and Network, University Putra Malaysia, 43400 UPM Serdang, Selangor D.E., Malaysia [email protected] School of Science and Technology, Universiti Malaysia Sabah, Locked Bag 2073, 88999 Kota Kinabalu, Sabah, Malaysia [email protected]

Abstract The rotated fourth order iterative algorithm of O(h4 ) accuracy which was applied to the linear system was introduced by Othman et al. [OTH01] and it was shown to be the fastest compared to the standard fourth order iterative algorithm. Meanwhile the parallel standard fourth order iterative algorithms with diﬀerence strategies were implemented successfully by many researchers for solving large scientiﬁc and engineering problems. In this paper, the implementation of the parallel rotated fourth order iterative algorithm on SMP architecture is discussed. The performance results of all the parallel algorithms were compared in order to show their outstanding performances.

1 Introduction The parallel fourth order iterative algorithm which incorporate the standard fourth order scheme known as compact high order scheme for solving a large and sparse linear system was implemented successfully by many researchers, see [YOU95, SPO98]. One of the most outstanding parallel algorithm was proposed by Spotz, et al., [SPO98]. Theoretically, the standard fourth order scheme of O(h4 ) was derived by Coltaz, [COL60]. Based on the scheme, several experiments were carried out and the results showed it has high accuracy, [GUP84]. In 2001, Othman et al. derived a new nine points scheme, known as a rotated fourth order scheme for solving the 2D Poisson’s equation. From the experimental results, they found that the new scheme ∗

The author is also an associate researcher at the Laboratory of Computational Sciences and Informatics, Institute of Mathematical Science Research (INSPEM), University Putra Malaysia

472

M. Othman and J. Sulaiman

has a drastic improvement in execution time and relatively good accuracy compared to the standard fourth order scheme, see [OTH01, ALI02, ZHA02].

2 Derivation of A Rotated Fourth Order Scheme Let us consider the 2D Poisson’s equation, which can be represented mathematically as (1) uxx + uyy = f (x, y), for all (x, y) ∈ Ω h subject to the Dirichlet boundary condition and satisfying the exact solution, u(x, y) = g(x, y) for (x, y) ∈ Ω h . The discretization resulted in a large and sparse linear system. Hence, the iterative method is the best approach for solving such a linear system. Consider Eq. (1) on a unit square, Ω h with the grid spacing h in both directions, xi = x0 +ih and yj = y0 +jh for all i, j = 0, 1, . . . , n. Assume that due to the continuity of u(x, y) on Ω h . Based on the cross orientation approximation and central diﬀerence formula, the displacements i and j which correspond √ with ∆x and ∆y respectively change to 2h. Eq. (2) can be approximated at any points u(xi , yj ) using the ﬁnite diﬀerence formula and yields ui+1,j+1 + ui−1,j+1 + ui+1,j−1 + ui−1,j−1 + 4ui,j ∼ = 2h2 (uxx + uyy )i,j h4 + 6 (uxxxx + 6uxxyy + uyyyy )i,j + O(h6 ).

(2)

Eq. (2) is known as a rotated ﬁve points scheme of O(h2 ) provided the second and third terms on the right side are ignored. Since the accuracy of the scheme is not good, it is better to derive a higher order of accuracy. Again the ﬁnite diﬀerence formula is used to derive a high order approximation. By taking the width 2h, approximation to Eq. (1) at the point u(xi , yj ) takes the form, ui+2,j + ui−2,j + ui,j−2 + ui,j+2 + 4ui,j ∼ = 4h2 (uxx + uyy )i,j 4h4 + 3 (uxxxx + uyyyy )i,j + O(h6 ).

(3)

Multiplying Eq. (2) by 4 and adding it with Eq. (3), the result becomes ui+2,j + ui−2,j + ui,j−2 + ui,j+2 +4(ui+1,j+1 + ui−1,j+1 + ui+1,j−1 + ui−1,j−1 ) − 20ui,j ∼ = 12h2 (uxx + uyy )i,j + 2h4 (uxxxx + uyyyy )i,j + 4h4 uxxyy + O(h6 ).

(4)

The double derivatives of Eq. (1) with respect to x and y, are represented by (uxxxx + uxxyy )i,j = (fxx )i,j

(5)

(uxxyy + uyyyy )i,j = (fyy )i,j ,

(6)

and respectively. Multiply both Eqs. (5) and (6) by 2h4 and add, we can write Eq. (4) as,

A Fast, Parallel Performance of Fourth Order Iterative Algorithm

473

ui+2,j + ui−2,j + ui,j−2 + ui,j+2 + 4(ui+1,j+1 + ui−1,j+1 + ui+1,j−1 + ui−1,j−1 ) − 20ui,j ∼ = 12h2 fi,j + 2h4 (fxx + fyy )i,j + O(h6 ).

(7)

For higher order approximation by replacing second term on the right side of Eq. (7) by h2 (fi+1,j+1 + fi−1,j+1 + fi+1,j−1 + fi−1,j−1 − 4fi,j ) and ignoring the third term, the result becomes ui+2,j + ui−2,j + ui,j−2 + ui,j+2 + 4(ui+1,j+1 + ui−1,j+1 + ui+1,j−1 + ui−1,j−1 ) − 20ui,j ∼ = Fi,j

(8)

where Fi,j = h2 (8fi,j + fi+1,j+1 + fi−1,j+1 + fi+1,j−1 + fi−1,j−1 ). Eq. (8) is called a rotated fourth order scheme with the accuracy of O(h4 ). Details of the scheme can be obtained in [OTH01, ZHA02].

3 Implementation of A Parallel Rotated Fourth Order Algorithm Assume the Ω h is large with n = 2(i + 1) for any integer i = 1, 2, . . .,. Several ordering strategies of parallelizing all the points in iterative algorithms have been investigated, [ABD96, OTH01, OTH04] and only the optimal strategy is reported. Let Ω h be discretized and labelled into three diﬀerent types of mesh points, •, ◦ and 2, see Fig. 1a.

10

10

9

9

T7

T15

8

T16

T8 8

T25

T20

T21

7

7

T13

T5

T6

T14

6

6

T19

T23

5

5

T3

T11

4

T24 T12

T4 4

T22

T17

T18

3

3

T9

T1

T2

T10

2

2

1

1

0

1

2

3

4

5

a:

6

7

8

9

10

0

1

2

3

4

6

5

7

8

9

10

b:

Figure 1. (a) and (b) show the 4C ordering strategy as indicated T1 , T2 , . . . , T25 and the remaining 2 points in Ω h , respectively with n = 10

All the ◦ points (or tasks Ti ) are allocated to the available processors in a four color (4C) ordering strategy, white (w), yellow (y), green (g) and red

474

M. Othman and J. Sulaiman

(r). Note that all points of type • are executed ﬁrst in parallels by using the rotated ﬁve points scheme with the natural ordering strategy. Applying the 4C strategy to Eq. (8) to each task; in turn Ti leads to the following linear system ⎤⎡ ⎤ ⎡ ⎤ ⎡ Dw E F J fw uw ⎢ E T Dy F J ⎥ ⎢ uy ⎥ ⎢ fy ⎥ ⎥⎢ ⎥ ⎢ ⎥ ⎢ T T (9) ⎣ F F Dg G ⎦ ⎣ ug ⎦ = ⎣ fg ⎦ ur fr J T J T GT Dr where the blocks Dw , Dy , Dg and Dr are diagonal matrices and hence invertible. Thus, the S.O.R relaxation technique to Eq. (9) will result in ⎫ (k+1) (k) (k) (k) (k) −1 ⎪ uw = ξuw + ωe Dw (fw − Euy − F ug − Jur ), ⎪ ⎪ (k+1) (k) (k) (k) ⎬ −1 T (k+1) uy = ξuy + ωe Dy (fy − E uw − F ug − Jur ), (10) (k+1) (k) (k+1) (k+1) (k) ⎪ ug = ξug + ωe Dg−1 (fg − F T (uw + ug ) − Gur ), ⎪ ⎪ (k+1) (k) (k+1) (k+1) (k+1) ⎭ ur = ξur + ωe Dr−1 (fr − J T (uw + uy ) − GT ug ), where ξ = (1 − ωe ) and ωe is the acceleration factor. Since the evaluation of each task Ti within each group is independent of one another, we can evaluate Eq. (10) in parallels to the sequence of the following order −→ u(k+1) −→ u(k+1) −→ u(k+1) . u(k+1) w y g r In other words, each iteration is split into four sweeps in parallels separated by a synchronizing call. This ensures the updates in k th sweep are completed before the updates in the (k + 1)th sweep begin. After an iteration is completed, a local convergence check will be made by each processor, followed by a global convergence check. The iteration is terminated only when local convergence is achieved. After the global convergence is attained, the solution of the remaining mesh points (i.e. points of type 2 (see Fig. 1b)) will be evaluated directly in parallels using the standard ﬁve points scheme, by assigning each row to a diﬀerent processor, individually. All processes of iteration, testing, exchange of values, checking local and global convergence were shown in the algo. 1.

4 Results and Performance Evaluations All the algorithms were applied to solve the following 2D Poisson’s equation uxx + uyy = (x2 + y 2 )exy , which was deﬁned on unit Ω h . The equation is subjected to the Dirichlet boundary condition and satisfying the exact solution u(x, y) = exy , (x, y) ⊂ ∂Ω h . This equation is still used by many researchers to test their algorithms, see [YOU95, ABD96, OTH00, OTH01, ALI02, OTH04]. Throughout the experiment, a tolerance of = 10−10 in the local convergence test was used. The experimental values of ωe were obtained within

A Fast, Parallel Performance of Fourth Order Iterative Algorithm

1: 2: 3:

A Parallel Rotated Fourth Order Algorithm() { Initialization and computation of all the initial values, fi,j , Fi,j , ωe ; h = n1 ; l ﬂag=0; g ﬂag=0; id=get myid(); bit = id2 ; ite=0; /* Check for global convergence and iterate the following loop */ /* in parallel */

4: While not global converge { 4.1: Compute all the • points on the boundaries with natural strategy (k+1)

(k)

(k+1)

(k)

(k+1)

vi,j = 0.25[vi+1,j+1 + vi−1,j−1 + vi−1,j+1 + vi+1,j−1 − 2h2 fi,j ]. < synchronization > 4.2:

Compute all the& ◦ points with' the 4C ordering strategy (k+1) (k+1) (k) (k) = ωe vi,j − ui,j + ui,j , vi,j with the Gauß-Seidel for (k+1) given as & (k+1) (k) (k+1) (k+1) (k) vi,j = 0.05 vi+2,j + vi−2,j + vi,j−2 + vi,j+2 +

' (k) (k+1) (k) (k+1) 0.20[vi+1,j+1 + vi−1,j−1 + vi−1,j+1 + vi+1,j−1 ] − Fi,j , < synchronization >

4.3:

/* Check local convergence for all • and ◦ points */ If not converge then set l ﬂag=0; otherwise l ﬂag=1; < synchronization >

4.4:

/* Mutual exclusion */ If (l ﬂag==1) { m lock(); g ﬂag=g ﬂag+bit; m unlock(); } < synchronization > ite++;

4.5:

/* Exchange the values of all the • and ◦ points from new to old */ (k) (k+1) ui,j ← vi,j ; < synchronization >

4.6: }

/* Compute at once all the remain points i.e. all the 2 points */ /* in parallel */ 2 1 5.1: vi,j = 0.25 ∗ vi,j+1 + vi−1,j + vi+1,j + vi,j−1 − h2 fi,j ; < synchronization > 6: Kill all the processes 7: } 5:

Algo. 1: A Parallel Rotated Fourth Order Iterative Algorithm

475

476

M. Othman and J. Sulaiman

±0.01 by running the program for diﬀerent values and choosing the one(s) that gave the minimum number of iterations. The experiments were carried out on SMP parallel computer, with several mesh sizes (i.e. n is 36, 50, 70 and 100). Table 1 shows the optimum value of ωe , no of iterations, strategies and maximum errors. While table 2 shows the execution time and speedup for all the parallel fourth order iterative algorithms. The execution time and temporal performance of all the parallel fourth iterative algorithms were plotted in Figures 2 and 3, respectively. Table 1. Acceleration factor ωe , no. of iteration, strategy and maximum error of all the parallel fourth order iterative algorithms n Parallel Algo. ωe No of iteration Strategy Max error Standard Rotated Standard Rotated Standard Rotated Standard Rotated

36 50 70 100

1.84 1.77 1.88 1.83 1.91 1.87 1.94 1.90

146 106 201 160 276 235 406 359

4.55x10−9 1.33x10−6 1.24x10−9 3.24x10−7 4.39x10−10 8.85x10−8 1.12x10−10 2.19x10−8

4C 4C 4C 4C 4C 4C 4C 4C

500 Par stand 4Order Par rotated 4Order

Time (second)

400

300

200

100

0 1

2

3 No of processor

4

5

Figure 2. The execution time vs. no of processors of all the parallel fourth order iterative algorithms, n = 100

A Fast, Parallel Performance of Fourth Order Iterative Algorithm

477

Table 2. The numerical results of all the parallel fourth order iterative algorithms n No of processor 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

36

50

70

100

Parallel Standard Algo. Parallel Rotated Algo. Time Speedup Eﬃciency Time Speedup Eﬃciency 20.8 11.3 8.3 6.4 5.3 55.8 29.9 20.7 16.7 13.6 156.6 82.6 56.1 43.7 36.5 473.8 241.7 171.1 130.0 108.0

1.0 1.8 2.4 3.2 3.9 1.0 1.8 2.6 3.3 4.0 1.0 1.8 2.7 3.5 4.2 1.0 1.9 2.7 3.6 4.3

1.00 0.90 0.80 0.80 0.78 1.00 0.90 0.86 0.83 0.80 1.00 0.90 0.90 0.87 0.84 1.00 0.95 0.90 0.86 0.86

7.6 4.2 3.1 2.4 2.0 19.5 10.6 7.6 5.9 4.8 65.9 35.0 24.8 18.8 15.8 212.3 111.4 76.8 60.6 49.6

1.0 1.8 2.4 3.0 3.7 1.0 1.8 2.5 3.2 3.9 1.0 1.8 2.6 3.4 4.1 1.0 1.9 2.7 3.5 4.2

1.00 0.90 0.80 0.75 0.74 1.00 0.90 0.83 0.80 0.78 1.00 0.90 0.86 0.85 0.82 1.00 0.95 0.90 0.87 0.84

6 Par stand 4Order 5

Par rotated 4Order Ideal

Speedup

4

3

2

1

0 1

2

3 No of processor

4

5

Figure 3. The speedup vs. no of processors of all the parallel fourth order iterative algorithms, n = 100

478

M. Othman and J. Sulaiman

5 Conclusion and Future Research In Tables 1 and 2, the results showed that the parallel rotated fourth order iterative algorithm with 4C ordering strategy was faster than the parallel standard fourth order algorithm for any number of processors. It is also indicated in the graphs of execution time, speedup versus no. of processors as stated in Figures 2 and 3, respectively. This is due to the fact that the mesh points involved in the iteration process are less than the other algorithm. In conclusion, the parallel rotated fourth order iterative algorithm was the fastest among the two parallel fourth order iterative algorithms in particular for solving the 2D Poisson’s equation on SMP architecture. In the future, the algorithm will be implemented on networked SMP architecture.

References [ABD96] Abdullah, A.R., Ali, N.H.M.: Comparative Study of Parallel Strategies for the Solution of elliptic PDEs. Parallel Algorithm and Applications, 10, 93–103 (1996) [ALI02] Ali, M. N., Yunus, Y., Othman, M.: A New Nine Point Multigrid V-Cycle Algorithm. Sains Malaysiana, 31, 135–147 (2002) [COL60] Collatz, L.: The Numerical Treatment of Diﬀerential Equations. SpringerVerlag, Berlin (1960) [GUP84] Gupta, M. M.: A Fourth Order Poisson Solver. Journal Computational Physic, 55, 166–172 (1984) [OTH00] Othman, M. and Abdullah, A. R., An Eﬃcient Four Points Modiﬁed Explicit Group Poisson Solver, International Journal of Computers Mathematics, 76, 203–217, (2000) [OTH01] Othman, M., Abdullah, A. R.: A Fast Higher Order Poisson Solver. Sains Malaysiana, 30, 77–86 (2001) [OTH04] Othman, M., Abdullah, A. R., and Evans, D. J.: A Parallel Four Point Modiﬁed Explicit Group Iterative Algorithm on Shared Memory Multiprocessors. Parallel Algorithm and Applications, 19(1), 1–9 (2004). [SPO98] Spotz, W. F., Carey G. F.: Iterative and Parallel Performance of HighOrder Compact System. SIAM Journal of Scientiﬁc Computing, 19(1), 1–14 (1998) [YOU95] Yousif W. S., Evans, D. J.: Explicit De-coupled Group Iterative Methods and Their Parallel Implementations. Parallel Algorithms and Applications, 7, 53–71 (1995) [ZHA02] Zhang, J., Kouatchou, J., Othman M.: On Cyclic Reduction and Finite Diﬀerence Schemes. Journal of Computational and Applied Mathematics, 145(1), 213–222 (2002)

Design and Implementation of a Web Services-Based Framework Using Remoting Patterns Phung Huu Phu1,2 , Dae Seung Yoo1 , and Myeongjae Yi1 1

2

Applied Software Engineering Laboratory School of Computer Engineering and Information Technology University of Ulsan, Republic of Korea {phungphu, ooseyds, ymj}@mail.ulsan.ac.kr Faculty of Information Technology Ho Chi Minh City University of Technology, Vietnam

Abstract In the recent years, Web services technology has been playing an important role as a middleware for distributed systems such as peer-to-peer computing, grid computing as well as interoperability transactions. As the technology continues to evolve, a number of speciﬁcations are being proposed to address the areas necessary to support Web services. These speciﬁcations are designed modularly, therefore it is necessary to have a framework to supply an eﬃcient way for developers in building Web services-based distributed applications. The aim of our approach is to combine and integrate appropriate Web service speciﬁcations within one framework; thus, distributed applications can be built on this framework regardless of these speciﬁcations. In our previous work, a Web services framework in which Web services-based interoperability transactions can be executed in reliable, eﬀective, and secure manner have been proposed. In this paper we present the design and implementation of modules for the framework based on remoting pattern approach. Remoting patterns are used since they provide a systematic way in developing distributed object middleware solutions and they can link to other patterns in the context of distributed applications. By using remoting pattern language, our framework can be easily integrated to Web services-based distributed systems as well as extended additional functionalities in future. A case study of e-banking transactions based on our framework has been developed to illustrate how our framework can be used in practice.

1 Introduction Web Service is a new model for using the Web, in which transactions are initiated automatically by a program, not necessarily using browser like the current World-Wide Web. In this new model, services can be described, published, discovered and revoked dynamically in a distributed computing

480

P.H. Phu et al.

environment. Data, invocations, and results are transferred in XML-based formats; therefore, Web services provide a means for interoperability in a heterogeneous environment. Based on Web services technology, Service-Oriented Architecture (SOA) is the new trend in distributed systems aiming at building loosely-coupled systems that are extendible, ﬂexible and ﬁt well with existing legacy systems [1,2]. The real value of SOA is the ability to automate large-scale business processes, mixing a variety of technologies. Web services and SOA provide a means for interoperability in eﬀective manner and can overcome the weaknesses of other distributed technologies such as language and platform dependence, inﬂexibility, disruption to existing interfaces of old systems [3]. Normally, developers use a Web services framework to build SOA-based applications since such frameworks supply simple approach for creating, deploying as well as consuming Web services. As the technology continues to evolve, a number of speciﬁcations such as security, reliability, transactionality are being proposed to address the areas necessary to support Web services. Unfortunately, most Web services frameworks do not consider these aspects; therefore, it is very diﬃcult for developers to consider these aspects in applications. In our point of view, to support additional standards and speciﬁcations, a framework for Web services should be designed in extensible and ﬂexible manner supporting Web services transactions. Developers could use the framework by transparently conﬁguring framework mechanisms and could manage services on concrete scenarios. The framework could be easily extended without disruption to existing systems. In this paper we present a design of such a framework based on remoting pattern approach. The proposed framework supports both client side and server side and is designed in extensible and ﬂexible manner based on open standards supporting Web services transactions. The rest of this paper is organized as follows. We discuss the related work in section 2, including some existing frameworks and the concept of remoting patterns in the context of Web services. The design and implementation of the framework are presented in section 3. Section 4 shows how the framework can be used by describing case studies. We conclude the paper in section 5.

2 Related Work 2.1 Existing Web Services frameworks Recently, there appear a number of Web services frameworks but they do not consider some parts of Web service speciﬁcations such as security, reliability, transastionality and so on. For instance, Apache Axis framework [4] only provides the way for constructing SOAP processors such as clients, servers,

A Web Services-Based Framework

481

INTERFACE DESCRIPTION d in es te cr rfa ib ce es of

d in esc te ri rfa be ce s of

CLIENT PROXY

raises

de

u -m ses re ars for qu ha es llin ts g

REMOTING

is ra

is es

CLIENT REQUEST HANDLER

ra

to uses send st reque

es

ERROR

INVOKER raises

communicates with

dis req patc ue he sts s to

d buil t s to use reques up

REQUESTOR

r fo ng es alli ts s u rsh es a u m req

Remote Object d inv ispa oc tch ati es on to

MARSHALLER

SERVER REQUEST HANDLER

Figure 1. Basic remoting patterns

gateways. GLUE framework oﬀers a more simple invocation model by generating an interface automatically and providing it remotely, i.e. only supports client side [7]. Asynchronous Invocation Framework [11], built on top of Axis, provides asynchronous invocation on client side without using asynchronous message protocols. Web Services Invocation Framework (WSIF) [5] is a simple Java API for invoking Web services with diﬀerent protocols and frameworks. The WSIF supports only on client side and does not consider additional aspects for Web Services as mentioned. Normally, developers use a Web Services framework to build SOA-based applications since such frameworks supply simple approach for creating, deploying as well as consuming Web services. However, most Web Services frameworks do not consider the aspects of additional speciﬁcations for Web Services QoS; therefore, it is very diﬃcult for developers to consider these aspects in applications. Therefore, the main aim of this work is to give an approach to build a Web Services framework which provides an easily integrated and ﬂexibly extended approach to SOA-based applications. 2.2 Remoting patterns in the context of Web Services Proposed by Uwe Zdun et al [7], remoting pattern language extends the broker architecture [7] to address the full range of how to use, extend, integrate, or even build distributed object middleware systems. In broker

482

P.H. Phu et al.

architecture, the communication functionalities are separated from the application functionalities of a distributed system. The broker hides and mediates all communication between the objects or components of a system. Broker architecture consists of a client-side requestor to construct and forward invocations and a server-side invoker that is responsible for invoking the operations of the target remote object. The marshaller on each side handles the transformation of requests and replies that can be sent over the transmission medium. Since remoting pattern concept provides the general, recurring architecture of successful distributed object middleware (such as Web Services), as well as more concrete design and implementation strategies, remoting patterns help system architects to extend the middleware with additional functionality. Remoting pattern language is a pattern language that oﬀers a systematic way to reuse software models, designs and implementations to extend, integrate, customize, or build distributed object middleware solutions. Figure 1 shows basis remoting patterns. In this paper, we intend to give an approach for designing and implementing a framework which supports additional Quality of Service (QoS) such as reliability, interoperability, security. The framework should be easily extended to additional features. For this purpose, remoting patterns could help us to ﬁnd the best hooks for extending a existing framework to support QoS for the framework.

3 Design and Implementation of the Proposed Framework In our previous work [6], a framework for Web Services is proposed. There are three main blocks in this framework: interoperability control, security control and reliability control. The purpose of interoperability control is to create SOAP message based on service description and business rules. The SOAP message is transferred to security control module if a secure exchange of message is required; otherwise, the message will be transferred to reliability control module. In this section, we explain the design and implementation of the framework in remoting pattern approach. Our framework focuses on the aspects of security and reliability; therefore, the design and implementation should be based on top of a general framework. We choose the popular Apache Axis framework for extending additional properties as proposed in [6]. 3.1 The framework on client side In Axis framework, class Service and Call act as the CLIENT PROXY pattern in remoting pattern language (we present remoting pattern in CAPITALIZED font), which provides a local call within the client process to a remote Web Services object. These classes oﬀer the remote object interface and hide networking detail. In our framework, we add more classes based on

A Web Services-Based Framework

483

Figure 2. Interaction diagram on client side

patterns to Axis in order to control the quality of services for Web Services transactions. Class ClientQoSHandler is added into Axis, to handle the quality of services for the proposed framework. This class acts as CLIENT REQUEST HANDLER pattern. This class instantiates class Security, which acts INVOCATION INTERCEPTOR pattern and handles security module, and class Reliability, which acts as PROTOCOL PLUG-IN pattern handles reliability module and performance for client side of the framework. After building MessageContext for an invocation, the framework dispatches the MessageContext to ClientQoSHandler. The handler invokes security module to encrypt and sign SOAP message for the invocation. The algorithm for encrypting and signing is loaded from a conﬁguration ﬁle which is exchanged by partners. The message returned from security model is dispatched to reliability class. This class implements the Runnable interface and associates handler object to process the result back to the client thread. Figure 2 shows the interaction of class in the framework. 3.2 The framework on server side On server side, our framework is also implemented on the top of popular Axis framework. In this implementation, MessageContext received from AxisServer is dispatched to ServerQoSHandler which is responsible for handling aspects of quality of services within the framework. This class implements SERVER REQUEST HANDLER pattern in remoting pattern language. The

484

P.H. Phu et al.

functionalities of additional modules on server side are similar to those on client side. Security module handles MessageContext dispatched from ServerQoSHandler to de-encapsulate (validates the signature and decrypts SOAP message according to designed security model). Reliability module interacts with Handlers on server side. This module implements Runnable interface and be responsible for performance considerations as showed in the following section. Besides, the framework on server side implements a QoS monitoring module to monitor Web Services execution and performance. Figure 3 shows the interactions of object within the proposed framework.

Figure 3. Interaction diagram on server side

3.3 QoS monitoring QoS monitoring module implements QOS OBSERVER pattern. The module is implemented in TCP socket listening events from the framework. This is a daemon program running on the same host with the framework (web server) or another host to reduce computation power. On startup, this program registers with the framework by writing parameters on a conﬁguration ﬁle. The ﬁle is loaded by the framework to determine which events, such as invocation start/invocation end, connection establish, incoming invocations, return of

A Web Services-Based Framework

485

Figure 4. QoS Monitoring model

invocation, marshaling start/end, should be reported to the QoS monitoring module. Figure 4 shows the implementation of QoS model the framework. 3.4 Performance considerations To reduce the overhead in the framework, multi-thread and thread pooling mechanism is used in implementation both on client and server side. On client side, multi-thread technique provides an asynchronous invocation so that the framework can improve performance thanks to the client could resume its work after sending an invocation. On server side, on receiving a request, the request message is dispatched to the QoS Handler before invoking the appropriate remote object; new thread is instantiated to process QoS properties as designed. The multi-thread mechanism can be used in order to process concurrent invocations, however, it also

idleWorkers ThreadPool

1 1

0..* workers

ThreadPoolWorker internalThread 1

0..*

1

process ServerQoSHandler <> Runnable

Figure 5. Thread pooling used in the framework

Thread

486

P.H. Phu et al.

incur more overhead due to instantiating the threads. Thread pooling technique [8] can be used to reduce this overhead. In this technique, threads are shared in a pool, when needing the handlers can get a thread-worker from the pool and then release a thread back into the pool when ﬁnishing. The pool eagerly acquires a pre-deﬁned number of workers. If the demand exceeds the available resources, the pool would instantiate more resources to execute. Figure 5 shows thread pooling model used in the framework.

4 Case Studies: Building Distributed Applications Based on the Proposed Framework 4.1 The use of the framework Since our framework is built on top of Axis framework, we can use all functions of Axis. Axis framework is implemented in Java language; therefore, it can run in any application server supporting Java. In demonstrations, we install our framework (customized Axis) on Apache Tomcat Server 5.5 running Windows XP Professional platform. The core of framework is ﬁle axis.jar, packing from Axis source ﬁles and additional Java classes. Besides, numerous Java libraries (for example, jaxrpc.jar, saaj.jar, commons-logging.jar, commons-discovery.jar, wsdl4j.jar) are needed to run this framework. When these libraries are declared on a server, such as Tomcat, the framework is ready to use. Developers could use APIs of the framework on both client and server side in the similar way to Axis (see [4] for detail). 4.2 A long-running transactions application scenario In e-business systems, transactions are usually transferred via multiple machine boundary. For example, in an Internet-based credit card payment system, credit card information is transferred via at least three sites. When a customer submits his credit card information for payment via a website, this data is transferred to web server, and then the web application connects to a payment gateway to perform the credit card charge. The payment gateway does not process this information but dispatches this data to an appropriate system of card issuer. On receiving the data, the card issuer’s system might process data or dispatches date to its partner depending on the card policy. The result of the transaction is sent back in reserve path to the customer. We can ﬁgure out that this is a long-running transaction in practice. In these long-running transactions, response time and reliability of transactions are important aspects. Moreover, transferred data passes through diﬀerent organizations’ system, therefore, security should be considered in message-level. From these requirements, our framework could be applied into such transactions in practice. In this scenario, the system of e-commerce site, payment

A Web Services-Based Framework / : Web application

/ : Payment Gatway

487

/ : Card charging system

/ : user 1 : submit payment()

2 : use service() 3 : use service()

Figure 6. Illustrates a long-running Web Services transaction scenario

gateway and card issuer could install our framework to use for such transactions. Since security algorithms are set in a conﬁguration ﬁle, organizations can negotiate security policy and set the conﬁguration ﬁle without customizing the framework. The framework also provides the reliability and other properties of QoS, such as availability, accessibility and performance for Web Services transactions. The e-commerce site use the client side of framework, constructs a service invocation to the payment gateway system. The payment gateway system uses the server side of the framework to provide payment service for e-commerce site systems. Then it uses the client side of framework to perform the request to the system of card issuer. The framework is used in card issuer system on server side to provide credit card charging for payment gateway systems. Figure 6 shows the transactions of this scenario. In this scenario, security is guaranteed in message-level and can be conﬁgured in a ﬁle without customizing the framework. The properties of security such as integrity, conﬁdential and non-repudiation are guaranteed in transitive manner through point-to-point transactions between each two contiguous systems. Reliability of transactions is also guaranteed in point-to-point manner in a long-running transaction. The framework provides asynchronous invocation on client side; therefore the performance penalty of such a long-running transaction could be reduced. Clients can signiﬁcantly resume faster its work when transaction is dispatched to a next system. This aspect is especially important in the case of reactive server applications such as the payment system in this scenario. Multi-thread and thread pooling mechanism on server side of the framework improve performance for Web Services transaction processing.

5 Conclusion This work present the design and implementation of a proposed framework supporting the properties of QoS for Web Services transactions. The framework is designed with remoting pattern language for distributed object

488

P.H. Phu et al.

frameworks. In this way, the design of framework allows ﬂexible extension with additional properties and Web Services speciﬁcations. The implementation of the framework is based on top of the popular Axis framework; thus, the framework automatically inherits Axis’s heterogeneity regarding communication protocols and back-ends of Web Services. Security and reliability are guaranteed for transactions in SOA-based applications built on this framework without the caring of application developers. In addition, the framework on client side is designed and implemented using multi-thread mechanism so that a client can signiﬁcantly resume faster its work. Thus, the performance penalty of Web Services can be reduced, especially in long-transaction Web services application. On server side, the QoS properties of availability and performance are also improved by using thread pooling technique. Case studies and scenarios show that the framework could be used easily and ﬂexibly in practice.

References 1. Hao He, What Is Service-Oriented Architecture, 2003. http://webservices. xml.com/pub/a/ws/2003/09/30/soa.html 2. M. P. Papazoglou and D. Georgakopoulos, Service-oriented computing, Communications of the ACM, Vol. 46, No. 10 2003, pp. 25-28. 3. D. Booth et al., Web Services Architecture, W3C Working Group Note 11, February 2004. http://www.w3.org/TR/2004/NOTE-ws-arch-20040211/ 4. Apache Software Foudation, Apache Axis, 2006, http://ws.apache.org/axis/ 5. Apache Software Foudation, Web Services Invocation Framework, 2006, http://ws.apache.org/wsif/ 6. Phung Huu Phu, Myeongjae Yi, A Service Management Framework for SOAbased Interoperability Transactions, In Proceedings of the 9th Korea-Russia Intl. symposium on Science and Technology (KORUS2005, IEEE Press), Novosibirsk, Russia, 2005, pp. 680–684. 7. M. Volter, M. Kircher, U. Zdun, Remoting Patterns, John Wiley & Sons, 2005. 8. M. Kircher and P. Jain, Pooling Pattern, Proc. EuroPlop 2002, July 2002 Germany. 9. Y. Huang, S. Kumaran, J.-Y. Chung, A service management framework for service-oriented enterprises, Proc. IEEE Intl. Conf. on E-commerce Tech., July 06-09, 2004 California, pp. 181-186. 10. Y. Huang and J.-Y. Chung, A Web services-based framework for business integration solutions, Electronic Commerce Research and Applications 2 (2003), pp. 15-26. 11. U. Zdun, M. V¨ olter, M. Kircher, Design and Implementation of an Asynchronous Invocation Framework for Web Services. In Proceedings of the CWSEurope (2003), pp. 64-78. 12. G. Wang at el., Integrated quality of service (QoS) management in serviceoriented enterprise architectures, Proc. 8th IEEE Intl Enterprise Distributed Object Computing Conf (EDOC 2004), September 20-24, 2004. California, pp. 21-32.

Simulation of Tsunami and Flash Floods S. G. Roberts1 , O. M. Nielsen2 , and J. Jakeman1 1 2

Mathematical Sciences Institute, Australian National University, Canberra, ACT, 0200, Australia {stephen.roberts, john.jakeman}@maths.anu.edu.au Risk Assessment Methods Project, Geospatial and Earth Monitoring Division, Geoscience Australia, Symonston, ACT, 2609, Australia [email protected]

Abstract Impacts to the built environment from hazards such tsunami or ﬂash ﬂoods are critical in understanding the economic and social eﬀects on our communities. In order to simulate the behaviour of water ﬂow from such hazards within the built environment, Geoscience Australia and the Australian National University are developing a software modelling tool for hydrodynamic simulations. The tool is based on a ﬁnite-volume method for solving the Shallow Water Wave equation. The study area is represented by a large number of triangular cells, and water depths and horizontal momentum are tracked over time by solving the governing equation within each cell using a central scheme for computing ﬂuxes. An important capability of the software is that it can model the process of wetting and drying as water enters and leaves an area. This means that it is suitable for simulating water ﬂow onto a beach or dry land and around structures such as buildings. It is also capable of resolving hydraulic jumps well due to the ability of the ﬁnite-volume method to handle discontinuities. This paper describes the mathematical and numerical models used, the architecture of tool and the results of a series of validation studies, in particular the comparison with experiment of a tsunami run-up onto a complex three-dimensional beach.

1 Introduction Floods are the single greatest cause of death due to natural hazards in Australia, causing almost 40% of the fatalities recorded between 1788 and 2003 [1]. Analysis of buildings damaged between 1900 and 2003 suggests that 93.6% of damage is the result of meteorological hazards, of which almost 25% is directly attributable to ﬂooding [1]. Flooding of coastal communities may result from surges of near-shore waters caused by severe storms. The extent of inundation is critically linked to tidal conditions, bathymetry and topography; as recently exempliﬁed in the United States by Hurricane Katrina. While the scale of the impact from such events is not common, the preferential development of Australian coastal

490

S.G. Roberts et al.

corridors means that storm-surge inundation of even a few hundred metres beyond the shoreline has increased potential to cause signiﬁcant disruption and loss. Coastal communities also face the small but real risk of tsunami. Fortunately, catastrophic tsunami of the scale of the 26 December 2004 event are exceedingly rare. However, smaller-scale tsunami are more common and regularly threaten coastal communities around the world. Earthquakes which occur in the Java Trench near Indonesia (e.g. [7]) and along the Puysegur Ridge to the south of New Zealand (e.g. [3]) have potential to generate tsunami that may threaten Australia’s northwestern and southeastern coastlines. Hydrodynamic modelling allows ﬂooding, storm-surge and tsunami hazards to be better understood, their impacts to be anticipated and, with appropriate planning, their eﬀects to be mitigated. Geoscience Australia in collaboration with the Mathematical Sciences Institute, Australian National University, is developing a software application called ANUGA to model the hydrodynamics of ﬂoods, storm surges and tsunami. In this paper we will describe the computational model used in ANUGA, provide validation of the method using a standard benchmark, and provide the preliminary results of a tsunami simulation of the coastal region of north eastern Australia, near the city of Cairns.

2 Model ANUGA uses a ﬁnite-volume method for solving the shallow water wave equations [8]. The study area is represented by a mesh of triangular cells as in Figure 1 in which water depth h, and horizontal momentum (uh, vh), are determined. The size of the triangles may be varied within the mesh to allow greater resolution in regions of particular interest.

Vi

Figure 1. Triangular elements in the ﬁnite volume method

Simulation of Tsunami and Flash Floods

491

The shallow water wave equations are a system of diﬀerential conservation equations of the form ∂U ∂E ∂G + + =S ∂t ∂x ∂y 1 2T where U = h uh vh is the vector of conserved quantities; water depth h, x momentum uh and y momentum vh. Other quantities entering the system are bed elevation z and stage (absolute water level) w, where the relation w = z + h holds true at all times. The ﬂuxes in the x and y directions, E and G are given by ⎤ ⎤ ⎡ ⎡ vh uh ⎦ E = ⎣ u2 h + gh2 /2 ⎦ and G = ⎣ vuh uvh v 2 h + gh2 /2 and the source term (which includes gravity and friction) is given by ⎤ ⎡ 0 S = ⎣ gh(S0x − Sf x ) ⎦ gh(S0y − Sf y ) where S0 is minus the bed elevation and Sf is the bed friction. The friction term is modelled using Manning’s resistance law Sf x

√ √ uη 2 u2 + v 2 vη 2 u2 + v 2 = and Sf y = h4/3 h4/3

in which η is the Manning resistance coeﬃcient. The equations constituting the ﬁnite-volume method are obtained by integrating the diﬀerential conservation equations over each cell of the mesh. By applying the divergence theorem we obtain for each cell an equation which describes the rate of change of the average of the conserved quantities within each cell, in terms of the ﬂuxes across the edges of the cells and the eﬀect of the source terms. In particular, rate equations associated with each cell have the form dUi + (Fij nij1 + Gij nij2 )lij = Ai Si Ai dt j where the subscript i refers to the ith cell, Ai is the associated cell area, Ui the vector of averaged conserved quantities, and Si is the source term associated with each cell. The subscript ij refers to the jth neighbour of the ith cell (j = 0, 1, 2), i.e. it corresponds to the 3 edges of the ith cell. We use Fij nij1 + Gij nij2 to denote the approximation of the outward normal ﬂux of material across the ijth edge, and use lij to denote the length of the ijth edge.

492

S.G. Roberts et al.

From the average values of the conserved quantities in each of the cells, we use a second order reconstruction to produce a representation of the conserved quantities as a piece-wise linear (vector) function of x and y. This function is allowed to be discontinuous across the edges of the cells, but the slope of this function is limited to avoid artiﬁcially introduced oscillations. Across each edge, the reconstructed function is generally discontinuous. The Godunov method (see a good desciption in [6]) usually involves approximating the ﬂux across an edge by exactly solving the corresponding one dimensional Riemann problem normal to the edge. We use the central-upwind scheme of [2] to calculate an approximation of the ﬂux across each edge. In the computations presented in this paper we use an explicit Euler time stepping method with variable timestepping adapted to the observed CFL condition. The model output consists of values for w, uh and vh at every mesh vertex at every time step.

3 Software The components of ANUGA are written in the object-oriented language Python, with the exception of computationally intensive routines, which are written in the more eﬃcient language of C and deal directly with Python numerical structures. VPython is used to interactively visualise the computational domain and the evolution of the model over time. The output of ANUGA is also being visualised through a netcdf viewer based on OpenSceneGraph which has been written as part of ANUGA. Figures in this paper have been produced using this viewer.

4 Validation The process of validating the ANUGA application is in its early stages, however initial indications are encouraging. As part of the Third International Workshop on Long-wave Runup Models in 2004 (http://www.cee.cornell.edu/longwave), four benchmark problems were speciﬁed to allow the comparison of numerical, analytical and physical models with laboratory and ﬁeld data. One of these problems describes a wave tank simulation of the 1993 Okushiri Island tsunami oﬀ Hokkaido, Japan [4]. A signiﬁcant feature of this tsunami was a maximum run-up of 32 m observed at the head of the Monai Valley. This run-up was not uniform along the coast and is thought to have resulted from a particular topographic eﬀect. Among other features, simulations of the Hokkaido tsunami should capture this run-up phenomenon.

Simulation of Tsunami and Flash Floods

493

Figure 2. Comparison of wave tank and ANUGAwater stages at gauge 5

The wave tank simulation of the Hokkaido tsunami was used as the ﬁrst scenario for validating ANUGA. The dataset provided bathymetry and topography along with initial water depth and the wave speciﬁcations. The dataset also contained water depth time series from three wave gauges situated oﬀshore from the simulated inundation area. Figure 2 compares the observed wave tank and modelled ANUGA water depth (stage height) at one of the gauges. The plots show good agreement between the two time series, with ANUGA closely modelling the initial draw down, the wave shoulder and the subsequent reﬂections. The discrepancy between modelled and simulated data in the ﬁrst 10 seconds is due to the initial condition in the physical tank not being uniformly zero. Similarly good comparisons are evident with data from the other two gauges. Additionally, ANUGA replicates exceptionally well the 32 m Monai Valley run-up, and demonstrates its occurrence to be due to the interaction of the tsunami wave with two juxtaposed valleys above the coastline. The run-up is depicted in Figure 3. This successful replication of the tsunami wave tank simulation on a complex 3D beach is a positive ﬁrst step in validating the ANUGA modelling capability. Subsequent validation will be conducted as additional datasets become available.

494

S.G. Roberts et al.

Figure 3. Complex reﬂection patterns and run-up into Monai Valley simulated by ANUGA and visualised using our netcdf OSG viewer

5 Case Study: Modelling Tsunamis with ANUGA The Great Barrier Reef is an icon of the north eastern Australian coast line. It is over 2,000 km long, stretching from Cape York to Gladstone oﬀ the Queensland coast, and consists of around 3,000 reefs and islands. The hydrodynamic module of ANUGA was used to estimate the impact of these geological formations on an incoming tsunami. The Tonga-Kermadec subduction zone poses a potential threat to Australia’s north-east coastline. A coseismic displacement in the subduction zone could displace billions of tonnes of water that would initiate a tsunami that threatened to destroy Cairns and the surrounding towns and communities. The following details the processes and outcomes of modelling such a tsunami using ANUGA. To set up a particular scenario using the hydrodynamic module of ANUGA the user must ﬁrst specify the bathymetry and topography, the initial water stage, boundary conditions such as tidal ﬂuctuations, and any forcing terms such as atmospheric pressure or wind stress. ANUGA contains a mesh generator which allows the user to construct the problem domain and to identify boundaries and special regions. Within these regions the resolution of the mesh can be coarsened or reﬁned and attributes such as stage or elevation redeﬁned to simulate seaﬂoor displacement or submarine mass failure.

Simulation of Tsunami and Flash Floods

495

In this study a tsunami was modelled on a small region close to Cairn in Northern Australia, between 145 degrees and 25 minutes East and 147 degrees East and between 16 degrees 30 minutes South and 17 degrees 30 minutes South. In our scenario, stage and friction were set to zero and elevation was taken from a ﬁle containing x and y coordinates in Eastings and Northings and an associated altitude. These points were obtained from bathymetry and topography data provided by Geoscience Australia’s “2005 Australian bathymetry and Topography grid” and converted into Eastings and Northings. The elevation of the computational domain just described is displayed in Figure 4. Note that the reef bathymetry was not explicitly set, but rather included as part of the Geoscience bathymetry data set used. All boundaries were deﬁned to be reﬂective with the exception of the eastern boundary, through which the tsunami would travel. This boundary allows the stage along the border to be set according to some arbitrary function. In the scenario being described the stage was set to be 6 metres at the boundary between 60 seconds and 660 seconds of the simulation and zero at al other times. This is hoped to approximate a possible tsunami generated

Figure 4. Elevation of the computational domain. The vertical scale has been exaggerated to extenuate the features on the continental shelf

496

S.G. Roberts et al.

in the Tonga-Kermadec subduction zone. At this point ANUGA relies on dedicated tsunamigenic models (such as MOST, [5]) to provide the tsunami events resulting from earthquakes at the boundary. Figure 5 illustrates the evolution of the model tsunami described as it propagates over a small section of the Great Barrier reef adjacent to Cairns. A rainbow scale is used in which red represents the largest negative stage, yellow the next largest through to green which corresponds to a stage value of zero and then up to blue which represents the largest positive stage values. The colour map was set to display from -10 to 10 metres with stage values outside this range clipped to the maximum and minimum values. Plots of the domain are shown for times t = 400, 800, 1300, 1800, 2300, 2800, 3200 and 3600 s. The plots of the tsunami at t = 400 s and t = 800 s illustrate the tsunamis decrease in speed and increase in height as it enters shallower water. The plot of the tsunami at t = 1300 s displays an interesting feature. A depression in the ocean surface has formed behind the tsunami as it encounters the continental shelf. At t = 1800 s a depression of varying depth and extent is present behind the entire length of the tsunami. At t = 2300 s the tsunami has passed over the reef. It appears as though the reef has reduced the height of the tsunami in various regions. The light blue colouring indicates these areas. This behaviour was not evident when the previous colour scheme was used. At t = 2800 s inundation of the Cairns coastline has occurred. But again some runup is occurring in advance of the oncoming wavefront. At t = 3200 s severe inundation has occurred along the entire southern coast. The ANUGA model works well for detailed inundation modelling of small sections like those mentioned above but currently less well for synoptic scenarios. To capture e.g. the source modelling and propagation across large regions we use deep water tsunami models such as MOST [5] to provide boundary conditions for AnuGA. In fact, Geoscience Australia has embarked on a program using this approach to predict which coastlines are most at risk taking into account the return period of submarine earthquakes, tsunami wave propagation and the non-linear eﬀects of local bathymetries.

6 Conclusions ANUGA is a ﬂexible and robust modelling system that simulates hydrodynamics by solving the shallow water wave equation in a triangular mesh. It can model the process of wetting and drying as water enters and leaves an area and is capable of capturing hydraulic shocks due to the ability of the ﬁnite-volume method to accommodate discontinuities in the solution. ANUGA can take as input bathymetric and topographic datasets and simulate the behaviour of riverine ﬂooding, storm surge and tsunami. Initial validation using wave tank data supports AnuGA’s ability to model complex scenarios. Further validation will be pursued as additional datasets become available.

Simulation of Tsunami and Flash Floods

t = 400s

t = 800s

t = 1300s

t = 1800s

t = 2300s

t = 2800s

t = 3200s

t = 3600s

497

Figure 5. The evolution in time of a tsunami, initially 6m above the ocean at the Eastern boundary, over the Great Barrier Reef. Note the orientation of the plots is upside down. The southernmost point of the grid is at the top of the picture

498

S.G. Roberts et al.

ANUGA is already being used to model the behaviour of hydrodynamic natural hazards. This modelling capability is part of Geoscience Australia’s ongoing research eﬀort to model and understand the potential impact from natural hazards in order to reduce their impact on Australian communities.

7 Acknowledgments The authors are grateful to Belinda Barnes, National Centre for Epidemiology and Population Health, Australian National University, and Matt Hayne and Augusto Sanabria, Risk Research Group, Geoscience Australia, for helpful reviews of a previous version of this paper. Author Nielsen publish with the permission of the CEO, Geoscience Australia.

References 1. R. Blong. Natural hazards risk assessment: an australian perspective. Issues in risk science series, Benﬁeld Hazard Research Centre, London, 2005. 2. A. Kurganov, S. Noelle, and G. Petrova. Semidiscrete central-upwind schemes for hyperbolic conservation laws and hamilton-jacobi equations. SIAM Journal of Scientiﬁc Computing, 23(3):707–740, 2001. 3. J. F. Lebrun, G. D. Karner, and J. Y. Collot. Fracture zone subduction and reactivation across the puysegur ridge/trench system, southern new zealand. Journal of Geophysical Research, 103:7293–7313, 1998. 4. M. Matsuyama and H. Tanaka. An experimental study of the highest run-up height in the 1993 hokkaido nansei-oki earthquake tsunami. In National Tsunami Hazard Mitigation Program Review and International Tsunami Symposium (ITS), pages 879–889. U.S. National Tsunami Hazard Mitigation Program, 2001. 5. V. V. Titov and F. I. Gonzalez. Imple-mentation and testing of the method of splitting tsunami (most) model. Technical Memorandum ERL PMEL-112, NOAA, 1997. 6. E.F. Toro. Riemann problems and the waf method for solving the two-dimensional shallow water equations. Philosophical Transactions of the Royal Society, Series A, 338:43–68, 1992. 7. Y. Tsuji, S. Matsutomi, F. Imamura, and C. E. Synolakis. Field survey of the east java earthquake and tsunami. Pure and Applied Geophysics, 144(3/4):839–855, 1995. 8. C. Zoppou and S. Roberts. Catastrophic Collapse of Water Supply Reservoirs in Urban Areas. ASCE J. Hydraulic Engineering, 125(7):686–695, 1999.

Diﬀerentiating Fixed Point Iterations with ADOL-C: Gradient Calculation for Fluid Dynamics Sebastian Schlenkrich1 , Andrea Walther1 , and Nicolas R. Gauger2,3 , Ralf Heinrich2 1 2 3

Institute for Scientiﬁc Computing, Technische Universit¨ at Dresden, Germany Institute of Aerodynamics and Flow Technology, German Aerospace Center (DLR), Braunschweig, Germany Humboldt University Berlin, Department of Mathematics, Germany

Abstract The reverse mode of automatic diﬀerentiation allows the computation of gradients at a temporal complexity that is only a small multiple of the temporal complexity to evaluate the function itself. However, the memory requirement of the reverse mode in its basic form is proportional to the operation count of the function to be diﬀerentiated. For iterative processes consisting of iterations with uniform complexity this means that the memory requirement of the reverse mode grows linearly with the number of iterations. For ﬁxed point iterations this is not eﬃcient, since any structure of the problem is neglected. The method of reverse accumulation proposes for linear converging iterations an alternative, iterative computation of the gradient. The iteration of the gradient converges with the same rate as the ﬁxed point iteration itself. The memory requirement for this method is independent of the number of iterations. Hence, it is also independent of the desired accuracy. We integrate the concept of reverse accumulation within the AD-tool ADOL-C to compute gradients of ﬁxed point iterations. This approach decreases the memory requirement of the gradient calculation considerably yielding an increased range of applications. Runtime results based on the CFD code TAUij are presented.

1 Introduction Automatic Diﬀerentiation (AD) is a technique that provides exact derivative information of a smooth vector-valued function F : IRn → IRm , x → F (x), given as computer program. The key idea in automatic diﬀerentiation is the systematic application of the chain rule. For such a purpose, the computation of F is decomposed into a typically very long sequence of simple evaluations,

500

S. Schlenkrich et al.

e.g. additions, multiplications, and calls to elementary functions such as sin() or exp(). The derivatives of these simple operations with respect to their arguments can be easily calculated. Applying the chain rule to the overall decomposition yields derivative information of the whole evaluation sequence with respect to the input variables x ∈ IRn . Depending on the starting point of this process—either at the beginning or at the end of the respective chain of computational steps—one distinguishes between the forward mode and the reverse mode of AD. The forward mode of AD computes the product of the Jacobian at a given point x and a vector v, i.e., directional derivatives of the form F (x)v. The reverse mode of AD evaluates the product of a given vector w and the Jacobian at a given point x, i.e. gradient information of the form wT F (x). Over the last few decades, extensive research activities have lead to a thorough understanding and analysis of the basic modes of AD. The book by Griewank [3] gives a comprehensive introduction. The theoretical complexity results obtained here are typically based on the operation count OF of the vector-valued function F . Using the forward mode of AD, the product F (x)v can be calculated with an operation count of no more than three times OF . Similarly, the product wT F (x), e.g. the gradient of F (x) if m = 1, can be obtained using the reverse mode in its basic form for an operation count of no more than ﬁve times OF . It is important to realize that this bound for the reverse mode is completely independent of the number of input variables n. The corresponding memory requirement of the reverse mode in its basic form is proportional to OF . The AD-tool ADOL-C is based on operator overloading. It can be applied for the automatic diﬀerentiation of function evaluations written in C or C++. For this purpose, the new class adouble is introduced by ADOL-C. To diﬀerentiate a given function evaluation, one has to declare the independent variables and all quantities that directly or indirectly depend on them as adoubles. These variables are called active. Other variables that do not depend on the independent variables but enter, for example, as parameters, may remain one of the passive types double, ﬂoat, or int. They are called inactive variables. During the function evaluation with adouble variables, an internal function representation is generated. For this purpose, ADOL-C stores for each operation the corresponding operator and the variables that are involved into a data structure called tape. Subsequently, the required derivatives are calculated on the basis of the internal function representation using the elemental diﬀerentiation rules. One has to note that AD computes exact derivative information for the representation of F given by the computer program. Hence, in numerous cases AD provides exactly what the user of AD requires, but one has to keep this diﬀerence in mind. This is especially important if the computer program to evaluate F (x) contains an iterative process, as for example a Newton iteration or a ﬁxed point iteration. In these cases, the black-box application of AD may not yield the derivative information with the required accuracy. Furthermore,

Diﬀerentiating Fixed Point Iterations with ADOL-C

501

the resulting memory requirement may be unacceptable for real-world problems as for example large CFD codes to simulate a ﬂow over an aircraft wing. For these large scale applications, frequently one introduces a pseudo time stepping to compute a quasi-steady state using a given set of design parameters. This simulation forms one main ingredient to optimize the airfoil for a particular ﬂight scenario. Hence, derivatives with respect to the design parameters are required if a calculus-based optimization method is applied. For this purpose, the pseudo time stepping can be interpreted as a ﬁxed point iteration. Then, it is possible to control the accuracy of the derivatives and to reduce the memory requirement using either a two-phase approach or the piggyback method as described for example in [3, Section 11.4]. The method of reverse accumulation proposed and analyzed in detail by Christianson [1] can be seen as one example for a two-phase approach. In this paper, we present the integration of the reverse accumulation into the AD-tool ADOL-C [5]. This advanced automatic diﬀerentiation is applied to compute gradient information for a large scale application, namely the CFD code TAUij provided by the DLR Braunschweig [6]. We present and analyze corresponding runtime measurements for the derivative computation performed on a standard PC and on one node of an Opteron cluster.

2 Iterative Evaluation of Derivatives of Fixed Point Iterations Quite often, the state of the considered system denoted by x ∈ IRn depends on some design parameters denoted by u ∈ IRq . One example for this setting forms the ﬂow over an aircraft wing mentioned already in the previous section. Here the shape of the wing that is deﬁned by the design vector u determines the ﬂow ﬁeld x. The desired quasi-steady state x∗ fulﬁlls the ﬁxed point equation (1) x∗ = F (x∗ , u) for a given continuously diﬀerentiable function F : IRn × IRq → IRn . Fixed point properties of this kind are also exploited by many other applications. Assume that one can apply the iteration xk+1 = F (xk , u)

(2)

to obtain a linear converging sequence {xk } for a given control u ∈ IRq . Then the limit point x∗ ∈ IRn fulﬁlls the ﬁxed point equation (1). Moreover, suppose that dF dx (x∗ , u) < 1 holds for any pair (x∗ , u) satisfying equation (1). Hence, there exists a diﬀerentiable function φ : IRq → IRn , such that φ(u) = F (φ(u), u), where the state φ(u) is a ﬁxed point of F for a given control u. To optimize the system described by the state vector x = φ(u) with respect to the design vector u, derivatives of φ with respect to u are of particular interest.

502

S. Schlenkrich et al.

2.1 Forward and reverse derivative iterations Applying the chain rule of diﬀerentiation to equation (1) yields dφ dx∗ dF dF dx∗ (u) = = (x∗ , u) + (x∗ , u). du du dx du du This formula is again a ﬁxed point equation for the evaluation of Analytically the derivative dφ du (u) can be expressed as dφ (u) = du

I−

−1 dF dF (φ(u), u) (φ(u), u). dx du

(3) dφ du (u)

=

dx∗ du .

(4)

However, a numerical computation according to this formula is not appropriate. Using automatic diﬀerentiation, directional derivatives φ˙ = dφ ˙ ∈ IRn du u q are evaluated for a given u˙ ∈ IR with the forward mode. The reverse mode q k ¯T dφ ¯ ∈ IRq . Denoting x˙ k = dx ˙ for yields adjoints u ¯T = x du ∈ IR for a given x du u k = 0, 1, . . . and applying the forward mode diﬀerentiation, we obtain from equation (3) dF dF x˙ k (x∗ , u)x˙ k + (x∗ , u)u˙ = F (x∗ , u) x˙ k+1 = . (5) u˙ dx du If the derivative dF dx (x∗ , u) describes a contraction, this iteration converges to the unique ﬁxed point x˙ ∗ = dφ(u) ˙ Equation (4) yields for a given x ¯ ∈ IRn du u. the identity −1 dx∗ dF dF =x ¯T I − (x∗ , u) (x∗ , u). x ¯T du dx du −1 ¯T I − dF , we get the ﬁxed point equation Setting ζ T := x dx (x∗ , u) ζ∗T = ζ∗T

dF (x∗ , u) +x ¯T dx

for the adjoint information. Hence, one obtains for k = 0, 1, . . . the iteration " ! T ∗ ,u) T T dF (x∗ ,u) ζk+1 , u ¯Tk+1 = ζkT dF (x + x ¯ , ζ k dx du (6) T T T ¯ ,0 , = ζk F (x∗ , u) + x −1 which converges to the unique ﬁxed points ζ∗T = x ¯T I − dF and dx (x∗ , u) T dF T dx∗ u ¯∗ = ζ∗ du (x∗ , u) = x ¯ du . As shown in [1], the uniform rate of convergence of the derivative ﬁxed point iteration (6) equals the asymptotic rate of convergence of the original ﬁxed point iteration (2). Using not a black-box approach but a slightly advance technique, AD provides for the ﬁxed point iteration (2) given as a computer program the ˙ u) ˙ T and ζ T F (x∗ , u) in equation eﬃcient evaluation of the terms F (x∗ , u)(x,

Diﬀerentiating Fixed Point Iterations with ADOL-C

503

(5) and (6) respectively. For this purpose, we apply a two-phase approach, i.e., a splitting of the original iteration (2) and the derivative iteration (5) and (6) respectively. Then the iterations computing the derivatives use information of F (x∗ , u) at the previously computed ﬁxed point x∗ . This approach to compute ¯T dφ adjoints of the form u ¯T = x du was proposed in [1] and applied for example in [2]. As alternative, the derivative information can be computed in combination with the original iteration to evaluate x∗ . This so-called piggy-back approach was proposed in [4]. 2.2 Embedded ﬁxed point iteration Usually ﬁxed point iterations are part of a larger computational context. This fact can be described in a mathematical way as a composition of mappings φ χ τ p → (x0 , u) → (x∗ , u) → y , (7) A BC D y=J(p)

where p is a set of given parameters. We refer to τ as initialization calculating the initial value x0 of the ﬁxed point iteration and the design vector u from the parameter values p. The function φ denotes the ﬁxed point iteration, and χ the evaluation of a target function to compute the target value y. In the remainder of this subsection, we describe the application of the reverse mode of AD, i.e. the reverse accumulation, to the computational procedure (7) using the AD-tool ADOL-C. The forward mode can be derived in analogy. The diﬀerentiation of y = J(p) = (χ ◦ φ ◦ τ )(p) using the reverse mode ¯ Its computation is of AD yields for a weight vector y¯ the value p¯ = y¯T J(p). given by the composition of the following functions: (¯ x∗ , u ¯) = y¯T χ(x ¯ ∗ , u), ¯ x u ¯ += φ(u, ¯∗ ), p¯ = τ¯(p, u ¯). Applying ADOL-C in a “black-box” fashion to the computation of J(p) = (χ ◦ φ ◦ τ )(p) is not eﬃcient since it neglects any structure of the ﬁxed point iteration in φ. Alternatively, we employ the reverse accumulation of gradient ¯ x ¯ in (6), then we only information. According to the evaluation of φ(u, ¯∗ ) = u need the repeated evaluation of the product ζkT F (x∗ , u) for diﬀerent vectors ζk . Therefore the taping process, i.e., the generation of an internal representation, is stopped during the evaluation of the ﬁxed point iteration φ(x0 , u). The taping is started again only for the last iteration the result of which is assumed to be suﬃciently close to x∗ . The corresponding internal representation of one iteration of the ﬁxed point process forms a subtape. An illustration of this advanced taping approach in comparison to the basic taping, i.e., the black-box application of ADOL-C, is given in Figure 1. During the evaluation ¯ the tape is interpreted in the reverse of the adjoint information p¯ = y¯T J(p)

504

S. Schlenkrich et al. tape generation

tape generation

A

BC τ

DA

BC φ

DA

BC χ

D

Basic taping process

Advanced taping process

Figure 1. Diﬀerent taping approaches

order starting at the end. Exploiting the advanced taping mechanism, the original derivative computation is interrupted when the ﬁxed point iteration is reached. The reverse accumulation for the ﬁxed point iteration according to (6) is performed by repeatedly computing the products ζkT F (x∗ , u) for dif¯, the original derivative ferent weight vectors ζk . After the computation of u computation is started again to evaluate the overall derivative information p¯. This nested tape evaluation is illustrated in Figure 2. derivative computation

A

BC τ¯

A

D

BC χ ¯

D

switching the diﬀerentiation context reverse accumulation

BC D φ¯ Figure 2. Derivative computation exploiting the ﬁxed point structure A

2.3 Implementation The presented advanced automatic diﬀerentiation of ﬁxed point iterations is provided by the resent ADOL-C function fp iteration(...). It has the following interface: int fp iteration ( int sub tape num, int (*double F)(...), double (*norm)(...), double epsilon, int N max, adouble *x 0, adouble *x ﬁx, int dim x,

int (*adouble F)(...), double (*norm deriv)(...), double epsilon deriv, int N max deriv, adouble *u, int

dim u )

Diﬀerentiating Fixed Point Iterations with ADOL-C

505

The ﬁrst argument sub tape num is an ADOL-C identiﬁer for the subtape. The variables double F and adouble F are pointers to functions that compute for x and u a single iteration step x ˜ = F (x, u). The passive version double F has arguments of type double and the active version adouble F has arguments of type adouble. The parameters norm and norm deriv are pointers to functions computing the norm of a vector. The latter functions together with epsilon, epsilon deriv, N max, and N max deriv control the iterations of the form do

do

k =k+1 x=y y = F (x, u) while (y − x ≥ ε and k ≤ Nmax )

k =k+1 ζ=ξ ¯T ) = ζ T F (x∗ , u) + (¯ xT , 0T ) (ξ T , u while (ξ − ζderiv ≥ εderiv and k ≤ Nmax,deriv )

Function evaluation

Derivative computation

The vector for the initial iterate and the control is stored in x 0 and u respectively. The ﬁxed point is stored in the vector x ﬁx. Finally dim x and dim u represent the dimensions n and q of the corresponding vectors.

3 Application to TAUij Code The CFD code TAUij is a quasi 2D version of TAUijk, which again is based on a cell centered developer version of the DLR TAU code [8]. TAUij solves the quasi 2D Euler equations. For the spatial discretization the MAPS+ [7] scheme is applied. To achieve second order accuracy, gradients are used to reconstruct the values of variables at the cell faces. A slip wall and a far ﬁeld boundary condition are implemented. For time integration a Runge-Kutta scheme is used. Local time stepping, explicit residual smoothing, and a multigrid method accelerate the convergence. The code TAUij is written in C++ and comprises approximately 6000 lines of code distributed over several ﬁles. First, a 2D grid is generated and initializations are done based on a parameter set consisting of 20 design variables, which determine the shape of the airfoil and the corresponding grid. The initial airfoil is the transonic RAE2822 airfoil with Mach number 0.73 and 2◦ angle of attack. The output value of interest is the drag coeﬃcient. In our notation, this start-up computation to generate the grid and to set up the system corresponds to the evaluation of the function τ (p). Subsequently, the ﬁxed point iteration φ(x0 , u) is performed as integration of 2000 pseudo time steps to compute a quasi-steady state x∗ around the airfoil. Finally, the value of the drag coeﬃcient is computed by the objective function χ(x∗ , u). All computations were performed on a standard AMD Athlon 3200 PC with 1 GB RAM and on a cluster consisting of 20 Dual AMD Opteron 240 (1.4GHz) using one node with 12 GB RAM and fast memory access. For the

506

S. Schlenkrich et al.

runtime measurements presented below, we used two diﬀerent 2D grids. The corresponding number of grid points for the two discretizations are shown in Table 1. Furthermore, the table shows the number of active variables, the number of operations recorded on the tape, and the resulting memory required for the evaluation of one Runge-Kutta step during the time integration. discretization I II

grid points 161 × 33 321 × 65

active variables ≈ 70 × 106 ≈ 285 × 106

operations ≈ 20 × 106 ≈ 116 × 106

tape size ≈ 344 MB ≈ 2.3 GB

Table 1. Conﬁguration details

As can be seen, the black-box application of ADOL-C is not feasible even for the rather coarse discretization I due to the enormous memory requirement of 2000 ∗ 344 MB. The taping of only one Runge-Kutta step resulting in a tape size of 344 MB and 2.3 GB, respectively, is possible even on a standard PC. However, the complexity results stated in Sec. 1 are valid only for a ﬂat memory hierarchy since the memory accesses are completely ignored in the complexity analysis. This fact is reﬂected by the runtime measurements for one gradient computation that are shown in Table 2. Obviously, the computing discretization I 2000 steps forward iter. Taping one time step 1000 steps reverse iter.

PC 2 min 39 sec 56.8 sec 1 h 21 min 1 sec

Cluster 1 min 44 sec 2.5 sec 15 min 8 sec

discretization II 2000 steps forward iter. Taping one time step 1000 steps reverse iter.

PC 11 min 36 sec 5 min 49 sec ≈5 days

Cluster 7 min 40 sec 10.4 sec 1 h 5 min 3 sec

Table 2. Runtime results for two conﬁgurations

times achieved on the PC do not conﬁrm the complexity results stated in Sec. 1. This is mainly due to the high memory access cost needed to read the information contained in the tape. For the coarser discretization I, the tape can be kept in main memory but nevertheless swapping is necessary. For the ﬁner discretization II, the tape has to be stored on disc leading to an unacceptable runtime. This situation changes completely when the derivative computation is performed on one node of the cluster that has a very fast access to the main memory. For both discretization, the tapes can be kept in main memory due to the 12 GB RAM. Hence, on the node of the cluster the memory

Diﬀerentiating Fixed Point Iterations with ADOL-C

507

hierarchy is comparable ﬂat. This leads to a very eﬃcient computation of the gradient: The ratio of the runtime needed to compute the gradient and the runtime needed for the function evaluation itself is less than 9 for the two discretizations considered. Hence, the temporal complexity of the gradient computation using AD is faster than the derivative computation based on ﬁnite diﬀerences since the gradient has 20 components. Furthermore, it is comparable close to the theoretical bound stated in Sec. 1. Currently, the derivative calculations presented in this paper are combined with an optimization tool. A forthcoming paper will report the details and results obtained for the overall optimization process.

4 Conclusions We discuss the eﬃcient diﬀerentiation of ﬁxed point iterations using the reverse accumulation approach. The integration of this method into the AD-tool ADOL-C and the corresponding user interface are described. This new facility enables the diﬀerentiation of the CFD code TAUij to compute the gradient of the drag coeﬃcient with respect to 20 input parameters. Using a computer platform with suﬃciently large RAM, the resulting temporal complexity of the derivative computation comes close to the theoretical bound. Hence, the advanced AD technique recently integrated into ADOL-C extends the range of applications considerably and provides an eﬃcient derivative calculation for large C and C++ codes.

5 Acknowledgments The authors thank M. Widhalm from DLR Braunschweig for providing an appropriate geometry generator based on the set of parameters used.

References 1. B. Christianson. Reverse accumulation and attractive ﬁxed points. Optimization Methods and Software, 3:311–326, 1994. 2. R. Giering, T. Kaminski, and T. Slawig. Applying TAF to a Navier-Stokes solver that simulates an Euler ﬂow around an airfoil. Future Generation Computer Systems, 21:1345–1355, 2005. 3. A. Griewank. Evaluating Derivatives: Principles and Techniques of Algorithmic Diﬀerentiation. Number 19 in Frontiers in Appl. Math. SIAM, Philadelphia, 2000. 4. A. Griewank and C. Faure. Piggyback diﬀerentiation and optimization. In L. Biegler, O. Ghattas, M. Heinkenschloss, and B. van Bloemen Waanders, editors, Large-scale PDE-constrained optimization, volume 30 of Lect. Notes Comput. Sci. Eng., pages 148–164, Berlin, 2003. Springer.

508

S. Schlenkrich et al.

5. A. Griewank, D. Juedes, and J. Utke. ADOL-C: A package for the automatic diﬀerentiation of algorithms written in C/C++. ACM Trans. Math. Softw., 22: 131–167, 1996. 6. R. Heinrich. Implementation and usage of structured algorithms within an unstructured CFD-code. Notes on Numerical Fluid Mechanics and Multidisciplinary Design, 92, 2006. 7. C.-C. Rossow. A ﬂux splitting scheme for compressible and incompressible ﬂows. Journal of Computational Physics, 164:104–122, 2000. 8. M. Widhalm and C.-C. Rossow. Improvement of upwind schemes with the least square method in the DLR TAU code. Notes on Numerical Fluid Mechanics, 87, 2004.

Design Patterns for High-Performance Matrix Computations Hoang M. Son Hanoi University of Technology, 1 Dai Co Viet, Hanoi, Vietnam [email protected]

Abstract This paper discusses fundamental issues of developing high-performance matrix computation software in an object-oriented language like C++ and presents key design patterns for solving these problems. These object-oriented design patterns are implemented in FMOL++ (Fundamental Mathematical Object Library) – a C++ template library, which includes basic algebraic structures and operations as well as common model classes needed by control system analysis and design. Through over a decade of evolutions, these patterns have proved to provide bestpractice solutions to most common problems in the context of high-performance matrix computation. Benchmarks are made for performance comparisons between FMOL++ and alternative approaches.

1 Introduction For modeling, simulation, analysis and design of complex systems, highperformance and easy-to-use matrix computation software is always needed. Traditionally, one would make use of well-proved software packages written in FORTRAN, such as LAPACK [1]. This software package has proved to be very eﬃcient and reliable, but its lengthy and unreadable coding style makes itself a high-potential source of errors. There are also translations to C programming language, but the use is very limited since they do not oﬀer real advantages over FORTRAN original version except a wider portability. Today, the availability of commercial products like MATLAB makes matrix computation almost simple as just writing down mathematical expressions on the paper [2]. Although these software products are very powerful and useful for many purposes such as modeling, analysis, design, simulation and oﬀ-line optimization, there are many situations where native-language matrix libraries are more appropriate, e.g. real-time (and distributed) simulation, real-time control implementation and on-line optimization, see e.g. [3–5]. Besides the question of licensing price, it is often claimed that program code generated by universal tools are neither eﬃcient enough nor easily embeddable into realtime applications.

510

H.M. Son

The combination of object-oriented programming methods and supports from a high-level language like C++ allows the development of very ﬂexible and highly reusable software. Because of its maturity and many excellent features, the C++ programming language is the deﬁnitive choice for many users [6, 7]. LAPACK++ [8] and its successor, so called TNT [9], are two examples of C++ matrix packages that are widely used. However, objectoriented design is hard and object-oriented design of high-eﬃcient, elegant, ﬂexible and reusable matrix library is much harder. In fact, while there are many other matrix libraries are freely available from the Internet, very few of them are well designed in the author’s viewpoint. The most critical design challenge involves the trade-oﬀs between diverse requirements (such as between simplicity and ﬂexibility, elegance and eﬃciency, compactness and openness, and so on), which are often in conﬂict to each other. For example, while the operator-overloading feature in C++ allows us to simplify coding convention, without proper design it could penalize the performance due to unnecessary data copying and ineﬃcient submatrices access. This paper makes a contribution to the developer and user community of matrix computation software by presenting object-oriented design patterns [10]. These design patterns have been successfully implemented in FMOL++ (Fundamental Mathematical Object Library), a C++ template library written by the author. Through over a decade of utilization, evolution and continuous improvements, these patterns have proved to provide best-practice solutions to most common problems in the context of high-performance matrix computations. The rest of this paper is organized as follows. Section 2 gives an overview of fundamental design issues of a C++ matrix library and discusses some alternatives for solving these problems. The key design pattern in FMOL++, namely Behavior Template, is presented in Section 3. Performance comparisons between FMOL++ and alternative approaches for typical arithmetic operations are given in Section 4. Section 5 concludes the paper.

2 Fundamental Design Issues The fundamental design requirements for a C++ matrix package are: 1. Ease-of-use and elegance: It is desirable that we could write elegant C++ code similar to MATLAB for common matrix computations, i.e. just as simple as writing down mathematical equations on the paper, without having to remember many names and conventions. 2. Portability and reusability: The package design and implementation should be based only C++ standard features, in order to keep the code portable across platforms and compilers. 3. Reliability and robustness: The implementation of a linear algebra package depends much on the availability of reliable algorithms, but this is not a

Design Patterns for High-Performance Matrix Computations

511

design matter. The design should focus on eliminating potential source of programming errors. Besides, there must be also a mechanism for user to detect the program errors in an early stage. 4. Flexibility and openness: The package should support various data types and precisions, general as well as special matrix structures and algorithms. It should also allow users to deﬁne their own matrix structures and to add more functionality with minimal eﬀorts while maintaining the overall consistency. 5. Eﬃciency: The performance in terms of memory use and computation speed should be at least comparable to hand-coded C/C++ programs. By adopting standard object-oriented design patterns and utilizing special techniques oﬀered in C++ language, the ﬁrst three requirements are not hard to meet. Theses issues are discussed in more details in [5]. As an illustrative example, consider the programming problem for simulation of a linear dynamic system described by the discrete state-space model: x(k + 1) = Ax(k) + Bu(k) y(k) = Cx(k) + Du(k) The C-style code should look more or less similar to the following fragment: void fsimss(double** A, double** B, double** C, double** D, double* x, double* u, double* y, int n, int p, int q /* no of states, inputs and outputs */) { /* temporary variables for storage immediate results */ double *v1, *v2, *v3, *v4; /* allocate memory */ v1 = (double*) malloc(q*sizeof(double)); v2 = (double*) malloc(q*sizeof(double)); v3 = (double*) malloc(n*sizeof(double)); v4 = (double*) malloc(n*sizeof(double)); /* calculate outputs */ multiMatrixWithVector(C,x,v1,q,n); multiMatrixWithVector(D,u,v2,q,p); addVector(v1,v2,y,q); /* update states*/ multiMatrixWithVector(A,x,v3,n,n); multiMatrixWithVector(B,u,v4,n,p); addVector(v3,v4,x,n); /* release memory */ free(v1); free(v2); free(v3); free(v4); }

Using object-oriented design in combination with operator overloading technique, the resulting matrix library should allow us to write more simple and elegant code as follows:

512

H.M. Son

Vector fsim(Matrix A, Matrix B, Matrix C, Matrix D, Vector& x, Vector u) { Vector y = C*x+D*u; x = A*x+B*u; return y; } We could go a step further by encapsulate the state-space model by an object class (SSModel). The simulation code now looks more compact: Vector fsim(SSModel& sys, Vector u) { Vector y = sys.output(u); sys.update(u); return y; } Of course, the readability of the program code is not just a matter of coding style, it is also very important for the veriﬁcation and error checking. The more readable is the code, the less error-prone it is. Consider further the following fundamental problem. We want to deﬁne m unary operations on n matrix structures of p element types (e.g. float, double, long double, complex,...). In a conventional FORTRAN or C library there may be as many as n × p data structures and m × n × p routines, which correspond to the same number of interfaces and implementations. Even worse, if binary operations were to be deﬁned, the number of implementations would explode much higher. It is not surprised that the LAPACK software consists of over 1,000 routines and 600,000 lines of FORTRAN 77 source code [1]. The tremendous number of function names and calling conventions has also a negative impact on the reusability. With object-oriented design and advanced language supports, there are several ways to solve this problem. First, using simple generic programming techniques (i.e. class and function templates) the number of data structures and routines could be reduced dramatically since many element types could share the same implementation. For the mentioned situation there would be suﬃcient with n class templates for matrix types and m × n function templates for unary operations. Second, function and operator overloading helps to reduce the number of interfaces while still allowing specialized implementations. Third, using inheritance mechanism all special matrix structures are considered as being derived from a common base class (or class template) representing the general matrix. Consequently, the amount of interfaces and implementations could be reduced further. Despite all advantages of object-oriented programming and advanced language supports stated above, many questions remain open in respect to the last two requirements. The high ﬂexibility and high eﬃciency are much tougher to achieve, as they are not only in conﬂict to the ﬁrst three requirements, but also often in conﬂict to each other. Some of the most diﬃcult questions to be answered are:

Design Patterns for High-Performance Matrix Computations

•

• • •

513

How is the code eﬃciency deteriorated due to inheritance and operator overloading mechanism and how to avoid it? Actually, without a special design consideration the code segments shown above could penalize much time and storage eﬃciency due to excessive copying of function parameters and return values. How are the relations between many matrix structures (e.g. general matrix, symmetric matrix, diagonal matrix and so on) to be deﬁned, by using inheritance or other class relationships? How to separate between interface and implementations in order to offer the universal and simple operations (such as +, –, *, /,...) while still utilizing special matrix structures for eﬃcient implementations? How to support eﬃcient, in-place submatrix operations and how is the type of a submatrix to be deﬁned?

The main distinctions of existing C++ matrix libraries could actually be made based on the judgment of how elegant are the design solutions for these problems. Three most important design patterns in FMOL++ are: •

• •

Reference Counting or Smart Copy used for avoiding unnecessary data copying. This pattern is one of the most simple and reliable techniques and thus adopted in many matrix libraries. For the detailed description of this pattern, refer to [11, 12]. Behavior Template used for separating interface and implementation of a matrix type, reduce the number of interfaces and increasing the ﬂexibility without sacriﬁcing performance. Reference Class or Alias used for eﬃcient deﬁnition of many important matrix operations such as submatrix access, matrix columns and rows access, matrix transpose and so on.

Interestingly, with a little extension the Reference Class pattern could be incorporated easily into the Behavior Template. Therefore, the Behavior Template is the key design pattern in FMOL++. This pattern is described in the next section.

3 The Behavior Template Pattern Let us return to the fundamental problem of deﬁnition of various matrix types and operations. We want all matrix types (general rectangular matrix and special matrix structures) to support basic linear operations and other common algorithms such as transpose, inverse, norms, LQ decomposition, QR decomposition, SVD,... Ideally, the interface to an operation or to an algorithm should be the same for all types of matrices, while the implementation should exploit the special characteristics of each matrix type. Actually, this requirement is known as polymorphism. The key design concept is to separate between interface and implementation.

514

H.M. Son

One common approach is to use the inheritance mechanism together with virtual member functions. There is a hierarchy of matrix classes (or class templates), where all common operations are declared as ’virtual’ in the base class. Some matrix operations could be implemented in the base class, but special matrix structures could change this default behavior by overwriting the virtual functions as needed. This is a standard object-oriented design approach, so called polymorphism based on dynamic bindings. However, it is a common knowledge that dynamic bindings could suﬀer much from function call overhead. Since many matrix operations rely on appropriate element access, if every element access involved a virtual function call then the performance deterioration would be enormous and clearly unacceptable. In FMOL++, this problem is solved by adopting the Behavior Template design pattern. This pattern consists of following concepts: • • • • •

A matrix class template serves as the single interface to all matrix types. A template parameter for the matrix class template is used as a base class for the actual matrix type. The template parameter should provide the concrete storage implementation and access operations for a particular matrix type. The access operations are implemented on three levels: element-wise (level 1), column-wise and row-wise (level 2) and submatrix access (level 3). For eﬃcient and ﬂexible access, column-wise and row-wise operations are based on Iterator pattern like in C++ Standard Template Library (STL).

The class diagram in Fig. 1 illustrates the relationship between diverse constructs involved in the pattern. The class template TMatrix is the root of all matrix types. Together with some non-member operations, it provides the common interface to almost all basic matrix operations. The ﬁrst template argument represents the element type, which may be parameterized later as float, double, long double, complex, and even as concrete vector, polynomial or matrix types. The second template argument represents the array type that encapsulates element storage and access mechanism. The default parameter for the second argument is TArray2D, which is a regular two-dimensional array with built-in reference-counting mechanism for representing general rectangular matrices. The implementation code for TMatrix should look like as follows: template > class TMatrix : public ArrayT { ... public: ElemT& operator()(int i, int j); TSubMatrix<ElemT> operator() (int i1, int i2, int j1, int j2); template TMatrix<ElemT,ArrayT>& operator+=(const TMatrix<ElemT,AnyT>&); ... };

Design Patterns for High-Performance Matrix Computations

515

ElemT TArray2D +begincol() : ColIter +beginrow() : RowIter +endcol() : ColIter +endrow() : RowIter ElemT, ArrayT TMatrix ElemT,RowIT,ColIT TRefArray2D +begincol(in j) : ColIT +beginrow(in i) : RowIT +endcol(in j) : ColIT +endrow(in i) : RowIT

+size() +operator()(in i, in j) +operatior()(in i1, in i2, in j1, in j2) +operator+=(in TMatrix<ElemT,AnyT>) +operator-=(in TMatrix<ElemT,AnyT>) +...()

ElemT TSymArray2D +begincol(in j) : SColIter +beginrow(in i) : SRowIter +endcol(in j) : SColIter +endrow(in i) : SRowIter

ElemT ElemT TMatrix<ElemT,TRefArray2D<ElemT,SubColIT, TMatrix<ElemT,TSymArray2D<ElemT> > SubRowIT> >

ElemT TSubMatrix

ElemT TSymMatrix +operator+=(in TSymMatrix<ElemT>) +operator-=(in TSymMatrix<ElemT>)

Figure 1. Structure of the Behavior Template pattern

Many binary operations (such as matrix multiplication) could be deﬁned as non-member operations that support almost any kind of matrices as follows: template TMatrix<ElemT> operator*(const TMatrix<ElemT,ArrayA<ElemT> >& A, const TMatrix<ElemT,ArrayA<ElemT> >& B) { . . . }

Special matrices, such as submatrices or symmetric matrices could be represented by a class template derived from TMatrix with a specialized element storage implementation (TRefArray2D or TSymArray2D) as second template argument. Specialized versions of functions and operators can be added to the derived class template as needed. For example, the class template TSymMatrix could deﬁne a special version of in-place matrix addition as follows: template class TSymMatrix public TMatrix<ElemT, TSymArray2D<ElemT> > { ... public: TSymMatrix<ElemT>& operator+=(const TSymMatrix<ElemT>&); ... };

For eﬃcient and elegant implementation of submatrix access, the class template TSubMatrix plays the crucial role. A submatrix is considered as a regular matrix except that it does not own element data, but references to data of another matrix instead. The mechanism for this kind of element access

516

H.M. Son

is encapsulated in TRefArray2D. All operations deﬁned in TMatrix are also applicable to TSubMatrix without any further consideration. For example, we could write the following code: ... A(1,5,1,5) += B(2,6,2,6);

where A and B are matrices of any kind (of course they must have the same element type). The submatrix access operation deﬁned in TMatrix returns an object of type TSubMatrix, which allows the self-assigned addition operation (+=) to take very eﬃciently in-place (that is, without having to copy data). This concept is known as Reference Class pattern. This pattern can be also adopted to implement other matrix operations such as taking matrix transpose, matrix inverse, matrix column and row access, etc. In short, the Behavior Template pattern and its derived patterns bring following advantages: • •

•

Generalization and specialization: Various matrix types and their interrelationships can be represented in a hierarchy with general interface and specialized implementation versions. Eﬃciency: All type resolutions are made at compile-time, so there is no run-time overhead for calling virtual functions. By adopting the Reference Counting and Reference Class patterns, element data is only copied when really needed. Many matrix operations could be implemented as in-place version, so that there is no lost in time or space caused by using intermediate variables. Flexibility and openness: Library user can choose the most appropriate matrix types and element types for his application needs. If a particular matrix type or operation has not been deﬁned, the user could add a specialized version with minimal eﬀort.

4 Performance Comparisons For performance comparisons, we considered three approaches: i) using FMOL++, ii) using TNT and iii) using C-style implementation. The test program was very simple; it consisted of a loop where typical basic linear algebra operations were repeated. Two situations were tested, one dealt with normal matrix and the other dealt with submatrix operations. The code using FMOL++ were listed as follows: long double Matrix_ops(int sz, int nloop) { TMatrix<double> A(sz,sz,1.0); TMatrix<double> X=A,Y=A; rtclock.reset(); for (int i=0; i < nloop; ++i) Y += A*X; return rtclock.time(); }

Design Patterns for High-Performance Matrix Computations

517

long double Submatrix_ops(int sz, int nloop) { TMatrix<double> A(sz,sz,1.0); TMatrix<double> X=A,Y=A; rtclock.reset(); for (int i=0; i < nloop; ++i) Y(1,sz-1,1,sz-1) += A(1,sz-1,1,sz-1)*X(2,sz,2,sz); return rtclock.time(); }

The code using TNT was very similar to that using FMOL++, except that in the second function the type Array2D was used instead of Matrix, because the class template Matrix of TNT did not support submatrix operations. long double Matrix_ops(int sz, int nloop) { Matrix<double> A(sz,sz,1.0); Matrix<double> X=A,Y=A; rtclock.reset(); for (int i=0; i < nloop; ++i) Y = Y + A*X; return rtclock.time(); } long double Submatrix_ops(int sz, int nloop) { Array2D<double> A(sz,sz,1.0); Array2D<double> X=A,Y=A; rtclock.reset(); for (int i=0; i< nloop; ++i) { Y.subarray(0,sz-2,0,sz-2) += matmult(A.subarray(0,sz-2,0,sz-2),X.subarray(1,sz-1,1,sz-1)); return rtclock.time(); }

The code written in C-style was very lengthy so it is not suitable to list here. Clearly, the source code using FMOL++ was most simple and elegant. For better graphical presentation of the test results, the average execution time for one loop was taken and divided by n3 where n was the order of test matrices. The test platform was IBM-PC, 1.6GHz running Windows XP. Several C++ compilers were used, including Visual C++ 6.0, Visual C++ .Net and GNU g++ 3.4.2, but they gave very similar results. Interestingly, all three programs (using FMOL++, TNT and C-style) produced the same size of runtime code. The speed test results with Visual C++ compiler are plotted in Figure 2 and Figure 3. As can be seen from these ﬁgures, FMOL++ gave by far the best results (solid line), even much better than C-style version.

5 Conclusion In this paper, we have discussed fundamental issues of design and implementation of a high-performance matrix library. It has been shown that the

518

H.M. Son

Computation time/n 3 (µsec)

0.02

0.015

0.01

0.005

0

0

100

200 300 Order of matrices: n

400

500

Figure 2. Speed test for matrix operations (solid line: FMOL++, dashed line: TNT, dotted line: C-style code)

3

Computation time/n (µsec)

0.2

0.15

0.1

0.05

0

0

100

200 300 Order of matrices: n

400

500

Figure 3. Speed test for submatrix operations (solid line: FMOL++, dashed line: TNT, dotted line: C-style code)

adoption of appropriate design patterns could solve the problems in a very elegant and eﬃcient manner. The key design pattern in FMOL++, namely the Behavior Template, has been described in details. The eﬀectiveness of this pattern has been discussed and demonstrated through several performance tests.

References 1. Anderson, E.; Z. Bai, C. Bischof, J. W. Demmel, J. J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen: LAPACK User’s Guide. SIAM Philadelphia (1992)

Design Patterns for High-Performance Matrix Computations

519

2. The MathWorks: MATLAB 7.0 User Guides, www.mathworks.com (2004) 3. Jobling, C.; P. Grant, H. Barker, P. Townsend: Object-Oriented Programming in Control System Design. Automatica, Vol. 30, Pergamon Press (1994) 4. Son, H.M.: Objektorientierte Techniken f¨ ur Echtzeit-Simulation komplexer Automatisierungssysteme, Preprint, 31. Regelungstechnik–Kolloquium, Boppard, Germany (1997) 5. Son, H.M.: On the Mathematical Programming Methodology in Development of High-quality Control Computation Software, (in vietnamese). 2nd National Conference on Application of Mathematics, Hanoi (2005). 6. Stroustrup, B.: The Design and Evolution of C++. Addison Wesley (1994) 7. Stroustrup, B: Learning Standard C++ as a New Language. C/C++ Users Journal, 43–54, Mai (1999) 8. Dongarra, J.J.; R. Pozo, D.W. Walker: LAPACK++: A Design Overview of Object-Oriented Extensions for High Performance Linear Algebra. www. netlib.org/lapack++/ (1993) 9. Pozo, R.: Template Numerical Toolkit (TNT), http://math.nist.gov/tnt/ (2004) 10. Gamma E.; R. Helm, R. Johnson, J. Vlissides: Design Patterns – Elements of Reusable Object-Oriented Software. Addison-Wesley (1995) 11. Stroustrup, B: The C++ Programming Language. Third Edition, AddisonWesley (1997) 12. Son, H.M; P. Rieger: Komponentenbasierte Automatisierungssoftware – Objektorientiert und anwendungsnah. Hanser-Verlag, M¨ unchen (1999)

Smoothing and Filling Holes with Dirichlet Boundary Conditions Linda Stals and Stephen Roberts Department of Mathematics, Australian National University, Canberra, ACT, 0200, Australia [email protected]

Abstract A commonly used method for the ﬁtting of smooth functions to noisy data sets is the thin-plate spline method. Traditional thin-plate splines use radial basis functions and consequently requires the solution of a dense linear system of equations that grows with the number of data points. We present a method based instead on low order polynomial basis functions with local support deﬁned on ﬁnite element grids. An advantage of such an approach is that the resulting system of equations is sparse and its size depends on the number of nodes in the ﬁnite element grid. A potential problem with local basis functions is an inability to ﬁll holes in the data set; by their nature local basis functions are not deﬁned on the whole domain like radial basis functions. Our particular formulation automatically ﬁlls any holes in the data. In this paper we present the discrete thin-plate spline method and explore how Dirichlet boundary conditions aﬀect the way holes are ﬁlled in the data set. Theory is developed for general d-dimensional data sets and model problems are presented in 2D and 3D.

1 Introduction The thin-plate spline method is favoured as a data ﬁtting technique because it is insensitive to noise in the data. The thin-plate spline for a general domain Ω, as formulated by Wahba [13] and Duchon [4], is the function f that minimises the functional 1 (f (x(i) ) − y (i) )2 n i=1 2 +α (Dν f (x))2 dx, ν Ω n

Jα (f ) =

|ν|=2

(1)

522

L. Stals and S. Roberts

d where ν = (ν1 , ..., νd ) is a d dimensional multi-index, |ν| = s=1 νs , x is a predictor variable in Rd , and x(i) and y (i) are the corresponding i-th predictor and response data value (1 ≤ i ≤ n). The parameter α controls the trade-oﬀ between smoothness and ﬁt. Techniques for choosing α automatically, using generalised cross validation, can be found in [5, 10, 13]. Radial basis functions are often used to represent f as they give an analytical solution of the minimiser of the functional in Equation (1). However the resulting system of equations is dense, and furthermore its size is directly proportional to the number of data points. Improvements to the radial basis function approach have been proposed in number of later works, [2, 3, 6, 9] for example, but the techniques still lead to complex data structures and algorithms that usually require O(n) amount of memory. We propose a discrete thin-plate spline method that uses polynomial basis functions with local support deﬁned on a ﬁnite element mesh. In particular, the method described in Section 2 uses standard linear ﬁnite element basis functions. The advantage of using functions with local support is that the size of the resulting system of sparse equations depends only on the number of grid points in the ﬁnite element mesh. Finite element discretisation of equations like Equation (1) have been proposed by other authors such as [1, 7, 12], but these approaches use smooth spaces and therefore lead to more dense stiﬀness matrices. For problems which do not need the smoothness it is more eﬃcient to use the standard linear basis functions described in this paper. The mathematical foundation of the low-order discrete thin-plate spline method with Neumann boundary conditions has been presented in [8]. The main focus of this paper is the case of Dirichlet boundary conditions, see Section 3. The choice of Dirichlet boundary conditions will inﬂuence the way holes are ﬁlled in the data sets, this idea is explored experimentally in Section 4 and some 3D examples are discussed in Section 5.

2 Discrete Thin Plate Splines The smoothing problem from Equation (1) can be approximated with ﬁnite elements so that the discrete smoother f is a linear combination of piecewise multi-linear basis functions (hat functions) bi (x) ∈ H01 , f (x) =

m

ci bi (x) = b(x)T c.

i=1

The idea is to minimise Jα over all f of this form. The smoothing term (the second term in Equation (1)) is not deﬁned for piecewise multi-linear functions, but the non-conforming ﬁnite element principle can be used to introduce

Smoothing and Filling Holes with Dirichlet Boundary Conditions

523

piecewise multi-linear functions u = (b(x)T g1 , ..., bT (x)gd ) to represent the gradient of f . The functions f and u satisfy the relationship ∇f (x) · ∇bj (x) dx = u(x) · ∇bj (x) dx, (2) Ω

Ω

for all of the basis functions bj . This relationship ensures that the gradient of f is equivalent to u in a weak sense. Substituting f = b(x)T c and u = (b(x)T g1 , ..., bT (x)gd ) into Constraint (2) gives the relationship d Gs gs , (3) Lc = s=1

where L is a discrete approximation to the negative Laplace operator and (G1 , ..., Gd ) is a discrete approximation to the transpose of the gradient operator. We now consider the minimiser of the functional n d 1 (b(x(i) )T c − y (i) )2 + α ∇(bT gs ) · ∇(bT gs ) dx Jα (c, g1 , ..., gd ) = n i=1 Ω s=1 =

n d 1 (b(x(i) )T c − y (i) )2 + α gsT Lgs . n i=1 s=1

(4)

Our smoothing problem consists of minimising this functional over all vectors c, g1 , ..., gd , deﬁned on the domain Ωh , subject to the Constraint (3). By using Lagrange multipliers, the minimisation problem may be rewritten as the solution of a linear system of equations. For example, if d = 2 the corresponding linear system is ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ A 0 0 L d h1 c ⎢ 0 αL 0 −GT1 ⎥ ⎢g1 ⎥ ⎢ 0 ⎥ ⎢h2 ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎥ ⎥ ⎢ (5) ⎣ 0 0 αL −GT2 ⎦ ⎣g2 ⎦ = ⎣ 0 ⎦ − ⎣h3 ⎦ , w 0 h4 L −G1 −G2 0 where w is a Lagrange multiplier associated with Constraint (3). The vectors h1 , · · · , h4 store the Dirichlet boundary information. Note that the size of the corresponding linear system depends on the discretisation size m instead of the number of the data points n. The time required to assemble the matrix does depend on n but it depends only linearly on n and the observation data only have to be read from secondary storage once if using a uniform ﬁnite element grid. A more detailed description of the system of equations and the method used to solve them is presented in [10, 11].

3 Dirichlet Boundary Conditions The theory behind the discrete minimisation problem (4) in the case of Neumann boundary conditions was presented in [8]. In this paper we address

524

L. Stals and S. Roberts

the case of Dirichlet boundary conditions. When using Dirichlet boundaries the null space of the Laplacian operator L is empty, which simpliﬁes the solution of the linear system of equations. With the Dirichlet boundary conditions we assume that b(x) ∈ H01 (Ω) and let λ(x) = 2b(x)T v for some v. Constraint (2) may be rewritten as ∇f (x) · ∇λ(x) dx = u(x) · ∇λ(x) dx, (6) Ω

Ω

for all functions λ. Deﬁne Lα (c, g1 , · · · , gd , v) = Jα (c, g1 , · · · , gd ) (7) ∇b(x)T c · ∇b(x)T v dx +2 Ω b(x)T g1 , · · · , b(x)T gd · ∇b(x)T v dx, −2 Ω

which is the discrete form of the minimisation problem with Constraint (2). According to the Karush-Kuhn-Tucker (KKT) condition, if (c∗ , g1∗ , · · · , gd∗ ) is a stationary point and there exists a v∗ such that ∇Lα (c∗ , g1∗ , · · · , gd∗ , v∗ )=0 then (c∗ , g1∗ , · · · , gd∗ ) is a global minimum. Furthermore, calculus of variations , then says that if v = v∗ + v v 0 = ∂v Lα (c∗ , g1∗ , · · · , gd∗ , v)|(v =0)

. for all v

s . Similar equations can be written for c = c∗ + c c and gs = gs∗ + gs g Evaluating these partial derivatives for v, c and gs gives the following system of weak ﬁnite element equations; dx (8) 0= ∇b(x)T c · ∇b(x)T v Ω dx, for all v , − b(x)T g1 , · · · , b(x)T gd · ∇b(x)T v Ω

0=

n "! " 1 ! c b(x(i) )T c∗ − y (i) b(x(i) )T n i=1 ∇b(x)T + c . ∇b(x)T v Ω

n 1 b(x)T = c b(x)T c∗ − y(x) δ(x − x(i) ) n Ω i=1 ∇b(x)T c, + c . ∇b(x)T v , for all Ω

and

(9)

Smoothing and Filling Holes with Dirichlet Boundary Conditions

0=α

s − ∇b(x)T g2∗ . ∇b(x)T g

Ω

∂2 b(x)T v∗

s b(x)T g

525

s . for all g

Ω

(10) We have used δ as the delta function. , b(x)T s can be viewed as a set of test c, and b(x)T g Note that b(x)T v 1 functions in H0 (Ω). By considering the continuous versions of equations (8), (9) and (10) we can construct a set of equivalent strong equations. Given Equation (8) an equivalent strong equation is; ∆f:(x) = ∇.: u(x)

in Ω,

(11)

with boundary conditions f:(x) = hf (x)

and

: (x) = hu (x) u

on Γ,

: = (: :d ). The precise form of the where Γ is the boundary of Ω and u g1 , · · · , g boundary functions hf and hu is discussed below. The strong equation corresponding to Equation (9) is : ∆λ(x) =

n " 1 !: f (x) − y(x) δ(x − x(i) ) n i=1

in Ω,

(12)

with Dirichlet boundary : λ(x) = hλ (x)

on Γ.

The boundary function hλ is also discussed below. The strong form of the gradient equation, Equation (10), is : −α∆: us (x) = ∂s λ(x)

in Ω,

(13)

with boundary conditions u :s (x) = (hu (x))s

on Γ.

Assuming that f: and u : are smooth enough we can formally show that the biharmonic of f: is zero everywhere except the data points; ∆∆f:(x) = ∆∇.: u(x) = ∇.∆: u(x), −1 : : ∇.(∂1 λ(x), = · · · , ∂d λ(x)) α n " −1 1 ! : = f (x) − y(x) δ(x − x(i) ). α n i=1

(14)

The Dirichlet boundary conditions hf , hu and hλ used in equations (11), (12) and (13) are not speciﬁed by the weak formulation given in (8), (9)

526

L. Stals and S. Roberts

and (10); any choice of boundary conditions will give a smoother. Diﬀerent boundary conditions will give a diﬀerent form of the smoother f: (or f in the discrete case), but irrespective of the boundary condition f: will always be a minimiser of Functional (1) (respectively (4)). The intuitive interpretation of this is given through clamped splines, changing the value at the endpoint of the spline will change the shape of the spline but it still passes through the nodal points. Experimental examples given in Section 4 show how the boundary conditions aﬀect the shape of the smoother in regions of the domain where there is missing data. To explore how the choice of Dirichlet boundaries aﬀect the smoother we setup a test problem containing four data points x(1) = (0.25, 0.25), x(2) = (0.75, 0.25), x(3) = (0.25, 0.75) and x(4) = (0.75, 0.75) and assigned them the values y (1) = 1, y (2) = 0, y (3) = 0 and y (4) = 1 respectively. The parameter α was set to 10−3 . We then used the following set of boundary conditions to ﬁt two diﬀerent smoothers. Boundary Condition Test Problem 1: The boundary conditions for f: were obtained by ﬁtting a thin plate spline to the four data points. Let r(i) (x) = 4x − x(i) 2 . ! "2 ! " ! "2 ! " r(1) (x) ln r(1) (x) − r(2) (x) ln r(2) (x) hf (x) = 0.180336880 "2 ! " ! "2 ! " ! (3) (3) (4) (4) − r (x) ln r (x) + r (x) ln r (x) + 0.5, hu = ∇hf (x),

hλ (x) = −α∆hf .

Boundary Condition Test Problem 2: All of the boundary conditions are zero. hf (x) = 0,

hu (x) = (0, 0),

hλ (x) = 0.

The plots in Figures 1 and 2 show the resulting ﬁnite element solution on a grid containing 4225 nodes.

4 Holes in the Data Set Smoothing techniques based on local basis functions may require a postprocessing step to ﬁll holes in the data set; our ﬁnite element formulation automatically ﬁlls in the holes. From Equation (14) we see that the discrete thin-plate spline essentially solves the biharmonic equation in regions where there are no data points.

Smoothing and Filling Holes with Dirichlet Boundary Conditions

527

Boundary Condition Test 1

1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3

0 0.1 0.2 0.3 0.4 0.5

Figure 1. Plot of smoother for Test Problem 1. The boundary conditions for f: were obtained by ﬁtting a thin plate spline to the four data points Boundary Condition Test 2

1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 -0.2

0 0.1 0.2 0.3 0.4 0.5

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3

Figure 2. Plot of smoother for Test Problem 2. All of the boundary conditions are zero

To test this idea we generated a set of equally spaced data points x(i) and assigned them the value y (i) = y(x(i) ) where y(x, y) = sin(4πx) sin(4πy). We then removed all of the data points where y (i) > 0.0 so a total of n = 179401 points remained. The shadow in Figures 3 and 4 show the data points. Figure 3 gives the discrete thin-plate spline ﬁt on a uniform ﬁnite element grid of size m = 4229 with α = 10−6 . The boundary conditions where hf (x) = y(x),

hu = ∇hf (x) and hλ (x) = −α∆hf .

528

L. Stals and S. Roberts Hole Example with Sin Boundary Condition

Figure 3. Example with missing data. The grid shows the discrete thin-plate spline ﬁt with boundary condition hf (x) = y(x)

Hole Example with Radial Boundary Condition

Figure 4. Example with missing data. The grid shows the discrete thin-plate spline ﬁt with the boundary conditions given in Boundary Condition Test Problem 1

To calculate the ﬁt given in Figure 4 we changed the boundary conditions to those given for Boundary Condition Test Problem 1 in Section 3. In future research we plan to look more closely at how the boundary conditions may be used to include any a-prior information into the smoother.

5 Example Applications To conclude the discussion we present two examples of data ﬁtting in 3D.

Smoothing and Filling Holes with Dirichlet Boundary Conditions

529

Figure 5. Isosurface representation of the original data set for the sphere example (left) and the ﬁnite element approximation on a grid with 2465 nodes (right)

The ﬁrst example was constructed by randomly generating a million points on a sphere with centre (0.5, 0.5, 0.5) and radius = 1/3. The data points were given a value of 1. The left plot in Figure 5 gives a representation of what the original data set looks like. The right plot in Figure 5 show an isosurface plot of the discrete thin plate spline with 2465 grid. The parameter α was set to 10−3 .

Figure 6. Isosurface representation of the original data set for the semi-sphere example (left) and the ﬁnite element approximation on a grid with 2465 nodes (right)

In the second example, the points on the sphere where the x-coordinate was less than 0.5 were removed to form a semi-sphere. See the left plot in Figure 6. The data set contained approximately 1/2 × 106 points. The parameter α was set to 10−3 . The right plot in Figure 6 shows an isosurface plot of the thin plate spline with 2465 nodes.

6 Future Research Other options that we wish to explore include the use of adaptive grid reﬁnement, diﬀerent solution techniques, the parallel implementation and the use of diﬀerent boundary conditions to include a-prior information into the system.

References 1. R. Arcang´eli. Some applications of discrete Dm splines. In Mathematical methods in computer aided geometric design (Oslo, 1988), pages 35–44. Academic Press, Boston, MA, 1989.

530

L. Stals and S. Roberts

2. R. Beatson and L. Greengard. A short course on fast multipole methods. In Wavelets, multilevel methods and elliptic PDEs (Leicester, 1996), Numer. Math. Sci. Comput., pages 1–37. Oxford Univ. Press, New York, 1997. 3. R. K. Beatson, G. Goodsell, and M. J. D. Powell. On multigrid techniques for thin plate spline interpolation in two dimensions. In The mathematics of numerical analysis (Park City, UT, 1995), volume 32 of Lectures in Appl. Math., pages 77–97. Amer. Math. Soc., Providence, RI, 1996. 4. J. Duchon. Splines minimizing rotation-invariant. In Lecture Notes in Math, volume 571, pages 85–100. Springer-Verlag, 1977. 5. M. F. Hutchinson. A stochastic estimator of the trace of the inﬂuence matrix for Laplacian smoothing splines. Comm. Statist. Simulation Comput., 19(2):433– 450, 1990. 6. B. Morse, T. S. Yoo, P. Rheingans, D. T. Chen, and K. R Subramanian. Interpolating implicit surfaces from scattered surface data using compactly supported radial basis functions. In Proceedings of International Conference on Shape Modeling and Applications ’01, pages 89–98. IEEE Computer Society Press, May 2001. 7. T. Ramsay. Spline smoothing over diﬃcult regions. J. R. Stat. Soc. Ser. B Stat. Methodol., 64(2):307–319, 2002. 8. S. Roberts, M. Hegland, and I. Altas. Approximation of a thin plate spline smoother using continuous piecewise polynomial functions. SIAM J. Numer. Anal., 41(1):208–234, 2003. 9. R. Sibson and G. Stone. Computation of thin-plate splines. SIAM J. Sci. Statist. Comput., 12(6):1304–1313, 1991. 10. L. Stals and S. Roberts. Low order smoothers on large data sets with holes. In preparation. 11. L. Stals and S. Roberts. Verifying convergence rates of discrete thin-plate splines in 3D. In Rob May and A. J. Roberts, editors, Proc. of 12th Computational Techniques and Applications Conference CTAC-2002, volume 46, pages C515–C529, June 2005. [Online] http://anziamj.austms.org.au/V46/ CTAC2004/home.html. 12. J. J. Torrens. Discrete smoothing Dm -splines: applications to surface ﬁtting. In Mathematical methods for curves and surfaces, II (Lillehammer, 1997), Innov. Appl. Math., pages 477–484. Vanderbilt Univ. Press, Nashville, TN, 1998. 13. G. Wahba. Spline models for observational data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990.

Constraint Hierarchy and Stochastic Local Search for Solving Frequency Assignment Problem T.V. Su and D.T. Anh HoChiMinh City University of Technology, 268 Ly Thuong Kiet, Dist. 10, HoChiMinh City, Vietnam [email protected] Abstract One approach for modeling over-constrained problems is using constraint hierarchies. In the constraint hierarchy framework, constraints are grouped into levels, and appropriate solutions are selected using a comparator which takes into account the constraints and their levels. Finite domain constraint hierarchy is one of the NP-hard optimization problems that have been studied for a long time due to its relevance to various application areas. Nevertheless, stochastic local search methods for solving ﬁnite domain constraint hierarchies remain largely unexplored. In this paper, we develop a variant of WSAT algorithm that can solve constraint hierarchies over ﬁnite domains. We experiment this generic algorithm in a real world application: solving Frequency Assignment Problem in the area of wireless communication. Experiments on benchmark Philadelphia instances of realistic sizes show that the proposed algorithm is an eﬃcient heuristic to ﬁnd good approximate solutions.

1 Introduction The goal in solving constraint satisfaction problems (CSPs) is to ﬁnd a solution which satisﬁes all given constraints. Approaches to CSPs such as constraint programming have proven successful for a range of problems. While powerful, the CSP schema presents some limitations. In particular, all constraints in a CSP are considered mandatory. However, in many real world problems constraints often appear that could be violated in solutions without causing such solutions to be unacceptable. If these constraints are treated as mandatory, this often causes problems to be unsolvable. If these constraints are ignored, solutions of bad quality are found. This is the motivation to extend the CSP schema to include over-constrained problems. An over-constrained CSP (OCSP for short) is a CSP for that any complete assignment violates some constraints. A solution of the OCSP is the complete assignment that best respects the set of constraints. There are two general approaches to dealing with over-constrained problems. Constraint hierarchy framework [2] models the over-constrainedness by

532

T.V. Su and D.T. Anh

resolving the conﬂict using preferences on the importance of some constraints and particular solutions. The other approach called PCSP (partial constraint satisfaction problem) [4] is to relax the problem deﬁnition so that it can be consistent. There exist several exact algorithms for solving constraint hierarchies over ﬁnite domains and over continuous domains. These algorithms basically performs systematic search, [3,9]. However, one major limitation of these systematic search algorithms is that large scales of real world constraint hierarchies make them impossible to reach globally optimal solutions. The work here aims at devising a stochastic local search algorithm that can solve ﬁnite domain constraint hierarchies. This local search algorithm which is a variant of WSAT can solve ﬁnite domain (FD) constraint hierarchy in two steps: ﬁnding one initial solution that satisﬁes all hard constraints and tuning that solution in order to get a better solution that takes soft constraints into account. This generic solver uses a mechanism of gradually increasing the number of satisﬁed levels in the hierarchy through a method of wisely selecting an unsatisﬁed constraint that should be improved. To experiment the proposed generic solver, we select a real world benchmark problem, the frequency assignment problem (FAP) in the area of wireless communication. Experiments on benchmark Philadelphia instances of realistic sizes show that the proposed algorithm is an eﬃcient heuristic to ﬁnd good near-optimal solutions. The paper is organized as follows. Section 2 presents the framework for constraint hierarchies over ﬁnite domains and the local search algorithm for solving these over-constrained systems. Section 3 describes the frequency assignment problem which is used as experimental benchmark for investigating the hierarchical local search. In section 4, we report experimental results of solving FAP. Finally, conclusions and future work are given in section 5.

2 Local Search for FD Contraint Hierarchies 2.1 FD Constraint Hierarchies In this section, we describe brieﬂy the deﬁnition of constraint hierarchies following [2]. Let X be a set of variables. Each variable x ∈ X takes values from a domain Dx which denotes the ﬁnite set of values. A k-ary constraint c over variables x1 , ..., xk is a relation over Dx1 x ... x Dxk . The constraints are organized in a vector H of the form < H0 , H1 , ..., Hn >, where for each i, 0 < i < n, Hi is a multiset of constraints at level i. Each constraint level represents the importance of the constraints in the set, Hi . Constraints in level 0, H0 , are required constraints (or hard constraints). The constraints in H1 , H2 , ..., Hn denote non-required constraints (or soft constraints). These non-required constraints range from the strongest level H1 to the weakest level Hn .

Constraint Hierarchy and Stochastic Local Search

533

A solution θ to a constraint hierarchy H is a valuation for the variables in H, i.e., a function which maps the variables in H to elements in their corresponding domains. The set of solutions S0 = {θ|∀c ∈ H0 cθ holds} are those valuations that satisfy the required constraints. (In the deﬁnition, cθ denotes the Boolean result of applying the valuation θ to c, and we say that “cθ holds” if cθ = true.) In order to take into account the other constraint in the hierarchy, a partial ordering better which is called a comparator is used. The comparator better(θ, σ, H) is true if valuation θ is preferred to σ in the context of the constraint hierarchy H. The solution set of the constraint hierarchy H is deﬁned as those solutions that are optimal with respect to better as follows: S = {θ|θ ∈ S0 ∧ ∀σ ∈ S0 ¬better(θ, σ, H)}

(1)

There are many suitable candidates for comparators. In this paper, we focus on the weighted-sum-better comparator which is relevant to the frequency assignment problem described in Section 3. The result of a valuation cθ can be described in terms of an error function e(c, θ) which returns a non-negative real number indicating the degree of violation of constraint c under the valuation θ. The error function e has the property that e(c, θ) = 0 iﬀ cθ holds, and e(c, θ) > 0 otherwise. A trivial error function returns 0 when θ satisﬁes c and 1 if not. The weighted-sum-better comparator can be deﬁned as follows: weighted-sum-better(θ, σ) ≡ ∃k 1 ≤ k ≤ n such that ∀i ∈ {1, ..., k − 1} weighed-sum(θ, Hi ) = weighed-sum(σ, Hi )∧ weighed-sum(θ, Hk ) < weighed-sum(σ, Hk ) (2) The function weighted-sum requires for each constraint c the deﬁnition of a weight w(c), a positive real number. Using weights for constraints, the function weighted-sum combines the error values of constraints in a given level into a single number as follows: w(c)e(c, θ) (3) weighed-sum(θ, Hi ) = c∈Hi

2.2 Local Search for Solving FD Constraint Hierarchies The generic algorithm, CH-Solver, for solving constraint hierarchies is based on the WSAT algorithm, proposed by Selman et al., 1994 [10]. WSAT is an extension of GSAT, specially developed to solve satisﬁability problems. WSAT employs a stochastic local search technique which escapes or avoids local minima by adding a random element to the move selection heuristic that allows cost increasing move. WSAT diﬀers from GSAT in restricting the move neighborhood by randomly selection a constraint in violation and then only

534

T.V. Su and D.T. Anh

considering the domain values of those variables in the constraint that cause the constraint to be satisﬁed (or improved). Then with probability p, a variable is selected at random from the constraint and its value is changed, otherwise the best move is selected from the domains of the constraint variables. We extend WSAT to solve constraint hierarchies by extending soft constraints to hierarchies. The WSAT based CH-solver is described in Fig. 1. Let E s m denote the application of the local move m on solution s. The CHsolver is parameterized by M ax moves and various procedures. CH-Sover always works with a full assignment for all variables. It begins with an initial solution that satisﬁes all the required constraints (level 0). The loop starts a local search by selecting a constraint c with procedure select-unsatisﬁedconstraint. Local search continues until M ax moves local repairs have been made or some solution stopping criteria is met. To escape from being stuck in local minima, a random walk probability p is used. CH-Solver returns the best solution found and this is determined by using procedure improve(s, sbest , H). CH-Solver leaves unspeciﬁed three procedures improve, select-unsatisﬁedconstraint and get-valid-moves. The procedure select-unsatisﬁed-constraint(H, X, s) returns an unsatisﬁed constraint from the constraint hierarchy H with the set of variables X and the current solution s. The procedure get-validmoves(c, s) returns the set of valid moves from the current solution s by considering the domains of the variables involving to constraint c. The procedure improve(s, s , H) returns the value true if s is better than s and the value false otherwise. Constraint Selection Scheme In the WSAT based CH-solver given in Fig. 1, for selecting an unsatisﬁed constraint to be improved we have to select a violated constraint from diﬀerent hierarchy levels. This constraint selection scheme consists of two steps: (1) selecting the top-most unsatisﬁed constraint level and (2) selecting an unsatisﬁed constraint between the top- most unsatisﬁed constraint level and the rest of the levels. In step 2, if at the selected level, more than 50% constraints are satisﬁed, the more likely the unsatisﬁed constraints at the rest of the levels will be considered. In this constraint selection scheme we keep the balance between choosing an unsatisﬁed constraint from the top most unsatisﬁed constraint level and choosing it from the rest of levels. The detail of the scheme is given in Fig. 2. Local Repair A local move is described as a pair < x, v > where a new value v is assigned to variable x. A local move < x, v > is valid if it meets the requirement: “A new value for x will not increase the number of violations at the more important levels than the selected level l and not make some currently satisﬁed constraints at the selected level violated”. From the set of valid moves, we

Constraint Hierarchy and Stochastic Local Search

535

Figure 1. The generic algorithm for solving constraint hierarchies over ﬁnite domains

Figure 2. The procedure select-unsatisﬁed-constraint

can determine the set of best moves by taking the better comparator of the constraint hierarchy into account. Notice that additionally, the CH-solver can be equipped with a tabu memory of size t: No local move can be made if it has been performed in the previous t moves.

536

T.V. Su and D.T. Anh

Termination Criteria The termination conditions of the CH-Solver are based on (1) the satisﬁability of all constraints in all levels or (2) after the predeﬁned number of iterations without ﬁnding a valid local move or (3) after the maximum predeﬁned number of iterations M ax moves or (4) the stop request from the user. Related Work There has been another work by Henz et al. [6] that also uses a WSAT based local search solver for solving constraint hierarchies over ﬁnite domains. Our CH-solver has some similarity with their solver. However, the main diﬀerence between our CH-solver and their algorithm is that the local repair strategy in our CH-solver is more eﬀective. Each local repair step in our CH-solver is a move which makes the whole constraint hierarchy less violated, rather than just one selected unsatisﬁed constraint as in their local search solver. This sophisticated repair strategy improves remarkably the performance of our local search algorithm in solving FD constraint hierarchies. Besides, Henz et al. uses a restart mechanism (an outer loop with parameter Max-tries) to escape local minima while we use walk probability p to escape local minima.

3 Frequency Assignment Problem To experiment the proposed generic CH-solver, we select a real world benchmark problem, the frequency assignment problem (FAP). The frequency assignment problem arises in the area of wireless communications. One can ﬁnd many diﬀerent models of the FAP problem (due to many diﬀerent applications), but they all have two common properties: • •

frequencies must be assigned to a set of wireless connections so that communication is possible for each connection. interference between two frequencies (and what follows, quality loss of signal) might occur in some circumstances which depend on: – how close the frequencies are on the electromagnetic band – how close connections are to each other geographically.

There are also many objectives, which deﬁne the quality of an assignment the goal is to obtain the highest possible quality. Survey [1] gives an extensive overview on diﬀerent models, problem classiﬁcations, applied methods and results. The cells of a network are modeled as hexagons; each cell requires a number of connections and must be assigned some number of frequencies. The objective is to ﬁnd frequency assignments which result in no interference and

Constraint Hierarchy and Stochastic Local Search

537

minimize the span of frequencies used, i.e., the diﬀerence between the maximum and minimum frequency used. Since the number of connection demands is large and the number of available frequencies is small, a mechanism of frequency reuse is employed. This mechanism is represented by a matrix M which is also called the reuse matrix. • • •

M [i][j](i = j) represents the minimum frequency separation between two connections in two adjacent-cells i and j enough to prevent frequency interference. M [i][i] represents the minimum frequency separation between two connections in the same cell i. M [i][j] = 0 means there is no constraint between two cells i and j which are independent of each other.

Given l, the number of available frequencies, C, the set of cells,N , the number of required connections in all the cells according the set of demands D, and M , the reuse matrix, we have to ﬁnd a frequency assignment which results in no interference and minimize the span of frequencies used. We decided to use the so called Philadelphia instances, which are one of the most widely studied sets of sample problems so far in the FAP literature, see [1]. 3.1 Philadelphia Instances Philadelphia instances consist of 9 FAP instances that were used for a cellular phone network around Philadelphia (Fig. 3). The cells in each instance are characterized by 21 hexagons; each cell requires some number of frequencies. Tuples (cell number, number of frequencies required) form a demand vector.

Figure 3. Network structure of the Philadelphia problems (left); demand D1 (right)

Considered demand vector include: D1 = (8, 25, 8, 8, 8, 15, 18, 52, 77, 28, 13, 15, 31, 15, 36, 57, 28, 8, 10, 13, 8)

538

T.V. Su and D.T. Anh

D2 = (5, 5, 5, 8, 12, 25, 30, 40, 40, 45, 20, 30, 25, 15, 15, 30, 20, 20, 25) D3 = (20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20) D4 = (16, 50, 16, 16, 16, 30, 36, 104, 154, 56, 26, 30, 62, 30, 72, 114, 56, 16, 20, 26, 16) D5 = (32, 100, 32, 32, 32, 60, 72, 208, 308, 112, 52, 60, 124, 60, 144, 228, 112, 32, 40, 52, 32) The distance between the centers of two adjacent cells is 1. Frequencies are denoted as positive integer numbers. Interference of the cells is characterized by a reuse distance vector R = (d0 , d1 , d2 , d3 , d4 , d5 )

(4)

where dk denotes the smallest distance between centers of two cells, which can use frequencies that √ by at least k without interference. For example, for √ diﬀer the vector R1 = ( 12, 3, 1, 1, 1, 0), if A is a cell at the center then all the frequencies assigned to the connections in A (distance d = 0) must separate from each other at least 5 (d5 = 0), frequencies assigned to any cell adjacent to A should separate from those in A at least 2 (d2 = √ 1)), frequencies assigned to a second and third ‘ring’ √ of cells ( distance d > 3 ) should separate from those in A at least 1 (d1 = 3 ) and frequencies assigned√to a farther ’ring’ of cells (d ≥ 4) can be the same as those in A (d0 = √ 12). √ Fig. 4 shows 12, 3, 1, 1, 1, 0), graphical representation of reuse distance vectors R1 = ( √ √ √ √ R2 = ( 7, 3, 1, 1, 1, 0) and R3 = ( 12, 2, 1, 1, 1, 0).

Figure 4. Reuse distances R1 (left); R2 (center); R3 (right)

Table 1 deﬁnes the Philadelphia instances explored in the literature (named P1-P9) and provides the value of the optimal solution (minimal span). The optimal solutions of the Philadelphia instances have been reported in some papers, [1, 7].

Constraint Hierarchy and Stochastic Local Search

539

Table 1. Philadelphia instances Instance

Demand vector

Total demand

Reuse distance

Minimal span

P1 P2 P3 P4 P5 P6 P7 P8 P9

D1 D1 D2 D2 D3 D3 D4 D1 D5

481 481 470 470 420 420 962 481 1924

R1 R2 R1 R2 R1 R2 R1 R3 R1

426 426 257 252 239 179 855 524 1713

3.2 Notations and Model C the set of cells C = {Ci |Ci is a cell in the network, i ∈ {1, 2, .., 21} D the set of connection demands D = {Di |Di is the minimum number of frequencies required by cell Ci } N the total number of frequencies required by the network (also called the traﬃc) N = D1 + D2 + .. + D21 R the reuse vector M the reuse matrix fik the frequency assigned to connection k in cell Ci fmin = 1 the smallest frequency used fmax : the largest frequency used. The FAP consists of the three following constraint classes: C1: ∀i ∈ {1, ...|C|}, ∀k ∈ {1, ...Di }, ∃x ∈ {1, ...fmax }, fik = x C2: ∀i ∈ {1, ...|C|}, ∀j ∈ {1, ...|C|}, ∀k ∈ {1, ...Di }, ∀t ∈ {1, ...Dj } if M [i][j] = 0 then |fik − fjt | ≥ M [i][j] C3: min fmax C1: Some frequency must be assigned to a connection in the network C2: Interference over the network must be avoided. C3: The span of frequencies used should be minimized The constraints of the class C2 are hard constraints. The constraints of the class C3 are soft constraints. Problem Representation We encode Philadelphia instances into a FD constraint hierarchy in the following way:

540

T.V. Su and D.T. Anh

variables: each cell is represented as a single variable, e.g., for demand vector D1 there are 21 variables. Each possible value of a variable is a set of frequencies assigned to the cell represented by that variable. domains: each variable has the same integer domain 2L where L = {1, ..., l} (l is the upper bound of the frequencies for the particular instance). When checking if a candidate frequency f is valid to assign to a connection in a cell i or not, we can make use of the reuse matrix M to speed up the check. We check the separation between f and the smallest used frequency which is higher than f (fu ) and that between f and the largest used frequency which is lower than f (fd ) in each cell. That means f must satisfy the requirement: ∀j = 1, ...|C| fu − f ≥ M [i][j] and f − fu ≥ M [i][j] and M [i][j] = 0 (5) Constraints of level 0 Constraints of class C2, interference constraints are required constraints (level 0). Between each pair of cells, there is a constraint of class C2. Let Fi , Fj be the sets of frequencies assigned to two cells i and j respectively. A required constraint involving two cells i and j is satisﬁed if ∀f1 ∈ Fi , ∀f2 ∈ Fj |f1 − f2 | ≥ M [i][j]

(6)

However, since whenever a frequency is assigned to a cell, it is checked for validity according the requirement given in (6), we can ensure that the constraints of class C2 are satisﬁed. Constraints of level 1 Constraints of level 1 are related to the requirement that some frequencies may be preassigned to some cells. These constraints are optional, i.e. there may be no constraints of this class in some problems. These constraints come from the situations in which the network administrator wishes to widen the network with more connections without modifying the current network conﬁguration. When some frequencies are preassigned to some cells, a minimal separation among frequencies within the same cell according to reuse matrix M must be respected. Constraints of level 2 Only one constraint of level 2 applying to all cells is that the span of frequencies used should be minimized.

Constraint Hierarchy and Stochastic Local Search

541

3.3 Initial Solution The method of generating initial solution is like a greedy algorithm. Generating initial solution consists of two steps: 1. Select the valid preassigned frequencies to assign to some cells (constraints of level 1). 2. Assign the remaining valid frequencies to the cells according to connection demands of the cells and satisfying the requirements given by the reuse matrix M . The fmax value of the initial solution generated by the above method is often not high. The selected value of l determines the complexity of ﬁnding the initial solution. If l is low, the run time of ﬁnding the initial solution will be high; otherwise if l is high, the overhead of minimizing fmax will be high. Through experiments, we set l approximately equal to 1.5*N where N is the total number of frequencies required by the network. 3.4 Local Moves Given a solution s, s is a neighbor of s if they are diﬀerent at the value of a single frequency at a cell. So a local move is a triple < i, f, f > where i, f and f being respectively a cell, a frequency of the cell and a new frequency value. Depending on the level of the constraints we wish to improve their satisﬁability, we apply diﬀerent kinds of frequency swaps. Frequency swap to improve constraints of level 1. We select randomly a frequency already assigned to some cell but not in the set of preassigned frequencies for that cell as the removed frequency. And then we select randomly a frequency in the set of preassigned frequencies for the cell; if there exist no such a frequency, we select randomly a valid frequency in the frequency domain as the new frequency to be added. Frequency swap to improve constraints of level 2. Since value of a variable is a set, a frequency swap move for a cell consists of three steps: 1. Selecting a cell using on the heuristic: “a cell with more connection demands will be more likely to be chosen”. 2. Selecting a frequency to be removed by picking randomly a frequency already assigned to the cell in consideration. 3. Selecting a new frequency to be added by using the heuristic: “the least reused frequency in the set of valid frequencies should be chosen to diversify the search space”.

542

T.V. Su and D.T. Anh

All the above mentioned heuristics used in frequency swap moves help to improve the performance of the algorithm. The heuristics in steps 2 and 3 aim to increase diversiﬁcation on the search space. Since we do not need to use tabu list mechanism in CH-solver for solving FAP, we set the tabu tenure t to 0. 3.5 Computing Cost Function When a frequency is changed in a cell Ci in the current solution s to obtain a neighbor s , we can easily know if s is better than s by computing the cost variation between s and s . In considering the constraints of level 1, the cost variation may take one of the following values: -1 if we can add one pre-assigned frequency to the cell. 0 if the number of pre-assigned frequencies in the cell is just the same. 1 if we decrease the number of pre-assigned frequencies in the cell by one. In considering the constraint of level 2 which aims to minimize the span of frequencies used, the cost function of a solution s is proportional to the value of fmax in the solution s. That means, if after a frequency swap, fmax decreases, then the cost of the neighbor solution also decreases a corresponding amount.

4 Experiments Our WSAT based local search CH-Solver and FAP application have been implemented in Visual C++. Experimental tests were run on a Celeron 900 MHz, 256 MB RAM PC. Before developing the CH-Solver, we have to build a C++ CH library which is a set of classes that can eﬃciently represent and solve FD constraint hierarchies. This class library approach makes it easy to extend the CH-solver when we use the solver to solve some complex real world application. Table 2 illustrates experimental results of applying the CH-solver to solve the Philadelphia instances P1, P2, .. , P9. Through experiments we found that with the number of iterations equal to about 200*N (where N is the total demand of the network) all the solutions can reach the quality over 94% w.r.t. optimal solutions. For the instances on which the solution quality is lower than 94%, we can improve this aspect by increasing the number of iterations (Max moves) equal to about 300*N . Fig.5 illustrates the change of solution quality along to the number of iterations for the Philadelphia instance P1. Here we initialize fmax to 580 and the number of iterations to 200*481. In Fig.5, we can see that with only 10000 iterations the quality of the solution has reached fmax = 464, i.e. 91% of the optimal solution (426).

Constraint Hierarchy and Stochastic Local Search

543

To improve the performance of the CH-Solver in solving FAP, we apply a heuristic of reducing dynamically the frequency domain at every iteration. That means from the initial value of fmax , we reduce the upper bound of the frequency domain from fmax to fmax - 1 at each iteration. Using this heuristic, we aim to improve the quality of the solution in terms of the soft constraint C3. The run time to ﬁnd a good quality solution for one Philadelphia instance is very fast. With the above mentioned PC conﬁguration, the run time is about 5 minutes for an instance with the total demand N less than 500 and about 17 minutes for instance with N equal to 1000 and about 1 hour for an instance with N equal to 2000. In comparison to some previous works of solving FAP using diﬀerent approaches, for examples, Iterated Local Search ( [6]) by D. Kapsa and J. Kisynski, Tabu Search ( [8]) by J. K. Kao et al. and Evolutionary Algorithm ( [4]) by R. Dorne et al., the proposed approach has performed well on the problems of larger sizes in shorter run times and with same or higher solution quality. Particularly, the algorithms employed by D. Kapsa and J. Kisynski ([6]) could not minimize frequency span. Table 2. Performance of the CH-solver on Philadelphia Instances Instance Demand vector P1 P’1 P2 P3 P4 P5 P6 P7 P8 P9

D1 D1 D1 D2 D2 D3 D3 D4 D1 D5

R

No. of cells

R1 R1 R2 R1 R2 R1 R2 R1 R3 R1

21 21 21 21 21 21 21 21 21 21

N

No. of moves

481 96200 481 96200 481 96200 470 94000 470 94000 420 84000 420 84000 962 192400 481 96200 1924 384800

Runtime Span (secs) 267 251 281 216 145 225 120 1009 435 4131

450 454 467 269 268 242 201 901 550 1802

Opt. Span

%

426 426 426 257 252 239 179 855 524 1713

94% 94% 90% 95% 94% 99% 87% 95% 95% 95%

Number of moves: 200 * N P’1 : Instance P1 with preassigned frequencies

5 Conclusions and Future Work In this paper we present a WSAT based local search solver which can solve constraint hierarchy over ﬁnite domains. We believe that many practical problems can be modeled in such a framework and it is natural for users to state the importance of various constraints in terms of a hierarchy. In our WSAT based local search solver, we use a constraint selection scheme that keeps the balance between choosing an unsatisﬁed constraint

544

T.V. Su and D.T. Anh

Span

The P1 Instance - Opt.Span 426 566 546 526 506 486 466 446 426

Min(96%) Max(93%) Avg(94%)

0

1

2

3

4

5

6

7

8

9

x 10000 moves

Figure 5. The change of solution quality in solving P1, maximum iterations = 10.000

from the top most unsatisﬁed constraint level and choosing it from the rest of levels. Besides, our generic solver employs an eﬀective local repair which is a move that can make the whole hierarchy less violated. To evaluate the CH-Solver, we have applied it to solve a diﬃcult benchmark problem, the frequency assignment problem (FAP) which is expressed as a ﬁnite domain constraint hierarchy. Experiments on Philadelphia instances of realistic sizes show that the proposed solver is an eﬃcient heuristic to ﬁnd good near-optimal solutions. Our main conclusion from solving the FAP is that solving eﬃciently a FD constraint hierarchy such as the FAP requires not only a good generic solver but also several additional domain-speciﬁc heuristics. In a future work to evaluate the applicability and eﬀectiveness of our CHsolver, we plan to make tests on some other real world applications such as set covering problem, nurse scheduling and sports timetabling.

References [1]

[2] [3]

[4]

K. I. Aardal, S. P. M. van Hoesel, A. M. C. S. Koster, C. Manniro and A. Sassano, Models and Solution Techniques for Frequency Assignment Problems, Technical Report ZIB-Report 01-40, Konrad-Zuse-Zentrum fur Informationstechnik, Berlin, Germany, December 2001 A. Borning, B. Freeman-Benson, and M. Wilson, Constraint Hierarchies, Lisp and Sympolic Computation, 5:223-270, 1992 A. Borning, R. Anderson, B. Freeman-Benson, Indigo: A Local Propagation Algorithm for Inequality Constraints, Proc. of ACM Symposium on User Interface Software Technology, Seattle, Washington, USA, November 6-8, 1996, pp. 129-136 R. Dorne and J. Hao, An Evolutionary Approach for Frequency Assignment in Cellular Radio Networks, Technical Report, LGI2P, EMA-EERIE, Park Scientiﬁc Georges Besse, France, 1995

Constraint Hierarchy and Stochastic Local Search [5] [6]

[7]

[8] [9]

[10]

545

E. Freuder, R. Wallace, Partial Constraint Satisfaction, Artiﬁcial Intelligence, Vol 58, 1992, pp. 21-71 M. Henz, L. Y. Fong, L. S. Chong, S. X. Ping, J. P. Walser, R. H. Yap, Solving Hierarchical Constraints over Finite Domains, Annals of Mathematics and Artiﬁcial Intelligence, 2000 D. Kapsa, J. Kisynski, Solving the Weighted Maximum Constraint Satisfaction Problem using Dynamic and Iterated Local Search, 2003. Available: http://www.cs.ubc.ca/labs/beta/Div/CPSC532D-WS-03/ J. K. Hao, R. Dorne and P. Galinier, Tabu Search for Frequency Assignment in Mobile Radio Networks, Journal of Heuristics, no. 4, 1998, pp. 47-62 F. Menezes, P. Barahona, P. Codognet, An Incremental Hierchical Constraint Solver, Proc. of Principles and Practices of Constraint Programming, April 28-30, 1993, New Port, Rhode Island, pp. 201-210 B. Selman, H. Kautz, and B. Cohen, Noise Strategies for Local Search, Proc. of the 12th National Conference on Artiﬁcial Intelligence (AAAI’94), pp. 337-343, 1994

Half-Sweep Algebraic Multigrid (HSAMG) Method Applied to Diﬀusion Equations J. Sulaiman1 , M. Othman2,∗ , and M. K. Hasan3 1

2

3

School of Science and Technology, Universiti Malaysia Sabah, Locked Bag 2073, 88999 Kota Kinabalu, Sabah, Malaysia [email protected] Department of Communication Technology and Network, University Putra Malaysia, 43400 UPM Serdang, Selangor D.E., Malaysia [email protected] Department of Industrial Computing, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia [email protected]

Abstract In previous studies, the eﬃciency of the Half-Sweep Multigrid (HSMG) method has been shown to be very fast as compared with the standard multigrid method. This is due to its ability to reduce computational complexity of the standard method. In this paper, the primary goal is to propose the Half-Sweep Algebraic Multigrid (HSAMG) method using the HSCN ﬁnite diﬀerence scheme for solving two-dimensional diﬀusion equations. The formulation of the HSAMG scheme is derived by borrowing the concept of the HSMG method. Results on some numerical experiments conducted show that the HSAMG method is superior to the standard algebraic method.

1 Introduction Recently the concept of the Half-Sweep Multigrid (HSMG) method has been inspired by Othman and Abdullah [19] for solving two-dimensional Poisson equations by using the rotated ﬁnite diﬀerence approximation equation. Actually the concept of this method has been developed by applying the concept of half-sweep iterative method [1]. Consequently, both methods have low computational complexity compared to the full-sweep methods. Furthermore Othman et al. [20] has proposed this method to be performed on the shared memory multiprocessors and Ali et al. [2] has also extended this concept of HSMG by introducing the Nine-Point HSMG method by using the rotated ∗

The author is also a associate researcher at the Lab of Computational Science and Informatics, Institute of Mathematical Research, University Putra Malaysia.

548

J. Sulaiman et al.

nine-point ﬁnite diﬀerence scheme. All these HSMG methods, however, still use the geometric approach. Therefore the standard multigrid and the HSMG methods can be also considered as the Full-Sweep Geometric Multigrid (FSGMG) and the Half-Sweep Geometric Multigrid (HSGMG) methods respectively. This means that both GMG schemes will need to know the position of all grid points at the ﬁner and coarser grids before implementing one GMG cycle. Apart from this approach, the Algebraic Multigrid (AMG) method, which is not geometric oriented, has been also discussed by Brandt [5], and Chang et al. [8]. Nevertheless, this AMG method is still categorized as the Full-Sweep Algebraic Multigrid (FSAMG) method. In order to reduce the computational complexity for each linear system involved in implementing one AMG cycle, the concept of the half-sweep iterative method is applied onto the AMG method. Here, we introduce a new AMG algorithm called the Half-Sweep Algebraic Multigrid (HSAMG) method. To indicate the eﬀectiveness of the HSAMG method in terms of eﬃciency and accuracy, let us consider the two-dimensional diﬀusion equation as given by 2 ∂ U ∂2U ∂U =α + , (x, y) ∈ D = [a, b] × [a, b], 0 ≤ t ≤ T (1) ∂t ∂x2 ∂y 2 subject to the initial condition U (x, y, 0) = g1 (x, y), a ≤ x, y ≤ b, and the boundary conditions U (a, y, t) = g2 (y, t) U (b, y, t) = g3 (y, t) U (x, a, t) = g3 (y, t) U (x, b, t) = g4 (y, t)

) a ≤ y ≤ b, 0 ≤ t ≤ T, ) a ≤ x ≤ b, 0 ≤ t ≤ T,

where α is a diﬀusion parameter. Before describing formulation of the ﬁnite diﬀerence approximation equation in case of the full- and half-sweep iterations over the problem (1), we assume the solution domain (1) can be uniformly divided into (n + 1) subintervals in the x and y directions and M subintervals in the t direction. The subintervals in the x, y and t directions are denoted x, y and t, respectively, which are uniform and deﬁned as ( x = y = b−a m = h, m = n + 1 . (2) T −0 t = M

Half-Sweep Algebraic Multigrid Method

549

2 The Half-Sweep Finite Element Approximation Based on Fig. 1, the networks of triangle ﬁnite elements need to be built in order to derive the full- and half-sweep triangle ﬁnite element approximation equations for problem (1). Similarly, the concept of the half-sweep iteration applied to the ﬁnite diﬀerence method Abdullah [1], Othman & Abdullah [19], Sulaiman et al. [22]), each triangle element for the full- and half-sweep cases will involve three red node points of type • only as shown in Fig. 2. Therefore, implementations of the full- or half-sweep iteration will be also carried out at the same node points until the iterative convergence test is met. It is obvious that the implementation of the half-sweep iterative method just involves half of whole inner points as shown in Fig. 1b compared with the full-sweep iterative method. Then the approximate solution at remaining points (the black points) can be computed directly, see in [1, 17, 22, 23, 27].

a).

b).

Figure 1. a). and b). show the distribution of uniform node points at any t level for the full- and half-sweep cases respectively at n = 7

By considering three node points of type • only, the general approximation of the function, U (x, y) in the form of interpolation function for an arbitrary triangle element, e is given by (Fletcher [9], Lewis & Ward [18], Zienkiewicz [28]) : [e] (x, y, t) = N1 (x, y)U1 (t) + N2 (x, y)U2 (t) + N3 (x, y)U3 (t) U

(3)

and the shape functions, Nk (x, y), k = 1, 2, 3 can generally be shown as Nk (x, y) =

1 (ak + bk x + ck y), k = 1, 2, 3 det A

where, det A = x1 (y2 − y3 ) + x2 (y3 − y1 ) + x3 (y1 − y2 ),

(4)

550

J. Sulaiman et al.

a).

b).

Figure 2. a). and b). show the networks of triangle elements for the full- and half-sweep cases respectively at n = 7

⎡

⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ a1 x2 y3 − x3 y2 b1 y2 − y3 c1 x3 − x2 ⎣ a2 ⎦ = ⎣ x3 y1 − x1 y3 ⎦ , ⎣ b2 ⎦ = ⎣ y3 − y1 ⎦ , ⎣ c2 ⎦ = ⎣ x1 − x3 ⎦ . a3 x1 y2 − x2 y1 b3 y1 − y2 c3 x2 − x1 Then based on Fig. 2, the approximation of the functions, U (x, y, t) in case of the full-sweep and half-sweep cases for the entire domain will be deﬁned respectively as (Vichnevetsky [25]) : (x, y, t) = U

m m

Rr,s (x, y)Ur,s (t)

(5)

r=0 s=0

and : (x, y, t) = U

m

m

r=0,2,4 s=0,2,4

m−1

Rr,s (x, y)Ur,s (t)+

m−1

Rr,s (x, y)Ur,s (t) (6)

r=1,2,5 s=1,3,5

where Rr,s (x, y) is a hat function. Thus, Eqs. (5) and (6) are approximate solutions for problem (1). To derive the full-sweep and half-sweep linear ﬁnite element approximation equations for problem (1), this paper proposes the Galerkin ﬁnite element scheme. Thus, let consider the Galerkin residual method (Fletcher [9, 10], Lewis & Ward [18]) be deﬁned as Ri,j (x, y)E(x, y, t) dxdy = 0, i, j = 0, 1, 2, ..., m (7) D

! 2 " ∂ U ∂2U −α + where E(x, y, t) = ∂U is a residual function. By applying the 2 2 ∂t ∂x ∂y Green theorem and using the modiﬁed Euler scheme, it can be shown that Eq. (7) will generate a linear system for both cases. Generally both linear systems at any k + 1 time level can be stated as AU k+1 = f k

(8)

Half-Sweep Algebraic Multigrid Method

551

3 The Half-Sweep Algebraic Multigrid Method As mentioned in the previous section, the multigrid method is one of the most eﬃcient iterative methods to solve any system of linear equations generated from discretizing partial diﬀerential equations. Further discussions on the basic concept of the multigrid method can be found by Brandt ([3, 4]), Hackbusch ( [13, 15, 16]), Wesseling [26], and Briggs [7]. In implementing parabolic multigrid method, there are two strategy approaches can be considered to solve the problem (1) such as the equal time level and the coarser time level cases, (Hackbusch [14]; Brandt & Greenwald [6]). In this paper, the coarser time level case will be implemented in order to solve a linear system in Eq. (8) at any time level. This means that numerical solution onto the parabolic problems will be found by solving a sequence of the respective elliptic problems. In this paper, however, we shall restrict our discussion on the Full-Sweep and Half-Sweep Algebraic Multigrid methods by using the V-cycle approach. In terms of implementations of the GMG or the AMG method, both methods will use a sequence of diﬀerent systems of linear equations, which is deﬁned as k

A2 h U 2

k

h

k

= f 2 h,

k = 0, 1, 2, . . . , q.

(9)

To implement an algebraic multigrid V-cycle, there are two operators used to transfer some values from diﬀerent linear systems such as the restriction and h will be used prolongation operators. Generally, the prolongation operator, I2h 2h from the coarser linear system to the values of to transfer the values of V V h at the ﬁner linear system and deﬁned as h V h = I2h V 2h .

(10)

h for the full- and half-sweep The stencil form of the nine-point operator, I2h (Othman & Abdullah [19]) cases can be shown respectively as ⎡ ⎤ 121 1 h (11) = ⎣2 4 2⎦ I2h 4 121

⎡

and h I2h =

1

⎢ 2 2 1⎢ ⎢1 4 4⎢ ⎣ 2 2 1

⎤ ⎥ ⎥ 1⎥ ⎥ ⎦

(12)

Then notation of the restriction operator is indicated as Ih2h and primarily used to transfer the residual values of rh from the ﬁner linear system to the residual values of r2h at the coarser linear system, which is expressed as r2h = Ih2h rh .

(13)

552

J. Sulaiman et al.

Determination of the stencil form of the restriction operator can be shown by h as given by using the relation between the operators, Ih2h and I2h Ih2h =

1 h I . 4 2h

(14)

2h for the full- and half-sweep cases can Therefore, computations of values of ri,j be shown respectively as h h h h 1 r2i−1,2j−1 + r2i−1,2j+1 + r2i+1,2j−1 + r2i+1,2j+1 + 2h h ri,j = (15) h h h h + 4r2i,2j + r2i,2j−1 + r2i,2j+1 16 2 r2i−1,2j + r2i+1,2j

and 2h ri,j

1 = 16

*

h h h h h + r2i+2,2j + r2i,2j−2 + r2i,2j+2 + 4r2i,2j + r2i−2,2j h h h h 2 r2i−1,2j−1 + r2i−1,2j+1 + r2i+1,2j−1 + r2i+1,2j+1

+ (16)

Actually, there are three important operators involved in both AMG methods such as the restriction, smoothing, and prolongation operators. Combination of these operators will establish a AMG V-cycle process. To facilitate further discussion on both AMG V-cycle, the following is listed several symbols used to represent a certain process.

Level h 2h

2kh

- Error smoothing process at level Ω ,k≠q 2kh

- Error smoothing process at level Ω ,k=q

4h

- Restriction operator - Prolongation operator

(a) 8h (b)

Figure 3. a. and b. show the AMG V(1,1)-cycle for three-grid and four-grid cases respectively

Using these symbols, both AMG structures for the V (η1 , η2 )-cycle (Vandewalle & Piessens, [24]) shown in Fig. 3 are the most popular cycles to be implemented. Then parameters η1 and η2 represents a number of error smoothing process for Eq. (9). The general for FSAMG and HSAMG schemes based on Fig. 3 may be described in Algorithm I, see Othman and Abdullah [19] and St¨ uben and Trottenberg [21].

Half-Sweep Algebraic Multigrid Method

553

FSAMG and AMG schemes q q q q If (k=q) Compute directly e2 h = (A2 h )−1 r2 h ∀Ω 2 h If not, perform a. Solve Ah V h = f h ∀Ω h to obtain V h b. Compute r2h = Ih2h (f h − Ah V h ) k k k c. Implement CGC(k + 1, A2 h , e2 h , r2 h ) k and set e2 h = 0∀Ω 2h to obtain e2h h 2h d. Correction V h = V h + I2h e h h h h e. Solve A V = f ∀Ω to obtain V h

Algorithm I: i. ii.

Moreover the Gauss-Seidel (GS) iterative method will be used as a smoother for both AMG schemes.

4 Numerical Experiments To verify the eﬃciency of the implementation of the FSAMG and HSAMG schemes as derived in Eq. (17), which is based on the approximation Eq. (7), some numerical experiments are conducted to solve the two-dimensional diffusion equation as follows 2 ∂ U ∂2U ∂U =α + , (x, y) ∈ D = [0, 1] × [0, 1], 0 ≤ t ≤ 1.0 (17) ∂t ∂x2 ∂y 2 The initial and boundary conditions and exact solution of the problem (17) are given by −

!

U (x, t) = e

π2 t 2

"

sin(

πy πx ) sin( ), 0 ≤ x, y ≤ 1, 0 ≤ t ≤ 1.0. 2 2

(18)

All results of numerical experiments, which were gained from implementations of the FSAMG and HSAMG methods have been recorded in Table 1. In implementations mentioned above, the convergence test considered the tolerance error = 10−10 . Fig. 4 shows execution time against mesh size respectively.

5 Conclusion As mentioned in the second section, it has shown that both of the full- and half-sweep triangle element approximation equations based on the Galerkin scheme can generate a system of linear equations as shown in Eq. (8). Through numerical results recorded in Table 1, the number of iterations at all diﬀerent

554

J. Sulaiman et al.

Figure 4. Execution time (seconds) versus mesh size of the FSAMG and HSAMG methods

mesh sizes for the FSAMG and HSAMG methods are the same that are 9 and 10. This seems that the numbers of iterations for both AMG methods are not inﬂuenced by the use of mesh sizes. This is due to the more mesh size used will occur many levels need to be involved in order to improve approximate values of V h at level Ω h ( [11, 12]). In terms of the execution time as shown in Fig. 4, the HSAMG method is slightly faster than the FSAMG method. As mentioned in the ﬁrst section, this is because of the computational complexity of the HSAMG method is 50% less than the FSAMG method. In terms of the accuracy, the approximate solutions of the HSAMG method is in good agreement compared to the FSAMG method. Overall, it seems that the whole results for the eﬀectiveness of the HSAMG method is obviously similar to the results obtained by using the HSGMG method, see in [19, 20]. Table 1. Number of iterations, execution time (in seconds) and maximum absolute errors of the iterative methods for FSAMG and HSAMG methods h−1 32 64 128 256

FSAMG method HSAMG method V(1,1)-cycle Time Max Errors V(1,1)-cycle Time Max Errors 9 9 10 10

0.23 1.02 4.22 18.69

8.8949e-7 4.7617e-7 3.7818e-7 3.5490e-7

9 9 10 10

0.11 0.50 2.32 15.86

1.3641e-5 4.5053e-5 2.2331e-5 8.8928e-6

Half-Sweep Algebraic Multigrid Method

555

References 1. Abdullah, A. R.: The Four Point Explicit Decoupled Group (EDG) Method: A Fast Poisson Solver. International Journal Computer Mathematics, 38, 61–70 (1991) 2. Ali, N. H. M, Yunus, Y., Othman, M.: A New Nine Point Multigrid V-Cycle Algorithm. Sains Malaysiana, 31, 135–147 (2002) 3. Brandt, A.: Multi-Level Adaptive Technique (MLAT) for fast numerical solution to boundary value problems. Proceeding of the Third International Conference on Numerical Methods in Fluid Mechanics, 18, 82–89 (1973) 4. Brandt, A.: Guide to multigrid development. In. Hackbusch, W, Trottenberg, U. Multigrid Methods. Lecture Notes in Mathematics, 960, 312–320 (1982) 5. Brandt, A.: Algebraic multigrid theory: The symmetric case. Applied Mathematics and Computation, 19, 23–56 (1986) 6. Brandt, A., Greenwald, J.: Parabolic multigrid revisited. International Series of Numerical Mathematics, 98, 143–154 (1991) 7. Briggs, W. L.: A multigrid tutorial. Pennsylvania: Lancaster Press. (1987) 8. Chang, Q., Wong, Y. S., Fu, H.: On the Algebraic Multigrid method. Journal of Computational Physics, 125, 279-292 (1996) 9. Fletcher, C. A. J.: The Galerkin method: An introduction. In. Noye, J. (pnyt.). Numerical Simulation of Fluid Motion, North-Holland Publishing Company, Amsterdam 113–170 (1978) 10. Fletcher, C. A. J.: Computational Galerkin method. Springer Series in Computational Physics. Springer-Verlag, New York (1984) 11. Gupta, M. M., Kouatchou, J., Zhang, J.: A Compact Multigrid solver for convection-diﬀusion equations. Journal of Computational Physics, 132(1), 166– 172 (1997) 12. Gupta, M. M., Kouatchou, J., Zhang, J.: Comparison of second and fourth order discretizations for multigrid Poisson solvers. Journal of Computational Physics, 132(2), 226–232 (1997) 13. Hackbusch, W.: Introduction to Multi-grid methods for the numerical solution of boundary value problems. In. Essers, J.A. Computational Methods for Turbulent, Transonic and Viscous Flows: Springer-Verlag, Berlin, 45–92 (1983) 14. Hackbusch, W.: Parabolic Multigrid. Computing Methods in Applied Sciences and Engineering, VI. North-Holland, 189–197 (1984) 15. Hackbusch, W.: Multi-grid methods and applications. Springer Series in Computational Mathematics 4. Springer-Verlag, Berlin (1985) 16. Hackbusch, W.: Iterative solution of large sparse systems of equations. SpringerVerlag, New York (1995) 17. Ibrahim, A., Abdullah, A. R.: Solving the two-dimensional diﬀusion equation by the four point explicit decoupled group (EDG) iterative method. International Journal Computer Mathematics, 58, 253–256 (1995) 18. Lewis, P. E., Ward, J. P.: The Finite Element Method: Principles and Applications. Addison-Wesley Publishing Company, Wokingham (1991) 19. Othman, M., Abdullah, A. R.: The Halfsweeps Multigrid method as a fast Multigrid Poisson solver. International Journal Computer Mathematics, 69, 319–329 (1998) 20. Othman, M., Sulaiman, J., Abdullah, A. R.: A parallel Halfsweep Multigrid algorithm on the Shared Memory Multiprocessors. Malaysian Journal of Computer Science, 13(2), 1–6 (2000)

556

J. Sulaiman et al.

21. St¨ uben, K., Trottenberg, U.: Multigrid methods: Fundamental algorithms, model problem analysis and applications. In. Hackbusch, W, Trottenberg, U. Multigrid Methods. Lecture Notes in Mathematics, 960, 1–176 (1982) 22. Sulaiman, J., Hasan, M. K., Othman, M.: The Half-Sweep Iterative Alternating Decomposition Explicit (HSIADE) method for diﬀusion equations. LNCS 3314, Springer-Verlag, Berlin 57–63 (2004) 23. Sulaiman, J., Othman, M., Hasan, M. K.: Quarter-Sweep Iterative Alternating Decomposition Explicit algorithm applied to diﬀusion equations. Intern. Journal of Computer Mathematics, 81, 1559-1565 (2004) 24. Vandewalle, S., Piessens, R.: Multigrid waveform relaxation for solving parabolic partial diﬀerential equations. International Series of Numerical Mathematics, 98, 377–388 (1991) 25. Vichnevetsky, R.: Computer Methods for Partial Diﬀerential Equations, Vol I. New Jersey: Prentice-Hall (1981) 26. Wesseling, P.: An introduction to Multigrid methods. John Wiley and Son, New York (1992) 27. Yousif, W. S., Evans, D. J.: Explicit De-coupled Group Iterative Methods and Their Parallel Implementations, Parallel Algorithms and Applications, 7, 53–71 (1995) 28. Zienkiewicz, O. C.: Why ﬁnite elements? In. Gallagher, R. H., Oden, J. T., Taylor, C., Zienkiewicz, O. C. (Eds). Finite Elements In Fluids-Volume, John Wiley & Sons, London 1, 1–23 (1975)

Solving City Bus Scheduling Problems in Bangkok by Eligen-Algorithm Chotiros Surapholchai1 , Gerhard Reinelt2 , and Hans Georg Bock3 1

2 3

Department of Mathematics, Chulalongkorn University, Thailand [email protected], [email protected] Institute of Computer Science, University of Heidelberg, Germany [email protected] Interdisciplinary Center for Scientiﬁc Computing (IWR), University of Heidelberg, Germany [email protected]

Abstract The modeling of city bus scheduling problems is considered to optimize the number of buses and their scheduling in the city. The vehicle scheduling problem (VSP) can be solved by some heuristic algorithms. The disadvantage of these algorithms is that the solution quality decreases as the number of depots increases. Therefore, in this paper, we develop the Eligen-algorithm, which uses the techniques of column elimination and column generation, for solving the multiple-depot vehicle scheduling problems (MDVSPs). The advantage of this algorithm is that the solution quality improves as the number of depots grows. Moreover, this algorithm is faster and gives better solutions than the nearest bus-stop heuristic algorithm (NB) and the joined nearest bus-stop heuristic algorithm (JNB) which we developed before. For example problem instance, we use the modeling of city bus scheduling problem in Bangkok, Thailand.

1 Introduction At present, metropolitan cities, deﬁned as cities with populations of at least one million people are very crowded. A major problem, which menaces the economic situation and reduces health and sanitation of people in these big cities, is the traﬃc jam problem. One eﬀect of this problem is that many people are unable to reach appointments on time. This is due to the lack of good transportation schedules to the destination and the lack of information concerning how to get there in the shortest time. Moreover, one more reason for the traﬃc jam problem is the increase in the number of private cars on the main streets. Some related governments try to convince people to use public bus transport instead of private cars in order to reduce the number of private

558

C. Surapholchai et al.

cars on the main streets. The optimization of the number of buses should help ease the traﬃc jam problem. Therefore, in this paper we consider the modeling of city bus scheduling problems. The aim of this research is to optimize the number of buses. Hence, we try to solve this problem to obtain the optimal number of buses used. Two problem cases modiﬁed from [8] are considered: the Single-Depot Vehicle Scheduling Problem (SDVSP) and the Multiple-Depot Vehicle Scheduling Problem (MDVSP). The city bus problems, which are real-world problems, are high dimensional integer programming problems and they cannot be solved in polynomial time (NP-hard problems) for the multiple depot case. Therfore, problem-speciﬁc algorithms are considered for the solutions. Bangkok, the capital city of Thailand, is one of metropolitan cities, which is very crowded, with a population of approximately ten million people. Thus, for the example problem instance, we use the modeling of city bus scheduling problem in Bangkok. One big government department who organizes the city buses in Bangkok is the Bangkok Mass Transit Authority (BMTA). Our methods can reduce the capital cost for this department by reducing the number of buses. It is expected that the results help reduce the traﬃc jam problem in Bangkok.

2 Literature Review The vehicle scheduling problems start with Bodin and Golden [3] who gave a “Classiﬁcation in Vehicle Routing and Scheduling”. They deﬁned the term vehicle route as an ordered sequence of pickup or delivery points traversed by a vehicle, starting and ending at a depot. They started with a simple version of a SDVSP having the task to minimize the ﬂeet size (or capital costs) only. This problem was solved using a minimum-cost ﬂow algorithm. Then Assad, Ball, Bodin, and Golden [1] gave a comprehensive overview of vehicle routing and scheduling approaches. They described heuristic algorithms like a concurrent scheduler, an interchange method, a cluster ﬁrst schedule second method, and a schedule ﬁrst - cluster second method, but no exact method to solve the MDVSP. For more methods to solve these problems, Carraresi and Gallo [4] gave a review on solving vehicle and crew scheduling problems in public mass transit. The authors proposed the two heuristics cluster ﬁrst - schedule second and schedule ﬁrst - cluster second and gave a note on a Lagrangian relaxation approach to solve the MDVSP. While Dell’Amico, Fischetti, and Toth [6] presented a heuristic algorithm on shortest path to solve a small MDVSP with only 1,000 serviced trips. Following years, Desrosiers, Dumas, Solomon, and Soumis [5] gave a survey on various vehicle scheduling and routing problems. Furthermore, Daduna and Paix˜ ao [7] formulated the MDVSP as a multicommodity ﬂow problem,

Solving City Bus Scheduling Problems in Bangkok by Eligen-Algorithm

559

but did not consider depot groups. Finally, L¨ obel [9] developed schedulecluster-reschedule heuristic algorithm (SCR) to solve MDVSP. Although the methods can solve large problem optimally, but it is not clear how much of the indicated savings of the vehicles and operational costs can be obtained in practice. Hence, we consider these problems and developed new methods to solve them.

3 Mathematical Formulation 3.1 Mathematical Description as Digraph The depot is a nonempty set of vehicles. Let D be the set of all depots. For each d ∈ D, let d+ be a start depot, and d− an end depot. We will mention timetabled trips or serviced trips or passenger trips or t-trips. The terms may be used interchangably. Such trips can carry passengers, denoted by T , that we have t+ a ﬁrst bus-stop, and t− a last bus-stop for every t in T . is {(t+ , t− ) | t ∈ Td }. We The set of timetabled trips, denoted by At−trip d will also consider trips not carrying any passengers. Firstly, a pull-out trip is a trip connecting a start depot d+ with a ﬁrst bus-stop t+ . The set of pull-out , is {(d+ , t+ ) | t ∈ Td }. Secondly, a pull-in trip is trips, denoted by Apull−out d a trip connecting a last bus-stop t− with an end depot d− . The set of pull-in , is {(t− , d− ) | t ∈ Td }. Thirdly, a dead-head trip trips, denoted by Apull−in d + or d-trips is a trip connecting a last bus-stop t− l with a ﬁrst bus-stop tf . The + set of dead-head trips is Ad−trip := {(t− d l , tf ) | tf , tl ∈ Td }. Lastly, we know that all pull-out trips, pull-in trips, and dead-head trips are unloaded trips, := Apull−out ∪ Apull−in ∪ Ad−trip . denoted by Au−trip d d d d − + Let (d , d ) denote a backward arc from the depot’s end to its start point. The backward arc represents the vehicle return and is used to indicate depot capacities. Let

∪ Au−trip ∪ {(d− , d+ )} Vd := {d+ , d− } ∪ Td− ∪ Td+ and Ad := At−trip d d denote the node and arc set, respectively, of the digraph Dd := (Vd , Ad ). We deﬁne a large digraph D := (V , A ) 3 with node set V := D+ ∪ D− ∪ T + ∪ T − and A := d∈D Ad . For the Singledepot case, the digraphs Dd := (Vd , Ad ), with Vd := {d+ , d− } ∪ Td+ ∪ Td− and Ad := At−trip ∪ Au−trip ∪ {(d− , d+ )} are given. And for the Multipled d depot 3 case, the digraphs D := (V , A ), with V := D+ ∪ D− ∪ T + ∪ T − and A := d∈D Ad are also given.

560

C. Surapholchai et al.

3.2 Mathematical Model The city bus scheduling problems can also be formulated through a graph model. Two problem cases modiﬁed from [8] are considered. • •

the Single-Depot Vehicle Scheduling Problem (SDVSP), the Multiple-Depot Vehicle Scheduling Problem (MDVSP).

Single-Depot Vehicle Scheduling Problems (SDVSP) The Single-Depot Vehicle Scheduling Problems (SDVSP) considers one depot. Consider a SDVSP assuming D := {d} and G(t) = D, for all t ∈ T . The ILP model of SDVSP is: min

ca xa

a∈Au−trip

subject to x(t+ ,t− ) = 1 ∀t ∈ T , −

x(δ (t )) − x(t+ ,t− ) = 0 ∀t ∈ T , +

x(t+ ,t− ) − x(δ − (t+ )) = 0 ∀t ∈ T , x(δ + (d+ )) − x(d− ,d+ ) = 0, x(d− ,d+ ) − x(δ − (d− )) = 0, λ ≤ x(d− ,d+ ) ≤ κ, 0 ≤ xa ≤ 1 ∀a ∈ At−trip ∪ Au−trip , x integer.

T+ a+

T− t−trip

a−

d−trip

d+

pull−out

b+

b−

c+

c−

pull−in

d−

Backward Arc

Figure 1. Illustration of the SDVSP

The SDVSP graph model shown in Figure 1 represents the SDVSP that can be solved in polynomial time by network ﬂow problem. Various polynomial and pseudo-polynomial time solution approaches for minimum-cost ﬂow problems exist [2]. Hence, SDVSP can be solved by the network simplex algorithm.

Solving City Bus Scheduling Problems in Bangkok by Eligen-Algorithm

561

Multiple-Depot Vehicle Scheduling Problems (MDVSP) The second model, Multiple-Depot Vehicle Scheduling Problem (MDVSP), considers multiple depots. The ILP model of MDVSPs is: min

d∈D

a∈Au−trip d

cda xda

subject to x(δ + (t)) = 1, x (δ (t)) − xd (δ − (t)) = 0, d

+

λd ≤ xd (δ + (d)) ≤ κd , x ≥ 0, x ∈ {0, 1}A .

∀t ∈ T , ∀t ∈ Td ∀d ∈ D, ∀d ∈ D,

Backward Arc

T−

T+ a+

t−trip

a−

d−trip

r+

pull−out

g+

b+

b−

c+

c−

d+

d−

e+

e−

r−

pull−in

g−

Backward Arc

Figure 2. Illustration of the MDVSP

The MDVSP graph model shown in Figure 2 represents the MDVSP that is NP-hard. The MDVS-decision-P is NP-complete in the strong sense which immediately follows from the fact that the given transformation generates only 0 and 1 for the variables xda for all trips, except the backward arcs.

4 Algorithms solving MDVSP This section presents the problem-speciﬁc heuristic algorithms, the nearest bus-stop heuristic algorithm (NB) and the joined nearest bus-stop heuristic

562

C. Surapholchai et al.

algorithm (JNB), and the special proposed method, the Eligen-algorithm, to solve the city bus scheduling problems in metropolitan cities. Both heuristic algorithms are used to solve SDVSP and MDVSP in the preprocessing step. After that, the CPLEX Callable Library version 8.1 is used to solve linear programming problem. The Eligen-algorithm is especially a great help with MDVSP instances. We will describe more detail of the algorithms in the following subsections. 4.1 The Nearest Bus-stop Heuristic Algorithm (NB) NB solves the city bus scheduling problems by ﬁnding the nearest busstop of each trip. That means NB ﬁnds the nearest next bus-stop of each t-trip and especially creates d-trips. In general, after a bus ﬁnishes serving t-trip, then the bus returns to end depot. But NB creates the arc between the last bus-stop node to the succeeding ﬁrst bus-stop node. So, the bus needs not return to the end depot, but goes straight to the succeeding ﬁrst bus-stop to start a new t-trip without going back to the end depot. The important step of NB is to ﬁnd the succeeding ﬁrst bus-stop node which has the least weight. This weight depends on the idle time of the bus and the driver, and the distance between the last bus-stop and the succeeding ﬁrst bus-stop. 4.2 The Joined Nearest Bus-stop Heuristic Algorithm (JNB) JNB solves the city bus scheduling problems by joining the nearest busstops of each trip together. Thus, JNB is similar to NB by ﬁnding the nearest next bus-stops of each trip. But after that JNB links the arc from the last bus-stop node and the arc to the succeeding ﬁrst bus-stop node together. That makes JNB get better solutions and faster than NB. 4.3 Eligen-Algorithm The Eligen-algorithm (Eligen) helps solve multiple depot case (MDVSP). This algorithm help solves MDVSP. It uses combinatorial optimization techniques between column generation and column elimination methods. The advantage of this algorithm is that the solution quality increases more than heuristic algorithms as the number of depots grows. The Eligen-algorithm is described as follows: Step 1: Solve MDVSP for every depot and then take the optimal solutions to the initial set of columns. Step 2: Compute an initial lower bound for the each set of columns by Lagrangian relaxation. Step 3: Delete columns with positive reduced cost. Step 4: Generate columns with negative reduced cost, which is the reduced cost of shortest path.

Solving City Bus Scheduling Problems in Bangkok by Eligen-Algorithm

563

Step 5: Compute the gap between an estimate of a lower bound for all problems. If this gap is small enough, go to Step 6. Otherwise, return to Step 2. Step 6: Solve Lagrangian relaxation with the set of columns generated in Step 4 again, where the optimal solution of the subproblem gives the number of buses. The Eligen-algorithm yields the better optimal solutions and processing time than NB and JNB.

5 Computational Results In this section, the computational results of the city bus scheduling problem in Bangkok, Thailand, which is considered for a problem instance of the city bus scheduling problems in metropolitan cities, are presented. The timetabled schedules of raw data of this problem are written in the form of graph model, and then transformed into an integer programming model (SDVSP or MDVSP model cases). After that, this problem is solved by our methods in the previous section to obtain the optimal number of buses with the corresponding bus schedules. Here only the results for the optimal number of buses will be reported; the complete results will be presented elsewhere. Depot # Buses Computed by Serviced Computed by Zone (Real) NB JNB Eligen Trips NB JNB Eligen z9 4240 3933 3833 3733 21504 21504 11963 10752 Table 1. Comparison the number of buses and serviced trips of multiple-depot case after running our algorithms

Depot Constraints Computed by Variables Computed by Zone (Rows) NB JNB Eligen (Columns) NB JNB Eligen z9 43010 32258 23928 21506 981233 522873 490617 490617 Table 2. Comparison the number of constraints and variables of multiple-depot case after running our algorithms

Tables 1-3 present comparison of the numbers of buses, serviced trips, constraint and variables of data and average percentage reduction of buses. For depot zone 9, we combine the input data of bus schedules of 100 bus lines with 21504 serviced trips, 43010 constraints, and 981233 variables. The duration between successive bus is 5 minutes for the rush hours and 10 minutes for normal time. All the calculations perform on 2800 Hz Pentium 4 and 2 GB RAM. The Eligen-algorithm gives not only the best optimal solution but also the fastest algorithm. The JNB comes second in the quality of the optimal solution and the speed with the NB coming last.

564

C. Surapholchai et al. Depot Average percentage reduction of buses at zone depot zone Z9 with using each algorithm NB JNB Eligen z9 7.24 9.60 11.96 Table 3. The percentage of average reduction of buses at the depot zones

6 Concluding Remarks In this paper, we have presented the new method, Eligen-algorithm, for solving SDVSP and MDVSP, especially the city bus scheduling problem in Bangkok, a problem instance of the city bus scheduling problems in metropolitan cities. This algorithm also compares with the NB and JNB which we developed before. From the computational results of the numbers of buses with their bus schedules can save the capital cost of BMTA, a big government department who organizes the public city buses in Bangkok. However, it is not cleared that this algorithm can save the number of vehicles for all MDVSP in diﬀerent models. Further study is to ﬁnd faster algorithm for the MDVSP.

References 1. A. Assad, M. Ball, L. Bodin, and B. Golden. Routing and Scheduling of vehicles and crews. Computer & Operations Research, 10:63–211, 1983. 2. R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network ﬂows: Theory, algorithms and applications. Prentice Hall, Inc., Ebglewood Cliﬀs, New Jersey, USA, 1993. 3. L. Bodin and B. Golden. Classiﬁcation in vehicle routing and scheduling. Networks, 11:97–108, 1981. 4. P. Carraresi and G. Gallo. Network models for vehicle and crew scheduling. European Journal of Operational Research, 16:139–151, 1984. 5. J. Desrosiers, Y. Dumas, M. M. Solomon, and F. Soumis. Time constrained routing and scheduling. In M. Ball, T. Magnanti, C. L. Monma, and G. L. Newhauser, editors, Handbook on operations research and management sciences, chapter 2, pages 35–39. Elsevier Science B.V., North Holland, Amsterdam, The Netherlands, 1995. 6. M. Dell’Amico, M. Fischetti, and P. Toth. Heuristic algorithms for the multiple depot vehicle scheduling problem. Management Science, 39(1):115–125, 1993. 7. J. R. Daduna and J. M. P. Paix˜ ao. Vehicle scheduling for public mass transit an overview. In J. R. Daduna, I. Branco, and J. M. P. Paix˜ao, editors, Computeraided transit scheduling: Lecture notes in economics and mathematical systems. Springer Verlag, 1995. 8. A. L¨ obel. Optimal vehicle scheduling in public transit. PhD thesis, Technischen Universit¨ at Berlin, Germany, 1997. 9. A. L¨ obel. Solving large-scale multiple-depot vehicle scheduling problems. In N. H. M. Wilson, editor, Computer-aided transit scheduling, pages 193–200. Springer Verlag, Germany, 1999.

Partitioning for High Performance of Predicting Dynamical Behavior of Color Diﬀusion in Water using 2-D tightly Coupled Neural Cellular Network A. Suratanee, K. Na Nakornphanom, K. Plaimas, and C. Lursinsap The Advanced Virtual and Intelligent Computing Center (AVIC) Department of Mathematics, Faculty of Science, Chulalongkorn University, Bangkok, Thailand, 10330 [email protected], [email protected], [email protected], [email protected] Abstract The 2-D tightly coupled neural cellular network is developed to simulate the diﬀusion characteristics of colored liquid dropped onto water surface by generating its own diﬀusion image. To learn the diﬀusion characteristics, a large training set was divided into data subsets by extracting the signiﬁcant feature patterns of the diﬀusion and simultaneously training the neural cellular network individually. Using this technique for reducing the training-time, increasing the performance, and facilitating the recognition of large data sets, many sub optimal neural networks were developed to replace of one network. Additionally, the result of the partitioning data achieved the speedup of 17.9 times for 12 networks and 605,267 data patterns. The accuracy of the simulated behaviour is more than 90 percent compared with the actual event.

1 Introduction The study of predicting dynamical behavior of color diﬀusion in water was introduced in the report [3]. Their study exposed a new approach to investigate the behavior by using neural networks. Based on the actual behavior of the natural phenomena, they used a 2-D tightly coupled neural cellular network to model the diﬀusion characteristics for animating the visualization afterwards. However, the performance of the prediction essentially depend on the training set. A large training set may cover all important features but it may cause the problem of time consuming or even the problem of non-applicable neural networks. To reduce the training-time, enhance the performance, and facilitate the recognition of large data set, this study emphasized the partitioning approach of [4] to separate a large data set into small subsets by extracting

566

A. Suratanee et al.

the signiﬁcant feature patterns of the diﬀusion and simultaneously training the individual cellular network. The rest of the paper is organized into four sections. Section 2 discusses about the 2-D tightly coupled Neural Cellular Network. Section 3 shows the approach of parallel training neural network and the network architecture. Section 4 gives the experimental results. Section 5 concludes the paper.

2 The 2-D Tightly Coupled Neural Cellular Network According to the report [3], the predicted diﬀusion of surface disturbance is generated by pouring an amount of liquid on the surface of water. The liquid is colored and slightly heavier than water. The spontaneous spreading of this liquid generates a two-dimensional characteristic contour of color on the water surface. We can visually observe this phenomenon. The entire water surface can be captured on ﬁlm at any instant of time. At each time step, the entire behavior is viewed as an image. The color intensity of each pixel corresponds to the intensity of the color of the poured liquid at the coordinates (i, j) of that pixel. The intensity is assigned a value ranging from 0 to 255. Let I(t)i,j be the intensity of pixel (i, j) at time t. Based on the concept of Markovian process, the predicted value of I(t)i,j is computed from the values of the neighboring pixels and the current pixel at time t − 1. In this paper, the neighboring pixels are conﬁned in a square of size 3 × 3. Our assumption is that the event occurred at pixel i is the sequel of the events of its neighboring pixels. In case of a square of size 3×3, the relation of I(t)i,j and its neighboring pixels is

I(t)i,j = f (I(t − 1)i−1,j−1 , I(t − 1)i−1,j , I(t − 1)i−1,j+1 , I(t − 1)i,j−1 , I(t − 1)i,j , I(t − 1)i,j+1 , I(t − 1)i+1,j−1 , I(t − 1)i+1,j , I(t − 1)i+1,j+1 )

Generally, this assumption can be extended to the situation where the event of the current pixel is aﬀected by its square neighboring pixels of size a × a. See [3] for more details. It is obvious that this prediction problem can be transformed to the problem of functional approximation problem for a supervised neural network. Hence, the intensity of pixel (i, j) can be predicted by using a neural network. The entire image can be captured by a set of neural networks formed in a 2-D cellular fashion. Each cell corresponds to each pixel in the image. The value of cell (i, j) is computed by a neural network. Figure 1 shows our

Partitioning and Predicting Dynamical Behavior of Color Diﬀusion

567

proposed 2D tightly coupled neural network. Every cell has the same neural structure and synaptic weights. The input vector of each network consists of {(I(t − 1)i−1,j−1 , I(t − 1)i−1,j , I(t − 1)i−1,j+1 , I(t − 1)i,j−1 , I(t − 1)i,j , I(t − 1)i,j+1 , I(t − 1)i+1,j−1 , I(t − 1)i+1,j , I(t − 1)i+1,j+1 )} and the corresponding target is I(t)i,j . Base on this concept, only one neural network is needed for training. It is used to predict the intensity of every pixel. However, capturing every pixel produces too many training set to be learned by the neural network. Therefore, we used the partitioning approach to ﬁx this problem.

Figure 1. The structure of our proposed 2D tightly coupled cellular neural network

3 Neural Network & Parallel Training Concept Artiﬁcial neural networks with backpropagation [5] are currently and mostly used as a classiﬁer or a recognizer. It successfully performs a variety of input-output mapping tasks for recognition, generalization, and classiﬁcation. For fast training time [1, 2], the data sets must be partitioned into several smaller data sets and trained with one individual network in parallel. Each input vector assumed as xd can be viewed as a vector in any dimensions. Thus, the value of each feature xd is the location in the dth dimension. Consider the example shown in Figure 2. Here, all feature vectors are in a 3-dimensional space comprising of three basic axes, X-axis, Y -axis, and Z-axis. Suppose that Z-axis is the key dimension. All feature vectors are, then, partitioned into three groups based on the values in this dimension. Each partitioned group in Figure 2 is denoted by a block sliced along the Z-axis and is separately trained by an individual neural network. 3.1 Partitioning Training Feature Vectors The partitioning process consists of three main steps as follows.

568

A. Suratanee et al.

Figure 2. Suppose feature vectors have three dimensions and the third feature in Z-axis is the key feature. Therefore, all feature vectors are partitioned into three groups denoted by three blocks. Each group of feature vectors is trained by an individual neural network

•

First of all, we focused on all information from training set and an important interval. All feature values in any features of all input vectors are whole information to teach our artiﬁcial neural network. However, its size may be too large. So, all the feature values are divided into many interval equally. The signiﬁcant information is, then, the most important range of the values that have the maximum number of feature values called an important interval. This interval contains the most number of values in any features.

•

After selecting the important interval, the next step is to select a key feature of the training set. The key feature is a marker for partitioning the input vectors into many groups. We selected the key feature by counting the number of input vectors of which feature values lay in the important interval. Then, the feature column with the minimum number of input vectors is selected as the key feature by considering each distribution of each feature. Under the condition of using the feature column with the minimum number of input vectors, it is believed that each value can be distributed consistently on any intervals with respect to the key feature.

•

Finally, all input vectors were partitioned into subgroups with respect to their key feature and the number of sub-networks instead of one nonapplicable network. The average number of feature vectors to be partitioned in each group is computed by this fraction. avg = Total numbers of feature vectors Number of networks The value of avg is just a guideline for partitioning feature vectors into blocks. The size of some groups may be less or greater than the value of avg. The feature vectors are gradually partitioned in groups by using the number of feature values of key feature column and the value of avg. See Example in Figure 3.

The total number of input patterns in our experiment is 605,267 patterns. All of the feature values are divided into 100 intervals. Most of the values lie in

Partitioning and Predicting Dynamical Behavior of Color Diﬀusion

569

Figure 3. An example of how input vectors (or feature vectors) are partitioned. There are 20 feature vectors whose feature values lay in the range [103, 139]. These feature values are divided into six intervals. The width of each interval is set to 139−103 = 6. Step 1 shows the number of values, Nvalue (i), lying in the i-th interval. 6 The important interval is the third interval because most of the values lie in it. Step 2 shows the number of feature vectors, Npattern (4, k), having their feature values lying in the third interval, which is the important interval. The second feature was used as the key feature to divide all feature vectors. Step 3 shows three groups of feature vectors after partitioning. For this example, the number of networks is set to 4. Each = 5 patterns groups must contains less than or equal to avg = 20 4

the interval [121.5,123.75) as our important interval. The third dimension of our feature vector is the key feature to partition our data into small groups, of which the number is equal to the number of subnetworks. The number of networks must be set prior to partitioning process. This number indicates how many neural networks are required to be trained in parallel fashion and also indicates how many blocks of partitioned data are supplied to these networks. It is necessary to set the average number small enough to possibly train the network. 3.2 Architecture of Recognition Network The backpropagation neural network (BPNN) was used as a learning model in this study. Color intensity of pixel (i, j) and the neighboring pixel

570

A. Suratanee et al.

at time t − 1 is considered as input vector from the network, where as color intensity at time t is an target vector. We do this process for all pixels at every time step of those. At ﬁrst, we prune all the redundant input patterns, and pass all the rest of input patterns to the network for training. Then, the remaining input patterns which are our training set were partitioned by partitioning process as in Figure 4. For testing, each input vector of testing set was fed to the appropriate network under the condition of the key feature of training set.

Figure 4. The training set was fed to the partitioning process to select a key feature and separate the data into small sunsets for training individual neural networks simultanously

4 Experimental Results The data are collected by performing hand-on experiments. The data from each experiment are grouped as the training set and the testing set. The procedures are as follows. To monitor the diﬀusion process of local surface disturbance by the pouring of colored liquid, we capture the planar spreading characteristic of colored liquid on water surface into video ﬁles. These ﬁles are then decomposed into image ﬁles with equal time. In order to avoid a vast number of inputs into the network, we crop each image at its center to achieve 100 × 100 pixels. Finally, we train the proposed network by using 100 images. The trained patterns are collected at every pixel and the time step of the training images.

Partitioning and Predicting Dynamical Behavior of Color Diﬀusion

571

4.1 Image Dynamical Behavior of Color Diﬀusion in Water Using 9 inputs for training 3 × 3 BPNN, we got about 605,267 nonredundant patterns to train our recognition network. The number of patterns is too large to be trained by only one neural network shortly. Hence, the training set must be distributed by parallel training as discussed in Section 3. The training of 3 × 3 BPNN uses nine input units, 20 hidden units, and one output unit for each subnetwork. Its learning rate is equal to 0.9 for each subnetwork also. Figure 5 shows the actual target images obtained from the colored liquid dropping experiment and Figure 6 shows the output images predicted by neural network. The value of each pixel in this network is computed by the values from its square neighboring pixels of size 3 × 3. The area within the rectangle refers to the trained area, and the one outside the rectangle refers to the untrained area. The reason of introducing untrained area to neural network is to prove that neuron network is not only capable of predicting the trained area beyond the training time, but also the untrained one. The performance of this technique is evaluated by considering the actual image and the generated image as vectors and measuring the values of cosine between these two vectors. Figure 7 shows the relationship between target and output vectors by representing their values of cosθ at diﬀerent times. 4.2 Performance of Parallel Technique The performance of the proposed technique was compared in terms of CPU-time and average CPU-time for all nodes in the neural network. The average time for each node in a neural network is the smallest part of the calculation in the neural network. All simulations of sub-networks were carried out on Intel Pentium4 1.80GHz with 1GB RAM PC, using the SNNS 4.2, Stuttgart Neural Network Simulator which is publicly available at http://www-ra.informatik.uni-tuebingen.de/SNNS/, on Linux operating system. The performance of this technique is shown in Table 1 and Figure 8. In parallel theory, the good speedup(S) should be equal to the number of processors, N um, and the eﬃciency(E) of processor, speedup N um , should be equal to 1. Our algorithm can reduce the training time during the increasing of the number of processors. If we cannot produce only one applicable network in a desirable time, using our method is the beneﬁt way to ﬁx the problem. Moreover, speedup can be more than the number of processors and the eﬃciency of each processor can be higher. Over all, the result of the partitioning data in this study achieved the speedup of 17.9 times for 12 networks and 605,267 data patterns. The accuracy of the simulated behaviour is more than 90 percent compared with the actual event.

572

A. Suratanee et al.

(a)

(b)

(c)

Figure 5. The actual snapshot images taken at diﬀerent times from the experiments. The area inside the rectangle is the training area and the area outside is the testing area. (a) At time 10. (b) At time 20. (c) At time 30

(a)

(b)

(c)

Figure 6. The predicted images 3 × 3 compared with the actual image in Figure 5. (a) At time 10. (b) At time 20. (c) At time 30

Table 1. The performance results of this parallel technique No. I/Net Time(sec) Epoch SpeedUp Eﬃciency 1 2 4 6 8 10 12

605267 1856.00 302634 1473.73 151317 767.89 100637 355.11 75658 226.65 60527 171.64 50439 103.56

184 777 1435 1049 1755 1794 1794

1.0000 1.2594 2.4170 5.2265 8.1890 10.8134 17.9213

1.0000 0.6297 0.6043 0.8711 1.0236 1.0813 1.4934

No. : Number of networks I/Net: Input patterns per network

Partitioning and Predicting Dynamical Behavior of Color Diﬀusion

573

Figure 7. The value of cosine obtained from the neural network by the period of predictable time

(a)

(b)

Figure 8. (a) Relationship between the speedup and the number of networks. (b) Relationship between the eﬃciency and the number of networks

5 Conclusion This study incorporates the techniques of digital photography, neural network into a single conceptual framework, and parallel training distribution. The 2-D tightly coupled neural cellular network is developed to simulate the diﬀusion characteristics of colored liquid dropped onto water surface by generating its own diﬀusion image. The methodology may be applicable to the study of many commonly observed natural phenomena such as tumor growth in human organs, urban population expansion, and surface wave propagation. The results of our study appear to indicate that our approach is feasible.

574

A. Suratanee et al.

According to the values of cosine in Figure 7, the number of sub-networks or nodes is not inﬂuent to the preciseness of our prediction while that is likely to inﬂuence the training time as shown in Table 1 and Figure 8. In addition, the parallel training can increase the speed of training time more than the number of processors. The obtained speedup was super linear since training the whole data set because when the data set are partitioned and trained by each individual network, the convergence can be achieved in a short period. The partitioning technique managing on data sets can be applied to other models of artiﬁcial neural networks to reduce computational time.

Acknowledgment This work is supported by National Science and Technology Development Agency (NSTDA) and National Electronics and Computer Technology Center (NECTEC), Thailand.

References 1. A. Roy, S. Govil and R. Miranda, A Neural-Network Learning Theory and a Polynomial Time RBF Algorithm, IEEE Transactions on Neural Networks, 8, 6 (1997): 1301-1313. 2. D. Cornforth, and D. Newth, The Kernel Addition Training Algorithm: Faster Training for CMAC Based Neural Networks, Proceedings Conference Artiﬁcial Neural Networks and Expert Systems, Otago (2001) 3. K. Na Nakornphanom, C. Lursinsap, J. Asavanant, and F. C. Lin, Prediction and Animation of Dynamical Behavior of Color Diﬀusion in Water Using 2-D Tightly Coupled Neural Cellular Network, IEEE International Conference on Systems, Man and Cybernetics (2004) 4. K. Plaimas, C. Lursinsap, and A. Suratanee, High Performance of Artiﬁcial Neural Network of Resolving Ambiguous Nucleotide Problem, 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS-2005), April 2005. 5. D. E. Rumelhart, and J. L. McCelelland (eds). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press (1987)

Automatic Information Extraction from the Web: An HMM-Based Approach M. S. Tran-Le, T. T. Vo-Dang, Quan Ho-Van, and T. K. Dang Faculty of CSE, University of Technology, Ho Chi Minh City, Vietnam [email protected], [email protected], [email protected], [email protected] Abstract With the continued growth of the Internet and a huge amount of available data, extracting meaningful information from the Web has got a wide interest in both research community and business organizations. Although there exists a number of previous research works, to the best of our knowledge, none of them is ﬂexible enough to fulﬁll users’ requirements in a variety of application domains. In this paper, we discuss and propose a general, extensible and dynamic approach based on the Hidden Markov model (HMM) in order to facilitate the eﬃcient information extraction from HTML pages. Our proposed approach helps experts build a HMM from necessary speciﬁcations, train the system search engine, and extract meaningful information from HTML pages with the high precision and at a reasonable cost. More importantly, the proposed approach can be employed to support building knowledge bases for the next generation of the Web applications, i.e. the semantic Web. We developed and evaluated this model on a prototype, called PriceSearch, to extract price information of goods such as Nokia mobiles, computer mice, digital cameras. Experimental results conﬁrm the eﬃciency of our theoretical analyses and approach.

1 Introduction The World Wide Web, with its explosive growth and ever-broadening reach, makes available a tremendous amount of text that has been generated for human consumption. Unfortunately, this vast quantity of information is not easy for computers to manipulate or analyze. Information Extraction (IE) is the process of automatically recognizing and annotating parts of humanreadable text with semantic tags. Typically, these tags correspond to attributes of entities that are demonstrated in ﬁelds of a database schema (or ontology). The task of Information Extraction can be deﬁned as “Filling slots in database from sub-segments of text” in which text segmentation is the main sub problem. The diﬃculty of text segmentation is due to the fact that a text string does not have a ﬁxed structure; segments may change their positions or be missing in the string. Moreover, one word may have diﬀerent meanings

576

M.S. Tran-Le et al.

and can be assigned to diﬀerent attributes. For example, in paper references, year of publication may appear after author names or at the end, and page number can be omitted. For dealing with those, the study of IE has emerged in two main directions, the rule-based method and the probabilistic model. The rule-based method aims to build rules to extract information. These rules are generalized from single ones in the training set, and so called bottom up. Because many constraints in syntax and semantics are required, a syntactic analyzer and a semantic tagger are needed before getting rules [11]. Works using this method are AutoSlog, Liep, Palka, Crystal, and Hasten getting rules from free text; other systems are Whisk, Crystal+Webfoot, Rapier, and SRV, which operate on semi-structure documents (like HTML) [10]. However, rule-based method is not portable and extensible because rules need to be rebuilt when applied on a new domain [12]. The probabilistic method lives on mathematical background to be more extensible, higher fault tolerance. Two typical models are Naive Bayes and Hidden Markov. In [13], authors use Naive Bayes. Whereas with Hidden Markov model, typical works [17, 9] extract information in address or bibliography, and the other [14] extracts named entities appearing in documents. At the state of the art, a probabilistic model like HMM has been shown to be eﬀective for general text [6, 14] as well as speciﬁc-meaning phrases such as postal address or bibliography records [14, 17]. However, these mentioned HMMs did not consider the synonymous words in counting their common occurrences probabilities. This aﬀects the performance of text segmentation. Besides, the segmentation task is more diﬃcult if a meaning word can be composed of more than one syllable. For example, “MX 510” means the name of Logitech computer mouse rather than “MX ” or “510 ” respectively. And altogether, there is still a barrier which hinders experts to build an engine that works well on diﬀerent kinds of application domains. In this paper, we propose a HMM-based approach that overcomes those limitations. Firstly, words having the same meaning are grouped into one synonym set (called synset), and the emission probability of each state is distributed over those synset instead of individual words. Secondly, there is some extensions on the Viterbi algorithm to group syllables into words. Our work is also a tool that helps expert to input necessary speciﬁcations, train the system, and later on, extract meaningful information on their wanted domains with a reasonable reconstruction. The rest of the paper is organized as follows. Section 2 reviews the basic notions of HMM and its application to IE. Section 3 presents our proposed synset-based HMM and its extension. We then report the experimental result in section 4. Section 5 concludes the paper with some remarks and suggestion for future work.

Automatic Information Extraction from the Web

577

2 HMMs for Information Extraction 2.1 Hidden Markov Models

Q

... A

Q B

O

O

A

Q B O

A

Q

A

...

Q

B O

O

Figure 1. Example of HMM

A HMM is a ﬁnite state automation with stochastic state transitions and symbol emissions [7] consisting following parameters: • • • • • •

{Q, O, π, A, B}. Q = {q 1 , . . . , q n }: set of states. O = {o 1 , o2 ,. . . , o k }: vocabulary set of Observations (symbols) for each state q, containing those words that can be emitted from q. π = {p i }: a 1 × n matrix where pi is the probability for qi to be the start state. A = {a ij }: an n× n transition matrix where the aij is the probability of making a transition from state qi to state qj . B = {b ik }: matrix of observation state probabilities where bik is the probability for qi emitting word ok in O.

These parameters are learned from data in the training phase. Then in the extracting phase, given a text, the most probable path of states that was likely to have generated the document is computed. Each state emits and corresponds to a meaning word of text. IE take this into account to “Filling slots in database from sub-segments of text”. 2.2 Learning parameters HMMs parameters are learnt from labeled training data. Each training instance is a sequence of state-word pairs. The learned set Q of states simple comprise all states appearing in the training data. The vocabulary set of each state can also be learnt easily as the set of all words paired with that state in the training data. Let Nij be the number of transitions made from state qi to state qj , and Ni be the total number of transitions made from state qi , according to the training data. The transition probability from state qi to state qj is as follows: aij = p(qj | qi ) = Nij /Ni

578

M.S. Tran-Le et al.

For the emission probability distribution of state qi , suppose that the vocabulary set O = { o 1 , o 2 . . . oK } and the raw frequency of each ok (i.e., number of occurrence times) in state qi in the training data is f i . Then, the probability that qi emitsok is computed as below: bik = p(ok | q i ) = fi / j=1,K fj For the initial probability π,let ρqi =q1 is the number of time state qi starting a new sequence, and ρq1 is the total number of starting states. Then the probability for qi to be the start state is: πi = p(q1 = ti ) =

ρqi =q1 ρq1

The above formulae would assign probability of zero to those words that do not appear in training data, causing the overall probability for a text string to be zero. It is necessary to apply smoothing in order to avoid zero probability assigned to an unknown word with respect to a state; the emission probability distribution of that state is adjusted accordingly. Such a smoothing technique is the Laplace one, as follows: p(“unknown” |qi )=1/( j=1,K fi + K+1) p(ok | q i ) =(fi +1)/( j=1,K fi + K+1) p(ok |q i ) + p(“unknown One can see that j=1,n aij = 1 and i=1,K word” |qi ) =1, satisfying the normalized conditions. 2.3 Text segmentation The aim of this stage is to discover the hidden state sequence that was most likely to have produced a given observation sequence. Given an input string o = o1 o2 . . . o T and an HMM λ, the single best state sequence, from the start state to the end state, that generates o can be obtained by the Viterbi [4, 7]. Let δt (j)be the probability for the most probable path for o1 o2 . . . o t (t ≤ T )ending at state qj (i.e., qj emits ot ). Then δt (j) can be recursively deﬁned as follows:

δt (j) = max δt−1 (i) × aij × bjt 1≤i≤N

Where the aij is the probability of making a transition from state qi to state qj and bjt is the probability for qj emitting word ot in o. The following is steps of the Viterbi algorithm: •

Initialization: δ1 (i) = πi bj (o1 ) 1 ≤ i ≤ N

•

Induction: δt (j) =

max δt−1 (i) aij

bj (ot ) ψt (j) = arg max δt−1 (i) aij 2 ≤ t ≤ T, 1 ≤ j ≤ N

1≤i≤N

1≤i≤N

Automatic Information Extraction from the Web

•

579

Termination probability: p∗ = max δT (i) 1≤i≤N

End state: qT∗ = arg max δT (i) 1≤i≤N

•

Read out path, previous states: ∗ ) qt∗ = ψt+1 (qt+1

S1

S1

δt(1)

S2

δt(2)

t = T − 1, ..., 1

1

1

1

1

2

2

2

2

3

3

3

3

4

4

4

4

4

3

3

3

a1j Sj

a2j

S3

a3j

δt(3)

δt+1(j)= δt(2)bj(ot+1) ψt+1(j)= 2

aNj SN

SN

δt(N)

t+1

t

(a)

(b)

Figure 2. The recursion and backtracking step of the Viterbi algorithm

Figure 2 shows two main steps of the Viterbi algorithm. The recursion step illustrated in (a), and the backtracking step is shown is (b). The backtracking allows the best state sequence to be found from the back pointers stored in the recursion step. This algorithm can be computed using dynamic programming in O(KN 2 ) time.

3 Synset-Based HMMs 3.1 A Case Study: Goods Price For evaluating the performance of proposed model, we have developed a prototype, called PriceSearch, to extract price information of goods. The learner is recommended to model both typical contents of a ﬁeld and also its context. We distinguish six types of states: • •

Target: states required to model the content of the target phrases. Preﬁx: states are designed in such a way that, if a state sequence (such as that returned by Viterbi) passes through any target state, it should ﬁrst pass through a preﬁx.

580

• •

•

•

M.S. Tran-Le et al.

Suﬃx: is quite similar to preﬁx state. A state sequence should pass through a suﬃx when leaving target states. Attribute: states to model any meaning text that helps to clarify the target ﬁelds or other attribute slots in the database. For example, the information of name: “Sony Cybershot digitalcamera” “DSC S90” “3x Zoom” “4.1 Megapixel” “290$” sounds more clearly than “Sony Cybershot digitalcamera” “DSC S90” “290$” Background: state to model any text that does not emitted by other kinds of states. For the perfect situation, a background state should have outgoing transitions only to itself and to the beginning of all preﬁxs, and it has incoming transitions only from itself and the ends of all suﬃxs. Context: states reserved to model any text that may decrease the accuracy. For example, we want to extract the exactly sale price of a particular product; however, text strings also contain the price of shipping, list price, the price 3 days ago,. . . and even price of accessories attached. Context states are designed to detect unwanted prices, and therefore, will improve the accuracy for extraction phase.

These six kinds of states reﬂect our intuition for successful extraction on variety domains. However, except target and background states, other kinds are not always needed.

Figure 3. An HMM for digital camera

Figure 3 shows the HMM learnt from our training data where each state corresponds to one of the above types. •

Target: Name such as “Sony Cybershot”, “Cybershot”, “Sony Cyber-shot”, “Canon Powershot”. . . ; Price such as “$248.88”, “$237.99”,...

Automatic Information Extraction from the Web

• • • • •

581

Preﬁx: Preﬁx Price such as “price”, “sale price”. . . Suﬃx: Suﬃx Name such as “3x optical zoom”, “JPEG ﬁle format” . . . Attribute: Chip such as “DSC-W1”, “DSC-T7” . . . Resolution such as “5.1 megapixel”, . . . . Context: DummyPreﬁxPrice such as “old price”, “list price” . . . ;, Other Product such as “Canon powershot camera ”, “Nikon camera”, . . . .. Background: BackGround.

3.2 Synset-based HMMs As mentioned above, synset-based HMMs is our proposed model in which words having the same meaning are grouped into a synset. Synonymous words are treated as the same in counting their occurrences and semantic matching. For example, “Sony Cybershot”, “Sony Cyber-shot”, and“SonyCybershot” have the same meaning indicate name of Sony Cybershot digital camera. The probability of a synset emitted by a state is deﬁned as the sum of the probabilities of all words in the synset emitted by the state. Therefore, probability fragment is also reduces. One example was showed in Table 1. The model will then operate on a given text string as if each word in the text were replace by its relevant synset. Table 1. Emission Probabilities of Synsets in HMM Word Sony Cybershot Sony Cyber-shot Cybershot SonyCybershot Olympus Camedia Olympus Cameida Olympuscamedia Canon Powershot Powershot Nikon Coolpix Nikoncoolpix

P(word | q = name) Synset 0.15 Sony Cybershot 0.10 Sony Cyber-shot 0.05 Cybershot 0.05 SonyCybershot 0.20 Olympus Camedia 0.05 Olympus Cameida 0.05 Olympuscamedia 0.15 Canon Powershot 0.05 Canon Power-shot 0.10 Nikon Coolpix 0.05 Nikoncoolpix

P(synset| q = name) 0.35

0.30 0.2 0.15

Figure 4 illustrate the most probable state sequence, found by using Viterbi algorithm, for two diﬀerent input strings. Using synsets help to fully match synonymous words emitted from the same state, such as “Sony Cyber-shot” and “Sony Cybershot”; “DSC-S90” and “DSC S90”; “Best price” and “Sale price” in this example.

582

M.S. Tran-Le et al. Name

Chip

SuffixName

“DSC-S90”

“Digital Camera”

Name

Chip

Resolution

Prefix Price

Price

“Sony Cybershot”

“DSC 90”

“4.1MP”

“Sale price”

“$299.99”

“Sony Cyber-shot”

Resolution

“4.1MP”

Background

“2304x1728” “3x Opt”

Prefix Price

“Best price”

Price

“$279.99”

Figure 4. HMM for Sony Cybershot camera

3.3 Viterbi extension The fact that some meaning words may comprise more than one syllable makes segmentation task more diﬃcult. For example, “Cordless laser”, an attribute of computer mouse, contains two syllables respectively. In order to extract the right sub-segment, the standard Viterbi algorithm was modiﬁed as follows. Given an input string o = o1 o2 . . . o T and an HMM λ,assume the maximal number of syllables that form a word ot is L. Let δt (j)be the probability for the most probable path for o1 o2 . . . o t (t ≤ T )ending at state qj . Then δt (j) can be recursively deﬁned as follows: ) δt (j) = max max δt−1 (i) × aij × p(ot+l−1 ...ot−1 ot |qj ) 1≤l≤L

1≤i≤N

That is l(1 ≤ l ≤ L) syllables may form a word ot ending at state qj .Time complexity is O(lKN 2 ) for a syllable sequence of length l.

4 Experimental Results Our ﬁrst experiment is to extract “Price” for six products from three categories: cell phones, computer mice and digital cameras. Each HMM was trained from 100 HTML documents. The extraction result is at Table 2. Another HMM was trained to to extract “Name” and “Price” for Nokia cell phones. The result is showed on Table 3. Later on, HMM to extract ﬁve target states was trained to extract for Sony digital camera. The result is in Table 4. Comparing the “Name” and “Price” ﬁeld on Table 2, 3, 4, we notice that the more target ﬁelds HMM contains, the less precision it oﬀers. We also decided to test if our prototype model (PriceSearch) can extract well on other domain. So we followed steps to input speciﬁcations and got HMM to perform on References and compare result with Datamold and Rapier [14] in Table 5. The accuracy are evaluated by the following parameters:

Automatic Information Extraction from the Web

•

•

•

Precision: P =

number of correctly extracted segments number of extracted segments

R=

number of correctly extracted segments number of true segments

Recall:

F1 =

1 ((1/P )+(1/R))/2

Table 2. “Price” extraction for six products Products

Extracted prices Nokia 7260 408 Nokia 7270 219 Nokia 7280 241 Samsung D500 257 Mouse Logitech 691 Sony Cybershot 747 Camera

Correctly extract 365 207 215 232 606 640

True prices 379 214 223 239 667 719

P(%)

R(%)

F1(%)

89.46 94.52 89.21 90.27 87.68 85.68

96.30 96.73 96.41 97.07 90.85 89.01

92.75 95.74 92.67 93.54 89.24 87.31

Table 3. “Name” and “Price” Extraction for Nokia cell phone Field Name Price

P(%) 98.75 72.55

R(%) 99.02 96.30

F1(%) 98.88 82.75

Table 4. Extraction for Sony cybershot camera Field Name Chip Resolution Zoom Price

P(%) 97.85 90.55 75.34 82.06 70.68

R(%) 98.15 92.97 78.35 84.24 89.01

F1(%) 97.99 91.74 76.82 83.14 78.79

583

584

M.S. Tran-Le et al. Table 5. “Author” and “Title” Extraction for References Field Author Title

PRICESEARCH DATAMOLD P(%) R(%) F1(%) P(%) R(%) 90.43 91.23 90.83 88.07 86.20 83.55 87.16 85.32 90.26 97.86

RAPIER P(%) R(%) 0.00 0.00 92.60 51.31

5 Conclusion In this paper, we have presented the application of HMM-based approach for Information Extraction. Experimental results proved that HMM is well suited to natural language domains. Besides, new proposed features where emission probabilities are no longer individual words but synsets, made HMM more ﬂexible and extensible in order to be applied to diﬀerent domains. More importantly, the proposed approach would be useful to support building knowledge bases for the next generation of the Web applications. We also suggest further experiments, the solution for interference data when the number of documents increases and the building HMMs sub models as future work. Then, the employment of our approach and semantic Web would be very promising.

References 1. A. McCallum, K. Migam, J. Rennie, K. Seymore. Building Domain-Speciﬁc Search Engines with Machine Learning Techniques. School of Comuter Science Carnegie Mellon University. AAAI- 99 Spring Symposium. 2. A. McCallum, W. Cohen. Information Extraction from the World Wide Web. University of Massachusetts Amhest. Carnegie Mellon University. 3. B. Dorr, C. Monz. Hidden markov Models. CMSC 723: Introduction Computation Linguistics. 4. D. Kauchak, J. Smarr, C. Elkan. Sources of Success for Information Extraction Methods. Dept. of Computer Science UC CitySan Diego. 5. D. Freitag. Information Extraction from HTML : Application of a General Machine Learning Approach. Department of Computer Science. Carnegie Mellon University. 6. D. Freitag, A. K. McCallum. Information Extraction Using HMMs and Shrinkage, AAAI, 1999. 7. D. Freitag, A. K. McCallum. Information Extraction with HMM Structure Learned by Stochastic Optimization, Procs of the 18th Conference on Artiﬁcial Intelligence, 2001. 8. D. E. Appelt, D. J. Israel. Introduction to Information Extraction Technology. A Tutorial Prepared for IJCAI-99. 9. E. Agichtein, V. Ganti. Mining Reference Tables for Automatic Text Segmentation. Procs of ACM Conference on Knowledge Discovery and Data Mining (SIGKDD), 2004.

Automatic Information Extraction from the Web

585

10. I. Muslea. Extraction Patterns for Information Extraction Tasks: A Survey. AAAI, 1999. 11. L. R. Rabiner. A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of IEEE, 1989. 12. M. Neeve. Learning to Extract Information with Constant and Variable HMM topologie. MSc thesis, 2002. 13. P. Frasconi, G. Soda, A. Vullo. Text Categorization for Multi-page Documents: A Hybrid Naive Bayes HMM Approach. JCDL’01, 2001. 14. P. Blunsom. Hidden Markov Models. 15. P. Joseph. HMM based classiﬁers. Paper for CSCI 765, under Dr. William Perrizo. North Dakota State University. 16. R. Gaizauskas. An information Extraction perspective on Text mining: Task, Technologies and Prototype Applications. Natural Language processing Group Department of Computer Science. University of Sheﬃeld. 17. V. Borkar, K. Deshmukh, S. Sarawagi. Automatic Segmentation of Text into Structured Records. ACM SIGMOD, 2001.

Advanced Wigner Method for Fault Detection and Diagnosis System Do Van Tuan, Sang Jin Cho, and Ui Pil Chong School of Computer Engineering and Information Technology, University of Ulsan, 680-749 San 29, Muger 2-Dong, Ulsan Korea [email protected], [email protected], [email protected] Abstract An advanced Wigner method for time-frequency analysis based on the Wigner distribution and short-time Fourier transformation (STFT) methods is used to examine the acoustic emission signals detected during the operation of pipelines in the power plants. The acoustic emission signals, which depend on the behavior of materials deforming under stress, will be changed when pipelines crack or leak. Based on the unusual characteristics of the signals in frequency domain and some features in time domain, cracking or leaking problems can be detected. Our proposed method, a combination of advanced Wigner distribution and Wavelet transformation methods, is proposed for a fault detection and diagnosis system in the power plants. The results of our proposed method are compared with the advanced Wigner distribution and STFT methods.

1 Introduction The diagnosis and detection of faults [1,7] play an important role in industrial ﬁelds. It is compulsory to have quality methods in the fault detection and diagnosis system for the earliest possible detection of machines starting to go bad. This is particularly important for the pipelines in industrial ﬁelds such as nuclear power or composite material plants whose failure may lead to critical situations. It is also critical in manufacturing plants since malfunctioning machinery may produce defective products. In this paper, we are concerned with fault detection and signal analysis based on the acoustic emission (AE) signal [1] from the pipelines in power plants. A power plant’s atmosphere is so noisy and a pipeline network is so sophisticated that human ears cannot detect the changes in sound. Two kinds of faults that can occur in pipelines and valves are leaking and cracking. The magnitude of frequency is small for small leaks and high for large leaks. However, the frequency components are not altered. Usually, the frequency range of general steam leaking signals is from 1Khz to 1Mhz. But available fault

588

D.V. Tuan et al.

detection systems consider 60Khz to 750Khz because the frequency components from 1Khz to 60Khz contain mechanical noise. In addition, the high frequency components from 750Khz to 1Mhz decay very fast [8]. In general, cracking or leaking causes variation in the magnitudes of frequencies from 300Khz to 600Khz for cracking and from 300Khz to 700Khz for leaking. In these bands the frequencies are most sensitive to faults. A little change in leaking or cracking may cause a big variation in frequency magnitude [8]. We analyzed the characteristics of AE signals from the pipelines in power plants using LabVIEW and Matlab software for time-frequency analysis. Based on previous research, we ﬁnd some disadvantages in the available analysis methods for fault detection systems. The STFT method [3, 9] gives detectable results. However, it is diﬃcult to apply due to the small change in frequency and sharp spectra. The Wigner method [3, 9] gives smooth and visible spectra, but it has complex computation and cross term appearance. The cross term is a useless component that may lead to the incorrect results [3, 5]. Therefore, ﬁnding a suitable method for a fault detection and diagnosis system is deﬁnitely needed. In this paper, an advanced Wigner method is used for analyzing the AE signals from the pipelines, which oﬀers advantages over the other methods such as Wigner distribution and STFT method. The advanced Wigner method inherits the good properties of both methods therefore providing a good trade-oﬀ between the quality of visible spectra and the computational complexity. Moreover, we proposed a combination method (hereafter, called the proposed method) between advanced Wigner method and Wavelet transformation. The full picture of it is provided in the next sections. The paper is organized as follows: In Section 3, the fundamentals of acoustic emission, data acquisition and analysis methods are given and a detailed description of the experimental setup is presented, while the proposed method is provided in Section 4. Experimental results using the proposed method are discussed in Section 5. In the last section, we provide concluding remarks.

2 Background and Experimental Setup 2.1 Fundamentals of Acoustic Emission In the pipeline network, pipelines under high water pressure can be damaged by gradual cracking that could cause terrible trouble. It is necessary to ﬁnd a preventative measure to prevent the problem. The AE waves generated from cracking pipelines depend on the behavior of material deforming under stress, and AE testing is a powerful method for examining this. An AE wave may be deﬁned as a transient elastic wave generated by the rapid release of energy within a material. With AE equipment (AE Sensor) one can ‘listen’ to the sounds of cracks growing, ﬁbers breaking and many other modes of active damage in the stressed material. Each kind of failure has its own AE signal feature that is helpful to examine.

Advanced Wigner Method for Fault Detection and Diagnosis System

589

2.2 Fundamentals of Data Acquisition Data acquisition is the process by which phenomena are sampled and translated into machine-readable signals. Sometimes abbreviated DAQ, data acquisition typically involves sensors, transmitter and other instruments to collect signals, waveforms, etc. processed and analyzed on the computer. The components of data acquisition systems include appropriate sensors that convert any measurement parameter to an electrical signal, which is acquired by data acquisition hardware. Acquired data typically is displayed, analyzed and stored in the computer. This is achieved by using vendor supplied interactive control software, or custom displays and control can be accomplished using programming languages such as LabVIEW. 2.3 Primitive methods for diagnosis and detection system In a nuclear power plant, a diagnosis and detection system is available called an acoustic leak monitoring system (ALMS) [8]. In this system, some analysis methods are used. The signal analysis techniques can be categorized into four fundamental sub-sections [4]. They are: (i) signal magnitude analysis; (ii) time domain analysis of individual signals; (iii) frequency domain analysis of individual signals; and (iv) dual signal analysis in either the time or frequency domain. Each of the four techniques has advantages and disadvantages. In time domain analysis, there are correlation, covariance and impulse response methods. In frequency domain analysis, there are spectral density, frequency response, coherence and cepstrum analysis methods. In principle, the frequency domain analysis of continuous signals requires a conversion of the time history of a signal into an auto-spectral density function via a Fourier transformation of the auto-correlation function. In a diagnosis and detection system, it is required to detect when faults begin. In time domain, it is impossible to examine the frequency components. Also, in frequency domain, all frequency components are presented but we do not know when those frequency components appeared. On the other hand, we cannot investigate when the faults began. Therefore, there is a need to describe how the spectral content is changing in time and to develop the physical and mathematical ideals to understand what a time varying spectrum is. There are some methods in these ﬁelds such as: STFT and Wigner distribution. As a rule of thumb, signal magnitude analysis and time domain analysis provide basic information about the signal and therefore only require inexpensive and unsophisticated analysis instrumentation, whereas frequency domain and dual signal analysis provide very detailed information about the signal and therefore require specialist expertise and reasonably complex analysis instrumentation. Hence, it is very important that the analyzer makes an appropriate value judgment about which technique best meets the necessary requirement for the work.

590

D.V. Tuan et al.

2.4 Experimental Setup

Recording Preamplifer Main Amplifier BNC2110 AE sensor H15 DAQ Device PCI-6111 Processing Computer with Labview Acoustic Emission wave Log File

Ploting Computer With Matlab

Warning

Figure 1. Layout of the complete laboratory experimental setup

With the aim of simulating the AE signals caught from some pipelines and some valves during the process, we executed some experiments with the general model shown in Figure 1. The complete experiment consists of 2 parts: the recording part and the processing part. We obtained the AE signals from power plants and processed the AE signals in the laboratory. The analog detection system with 1500Khz in the sampling frequency could record the signal up to 750 Khz. The frequency range is also suitable for the H-15 sensor and the DAQ devices. Figure 2(a) depicts the ﬁrst experiment applied for ‘Condensate Feed Water Circulation Pipeline’ place in power plant. In this experiment, we placed an AE sensor on the surface of the pipeline. Two cases are checked. In the ﬁrst case, the normal case, the valve is closed therefore no water ﬂow was coming through. In the other case, the valve is opened a little bit. It allows a strong water ﬂow to come through the pipeline. We examined two AE signals in the two cases to consider the eﬀect of the water ﬂow on the AE signal. When the valve opened, the frequencies recorded were from 0Hz to 80Hz and the magnitude was high. The frequencies were from 0Hz to 40Hz and the magnitude was low when the valve was closed. Figure 2(b) describes the second experiment applied to the pipelines in the nuclear power plant. In this experiment, the AE signals were checked from the

Advanced Wigner Method for Fault Detection and Diagnosis System

591

valve in two cases, they are when there was water ﬂow and when there was not water ﬂow in the pipeline. The AE sensor was placed on the valves.

a water Tank Pipeline Closed Control Valve

Water flow

Measurement point

Pipeline

b Openned Valve (I)

Closed Control Valve

Closed Valve (II)

Water flow

Figure 2. (a) The ﬁrst experiment in ‘Condensate Feed Water Circulation Pipeline’ in power plant. (b) The Second experiment in Nuclear power plants, the Opened valve and Closed valve are checked

The AE signals in frequency domain for two cases are shown in Figure 3(a,b,c,d). In the second experiment as shown in Figure 3(c,d), the signal from the closed valve has low magnitude. On the other hand, the signal from the opened valve has high magnitude and its frequencies are around 500Khz. Those differences will be the base for our examination.

3 Proposed Method With the disadvantage of STFT and Wigner distribution methods as mentioned above, we suggest to use another method that has advantages over them. The advanced Wigner method satisﬁes the target requirements: ﬁnding when the fault begins, having a simpliﬁed computation, producing a smooth spectrum and focusing on a speciﬁc frequency bandwidth. The advanced Wigner method uses both Wigner distribution and STFT methods [3, 9] so that it inherits the good characteristics of both methods. In the STFT method, even though it has an exact description of the dependence of frequencies on time, it is so diﬃcult to see the frequency in bandwidth. Alternatively, the

592

D.V. Tuan et al.

Figure 3. (a) and (b) Spectra of signal from the pipeline using FFT method when the valve opened and closed in Figure 2(a). (c) and (d) Spectra of signal from the closed valve and opened valve using FFT method in Figure 2(b)

Wigner method provides a smooth spectrum but the cross term usually appears and generates incorrect results [5]. Moreover, the computational complexity is a problem of real-time fault detection and diagnosis system. The advanced Wigner method deals with the trade-oﬀ between the quality of spectra and the computational complexity. The inheritance from the STFT method allows the advanced Wigner method generate the correct spectra and has a simpler computational complexity. Moreover, from the Wigner method, the advanced Wigner method provides smooth spectra without cross-terms. The STFT [3] is 1 e−jwτ h(τ )s(τ + t)dτ FSP (t, w) = √ 2π with the spectrogram ST F T (t, w) = |FSP (t, w)|2 where FSP is STFT function, h(t) is the window, s(t) is the analog signal. ST F T (t, w) is the spectrogram of FSP .

Advanced Wigner Method for Fault Detection and Diagnosis System

593

The Wigner distribution [3] can be expressed in the time domain or in frequency domain as follows: τ 1 τ W D(t, w) = s∗ (t − )s(t + )e−jwτ dτ 2π 2 2 1 θ θ = S ∗ (w + )S(w − )e−jwθ dθ 2π 2 2 where s(t) is the signal in time domain, S(w) is the signal in frequency domain. We developed an advanced Wigner method based on the pseudo Wigner distribution [5], which is described in the time domain as follows: 1 τ τ τ τ P W D(t, w) = h( )h∗ (− )s∗ (t − )s(t + )e−jwτ dτ 2π 2 2 2 2 and using STFT and pseudo Wigner combined with a narrow window P (θ) [5] 1 θ ∗ θ P (θ)FSP (t, w + )FSP ADW D(t, w) = (t, w − )dθ 2π 2 2 The discrete form of the spectrogram is

N/2

S(n, k) = |FSP (n, k)|2 = |

2π

h(i)s(n + i)e−j N ik |2

i=1−N/2

where h(i) and s(n) are the form of the discrete window and discrete signal respectively. Then calculation of the advanced Wigner method ADW D(n, k) =

L

∗ p(i)FSP (n, k + i)FSP (n, k − i)

i=−L

Here if p(i) is the delta function, then ADW D(n, k) becomes STFT, and if p(i) is 1 for all i then ADW D(n, k) becomes the Wigner distribution spectrum. P(i) 1

–L

Q

L

i

Figure 4. Window for the proposed method

594

D.V. Tuan et al.

Here, 2L + 1 is the length of discrete window shown in Figure 4. is rectangular; hanning or hamming. The window, p(i) is an important part. In Wigner distribution, the cross term appears as the result of cross computations between two frequency components [5]. This is a real value and unnecessary, and it causes the analysis results to be diﬃcult to see. The small window adjustment can be used to provide clear results. In this paper, we propose using a rectangular window in frequency domain as shown in Figure 4. Depending on the window we choose, the proposed method inherits all good properties of the Wigner distribution and the STFT methods. It inherits the exactness of the STFT method in the analysis and the smoothness property of Wigner distribution. Moreover, by choosing the window, the cross term appearing in the Winger distribution can be completely removed from the spectra. That gives good results in detection and prevention of pipeline damage. Even though the calculation of this method is more complicated compared with the STFT method, it is practical enough for considering the signal in the real-time condition. With the variation in AE signals, the frequency bands also vary. There is another way to analyze these AE signals by using a combination method between Wavelet transformation [2, 6, 9] and the advanced Wigner method. For example, we examined two kinds of signals from two experiments. In the signal in the ﬁrst experiment the frequencies stay in the low frequency area. Therefore, it is not necessary to consider the high frequency components. So it is possible to focus only on the low frequency components. Therefore, using Wavelet transformation to reﬁne the low frequency components is beneﬁcial. However, in the signal in the second experiment, whose frequency components are in the high frequency band, it is necessary to consider the high frequency instead of the low frequency components. We propose using two kinds of models for reﬁning the low frequency and high frequency components as shown in Figure 5. In the ﬁrst and second models, there are two ﬁlters used; HPF (highpass ﬁlter) and LPF (lowpass ﬁlter). In the ﬁrst experiment, the signal from the pipeline changes very fast when the valve opens a little bit. The changes happen in the low frequency band. In order to compare the signals, we just need to consider the low frequency components. Therefore, the ﬁrst model seems to be suitable for this case. Moreover, in a real-time diagnosis and detection system, a proper method usually has to satisfy two strict requirements; time constraints (computational complexity) and the correctness of the results. Even though some methods have low computational complexity, they do not give the satisfactory results. Therefore, we need to make a trade-oﬀ between the two requirements to get better results for analysis. Nevertheless, the calculation can be simpliﬁed by using these models. If we have 4000 samples in the original signal, after using the ﬁrst model there are only 250 samples remaining to be considered. All high frequency components are removed because they don’t contain any necessary information. Hence,

Advanced Wigner Method for Fault Detection and Diagnosis System

595

HPF

Sig HPF HPF

LPF

HPF

LPF LPF

a

LPF

Sig

Sig HPF HPF

LPF Sig

LPF

b

Figure 5. The model of Wavelet transformation to (a) reﬁne low frequency components, (b) reﬁne high frequency components. The signals ’sig’ are examined

the same results are given. Similarly, in the second experiment the diﬀerences occur in the high frequency band, which is from 400Khz to 600Khz. Therefore, reﬁning the low and high frequency components is deﬁnitely necessary. Figure 5(a) shows the signals after using two LPFs analyzed with the advanced Wigner method. We propose a method which is a combination between advanced Wigner method and Wavelet transform as described above. The proposed method reducing the frequency band by Wavelet transformation not only gives simpliﬁed computational complexity but also provides visible spectra after using the advanced Wigner method. The results of analysis using advanced Wigner method, STFT and the proposed method are discussed in the next section.

4 Results of Analysis The AE signals from two experiments are analyzed. We provide a comparison of STFT, advanced Wigner and the proposed methods. The sampling frequency of all signals applied is 1500000 samples/second. In the STFT method, the window size is 256 samples, overlap size is 128 samples. In advanced

596

D.V. Tuan et al.

Wigner method we use the window with L = 128 samples, Q = 4 samples (mentioned in section 3). In the proposed method, the ﬁrst experiment and the second experiment are applied with the wavelet model described in Figure 5(a) and Figure 5(b) respectively. The signals after using Wavelet transformation are handled with the advanced Wigner method. In the ﬁrst experiment there are two cases for consideration; the AE signals from the pipeline when the valve closes and when the valve opens. The AE signal frequencies are concentrated in the low frequency bands. So, the diﬀerences between the two cases occur in the low frequency components. The STFT method shows the diﬀerences between the two cases as shown in Figure 6 and Figure 7. However, the gap between the high magnitude and low magnitude in the two cases is still diﬃcult to distinguish and the low frequency components are not shown clearly. The results using the advanced Wigner method show clear diﬀerences between the two cases because the gap between magnitudes is already magniﬁed as shown in Figure 8 and Figure 9. Nevertheless, each of the low frequency components is still not examined carefully and the high frequency components are useless for considering the diﬀerences. The proposed method not only provides better results for showing the low frequency components and reduces the computational complexity as well. That is shown clearly in Figure 10 and Figure 11. In the second experiment there are also two cases for consideration; the AE signals from the closed valve and opened valve. In these AE signals, the spectral diﬀerences between the two cases occur mostly in the high frequency components. Therefore, the low frequency components become useless and there is also no need for computation and consideration. We consider the difference occuring in the high frequency components only. The STFT method gives good results to show the diﬀerences in high the frequency bands. However, spectra are not really smooth and visible as shown in Figures 12, 13. Figures 14 and 15 for the advanced Wigner method show clearer and smoother spectra. Nevertheless, the computational complexity is high. Figures 16 and 17 applied the proposed method with reduction of low frequency components and show the diﬀerences in the two cases clearly.

5 Conclusions This paper is concerned with an advanced method in signal analysis based on the Wigner distribution and STFT methods. It is suitable for a fault detection and diagnosis system in power plants. The Wigner method is used for time-frequency analysis and reduces (or completely removes) the cross term eﬀects which appeared in the Wigner distribution method. We developed the advanced Wigner method to balance the trade-oﬀ between the computational complexity and the correctness of the results. By choosing a suitable window, mentioned above, this method retains all the advantages of both the STFT and the Wigner distribution methods. We examined the advantage of the

Advanced Wigner Method for Fault Detection and Diagnosis System

597

Figure 6. Experiment 1: Spectra of signal from the pipeline using STFT method when the valve opens

Figure 7. Experiment 1: Spectra of signal from the pipeline using STFT method when the valve closes

598

D.V. Tuan et al.

0

Magnitude

0.25 0.2

-50

0.15 -100

0.1 0.05

-150 8

7 X 105

0 6

5

4 3 Frequency

2

1

Time

0

Figure 8. Experiment 1: Spectra of signal from the pipeline using Advanced Wigner method when the valve opens

Figure 9. Experiment 1: Spectra of signal from the pipeline using Advanced Wigner method when the valve closes

Advanced Wigner Method for Fault Detection and Diagnosis System

599

50

Magnitude

0 0.5 -50

0.4

-100

0.3 0.2

-150 5

0.1

4

3

x 104

2

1

Frequency

Time

0

0

Figure 10. Experiment 1: Spectra of signal from the pipeline using the proposed method when the valve opens

Magnitude

50 0 0.8

-50

0.6 -100

0.4

-150 5

0.2 4 x 104

3

2 Frequency

1

0

0

Time

Figure 11. Experiment 1: Spectra of signal from the pipeline using the proposed method when the valve closes

600

D.V. Tuan et al.

0

Magnitude

-20 -40 0.4

-60 0.3 -80 8

0.2 6

0.1

4

x 105

2 0

Frequency

Time

0

Figure 12. Experiment 2: Spectra of signal in the closed valve using the STFT method

0

Magnitude

-20 0.2

-40 0.15

-60 0.1

-80 8 0.05

6 x 105

4 2 Frequency

0

0

Time

Figure 13. Experiment 2: Spectra of signal in the opened valve using the STFT method

Advanced Wigner Method for Fault Detection and Diagnosis System

601

Magnitude

50 0 0.4

-50 -100

0.3

-150 8

0.2 6

0.1

4

x 105

2 Frequency

Time

0

0

Figure 14. Experiment 2: Spectra of signal in the closed valve using the Advanced Wigner method

50

Magnitude

0 -50 0.2

-100 0.15 -150 8

0.1 6

x 105

0.05

4 2 Frequency

0

0

Time

Figure 15. Experiment 2: Spectra of signal in the opened valve using Advanced Wigner method

602

D.V. Tuan et al.

50

Magnitude

0

-50 0.4

-100 0.3 0.2

-150 8

6 x 10

5

4

0.1 2

0

0

Time

Frequency

Figure 16. Experiment 2: Spectra of signal in the closed valve using the Proposed method

20 0

Magnitude

-20 -40 -60 -80 -100 0.4

-120

0.3 0.2

-140 8

6 x 105

4 Frequency

0.1 2

0

0

Time

Figure 17. Experiment 2: Spectra of signal in the opened valve using the Proposed method

Advanced Wigner Method for Fault Detection and Diagnosis System

603

advanced Wigner method over previous methods such as STFT and Wigner distribution and found a good trade-oﬀ between the computational complexity and the correctness of the results. Moreover, the proposed method which is a combination between the advanced Wigner and Wavelet transformation methods lets us choose the most signiﬁcant frequency band. In order to compare the proposed method with previous methods, we consider the STFT, Wigner distribution, Fourier transformation and Wavelet transformation methods. After a full comparison between the proposed method and the others the positive results show that the proposed method has some advantages over the other methods. The results of the analysis also provide a good understanding in analysis of AE signals from the pipelines. These ﬁndings have many practical applications to improve fault detection and diagnosis systems. The proposed method for fault detection and diagnosis can be applied to other industrial ﬁelds, such as composite material and gasworks. Acknowledgment: This paper was supported by the research fund of University of Ulsan.

References 1. P. M. Franch, T. Martin, D. L. Tunnucliﬀe and D. K. Das-Gupta, PTCa/PEKK Piezo-composites for Acoustic Emission Detection, Sensor and Actuators, A, Vol. 99, pp. 235-243, 2002. 2. G. Strang, T. Nguyen, Wavelet and Filter Banks, 1996. 3. L. Cohen, Time-Frequency Analysis, 1995. 4. M. Norton, D. Karczub, Fundamental of Noise and Vibration Analysis for Engineers, 2003. 5. L. Stankovic, A Method for Time-Frequency Analysis. IEEE transaction on signal processing, Vol. 42, No. 1, 1, 1994. 6. M. Vetterli, J. Kovacevic, Wavelet and Subband Coding, 1995. 7. Y. Tian, P. L. Lewin, A. E. Davies, Z. Richardson, Acoustic Emission Detection of Partial Discharges in Polymeric insulation, High Voltage Engineering, 1999. Eleventh International Symposium on (Conf. Publ. No. 467). 8. Samchang Enterprise Co. LTD, User Manual for Acoustic Leak Monitoring System, July 1, 2002. 9. K. Groechenigs, Foundations of Time Frequency Analysis, 2001.

Appendix

Magnetization Vector s2

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 Mz(t,s2)

Mz(t,s1)

Magnetization Vector s1

0.5 0.4

0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 0.15

0 0.15

0.1

0.1

0.05 0.05

0 0

0.1

0.1

0.05

0.05

0 0

−0.05 −0.05

−0.05 −0.05

−0.1

My(t,s1)

,

M (t,s ) x

1

−0.1

My(t,s2)

M (t,s ) x

2

Magnetization Vector s4

Magnetization Vector s3

1

1 0.9

0.8

0.8 0.7 Mz(t,s4)

0.6

3

M (t,s )

0.6 z

0.5

0.4

0.4 0.3

0.2

0.2 0 0.2

0.1 0 0.15

0.15

0.1 0.1

−0.05

0

0

−0.05

0

0.05

0.05

0

0.05

My(t,s3)

0.1

0.1

0.05

−0.05

−0.1 Mx(t,s3)

,

My(t,s4)

−0.05 −0.1

−0.1 M (t,s ) x

4

Figure 1. From Top to bottom, magnetization vectors corresponding to coordinate positions s1 , s2 , s3 and s4 . (See Fig. 7 on page 37)

606

Appendix Magnetization Vector s

Magnetization Vector s5

6

1

1 0.9

0.8

0.8 0.7 6

M (t,s )

0.6 z

Mz(t,s5)

0.6 0.5

0.4

0.4 0.3

0.2

0.2 0 0.1

0.1 0 0.1

0.05

0.1 0.05

0.1

0.05 0

0

0 −0.05 −0.1

−0.1

My(t,s5)

5

y

Magnetization Vector s7

1

−0.1

0.9

0.9

0.8

0.8

0.7

0.7

−0.05

6

Mx(t,s6) Magnetization Vector s8

1

0.6

0.6 Mz(t,s8)

Mz(t,s7)

0

M (t,s )

,

M (t,s ) x

0.05 −0.05

−0.05

0.5

0.5

0.4

0.4

0.3

0.3 0.2

0.2 0.1

0.1 0

0 My(t,s7)

1

M (t,s )

0

−8

0.4

0.3

0.2

0.1

−1 0

x 10

0.5 Mx(t,s7)

0.6

0.8

0.7

0.9

1

y

,

8

−1

0

−8

0 1

0.1

0.4

0.3

0.2

x 10

0.6

0.5 M (t,s ) x

0.8

0.7

0.9

1

8

Figure 2. From top to bottom, magnetization vectors corresponding to coordinate positions s5 , s6 , s7 and s8 . (See Fig. 8 on page 38)

Magnetization Vector s

Magnetization Vector s

2

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 2

M (t,s )

0.5

0.5

z

Mz(t,s1)

1

0.4

0.4

0.3

0.3

0.2

0.2 0.1

0.1 0 0.6

0 0.6

0.5

0.5 0.4

0.4 0.2

0.2

0

0

0

0 −0.2

−0.2 −0.4

−0.5 x

−0.4

,

M (t,s )

My(t,s1)

1

M (t,s ) x

2

Magnetization Vector s4

3

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6 4

M (t,s )

0.6 0.5

0.5

z

Mz(t,s3)

−0.5

My(t,s2)

Magnetization Vector s

0.4

0.4 0.3

0.3 0.2

0.2 0.1

0.1 0 0.6

0 0.6

0.5

0.2

0

0.2

0.5

0.4

0.4

−0.2 M (t,s ) y

3

0 0

0 −0.5 M (t,s ) x

3

,

−0.2 My(t,s4)

−0.5 M (t,s ) x

4

Figure 3. From top to bottom, magnetization vectors corresponding to coordinate positions s1 , s2 , s3 and s4 . (See Fig. 9 on page 39)

Appendix Magnetization Vector s

Magnetization Vector s6

5

1

1

0.9

0.9 0.8

0.8

0.7

0.7

0.6 Mz(t,s6)

0.6 Mz(t,s5)

607

0.5 0.4

0.5 0.4 0.3

0.3

0.2 0.2

0.1 0.1

0 0.6

0.5

0 0.6

0.5 0.4

0.5

0.4

0

0.3

0.2

0.1

0

0 0

−0.5

−0.1

,

Mx(t,s5)

M (t,s ) y

0.2

5

−0.2

My(t,s6)

Mx(t,s6)

Magnetization Vector s

Magnetization Vector s8

7

1

1

0.9

0.9

0.8

0.8

0.7

0.7 0.6 Mz(t,s8)

0.6 Mz(t,s7)

−0.5

0.5

0.5 0.4

0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

0 −0.05 My(t,s7)

My(t,s8) 0

0.05 0

0.1

0.2

0.3

0.4

0.6

0.5 M (t,s ) x

0.7

0.8

0.9

7

1

−0.5 −1

−9

x 10

,

0

0.1

0.2

0.3

0.4

0.6

0.5 Mx(t,s8)

0.7

0.8

0.9

1

Figure 4. From top to bottom, magnetization vectors corresponding to coordinate positions s5 , s6 , s7 and s8 . (See Fig. 10 on page 40)

15 Slice Results Gradient Sequence (G(t))

0.01929

0.02

0.01928

0.015

0.01927

0.01

0.01926

0.005

Gradient (mT/mm)

by Pulse

by(t) RF Pulse Sequence

0.01925

0.01924

0

−0.005

0.01923

−0.01

0.01922

−0.015

0

0.2

0.4

0.6

0.8

1 Time (ms)

1.2

1.4

1.6

1.8

2

−0.02

,

0

0.2

0.4

0.6

0.8

1 Time (ms)

1.2

1.4

1.6

1.8

2

Figure 5. External magnetization component by (t) and gradient sequence G(t) for the 15 slice results, bx (t) is zero. (See Fig. 11 on page 41)

608

Appendix 15 Slice Penalty Results Gradient Sequence (G(t))

by(t) RF Pulse Sequence 0.1013

0.02

0.015

0.01

by Pulse

Gradient (mT/mm)

0.1012

0.005

0

−0.005

0.1011

−0.01

−0.015

0.1010

−0.02 0

0.2

0.4

0.6

0.8

1 Time (ms)

1.2

1.4

1.6

1.8

2

0

,

0.2

0.4

0.6

0.8

1 Time (ms)

1.2

1.4

1.6

1.8

2

Figure 6. External magnetization component by (t) and gradient sequence G(t) for the 15 slice penalty results, bx (t) is zero. (See Fig. 12 on page 42) Desired Magnetization Profile 1 M

x

0.9 0.8

Desired Magnetization Magnitude

0.7 0.6 0.5 0.4 0.3 0.2 0.1 Mz 0 1

3

5

7

9

11

13

15

Coordinate Position

Figure 7. Desired Mx and Mz distribution proﬁles for a 900 pulse. (See Fig. 13 on page 43)

Logitudinal Magnetization Profile Transverse Magnetization Profile 1.5

1

0.8

1

Logitudinal Magnetization

Transverse Magnetization

Mx

0.5

0.6

M

z

0.4

My 0

0.2

−0.5

1

3

5

7

9

Coordinate Position

11

13

0

15

,

1

3

5

7

9

11

13

15

Coordiante Position

Figure 8. Transverse magnetization components highlighting Mx and My magnitudes (left), Longitudinal Mz magnetization component magnitude (right). (See Fig. 14 on page 43)

Appendix

609

Cerebrospinal Fluid Image Positioning

s Coordinate Position

Sout

Cerebrospinal Fluid

Sin

Sout

x Coordinate Position

Figure 9. The angular position of cerebrospinal ﬂuid to be imaged by our MRI simulation. (See Fig. 15 on page 44) B: Generic RF Sinc Signal

Signal Amplitude

Signal Amplitude

A: gVERSE Signal

x Coordinate Position

,

x Coordinate Position

Figure 10. The signal produced by the gVERSE pulse MRI simulation over the diagonal cerebrospinal ﬂuid (A), and when a generic sinc RF pulse and gradient sequence is applied (B). (See Fig. 16 on page 45)

610

Appendix

Figure 11. Earthquake tsunami source. (See Fig. 1 on page 61)

Figure 12. Maximum elevations in Bay of Bengal. (See Fig. 2 on page 64)

Appendix

(a)

611

(b)

Figure 13. (a) Maximum elevations along Banda Aceh and (b) Maximum elevations along the westcoast of Thailand. (See Fig. 3 on page 64)

Figure 14. Comparison of tsunami measured with satellite altimetry by Jason1 and results of tsunami simulation. (See Fig. 4 on page 65)

612

Appendix

(a)

(b)

(c) Figure 15. Comparison of numerical tide gage data for (a) Hanimaadhoo, (b) Male, (c) Colombo, (d) Taphao Noi, and (e) mercator yatch. (See Fig. 5 on page 65/66)

Appendix

(d)

(e) Figure 15. (continued)

Figure 16. Candies or phtalate crystals? (See Fig. 1 on page 134)

613

614

Appendix

Figure 17. Sea shell crystallization (from [50]). (See Fig. 3 on page 135)

Figure 18. A simulation of the growth of a tumor mass coupled with a random underlying ﬁeld (from [3]). (See Fig. 4 on page 135)

Figure 19. Vascularization of an allantoid (from [27]). (See Fig. 5 on page 135)

Appendix

615

Figure 20. Angiogenesis on a rat cornea (from [26]) (left). A simulation of an angiogenesis due to a localized tumor mass (black region on the right) (from [25]) (right). (See Fig. 6 on page 136)

Figure 21. Response of a vascular network to an antiangiogenic treatment (from [33]). (See Fig. 7 on page 136)

616

Appendix

Figure 22. Simulated Experiment; a Johnson-Mehl tessellation. (See Fig. 12 on page 138)

TIME=2 sec

, TIME=9 sec

, Figure 23. A simulated crystallization process. (See Fig. 14 on page 141)

Appendix

617

Figure 24. Capture of a point x during time ∆t. (See Fig. 16 on page 146)

Figure 25. n−facets for a tessellation in IR2 . (See Fig. 18 on page 152)

145 140 135 130 125 120 115 2 1.8

1 1.6

0.8 0.6

1.4 0.4

1.2

0.2 1

0

Figure 26. Temperature after 10 minutes. Because of the symmetry, only the upper part of the rectangle ((0, L) × (L, 2L)) is plotted. (See Fig. 21 on page 157)

618

Appendix crystallinity 0

0

0.2

0.2

0.4

0.4

0.6

0.6

0.8

0.8

1

1

0

0.5

1

1

2

0

t=2 sec

0.5

1.9

0.9

1.8

0.8

1.7

0.7

1.6

0.6

1.5

0.5

1

t=4 sec

0

0

1.4

0.4

0.2

0.2

1.3

0.3

0.4

0.4

1.2

0.2

0.6

0.6

0.8

0.8

1

1

0.1

1.1

1

0

0.5

t=8 sec

1

0

0.5

0

0.2

0.4

0.6

0.8

1

0

1

t=16 sec

Figure 27. Degree of crystallinity after 5 and 15 minutes. (See Fig. 22 on page 157)

Figure 28. Plot of log(γ), where γ is the mean density of crystals interfaces in R2 , after 5 minutes. (See Fig. 23 on page 158)

Figure 29. The transformation of L-systems, structure and its inverse problem. (See Fig. 1 on page 164)

Appendix

619

Figure 30. Flow diagram of reconstruction process of branching structure. (See Fig. 2 on page 166)

Figure 31. Example of input images and volume sliced images of volume data, neuron and rice root images from [Str04], leaf network from [Ash99], actual soil volume data from P. Kolesik. (See Fig. 3 on page 167)

620

Appendix Original Image

Non Linear Diffusion

240

50

50

240

100

100 220

220 150

150 200

200

200

200 180

250

250 180

300

300 160

350

350

160

140 400

400 120

450

100

200

300

400

500

140

450

100

200

300

400

500

Time = 500, Step = 50

Figure 32. Example of nonlinear diﬀusion ﬁltering for combination of disconnected lines. (See Fig. 4 on page 168)

Figure 33. Example of input image, the color image (left) converted to the grayscale image (right). (See Fig. 5 on page 169)

Figure 34. The gray-scale intensity bar from zero intensity (black) to 255 intensity (white) with the range [a, b] and current pixel p. (See Fig. 6 on page 169)

Appendix

621

Figure 35. Setup of the input image and given initial pixel. (See Fig. 7 on page 170)

Figure 36. Growing process of each step. (See Fig. 8 on page 171)

622

Appendix

Figure 37. Example of CT volume data of plant root. (See Fig. 9 on page 172)

Figure 38. Sequence of volume growing method of root data in every several steps. (See Fig. 10 on page 174)

Appendix

(a)

(b)

623

(c)

Figure 39. A setup of Hilditch’s algorithm, (a) point labels from p1 to p9, (b) Fa (p1) = 1, Fb (p1) = 2, (c) Fa (p1) = 2, Fb (p1) = 2. (See Fig. 11 on page 174)

(a)

(b)

(c)

Figure 40. Function Fa (p1) and Fb (p1) from ﬁgure 11, (a) cycle of pattern from p2 to p9, (b) Fa (p1) = 1 (number of arrows represents (0,1) pattern), Fb (p1) = 2, (c) Fa (p1) = 2 (two arrows for two (0,1) patterns), Fb (p1) = 2. (See Fig. 12 on page 175)

(a)

(b)

(c)

Figure 41. The height ﬁeld map for the thickness of branching structure, (a) given object, (b) edge detection, (c) height ﬁeld map (black is short distance, white is long distance from boundary). (See Fig. 13 on page 176)

624

Appendix

(a)

(b)

(c)

(d)

(e)

Figure 42. L-string construction: (a) input network from algorithm 8.1, (b), (c), (d) and (e) step by step of L-string construction of input network (a). (See Fig. 14 on page 180)

Input image

iter=70

skel iter=1

iter=10

iter=90

skel iter=5

iter=30

iter=120

skel iter=10

iter=50

iter=157

skel iter=15

Figure 43. Region growing process and skeletonization in branching structure of clover plant. (See Fig. 15 on page 181)

Appendix

(a)

(b)

(c)

(d)

625

Figure 44. Two resolution structures of clover plant with diﬀerent δ value, (a) and (c) show the wire frame structure, (b) and (d) show 3D structure. (See Fig. 16 on page 182)

Actinidia latifolia (Actinidiaceae)

Actinidia latifolia (Actinidiaceae)

Actinidia latifolia (Actinidiaceae)

Actinidia latifolia (Actinidiaceae)

Fig. 39.2

Fig. 39.2

Fig. 39.2

Fig. 39.2

opposite percurrent - 4⬚s cross

opposite percurrent - 4⬚s cross

opposite percurrent - 4⬚s cross

opposite percurrent - 4⬚s cross

Figure 45. The reconstruction with de-noising process from an input image, denoising, smoothing, region growing, skeletonization, network construction, and 3D object. (See Fig. 17 on page 182)

626

Appendix

(a)

(b)

(c)

(d)

(e)

Figure 46. Some Tree-like structures constructed in L-string codes: (a) I(132.0) [+(29.61)I(117.8)] [-(29.58)I(119.0)], (b) I(136) [-(43.1)I(111.0)] I(101), (c) I(102.0) [+(45.21)I(104.6)] I(145.0), (d) I(108)[+(46.05)I(115.2)] [-(45.6)I(117.3)] I(137), (e) I(94.00) [+(45.64)I(96.89)] [-(1.29)I(84.00)] [-(45.1)I(100.4)] I(84), the number labels in each network are the pixel node numbers. (See Fig. 18 on page 183)

(a)

(b)

(c)

Figure 47. Some results of (a), (b) reconstructed roots and (c) neuron structure from P. J. Broser in HOC ﬁle. (See Fig. 19 on page 183)

Appendix

(a)

(b)

(c)

(d)

627

Figure 48. The vascular aneurysm and its reconstruction with diﬀerent conditions and resolutions. (See Fig. 20 on page 184)

Figure 49. 2D satellite image on September 9, 2005 at 10:00GMT. (See Fig. 1 on page 189)

628

Appendix

Figure 50. The conversion of 2D satellite image to 3D surface. (See Fig. 2 on page 189)

Figure 51. Satellite image on September 23, 2005 at 21:00GMT. (See Fig. 3 on page 190)

Appendix

629

Figure 52. Segmented cloud and storm from Fig. 1, (a) and (b) by Cdv = 50, Ccv = 140, (c) and (d) by Cdv = 70, Ccv = 100. (See Fig. 4 on page 192)

Figure 53. Segmented cloud and storm (a) and (b) by Cdv = 106, Ccv = 155, (c) and (d) by Cdv = 93, Ccv = 134. (See Fig. 5 on page 193)

630

Appendix

Figure 54. The transformation of gray-scale image to 2D histogram and its segmentation. (See Fig. 6 on page 194)

Figure 55. The comparison of cloud and storm segmentation of the same segmented region: input images (left), 2D histograms (middle) and output images (right), grayscale segmentation (upper row), color segmentation (lower row). (See Fig. 7 on page 194)

Appendix

631

Figure 56. 2D surfaces for volume rendering of cloud and storm reconstruction. (See Fig. 8 on page 195)

Figure 57. 3D topography of world (Etopo10) and earth (Etopo2). (See Fig. 9 on page 197)

632

Appendix

Figure 58. The typhoon Damrey from diﬀerent perspectives. (See Fig. 10 on page 198)

Appendix

633

Figure 59. The typhoon Kaitak from diﬀerent perspectives. (See Fig. 11 on page 199)

634

Appendix

Figure 60. The 3D storm reconstruction of typhoon Damrey every 6 hours starting from September 25, 2005 at 01:00GMT (left to right, top to bottom). (See Fig. 12 on page 200)

Appendix

635

Figure 61. The numerical result of typhoon Damrey every 6 hours starting from September 25, 2005 at 01:00GMT (left to right, top to bottom). (See Fig. 13 on page 201)

636

Appendix

Figure 62. The 3D storm reconstruction of typhoon Kaitak for every 6 hours starting from October 29, 2005 at 02:00GMT (left to right, top to bottom). (See Fig. 14 on page 202)

Appendix

637

Figure 63. The numerical result of typhoon Kaitak for every 6 hours starting from October 29, 2005 at 02:00GMT (left to right, top to bottom). (See Fig. 15 on page 203)

638

Appendix

Figure 64. The 3D reconstruction of hurricane Katrina: input image (from NASA) (a) in diﬀerent perspectives. (See Fig. 16 on page 204)

Appendix

639

Figure 65. The 3D storm reconstruction of hurricane Kyrill every 6 hours starting from January 18, 2007 at 01:00GMT (left to right, top to bottom). (See Fig. 17 on page 205) ODBS model [DuA00, HMI02, Dan06b]

Privacy

Confidentiality

Query Assurance

Auditing

[BoP02, DVJ+03, Dan05]

[BDW+04, Dan06b]

User Privacy

Data Privacy

[HIL+02, LiP04, ChM04, Dan06a]

[GIK+08, DuA00, Dan06b]

Correctness [BGL+03, MNT04, PaT04, PJR+05, NaT06, Sio05, this paper]

Completeness

Freshness

[PJR+05, NaT06, Sio05, this paper]

[this paper]

Figure 66. Security issues in the ODBS model. (See Fig. 1 on page 210) NID 0 1 2 3 4 5 6 7 8

B+Table Node (1,John,2,-,-1) (3,Bob,4,Ha,5) (6,Rose,7,Trang,8) (Alice,Anne,4) (Bob,Carol,5) (Ha,-,6) (John,Linh,7) (Rose,Son,8) (Trang,-,-1)

NID 0 1 2 3 4 5 6 7 8

B+Encrypted Table Encrypted Node Signature s0 D0a1n2g3Kh75nhs& T9&8ra§ÖÄajh3q91 s1 H&$uye’’µnÜis57ß@ s2 L?{inh*ß23&§gnaD s3 Wh09a/[%?Ö*#Aj2k s4 j8Hß}[aHo$§angµG s5 #Xyi29?ß~R@€>Kh s6 ~B3!jKDÖbd0K3}%§ s7 T-§µran&gU19=75m s8

Figure 67. EncryptedTable with tree node contents’ signatures. (See Fig. 3 on page 216)

640

Appendix

NID 0 1 2 3 4 5 6 7 8

Including NIDs (encrypted with the corresponding node contents)

B+Encrypted Table Signature Encrypted Node xD0a1n2g3Kh75nhs& yT9&8ra§ÖÄajh3q91 zH&$uye’’µnÜis57ß@ mL?{inh*ß23&§gnaD nWh09a/[%?Ö*#Aj2k oj8Hß}[aHo$§angµG p#Xyi29?ß~R@ C>Kh q~B3!jKDÖbd0K3}%§ fT-§µran&gU19=75m

s0 s1 s2 s3 s4 s5 s6 s7 s8

Figure 68. Settings for verifying completeness guarantees. (See Fig. 4 on page 217)

NID 0 1 2 3 4 5 6 7 8

Including NIDs and timestamps of the child nodes

B+Encrypted Table Encrypted Node Signature D0a1n2g3Kh75nhs. T9&8ra§ÖÄajh3q91c%. H&$uye’’µnÜis57ß@j9. L?{inh*ß23&§gnaDx<Wh09a/[%?Ö*#Aj2k;}o j8Hß}[aHo$§angµG10:’’ #Xyi29?ß~R@€>Kh{}~B3!jKDÖbd0K3}%§5, T-§µran&gU19=75mz*

s0 s1 s2 s3 s4 s5 s6 s7 s8

Figure 69. Settings for verifying freshness guarantees. (See Fig. 5 on page 218)

50K 2-d points (kd-tree)

CPU-time (sec)

30

Naive RSA Condensed RSA

25 20 15 10 5 0 Point

Range Insert Query type

Delete

Figure 70. Condensed RSA signature scheme vs. naive RSA signature scheme. (See Fig. 6 on page 220)

Appendix

641

CPU-time saving (%)

Computational cost savings

40 35 30 25 20 15 10 5 0

Point query Range query Insert Delete

10k

20k

30k

40k

50k

Dataset size

Figure 71. A variety of dataset sizes. (See Fig. 7 on page 221)

Figure 72. IRIS Explorer PSE for skin. (See Fig. 1 on page 253)

Figure 73. MultiDisplay visualising many output streams. (See Fig. 2 on page 255)

642

Appendix Number of function evaluations 1

Percentage of problems

0.8

0.6

0.4

0.2 TRON KNITRO-DIRECT KNITRO-CG KNITRO-ACTIVE 0 1

4

16

64 256 x times slower than the best

1024

4096

16384

Figure 74. Number of Function Evaluations. (See Fig. 1 on page 278)

CPU time 1

Percentage of problems

0.8

0.6

0.4

0.2 TRON KNITRO-DIRECT KNITRO-CG KNITRO-ACTIVE 0 1

4

16

64 256 x times slower than the best

1024

Figure 75. CPU Time. (See Fig. 2 on page 278)

4096

16384

Appendix Number of function evaluations 1

Percentage of problems

0.8

0.6

0.4

0.2 SNOPT L-BFGS-B KNITRO-DIRECT KNITRO-CG KNITRO-ACTIVE 0 1

4

16

64 256 1024 x times slower than the best

4096

16384

65536

Figure 76. Number of Function Evaluations. (See Fig. 3 on page 290)

CPU time 1

Percentage of problems

0.8

0.6

0.4

0.2 SNOPT L-BFGS-B KNITRO-DIRECT KNITRO-CG KNITRO-ACTIVE 0 1

4

16

64 256 1024 x times slower than the best

4096

16384

Figure 77. CPU Time. (See Fig. 4 on page 290)

65536

643

644

Appendix 0.7 0.6

value ofJ

0.5 0.4 0.3 0.2 0.1 0 0

20 40 60 80 number of cost function evaluation

100

Figure 78. (Left) Initial (Dashed line) and optimized structures (Solid line). (Right) Cost function evolution during the optimization process (history of convergence). (See Fig. 1 on page 310)

1.5 m

(b)

(a)

0.6 m

1

0.4 0.5

0.2 0

0 -0.5

-0.2

-1

-0.1 m

-1.5 m

Figure 79. free surface elevation ξ resulting from a reﬂection (a) on rectangular structures perpendicular to the coastline, (b) on optimized structures with no feasibility constraints. (See Fig. 2 on page 310)

0.65

0.65

0.6

0.6

value of J

Cost function value J

Best convergence History of convergence

0.55

0.5

0.45 100

Space control parameters

200 300 400 500 600 700 Distance Coastline/Geotextile (m)

0.55

0.5

800

0.45 0

10

20 30 40 number of evaluation of J

50

Figure 80. (Left) Cost function value with respect to the position of the geotube. The admissible domain for the geotubes is 350 − 800m. (Right) Cost function evolution during the optimization. We see the importance of using global minimization. (See Fig. 3 on page 311)

Appendix TLi TL1

TL2

RL2 RL1

TLk RLk

TLs

Node 2 (a)

Node k

t

SMij SMij

tp

ts SMij

RLj Node 1

t

SMij

t0 RLs

645

t

TL=2*tp+ ts (b)

Figure 81. (a) Switched Ethernet and (b) message transmission on a switch. (See Fig. 1 on page 315)

Figure 82. Message transmission model. (See Fig. 2 on page 316)

646

Appendix a−10%

0.8

0.8

0.4

0.4

Imaginary Part

Imaginary Part

a+10%

0 −0.4 −0.8 −0.5

0 −0.4 −0.8

0

0.5

1 Real Part

1.5

2

2.5

−0.5

0

0.5

0.8

0.8

0.4

0.4

0 −0.4 −0.8 −0.5

2.5

1.5

2

2.5

1.5

2

2.5

0

−0.8 0

0.5

1 Real Part

1.5

2

2.5

−0.5

0

0.5

1 Real Part Ω−10%

0.8

0.8

0.4

0.4

Imaginary Part

Imaginary Part

2

−0.4

Ω+10%

0 −0.4 −0.8 −0.5

1.5

ζ−10%

Imaginary Part

Imaginary Part

ζ+10%

1 Real Part

0 −0.4 −0.8

0

0.5

1 Real Part

1.5

2

2.5

−0.5

0

0.5

1 Real Part

Figure 83. Polar plots with a, ζ, Ω changed. (See Fig. 4 on page 368)

Appendix YCH4

647

YO2

0.4 0.3 0.3

0.25 0.2

0.2

0.15 0.1

0.1

0.05

2

2

0.01

1.5

−4

1

x 10

0.01

1.5

−4

1

x 10

0.005 0.5

0.5 0

r

0.005

0

0

r

z

YCO

0

z

YCO2

0.04

0.5

0.03

0.4 0.3

0.02

0.2 0.01

0.1 0

0 2

−4

2

0.01

1.5 1

x 10

0.01

1.5

−4

1

x 10

0.005 0.5

0.5 0

r

0.005

0

0

r

z

0

z

YH2

YH2O

0.12 0.06

0.1 0.08

0.04

0.06 0.04

0.02

0.02 0

0 2

−4

2

0.01

1.5 1

x 10

0.005

−4

0.01

1.5 1

x 10

0.5 r

0.005 0.5

0

0

z

r

0

0

z

Figure 84. Proﬁles of some major species. (See Fig. 1 on page 378)

Appendix 1400

1200

Time (s)

1000

800

600

400

200

0 P1(1)

P2(2)

P3(2)

P4(3)

P5(4)

P6(5)

P7(5)

P8(7)

Project Execution Time

Submission Time

Figure 85. 8 Projects on 8-node cluster. (See Fig. 2 on page 398)

1200

1000

800

Time (s)

648

600

400

200

0 P1(1)

P2(2)

P3(3)

P4(3)

P5(5)

P6(5)

P7(7)

P8(8)

Project Execution Time

Submission Time

Figure 86. 8 Projects on 16-node cluster. (See Fig. 3 on page 399)

Appendix

649

Figure 87. Matrix X with 6×6 blocks is distributed over a 3×3 processor template from a matrix point-of-view. The 9 processors are labeled from 0 to 8. (See Fig. 1 on page 449)

Figure 88. Same as Figure 1, but from a processor point-of-view. (See Fig. 2 on page 449)

Figure 89. Matrix X with 12 × 12 blocks is distributed over a 3 × 3 processor template from a matrix point-of-view. The 9 processors are numbered from 0 to 8. (See Fig. 3 on page 450)

Figure 90. Same as Figure 3, but from a processor point-of-view. (See Fig. 4 on page 450)

650

Appendix

Figure 91. (See on page 451)

Figure 92. (See on page 456)

Figure 93. Interaction diagram on client side. (See Fig. 2 on page 483)

Appendix

651

Figure 94. Interaction diagram on server side. (See Fig. 3 on page 484)

Figure 95. Comparison of wave tank and ANUGAwater stages at gauge 5. (See Fig. 2 on page 493)

652

Appendix

Figure 96. Complex reﬂection patterns and run-up into Monai Valley simulated by ANUGA and visualised using our netcdf OSG viewer. (See Fig. 3 on page 494)

Figure 97. Elevation of the computational domain. The vertical scale has been exaggerated to extenuate the features on the continental shelf. (See Fig. 4 on page 495)

Appendix

t = 400s

t = 800s

t = 1300s

t = 1800s

t = 2300s

t = 2800s

t = 3200s

t = 3600s

653

Figure 98. The evolution in time of a tsunami, initially 6m above the ocean at the Eastern boundary, over the Great Barrier Reef. Note the orientation of the plots is upside down. The southernmost point of the grid is at the top of the picture. (See Fig. 5 on page 497)

654

Appendix Boundary Condition Test 1

1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3

0 0.1 0.2 0.3 0.4 0.5

Figure 99. Plot of smoother for Test Problem 1. The boundary conditions for f: were obtained by ﬁtting a thin plate spline to the four data points. (See Fig. 1 on page 527)

Boundary Condition Test 2

1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 -0.2

0 0.1 0.2 0.3 0.4 0.5

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3

Figure 100. Plot of smoother for Test Problem 2. All of the boundary conditions are zero. (See Fig. 2 on page 527)

Appendix

655

Hole Example with Sin Boundary Condition

Figure 101. Example with missing data. The grid shows the discrete thin-plate spline ﬁt with boundary condition hf (x) = y(x). (See Fig. 3 on page 528)

Hole Example with Radial Boundary Condition

Figure 102. Example with missing data. The grid shows the discrete thin-plate spline ﬁt with the boundary conditions given in Boundary Condition Test Problem 1. (See Fig. 4 on page 528)

Figure 103. Isosurface representation of the original data set for the sphere example (left) and the ﬁnite element approximation on a grid with 2465 nodes (right). (See Fig. 5 on page 529)

656

Appendix

Figure 104. Isosurface representation of the original data set for the semi-sphere example (left) and the ﬁnite element approximation on a grid with 2465 nodes (right). (See Fig. 6 on page 529)

Span

The P1 Instance - Opt.Span 426 566 546 526 506 486 466 446 426

Min(96%) Max(93%) Avg(94%)

0

1

2

3

4

5

6

7

8

9

x 10000 moves

Figure 105. The change of solution quality in solving P1, maximum iterations = 10.000. (See Fig. 5 on page 544)

I(t)i,j = f (I(t − 1)i−1,j−1 , I(t − 1)i−1,j , I(t − 1)i−1,j+1 , I(t − 1)i,j−1 , I(t − 1)i,j , I(t − 1)i,j+1 , I(t − 1)i+1,j−1 , I(t − 1)i+1,j , I(t − 1)i+1,j+1 ) Figure 106. (See on page 566)

Appendix

657

Figure 107. Suppose feature vectors have three dimensions and the third feature in Z-axis is the key feature. Therefore, all feature vectors are partitioned into three groups denoted by three blocks. Each group of feature vectors is trained by an individual neural network. (See Fig. 2 on page 568)

(a)

(b)

Figure 108. (a) Relationship between the speedup and the number of networks. (b) Relationship between the eﬃciency and the number of networks. (See Fig. 8 on page 573)

Q

... A

Q B

O

O

A

Q B O

A

Q

A

...

Q

B O

Figure 109. Example of HMM. (See Fig. 1 on page 577)

O

658

Appendix

Figure 110. An HMM for digital camera. (See Fig. 3 on page 580) Recording Preamplifer Main Amplifier BNC2110 AE sensor H15 DAQ Device PCI-6111 Processing Computer with Labview Acoustic Emission wave Log File

Ploting Computer With Matlab

Warning

Figure 111. Layout of the complete laboratory experimental setup. (See Fig. 1 on page 590)

Appendix

659

a water Tank Pipeline Closed Control Valve

Water flow

Measurement point

Pipeline

b Openned Valve (I)

Closed Control Valve

Closed Valve (II)

Water flow

Figure 112. (a) The ﬁrst experiment in ‘Condensate Feed Water Circulation Pipeline’ in power plant. (b) The Second experiment in Nuclear power plants, the Opened valve and Closed valve are checked. (See Fig. 2 on page 591)

Figure 113. (a) and (b) Spectra of signal from the pipeline using FFT method when the valve opened and closed in Figure 2(a). (c) and (d) Spectra of signal from the closed valve and opened valve using FFT method in Figure 2(b). (See Fig. 3 on page 592)

660

Appendix HPF

Sig HPF HPF

LPF

HPF

LPF LPF

a

LPF

Sig

Sig HPF HPF

LPF Sig

LPF

b

Figure 114. The model of Wavelet transformation to (a) reﬁne low frequency components, (b) reﬁne high frequency components. The signals ’sig’ are examined. (See Fig. 5 on page 595)

Figure 115. Experiment 1: Spectra of signal from the pipeline using STFT method when the valve opens. (See Fig. 6 on page 597)

Appendix

661

Figure 116. Experiment 1: Spectra of signal from the pipeline using STFT method when the valve closes. (See Fig. 7 on page 597)

0

Magnitude

0.25 0.2

-50

0.15 -100

0.1 0.05

-150 8

7 X 105

0 6

5

4 3 Frequency

2

1

Time

0

Figure 117. Experiment 1: Spectra of signal from the pipeline using Advanced Wigner method when the valve opens. (See Fig. 8 on page 598)

662

Appendix

Figure 118. Experiment 1: Spectra of signal from the pipeline using Advanced Wigner method when the valve closes. (See Fig. 9 on page 598)

50

Magnitude

0 0.5 -50

0.4

-100

0.3 0.2

-150 5

4 x 104

0.1 3

2 Frequency

1

0

0

Time

Figure 119. Experiment 1: Spectra of signal from the pipeline using the proposed method when the valve opens. (See Fig. 10 on page 599)

Appendix

663

Magnitude

50 0 0.8

-50

0.6 -100

0.4

-150 5

0.2 4

3

x 104

2 Frequency

1

0

0

Time

Figure 120. Experiment 1: Spectra of signal from the pipeline using the proposed method when the valve closes. (See Fig. 11 on page 599)

0

Magnitude

-20 -40 0.4

-60 0.3 -80 8

0.2 6

x 105

0.1

4 2 Frequency

0

0

Time

Figure 121. Experiment 2: Spectra of signal in the closed valve using the STFT method. (See Fig. 12 on page 600)

664

Appendix

0

Magnitude

-20 0.2

-40 0.15

-60 0.1

-80 8 0.05

6 4

x 105

2 0

Frequency

Time

0

Figure 122. Experiment 2: Spectra of signal in the opened valve using the STFT method. (See Fig. 13 on page 600)

Magnitude

50 0 0.4

-50 -100

0.3

-150 8

0.2 6

x 105

0.1

4 2 Frequency

0

0

Time

Figure 123. Experiment 2: Spectra of signal in the closed valve using the advanced Wigner method. (See Fig. 14 on page 601)

Appendix

665

50

Magnitude

0 -50 0.2

-100 0.15 -150 8

0.1 6

0.05

4

x 105

2 0

Frequency

Time

0

Figure 124. Experiment 2: Spectra of signal in the opened valve using Advanced Wigner method. (See Fig. 15 on page 601)

50

Magnitude

0

-50 0.4

-100 0.3 0.2

-150 8

6 x 105

4

0.1 2

0

0

Time

Frequency

Figure 125. Experiment 2: Spectra of signal in the closed valve using the Proposed method. (See Fig. 16 on page 602)

666

Appendix

20 0

Magnitude

-20 -40 -60 -80 -100 0.4

-120

0.3 0.2

-140 8

6 x 105

4 Frequency

0.1 2

0

0

Time

Figure 126. Experiment 2: Spectra of signal in the opened valve using the Proposed method. (See Fig. 17 on page 602)

Modeling, Simulation and Optimization of Complex Processes: Proceedings of the International Conference on High Performance Scientific Computing, March 10-14, 2003, Hanoi, Vietnam

Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Third International Conference on High Performance Scientific Computing, March 6-10, 2006, Hanoi, Vietnam

Modeling, Simulation and Optimization of Complex Processes: Proceedings of the International Conference on High Performance Scientific Computing, March 10-14, 2003, Hanoi, Vietnam

Modeling, Simulation and Optimization of Complex Processes

High Performance Computing on Vector Systems 2006: Proceedings of the High Performance Computing Center Stuttgart, March 2006

High Performance Computing on Vector Systems 2006: Proceedings of the High Performance Computing Center Stuttgart, March 2006

High Performance Computing on Vector Systems: Proceedings of the High Performance Computing Center Stuttgart, March 2005

Proceedings of the Conference on Applied Mathematics and Scientific Computing

Proceedings of the Conference on Applied Mathematics and Scientific Computing

Proceedings of the Conference on Applied Mathematics and Scientific Computing

High Performance Computing - HiPC 2006: 13th International Conference Bangalore, India, December 18-21, 2006, Proceedings

Design Computing and Cognition '08: Proceedings of the Third International Conference on Design Computing and Cognition

High-Performance Scientific Computing: Algorithms and Applications

High-Performance Scientific Computing: Algorithms and Applications

High-Performance Scientific Computing: Algorithms and Applications

Ubiquitous Intelligence and Computing: Third International Conference, UIC 2006, Wuhan, China, September 3-6, 2006, Proceedings

Modeling, Simulation and Optimization – Focus on Applications

Scientific American (March 2006)

The Rise of Games and High Performance Computing for Modeling and Simulation

The Rise of Games and High Performance Computing for Modeling and Simulation

The Rise of Games and High Performance Computing for Modeling and Simulation

Introduction to High Performance Scientific Computing

Introduction to High Performance Scientific Computing

Introduction to High Performance Scientific Computing

Introduction to High Performance Scientific Computing

High Performance Computing and Communications: Second International Conference, HPCC 2006, Munich, Germany, September 13-15, 2006, Proceedings

Security in Pervasive Computing: Third International Conference, SPC 2006, York, UK, April 18-21, 2006, Proceedings

High performance computing

High Performance Computing

High Performance Computing

Complex geometry: proceedings of the Osaka international conference

Proceedings of the European Computing Conference 2

Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Third International Conference on High Performance Scientific Computing, March 6-10, 2006, Hanoi, Vietnam

Modeling, Simulation and Optimization of Complex Processes: Proceedings of the International Conference on High Performance Scientific Computing, March 10-14, 2003, Hanoi, Vietnam

Modeling, Simulation and Optimization of Complex Processes

High Performance Computing on Vector Systems 2006: Proceedings of the High Performance Computing Center Stuttgart, March 2006

High Performance Computing on Vector Systems 2006: Proceedings of the High Performance Computing Center Stuttgart, March 2006

High Performance Computing on Vector Systems: Proceedings of the High Performance Computing Center Stuttgart, March 2005

Proceedings of the Conference on Applied Mathematics and Scientific Computing

Proceedings of the Conference on Applied Mathematics and Scientific Computing

Proceedings of the Conference on Applied Mathematics and Scientific Computing

High Performance Computing - HiPC 2006: 13th International Conference Bangalore, India, December 18-21, 2006, Proceedings

Design Computing and Cognition '08: Proceedings of the Third International Conference on Design Computing and Cognition

High-Performance Scientific Computing: Algorithms and Applications

High-Performance Scientific Computing: Algorithms and Applications

High-Performance Scientific Computing: Algorithms and Applications

Ubiquitous Intelligence and Computing: Third International Conference, UIC 2006, Wuhan, China, September 3-6, 2006, Proceedings

Modeling, Simulation and Optimization – Focus on Applications

Scientific American (March 2006)

The Rise of Games and High Performance Computing for Modeling and Simulation

The Rise of Games and High Performance Computing for Modeling and Simulation

The Rise of Games and High Performance Computing for Modeling and Simulation

Introduction to High Performance Scientific Computing

Introduction to High Performance Scientific Computing

Introduction to High Performance Scientific Computing

Introduction to High Performance Scientific Computing

High Performance Computing and Communications: Second International Conference, HPCC 2006, Munich, Germany, September 13-15, 2006, Proceedings

Security in Pervasive Computing: Third International Conference, SPC 2006, York, UK, April 18-21, 2006, Proceedings

High performance computing

High Performance Computing

High Performance Computing

Complex geometry: proceedings of the Osaka international conference

Proceedings of the European Computing Conference 2

Recommend Documents