DYNAMIC MODEL DEVELOPMENT Methods, Theory and Applications
COMPUTER-AIDED CHEMICAL ENGINEERING Advisory Editor: R. GanI Volume 1: Volume2: Volume 3: Volume 4: Volume 5: Volume 6: Volume 7: Volumes: Volume 9: Volume 10: Volume 11: Volume 12: Volurne 13: Volume 14: Volume 15: Volume 16:
Distillation Design in Practice (LM. Rose) TheArtof Chemical Process Design {G.L Wells and L M . Rose) Computer Programming Examples for Chemical Engineers (G. Ross) Analysis and Synthesis of Chemical Process Systems (K. Hartmann and K. Kaplick) Studies in Computer-Aided Modelling. Design and Operation Part A: Unite Operations (I. Pallai and Z. Fonyo, Editors) Part B: Systems (I. Pallai and G.E. Veress, Editors) Neural Networks for Chemical Engineers (A.B. Bulsari, Editor) Material and Energy Balancing in the Process Industries - From Microscopic Balances to Large Plants (V.V. Veverka and R Madron) European Symposium on Computer Aided Process Engineering-10 (S. Pierucci, Editor) European Symposium on Computer Aided Process Engineering-11 (R. Gani and S.B. Jorgensen, Editors) European Symposium on Computer Aided Process Engineering-12 (J. Grievink and J. van Schijndel, Editors) Software Architectures and Tools for Computer Aided Process Engineering (B. Braunschweig and R. Gani, Editors) Computer Aided Molecular Design: Theory and Practice (L.E.K.Achenie, R. Gani and V Venkatasubramanian, Editors) Integrated Design and Simulation of Chemical Processes (A.G. Dimian) European Symposium on Computer Aided Process Engineering-13 (A. Kraslawski and I. Turunen, Editors) Process Systems Engineering 2003 (Bingzhen Chen and A.W. Westerberg, Editors) Dynamic Model Development: Methods, Theory and Applications (S.R Asprey and S. Macchietto, Editors)
COMPUTER-AIDED CHEMICAL ENGINEERING, 16
DYNAMIC MODEL DEVELOPMENT Methods, Theory and AppUcations Proceedings from a Workshop on The Life of a Process Modei - From Conception to Action October 25-26,2000, Imperial College, London, UK
Edited by
S.P. Asprey S.Macchietto Centre for Process Systems Engineering Department of Chemical Engineering Imperial College of Science, Technology and Medicine Prince Consort Road, London, SW72BY UK
2003 ELSEVIER Amsterdam - Boston - Heidelberg - London - New York - Oxford Paris - San Diego - San Francisco - Singapore - Sydney -Tokyo
ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 211,1000 AE Amsterdam, The Netherlands © 2003 Elsevier Science B.V. All rights reserved. This work is protected under copyright by Elsevier Science, and the following terms and conditions apply to its use: Photocopying Single photocopies of single chapters may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document deUvery. Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use. Permissions may be sought directly from Elsevier Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
[email protected]. You may also complete your request on-line via the Elsevier Science homepage (http://www.elsevier.com), by selecting 'Customer support' and then 'Obtaining Permissions'. In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (+1) (978) 7508400, fax: (+1) (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London WIP OLP, UK; phone: (+44) 207 631 5555; fax: (+44) 207 631 5500. Other countries may have a local reprographic rights agency for payments. Derivative Works Tables of contents may be reproduced for internal circulation, but permission of Elsevier Science is required for external resale or distribution of such material. Permission of the Publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the Publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter. Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions requests to: Elsevier Science Global Rights Department, at the fax and e-mail addresses noted above. Notice No responsibiUty is assumed by the Pubhsher for any injury and/or damage to persons or property as a matter of products UabiUty, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made.
First edition 2003 Library of Congress Cataloging in Publication Data A catalog record from the Library of Congress has been apphed for. British Library Cataloguing in Publication Data A catalogue record from the British Library has been appUed for.
ISBN: 0-444-51465-1 ISSN: 1570-7946 (Series) © The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed in Hungary.
Foreword Detailed mathematical models are increasingly being used by companies to gain competitive advantage through such applications as model-based process design, control and optimisation. Thus, building high quality steady-state or dynamic, single or multiresponse empirical or mechanistic models of processing systems has become a key activity in Process Engineering. This activity involves the use of several methods and techniques including model solution techniques, nonlinear regression for parameter estimation, nonlinear systems identification, model verification and validation, and optimal design of experiments just to name a few. In turn, several issues and open-ended problems arise within these methods, including, for instance, use of higher-order information in establishing parameter estimates, establishing metrics for model credibility, closed-loop identification and parameter estimation issues, and extending experiment design to the dynamic situation. Papers included in the book cover such topics as mathematical model representation, implementation and solution, rapid model development (including advances in nonlinear regression; structural analysis; and, automated mechanistic model building / nonlinear systems identification), model quality (including validation and verification techniques; advanced statistical metrics; as well as nonlinearity issues), model selection and improvement (including optimal design of experiments, as well as recursive, on-line techniques), computeraided modelling tools, and finally, industrial applications / challenges. The material covered in the book conveys information to a wide audience including researchers and practitioners within the process industries and academia, allowing easier development and full use of detailed and high fidelity models with reliable and quantified characteristics. Potential applications of these techniques in all engineering disciplines are abundant, including applications in chemical kinetics and reaction mechanism elucidation, polymer reaction engineering, physical properties estimation, biochemical and tissue engineering, and crystallisation. These models, in turn, can increasingly be used in and become the source of competitive advantage for such applications as model-based process design, control and optimisation. As a result, we can expect to see substantial reduction in the costs and time of development of building mechanistic, first-principles models, as well as increase the precision and confidence in their subsequent use. On the academic side, the book will serve to generate research ideas to further develop the underlying methods for mechanistic model building. The book will also serve as an excellent reference for postgraduate and research students, and has excellent potential to be used as supplementary reading in a graduate course on process modelling.
S. Asprey and S. Macchietto, 2002
This Page Intentionally Left Blank
Contents
Foreword
v
Methodological Aspects in the Modelling of Novel Unit Operations H. Haario and I. Turunen
1
Dynamic Modelling, Nonlinear Parameter Fitting and Sensitivity Analysis of a Living Free-Radical Polymerisation Reactor A. Flores-Tlacuahuac, E. Saldivar-Guerra, andR. Guerrero-Santos
21
An Investigation of Some Tools for Process Model Identification for Prediction N. R. Kristensen, H. Madsen, andS. Bay J0rgensen
41
Multivariate Weighted Least Squares as an Alternative to the Determinant Criterion for Multiresponse Parameter Estimation P. W. Oxby, T. A. Duever, and P. M. Reilly
63
Model Selection: An Overview of Practices in Chemical Engineering P. J. T. Verheijen
85
Statistical Dynamic Model Building: Applications of Semi-infinite Programming S. P. Asprey
105
Non-constant Variance and the Design of Experiments for Chemical Kinetic Models A. C. Atkinson
141
A Continuous-Time Hammerstein Approach Working with Statistical Experimental Design D. K. Rollins
159
Process Design Under Uncertainty: Robustness Criteria and Value of Information F. P. Bernardo, P. M. Saraiva, andE. N. Pistikopoulos
175
A Modelling Tool for Different Stages of the Process Life M Sales-Cruz and R. Gani
209
Appendix
238
Other Papers Presented at the Workshop
251
Author Index
253
Subject Index
255
Dynamic Model Development: Methods, Theory and Applications S.P. Asprey and S. Macchietto (editors) © 2003 Elsevier Science B.V. All rights reserved
Methodological Aspects in the Modelling of Novel Unit Operations H. Haario^ and I. Turunen ^Yliopistokatu 5, FIN-00014, University of Helsinki, Helsinki, Finland ^Lappeenranta University of Technology, P.O.Box 20, FIN-53851 Lappeenranta, Finland Modelling is discussed as a tool for development of novel unit operations. General aspects, including model selection, validation and integration, are considered in the first part. In the second part of the paper novel methods for model building are discussed. INTRODUCTION: R&D STRATEGIES The main tools in the development of novel unit processes are modelling and experimentation. These two activities support each other; therefore any up-to-date methodology of process development should be based on integrated utilization of models and experiments. The requirements and results of experimental work have to be taken into account when planning the modeling strategy, and vice versa. A good description of industrial R&D strategy, valid for the development of novel unit processes, has been given by Euzen, Trambouze and Wauquier [1]. An example of modelling in industrial R&D project has been described by Piironen, Haario and Turunen [2]. The following two roles of models are especially important from the viewpoint of process development: • The theory and the mechanisms of the process are presented by models, therefore models increase the understanding of the process. • The models decrease R&D costs, because they make it possible to decrease the number of experiments and to increase the scale-up ratios. The first item often calls for deep scientific study of the process. In practical process development, this might contradict the second item, which emphasizes limited efforts. More detailed models often require extensive theoretical and experimental work and thus increase R&D costs. Therefore a very challenging task in process development is to formulate modeling strategy where different models are combined with different types of experiments to obtain reliable information for all relevant purposes with minimum costs within acceptable
time limits. Optimal design of experiments can provide tools for this task. Another challenge is created by modem measurement instruments: how to calibrate rapidly and effectively utilize the information content of the often massive amount of data they produce. We will discuss here the general aspects of modelling as well as some more specific computational tools. 1. THE ROLE OF MODELLING IN R&D Different steps in the modeling of novel unit processes can be presented e.g. in the following way: • • •
• • • • • • • •
Identification of the main purpose of the model, i.e. clear statement of the industrial objective. Identificationof the different phenomena in the process. Identification of the most important phenomena and planning their experimental research. In this connection it is often necessary to divide the process into several subsystems to avoid excessive complexity and too large a number of parameters. An adequate combination of laboratory, mock-up and pilot scale experiments has to be chosen from the extremely large number of possible ones. Selection of the theoretical basis from possible several competitive theories. Formulation of equations Parameter estimation and model validation based on experimental data. Experimental design methodologies should be adopted. Presentation and interpretation of the results Documentation of the model Integration of the model in the total system, i.e. evaluation of the impact of the new technology on the whole industrial process. Further development of the model as the project proceeds.
This modeling sequence is iterative as the developer usually returns to earlier steps after checking the results. From the preceding list experimentation, model selection, validation and integration can be considered as the cornerstones of industrial modeling methodology. 1.1 Model selection Commercial simulation programs can be used only to a limited extent in the development of novel unit processes. Usually existing models are not detailed enough and are incapable of coping with the specific features of real development projects. Flexible tools which could reach the required specificity and detailed nature have been developed in the universities. For one reason or another, these have stayed at the level of research results and are not much utilized in practice. Therefore companies develop their own specific and detailed models for their own purposes. This is especially true in the case of more complicated process units. One can recognize several important sub-problems in model selection. Often there are competitive theories to describe the relevant phenomena. There is always a choice between a
mechanistic and an empirical approach. The goal should be a mechanistic model but usually a compromise is needed in practice. All these decisions in model selection should be based on the main purpose of the model. Therefore it is very important to identify this purpose in the beginning of modeling activities. The purpose of the model also determines the degree of detail in the model. Increasing the details and the theory in the models, while usually increasing the accuracy and reliability, also often brings more parameters to be estimated and therefore also more experimental activities. In practice the best way often is to use several levels of "granularity", so that detail can be provided in the critical areas but simplified or empirical approaches are used in non-critical areas. 1.2 Model validation Validation of the models on the basis of experimental results is extremely important. First of all, the complexity of the model has to be compatible with the quality of the experimental data available. Crude data with high noise level may only identify rather crude models. A proper interplay between experimental and modeling work is crucial. This is a topic still overlooked in many R&D projects. A good fit between the model and measurements is usually not sufficient, but in addition, the proper values of parameters have to be identified and possible mutual correlations between them revealed. This is especially important when the model is used for extrapolation, e.g. scale-up. 1.3 Integration Integration of the models can be considered from different viewpoints. First the new unit process, as well as its model, should be considered as a part of the whole process. Optimal design of a plant can be found only in that way. Models have to be integrated also in the whole process development project, i.e. the modelling activity should proceed concurrently with process development. Different models are now developed for different purposes and at different stages of the process life-cycle, e.g.: • • •
at development, i.e. to demonstrate or investigate an idea before applying it in the process at design, i.e. for equipment sizing and scale-up, selection of operating conditions etc. at plant operation, i.e. for operation support, control, optimization or trouble shooting.
Model integration can be implemented also using a single model with minor modifications through all those stages. In order to do this a model should be based on sound engineering concepts. It must be pointed out that both model and process are evolving things; as the model is modified once new information about the process becomes available, in the same way, the process can be modified as soon as the model highlights how to improve the performance.
Finally, another important reason why companies should integrate models in life cycle of a process, i.e. by developing models concurrently with technologies, is that nowadays customers require them. A model may be used, for instance, to optimize the plant operation thus greatly improving its profitability. Models represent, in fact, a modem version of the operational handbook of the plant - and more. 2. NEW COMPUTATIONAL TOOLS In the following, we discuss in more detail certain novel methods for two basic topics in process model building: estimation of kinetic parameters and optimal design of experiments. 2.1 Rapid estimation of kinetic parameters by implicit calibration Reaction kinetics is often the bottleneck of reliable simulation studies, especially for complex or fast reaction mechanisms. Online measurements of various spectra - UV, IR, NIR, etc, combined with chemometrical calculation tools are increasingly used to identify the kinetics. We present here a recent approach which combines nonlinear parameter estimation with implicit, online calibration between measured spectra and calculated concentrations. In the standard parameter estimation approach the kinetic parameters are fitted against measured concentration units. The kinetic is supposed to be known except for some unknown parameter values, the model would give the concentrations if all model parameters were known. The initial values for the unknowns are guessed and optimized (estimated) iteratively by nonlinear regression in order to give the best possible fit between measures and calculated concentrations. In the approach that uses chemometrics, a computational model predicts the concentrations on the basis of the measured spectra. This requires off-line calibration, 'training sets' of known mixtures of concentrations and the spectra measured of them. Various principal component type multivariate methods - PCR, the principal component regression, or PLS, the partial least squares, some of the most common of them - exist for creating the model that maps the spectra into the concentration units. Once the calibration model has been created, measurement of the spectra can be done on-line during the reaction, often with rather dense sampling intervals, and the spectra are transformed by the calibration model into concentration units. The parameter estimation is then carried out exactly as in the standard approach, using the calculated concentrations as the measurements. The benefit is not limited to larger quantities of data, also the quality of concentration data obtained in this way may clearly exceed that obtained by more traditional chromatographic methods, see [12] as an example. Difficulties in preparing known mixtures may arise, anyway. In cases with fast reactions or elusive intermediates it may be difficult or impossible to create the training set foiathe calibration model. Another pitfall may be the scope of the training set: a reliable performance of the calibration inodel requires that the training set has been designed well
enough to cover the situations that will be met in the reaction batches. A considerable laboratory work may be required because of these aspects. Here we discuss a novel approach where minimal or no off-line calibration is done. The calibration of spectra and concentration units is done computationally, together with the parameter estimation procedure. No preliminary concentration data are available, with the natural exception that often the initial values of the reaction batches are known, together with the measured spectra at the same moments. The idea is to produce the concentration values from the model and test how these values may be calibrated against the measured spectra. Hopeftilly, the correct values of the kinetic parameters can be calibrated better than the incorrect ones. Figure 1 gives a schematic picture of the various approaches: arrows 1 present the standard way of fitting parameters and concentrations. Arrows 2 represent the chemometrical calibration of spectra and concentration units. Arrows 3 exhibit the implicit calibration.
dt dCs dt
—kiCjiCs + k2CcCo
Figure 1. Schematic picture of the various approaches The idea of going the way indicated by arrow 3 is to avoid any (or many) explicit calibration work. So we have called it 'implicit calibration'. Calibrations are in fact done, but between computed values, as the algorithm iterates via the ways indicated by arrows 1 and 2. There are several ways to proceed, we have studied mostly the methods where the calibration step is a sub-task of the kinetic fitting problem. Other approaches are discussed briefly below. We will see how the approach succeeds in finding correct values for the kinetic parameters, also in a case where more traditional off-line calibration failed. Let us clarify the implicit calibration, the way of Arrow 3, with an example, esterification of methanol with formic acid. Consider the equilibrium reaction A + B <-> C + D, where A is methanol, B formic acid, C methyl formiate and D water. Let us denote the forward and backward reaction constants as kl and k2. From the implicit calibration point of view, this is
an interesting case because the reaction is fast, traditional analyses are difficult to carry out because the reaction starts immediately already at room temperature. The spectroscopic IR data was produced at Lappeenranta University of Technology. It turned out that the standard chemometrical procedure, Arrow 2, was not successful: even the short time between getting the mixtures of the components done and getting them into the instrument producing the IR spectra measured of them, was too much. The reaction had advanced so that no 'known' concentrations existed anymore that would correspond to the spectra at the sampling time. Of course, the concentrations could be calculated, from the known initial values at mixing time, if the kinetic rate constants were known - but they are exactly the unknowns to be estimated. So this situation calls for a methodology that is able to combine the calibration and model parameter identification. We shall see below that a simultaneous calibration of the spectra and parameter estimation could indeed be successftilly done. Six reaction batches were run, all in the same (room) temperature, with different initial concentrations for the components A,B,C, D. The spectra were measured at about 20 sampling times during each batch. In implicit calibration, the calibration step is performed separately inside the kinetic parameter estimation loop, i.e. the calibration takes place between the measured spectra and the concentrations calculated by the model using the current values of the kinetic parameters. The improvement is based on the idea that with more correct kinetic parameter values the fit of the calibration should also be better. Several alternatives for the calibration are possible, they are discussed below. Here we give the results just for one choice. Inside the parameter estimation loop the spectra were calibrated with the PLS method using 4 PLS dimensions. Figure 2 below shows the contour lines of the R2 values of the concentration fit, the concentrations calculated by the kinetic model versus the 'data', the concentrations calibrated from the spectra. The R2 values are given in a grid, we can see a clear optimum around the point kl=0.1, k2 = 0.2.
Figure 2. Contour lines of the R2 values of the concentration fit The same reaction was earlier studied at Kemira Agro Oyj Research Centre where a reliable kinetic model was developed by traditional nonlinear parameter estimation, using a sophisticated on-line HPLC system especially designed for fast reaction. So we are able to
compare the results, obtained by quite different measurement approaches at different temperatures. The values obtained by implicit calibration turned out to practically coincide with those obtained by the traditional parameter estimation procedure, which, however, required much more difficult experimental work. In general, we have several alternatives to choose. The fit can be done either in the concentration units or in the absorbance units. We should also employ any a priori knowledge available to 'fix' the empirical calibration step: positivity, smoothness, stoichiometry, linear constraints and so on. Furthermore, various objective functions can be employed: ordinary least squares, the overall R2 value of the fit, separate R2 values for each chemical component, etc. This gives us a multitude of calibration alternatives. The basic choices are: • • • •
The 'direction' of calibration (fit in absorbance/concentration units) The calibration method (ordinary regression, principal component regression, PLS, Ridge regression) The constraints (positivity, smoothness, stoichiometry ....) The objective function
It is intuitively clear that a too flexible calibration method would fit too well to concentrations calculated even with wrong values for the kinetic parameters. The choice of a suitable 'rigidity' for the soft calibration step is crucial. In addition, our studies show that the most important factors in obtaining reliable results are proper design of the batch experiments, proper weighting of the initial measurements of each batch (in cases where both spectra and concentration data is available for initial values), and proper selection of the objective function. A method related to our approach is so-called curve resolution. There, one aims at factoring the measured spectral matrix into a product between pure component spectra and concentration matrixes. It should be noted that there is a major difference in the curve resolution and implicit calibration approaches. The former is a valuable tool when the kinetic mechanism is not known, helping us in finding the possible reaction schemes. The latter, in turn, provides a tool for finding a quantitative kinetic model when the kinetic mechanism is known. The pure curve resolution procedure may also be extended to take into account the kinetics that produces the data. Indeed, this is another way to do the computational calibration [3] and has been further studied by other authors in, e.g., [14-16]. Above, we have exclusively discussed absorbance spectra and concentration units. The approach is, however, more general. Instead of absorbance, we could have any analytical signal, and instead of concentrations, we could have any state variable of the kinetic system, mole fraction, mass fraction, temperature, etc. The proposed implicit calibration method seems to be able to overcome many difficulties: no calibration based on known mixtures or pure components is needed. The measurements correspond to the true state of the reaction at each time step. All intermediates effect the
measurements provided that the instrument is sensitive to them. Thus, reaction mechanisms that could not be determined by traditional methods can be handled, and even in cases where traditional methods v^ork, the implicit calibration approach can significantly speed up the kinetics estimation procedure. Nevertheless, a multitude of details that affect the results have to be further studied. These include preprocessing of spectra, the direction of the implicit calibration, the multivariate calibration method itself, the measure of goodness of fit, use of prior knowledge in constraining the solution, weighting of measurements, the dimension of the calibration model or the weights for the constraints and finally the design of the experiments. For more details and references see [3,4,14,15,16]. 2.2 Global criteria for design of experiments The traditional criteria - D, A, E, etc - for optimal design of experiments are all based on linear theory. For nonlinear models they are applied via linearization. In some cases this approach has, in addition to the inevitable distortion caused by linearization, a certain drawback. With correlated model parameters the Jacobian matrix becomes singular, and no good experimental points are found by numerical optimization. Thus methods are required that take into account the parameter effects more globally. In [6] we employed a method we called 'Parameter Discrimination^ as a global criterion in a case of enzyme kinetics. The traditional criteria did not produce reasonable results. So we used the idea of finding 'bad' parameter vector pairs from the confidence region: parameter combinations far away from each other that produce roughly equally good fits, i.e., that are not discriminated by the data available so far. The optimal design is then found by maximizing, in some norm, the difference of responses at those discrimination points with respect to the experimental variables. If a good experimental point exists, it will be one that separates the responses by an amount greater than the experimental error level. The method is, in fact, the familiar model discrimination principle [5],[11], but applied within one model: the model predictions at two (or more) parameter values are regarded as given by different models. Basically the same idea was applied already by D. Isaacson [8] under the title 'distinguishability' in the slightly different context of electrical impedance tomography and, more recently, by Asprey in process research [7]. Here we want to emphasize a novel aspect of the methodology: the role of MCMC methods in finding 'optimal' discrimination parameters. We shall illustrate the methodology with two examples. The first one is a simple case where the classical criteria also work well. We show how essentially the same design is obtained by the parameter discrimination principle. As the second example we utilize the same enzyme kinetics case as in [6]. However, now the search for the discriminative parameters is done more systematically by the MCMC chain of the parameters to be estimated.
Markov Chain Monte Carlo, MCMC methods Let us recall the standard nonlinear model fitting situation, y=f(x,e}^s,s
^N(%a^)
(1)
where, for simplicity, we assume the measurement error 8 to be independent Gaussian with variance a^. The least squares estimate 0 for the unknown parameter vector is obtained by minimizing the sum of residuals
i(e)=Yjy,-f(x„e))'
(2)
of the data y., and the model at the measurement points x.,/=l,2,...« . Classical formulae for the confidence intervals for the parameter 6 are taken from the theory of linear models, and applied here after linearizing the above sum. The result is naturally approximative and may be rather misleading. The Bayesian approach is consider the posterior distribution of the parameters in terms of probability densities. In the case of Gaussian measurement error, the likelihood function is obtained from the residual sum by the formula P(e\y)=Ce
-}-i(e)
2^
(3)
(we skip here more general cases, as well as the discussion about a possible prior distribution for the parameters ^ ) . The constant C in the above should be determined so that a true probability distribution, with total mass one, is created. In principle, the recipe is simple: integrate the exponential term in the above formula over ^ and take C as the inverse of the value obtained. The probability that 6 belongs to a certain region - the 95 % credibility region around the best fitting point 9, for instance - is then obtained by integrating P{6\y) over that region. There are, however, two obstacles that practically have blocked the use of this recipe: the numerical integration over 0 soon becomes prohibitive, if the dimension of the parameter space gets higher. And there is no direct way to find limits for integration over given confidence regions. The emergence of the MCMC methods has recently removed these difficulties, to a large extent.
10 MCMC can be considered as a Monte Carlo integration where random points are chosen from a suitable constructed Markov chain. The chain produces successive points 0.,i=l,2,... in such a way that the probability distribution of 0 is approximated. There is a great variety of MCMC algorithms; let us describe a simple version, the Metropolis algorithm. Suppose the process is at the point 0. at the i^^ step. For a new sample a candidate point 0 is first chosen from a suitable proposal distribution q(\d.), which can depend on the previous state. The candidate point is accepted as a new state in the chain with probability a(0^,6 )= min
,' P(0V,
(4)
If the candidate point ^ is accepted, the next state in the chain becomes 6 , -0 ^ otherwise 0 . =^.. It may seem that the calculation requires the proper, scaled probability fixnction P. However, in the above formula only the ratio of the P values appear, so the normalizing constant cancels out and the everything boils down to successive computations of the residual sums. It can be proved that with a proper, fixed proposal distribution q- e.g., a Gaussian distribution centered at 0^ - the chain converges towards the proper distribution of ^ .[17]. The choice of q is the only decision the user has to make. This may also be a source of difficulties: if the size or shape of the proposal distribution is not in accordance with the distribution of 0 (so that, e.g., many candidate points will give practically a zero value for PO and are thus not accepted), the convergence may be very slow, excessive many points 6^ are needed. In [9,10] an adaptive method to 'tune' the proposal q was introduced, which, according to our experience, practically removes this possible pitfall. As an example, consider a simple linear model y=X6 in dimension 2, with data generated at certain points x., j;.,/=l,2,...,10 (we will use this example again below, with more details). Figure 3 gives the least squares fit point and the points generated by the MCMC calculation. The 95 % credibility region can be constructed by the histogram of the MCMC points in a straightforward manner. Of course, in this linear case the same result is readily available by classical theory, too. The strength of the MCMC methods is that they apply equally well for nonlinear models, which are beyond the scope of the standard statistics. The credibility region tells us how well the parameters are identified by given data. The information may be utilized to design new optimal experiments for better identification. In the next section we show how the information provided by the MCMC chain may be employed to design experiments for nonlinear models.
11 1
1
3——
?
\
r—
r 7 *% ^ %• "^ -s
L
^''v'V*"';'"^H • "'s
L •
.
•
:
•
.
•
' ' ^ N" \ -V I
w&>
,v
»*s
^
J N .
•.
.
N
s•
PARAMETER 1
Figure 3. Credibility region of a linear model. The theoretical 95% Probability region (-) and the points of an MCMC run (.). Local criteria for experimental design The traditional criteria - D, A, E, etc ~ for optimal design are based on linear theory. For nonlinear models they are applied via linearization. For the model y=f{x,0) one computes the Jacobian matrix J consisting of the sensitivity derivatives Jij=
de,
(5)
where x.,/=!,...,« denote the experimental points and 6jJ=\,...,p the unknown parameters. The derivatives are evaluated at the point 0^, the best guess or last fitted value for the parameters. An approximate credibility region for the parameters is obtained as a p dimensional ellipsoid, by replacing the design matrix X in the linear theory by the matrix / . The traditional criteria aim at minimizing the size or optimizing the shape of the ellipsoid. Minimizing the volume of the credibility region, for instance, leads to the D-Optimality, where one maximizes the expression
det(J'J) with respect to the ne^ experimental points x.
(6)
12 For moderately nonlinear, reasonably well identifiable models the traditional methods often work perfectly well. Problems may be encountered, however, for several reasons: The result is sensitive with respect the assumed value for 9^ If the model is badly identifiable, due to too many or correlated parameters, the Jacobian J becomes nearly singular. In the latter case the objective function for the optimization of the design may have no welldefined optimum, optimizers may have difficulties in convergence, or the results may be misleading, see, e.g., [13] for an example. One could interpret that a common cause for the difficulties is the local character of the criteria used: the derivatives in the Jacobian matrix are computed by varying the unknown parameter values in a small neighborhood of 6^. For weakly identifiable parameters the differences in the model predictions might be negligible, the criteria only test a small 'flat' landscape around 9Q .
Global criteria for experimeiital design To see the effect of parameter variations for the behavior of the model, one should compare the model predictions at points that are at larger, 'global' distances in the parameter space. Bad parameter identification is due to distant parameter points 0, at which the model responses nevertheless are close to each other, at the points x, where the data so far has been measured. The design of experiments should find new points x where the responses at those 6 points are as different as possible. This is the idea of the global type designs we discuss next. The procedure, in principle, consists of the steps Find 'extreme' parameter combinations 6. not discriminated by data so far Maximize the difference of responses with respect to the experimental variables,
d(f(x,e,)j(x,ej)i i^j
(7)
Above, the distance d may be given in any norm (L\L^ , max). If the model predictions given by the different parameter vectors are interpreted as given by different models, this is indeed the familiar model discrimination procedure [5]. So we have called the approach 'Parameter Discrimination'. To arrive at a practical algorithm, one still has to specify how to select the discriminating 'extreme' parameter vectors 0., how to define the norm d to be maximized, and how to connect the procedure with proper statistical context, taking into account the size of the measurement error.
13 The credibility region gives the parameter values that, with certain data available, could statistically 'equally well' be the true values. So the extreme tips of the region show how well or badly the model parameters are determined by the data: they represent points whose model responses still are close, yet the distances between the points are largest. A good design for next experiments would bring such tips of the credibility region as close each other as possible. For linear models the selection of the discriminating points could be done by the known formulae that define the credibility region (we shall indicate below the connection to the classical criteria). For the general, nonlinear case the same idea can be realized by aid of the MCMC chain. If properly run, the MCMC algorithm produces points that cover the credibility region. Various criteria may now be created, depending on how the discriminating 6 points are selected from the MCMC chain matrix. The situation is analogous with the classical 'alphabetic' criteria, constructed by the semi axes of the credibility ellipsoid in various ways. In principle, the search for the 'extremal' parameter points 0 could be viewed as a constrained optimization problem: maximize ||^-^o|| under the constraint that ||/(;c,^o)-/(x,^)[| remains small. But no optimization is needed if an MCMC algorithm provides the set of 6 from which the extremal ones can be selected. In addition, for proper statistical analysis the credibility region should be created anyway. So we get the objective fimctions for global design criteria as byproducts of the parameter fitting followed by the construction of the MCMC chain. In fact, the use of an optimizer could be omitted even in least squares fitting, since the MCMC chain provides the means, first moments or the peak points of one dimensional marginal distributions of the individual parameters. In some ill—posed such estimates may be more robust than the least squares fit, see [9] for an example. Our standard approaches, however, to calculate first the least squares fit and then continue with the calculation of the MCMC chain by the adaptive methods, using the approximate covariance from the fit as the first guess for the proposal distribution. Example 1: a linear model Let us consider the situation using the L2 distance norm, the usual least squares objective function. For a fixed 6^, denote
d(e)=\f(x,9j-f(x,e)\l In the linear case, f{x,6)=x'0,
(8)
we have
d(9f=(e-ejx'x((e-e,),
(9)
14 where X denotes the design matrix of the x vectors. If 0. -0=v. is an eigenvector of X^ X with eigenvalue X^, we see that, for example,
Y\d(6^)-X^ ...X ^, D~optimality, i
^ d ( 0 . ) = /i^-h\,
A-optimality,
(10)
So various combinations of d(6.) have a connection with the classical designs. Of course, for linear models the classical criteria only depend on the X matrix, not on any selected parameter points 0. However, the following example shows how the same designs may be obtained by classical and global methods, if the criteria in both cases are suitably chosen. Consider again the simple two dimensional model y=x'0, with 0=(20,25), with 0 = (20,25) and the data given at 10 'experimental' points (especially badly designed points, for demonstration) as shown in Figure 5. Noise with cr = 0.1 was added to the y values, the resulting credibility region was already shown in Figure 3. We can now ask for one new optimal experimental point x, and do the optimization both with D-optimality and with the global criterion close to D-optimality as shown above. In Figures 4 and 5 we have plotted both design criteria as functions of x^,X2. The surfaces are naturally different, but qualitatively close to each other. The optimal result is in both cases the same: the point that lies in the comer ^ = (-1,1).
parameter no 1
Figure 4. The D-optimality surface
15
S
oK
Figures. The Parameter Discrimination surface The result is rather typical. For mildly nonlinear, relatively well-behaved systems the classical criteria give similar designs as the parameter discrimination, if the discriminating parameter values are chosen so that they correspond to the given classical criterion. Example 2: a nonlinear model Here we study an example where the D-optimality seems to fail to produce a proper design, while parameter discrimination works. The example comes from enzyme kinetics [6], mashing of beer. The complexity of the system is rather typical in chemical or biological kinetics, with respect to the number of ODE components, unknown parameters, control variables and observations. Below is a list of the states and the ODE system that models the situation. a=-HaMIVg{a^a) ag =HaMIv(a^
enzyme, -a^-ka^^a
enzyme,
P=-HbMIVg{P^-P)
enzyme,
P^=-HbMIV{l3^P)-kbp
enzyme,
x\ xi
=-a{x^-uT)c^Agmlt-\-Agdex) =a{x^-uT)Agdex-Px^(c^Bgl-\-
starch, C3 "^Bmal Km + Xi
+ Bldex
dextrins,
X3=Bgl Px^
glucose,
X4= Bmal P x^ I {Km -^x^)
maltose.
16 X5 =Agmlta(x^
maltotriose,
-uT)
limit - dextrins,
X6 =Bildex J3x2
The experiments had been done on an intuitive bases, not following any specific design methodology. Seven batches were run at different temperatures, with different initial values for X2,x^,x^,x^. In two of the batches the temperature, a control variable, was changed during the batch. The parameters 0=(Agdex,Agmlt,Bgl,Bmal,Bldex,Km) depend on the temperature via the Arrhenius law. They were to be estimated by the measured values of components x^,x^, x^ and x^-^x^. For simplicity, were assume here the activation energies, as well as other remaining parameters, to be known and estimate only the parameters in ^ at a reference temperature. We also skip here all further practical details, for our present purpose it is sufficient to deal with the example simply as an ODE system with certain parameters to be estimated. data set 3
data set 2 0--
^^•••••^^
O:-
o
o
Q
o
-
O
-Q-O—©—o-
© ^
O Q
_^__&— O"
100
Q-— -—O
"'•••'O
Q
50
Figure 6. Typical data and fits for the ODE system. Several test runs were performed, for these runs we replaced the original data with simulated values, with some Gaussian noise added. Figure 6 shows some of the data together with the fits. The MCMC chains were run without difficulties after the fits. Figure 7 represents one and two dimensional marginal posteriors of two of the parameters, with the 68%, 95% and 99% probability regions shown for the two dimensional distribution.
17
Figure 7. The one and two dimensional posterior distributions of the parameters Bldex and Km.
X10*
In this simulated situation the least squares fit is already close to the 'true' parameter value 6^ but new experiments for smaller error bounds are called for. Two variables T and M, the temperature and a concentration constant in the enzyme rates above, were selected as the experimental factors to be optimized for the next batch (again, we deal with a somewhat simplified situation for demonstration purposes). For the global criterion two discriminating points 6^,02 were selected, roughly at the tips of the main semi axes of the 6 dimensional MCMC sample of the credibility region. Figures 8 and 9 present the D-optimal and global objective functions as contour surfaces in dependence of T and M. Now we can see that the criteria offer rather different answers: the global criterion suggest the point T = 321, M = 0.07 while the point T = 330, M = 0.03 would be optimal according to the D-optimality. Figures 10 and 11 exhibit the model responses as calculated at the discriminating points 6^,6^ and the least squares fit point 9^. We see that the global criterion does find an experiment that will give separate trajectories, while the respective responses at the experiment suggested by Doptimality are closer than the selected measurement error level. Performing an experiment at this point would thus not bring us new information to shrink the credibility region.
18
0.03
0.035
0.065
Figure 8. The D-optimality surface
J^i
Figure 9. The Parameter Discrimination surface
o.o:
19 data set 1
60
r
50
"T
" "
i
i
i
80
100
/
40
m
g
1 s
1/
LU
Q O
20
10
i-****^"'*"
/
20
40
Figure 10. The responses at the D-optimal point.
Figure 11. The responses at the Parameter Discrimination point.
20 REFERENCES 1. 2. 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17
Euzen, J.P., Trambouze, P., Wauquier, J.P., Scale-up Methodology for Chemical Processes. Editions Technip, Paris, 1993. Piironen, M., Haario, H., Turunen, L, Modelling of Katapak Reactor for Hydrogenationof Anthraquinones. Chem. Eng. Sci. 56 (2001), 859-864. Haario, H., Taavitsainen, V., Combining soft and hard modeling in chemical kinetic models, Chemometrics and Intell. Lab. Syst, 44 (1998), 77-98. Taavitsainen, V., Haario H., Rapid estimation of chemical kinetics by implicit calibration. I. J. of Chemometrics, 15 (2001), 215-239. Bard Y., Nonlinear Parameter Estimation, Academic Press, New York, 1974. Haario, H., Holmberg, A., Pokkinen, M., Oinas, R. Integrated information and quality management of biotechnical processes and its application to experimental design in process scale-up from R & D to production. Proc. of the 6th International Conf on Computer Applications in Biotechnology, May 14-17, 1995, Garmisch-Partenkirchen. Asprey, S.P. and Macchieto, S., Statistical tools for optimal dynamical model building. Comp.Chem.Eng., 24 (2000), 1261-1267. Isaacson, D., Distinguishability of conductivities by electric current computed tomography. IEEE Trans. Med. Imaging, 5 (1986), 91-95. Haario, H.,Saksman E., Tamminen J., Adaptive proposal distribution for random walk MetropoHs algorithm. Comput. Statistics, 14 (1999), No 3. Haario H., Saksman, E., and Tamminen, J., An adaptive Metropolis algorithm. J.of Bernoulli Soc, 7 (2001), 223-242. Atkinson,A.C, Fedorov, V.V., The design of experiments for discriminating between two rival models. Biometrika, 62, 1, 57-70, 1975. Helminen, J., Leppamaki M, Paatero E., Minkkinen P., Monitoring the kinetics of the ion exhange resin catalysed esterification of acetic acid with ethanol using near infrared spectroscopy with PLS model. Chemometrics and Intell. Lab.Syst. 44 (1998), 345-356. Oinas,P., A. Wild and N. Midoux, H. Haario: Identification of mass transfer parameters in cases of simultaneous gas absorption and chemical reaction. Chem. Eng. Process, 34, 503-513,1995. Bijlsma,S., Louwerse,D.J., Smilde,A.K., Rapid estimation of rate constants of batch processes using on-line SW-NIR. Aiche J., 44 (12), 2713-2723, 1998. De Juan,A, Maeder, M., M.Martinez, R. Tauler, Combining hard- and soft-modelling to solver kinetic problems. Chemometrics and Intell. Lab. Syst., 54, 123-141, 2000. Bezemer,E, Rutan, S.C,Multivariate curve resolution with non-linear fitting of kinetic profiles. Chemometrics and Intell. Lab. Syst., 59,19-31,2001 Gilks, W.R., Richardson S., Spiegelharter D.J (eds), Markov Chain Monte Carlo in Practice. London, Chapman & Hall, 1995.
Dynamic Model Development: Methods, Theory and Applications S.P. Asprey and S. Macchietto (editors) © 2003 Elsevier Science B.V. All rights reserved
21
Dynamic Modelling, Nonlinear Parameter Fitting and Sensitivity Analysis of a Living Free-Radical Polymerization Reactor Antonio Flores-Tlacuahuac^, Enrique Saldivar-Guerra, Ramiro Guerrero-Santos ^Departamento de Ciencias, Universidad Iberoamericana, Prolongacion Paseo de la Reforma 880, Mexico DF, 01210, MEXICO.
In this work modelling and nonlinear analysis tools were applied to an experimental living free-radical polymerization reaction system. The objectives were to build a dynamic mathematical model able to reproduce observed data, to fit experimental information and to get a clear idea about the influence of reaction rate constants on reactor response using sensitivity analysis techniques.
1
INTRODUCTION
Polymer production is one of the most important areas of applied chemistry due to its significant economic and social impact. Polymers as materials are present in almost every field of human activity. They range from commodity materials, such as polyethylene or PVC (poly vinyl chloride), up to highly specialized and expensive materials for drug release or space-craft related applications. Polymers are long molecules or "macromolecules" produced from simple small chemical components or monomers. The chemical reaction by which monomers are transformed into polymers is called polymerization and its control presents serious challenges for the chemical engineer, due to the fact that these reactions are usually highly exothermic and often proceed in very viscous media that render mass and heat transport difficult. Also, these reactions are famous for behaving in non-hnear fashion and several instances of multiplicities and sustained oscillations have been reported in the literature for even industrial scale
22 reactors [1],[2],[3]. In order to aid in the design, operation, control and optimization of this kind of reactors and reactions, mathematical modeling of the polymerization process is an invaluable tool. For many years free-radical polymerization has been industrially used for the large-scale production of different kinds of polymers. The kinetic mechanism of such polymerization method is well known and the process is relatively easy to carry out. However, this polymers manufacturing technique has some drawbacks: (1) well defined molecular weight distributions are not easy to obtain, (2) polymers with a desired molecular structure are diSicult to manufacture. The importance of controlling these factors has been recognized due to the rising need for speciality polymers. In traditional free-radical polymerization the life of each polymer chain is only some fractions of a second. On the other side, living polymerizations in which the polymer chains are active for minutes or hours allow the preparation of polymers such as: macromonomers, macroinitiators, functional polymers, block and graft copolymers, and star polymers [4],[5]. Usually such polymers are produced using anionic/cationic or group transfer polymerization. However, because this type of polymerization processes requires severe reaction conditions (i.e. a high level of purity) and the spectrum of suitable monomers is limited [6] it has little industrial impact. Therefore, it would be desirable to combine the industrial advantages of free-radical polymerization (tolerance to impurities, and unselectivity to monomers) with the use of living polymerization techniques, as an efficient way to manufacture new polymers. It is also worth mentioning that, depending on the thermodynamic nature of the different components, these new block and graft molecular architectures can bring about micro-segregation of domains which lead to nanometric structures and to selfensambling materials. These materials are provoking a revolution in Material Science since its unique properties can be used in a wide spectrum of applications [7]. These properties arise from the fact that physical laws manifest themselves in unique ways at the nanometric scale. Living, quasi-living or controlled radical polymerization (CRP) is an emergent technique to synthesize polymers with control on the molecular weight and low polydispersities. There are several versions of CRP [8]: (a) atom transfer radicalpolymerization (ATRP), (b) nitroxyl mediated radical polymerization (NMRP), (c) the use of initers and iniferters and (d) reversible addition fragmentation transfer (RAFT). In CRP a new compound called "regulator" or "controller" is used. Such species are capable of reversibly trapping the propagating radicals, thereby reducing the concentration of growing chains which brings about the minimization of the irreversible termination step. Under such circumstances growing chains can polymerize only to a certain extent (before being trapped). The adduct formed by the controller and growing chain regenerates -in the reverse reaction step- free radicals that start a new cycle of reactions (propagation, reversible termination and dissociation). In this way polymer chains grow at the same pace. Polymers with narrow molecular weight distributions can be achieved if the initiation step, i.e. the period of time employed to initiate all chains, is reduced. To date, the achievements of CRP are: (i) the control of molecular weight and poly-
23 dispersity for homopolymerizations and (ii) the production of block copolymers by free-radical reactions, without such stringent conditions as needed for anionic polymerization. The goal of many scientist world wide is the implementation of this new technology in the development of new materials or improvement of existing ones. Mathematical modelling and simulation can help to understand better the underlying kinetic mechanism governing these processes. Also, with the aid of parametric sensitivity tools, key kinetic steps and associated parameters can be identified and their estimability can be assessed. Once these parameters are identified, the synthesis of better regulating agents can be attempted. In principle, for achieving these purposes, several living radical techniques might be used. In a first trial the initer bis(trimethylsylyloxy) tetraphenylethane was used. Initers are compounds that can generate two radicals; one acts as initiator while the other reversibly scavenge growing radicals leading to temporarily "dormant" polymer chains. In this sense initers compounds act as both initiators and regulators. In this work a dynamic mathematical model for a styrene living free-radical polymerization reactor is developed. The model is cast in terms of the moments of the species so we can compute molecular averages of the molecular weight distribution. Experimental information is used to fit kinetic constant rate values. The fitting procedure uses traditional nonlinear optimization techniques. Open-loop dynamic simulations of a typical industrial living polymerization reactor are discussed. Sensitivity analysis of the above mentioned mathematical model is used to assess the way kinetic rate information impacts typical reactor behaviour in the form of monomer conversion, molecular weight and polydispersity. The aim was to gain insight into the kinetic mechanism and to identify key kinetic steps and its associated parameters. Because very few works have been published on the modelling, parameter fitting and sensitivity analysis of initer living polymerization reactors, this work represents a contribution to this new and challenging engineering field. In section 2 the dynamic mathematical model of the living free-radical polymerization reaction is derived from the basic reaction mechanism. In section 3 experimental data information is used for the nonlinear parameter fitting procedure. In section 3 some open-loop numerical dynamic runs are shown. In section 5 sensitivity coefiicients are computed. Finally section 6 contains discussion and conclusions of the results
2 2.1
DYNAMIC MATHEMATICAL MODEL Kinetic mechanism
The living polymerization kinetic scheme involves the famihar thermal decomposition, initiation, propagation and termination steps. The thermal decomposition reaction produces free-radicals through the following reversible equilibrium reaction (Table 1 contains the notation description of those species participating in the polymerization reaction system):
24 i?^25"
(1)
where R = TPSE and S — SDPM (sylyloxy diphenyl methyl) radicals. Free radicals undergo reaction with monomer "M" through the initiation step yielding Pi : (2)
S' + M -^Pi from which the chain polymer starts growing (propagation step):
(3)
Pn + M^Pn+X
However, living polymerization differs from the more traditional free-radical polymerization in the following reaction step:
>„ + 5-SP„5
(4)
i^rd
This reaction step temporarily stops the chain polymer growing process (P^) leading to a dormant polymer chain (PnS). However, under certain processing conditions, the dormant chain polymer can undergo "reactivation" and the growing chain polymer process continues. This reaction step is the main reason why living polymerization produces narrow molecular weight distributions. Finally, the irreversible termination steps (disproportionation and combination, respectively) take place, leading to dead polymer chains: Pn + Pm ^
(5)
D^ + D^
Pji + Pm —^ Dn+m
2.2
(6)
Model
Dynamic material balances were make for each one of the species in an isothermal batch reactor. The kinetic scheme results in a system of pure ordinary differential equations of infinite dimension. • Initer compound (7)
^ = -k^R + krS'' at • Radical dS'
^^ =2fkdR-KS*''-kiS'M-ktpS'J^Pn
+ krdY,PnS n=l
n=l
(8)
25
Species Agent Radical Monomer Growing radical Polymer chain Dormant polymer Dead polymer
Symbol R 5* M Pi Pn
PnS Dn
Table 1: Chemical species participating in the reaction mechanism. Monomer dM
- - =
-kiMS'-kpMj2Pn
(9)
n=l
• Growing polymer a)
n= 1 dPi
—i = kiMS' - kpMPi - ktpPiS' + KdPiS - ktdPi J2^--
^*-^i S ^-(^^)
n=l
a=l
b) n > 1 oo
ip
- ^
= kpM{Pn-i
- Pn) - hpPnS'
+ KdPnS
oo
^ ^ -Pm - hcPn
^
771=1
m=l
- hdPn
^^(11)
• Dormant polymer dPnS dt
— f^tp-tn^
f^rd-^n^
(12)
• Dead Polymer (n > 2) dD,
— f^tpPn / ^ Pm + ey '^tc / ^ PrnPn-n
dt
2.3
ri=l
(13)
m=l
Moments
Next the method of moments [9] was used in order to reduce the dimensionality of the problem. 1) Growing polymer.
•
26 • Zeroth moment
—^ = kiMS* - ktpS'Xo + krdCo - ktdXoXo - hcXoXo
(14)
• First moment dXi ^^ dt
kiMS* - kpMXo - ktpS^Xi + KdCi - hdXiXo - hcXiXo
(15)
• Second moment
- ^ - kiMS' - kpM{Xo + 2Ai) - ktpS'X2 + krdC2 - ktdX2Xo - fc,,A2Ao (16) 2) Dormant Polymer
•
• Zeroth moment "IT = ktpS^Xo - krdCo
(17)
^
(18)
First moment = hpS^X^ - krdCi
• Second moment ^=ktpS'X2-krdC2 3) Dead Polymer
(19)
•
• Zeroth moment
^-M^
+ ^«
(20)
• First moment djj.
, - = ktpXiXo + ktcXiXo
(21)
Second moment ^
= ktpX2Xo + hc{X2Xo + Xl)
(22)
27 by numerically integrating the system of o.d.e.'s represented by Eqs. 14-22, average molecular weights for the dead polymer are obtained: M„ =
M ™ ^ ^ )
M„
M^(^^)
=
Co + Mo/
(23) (24)
where Mm stands for monomer molecular weight. Monomer conversion XM was computed as: MQ-M
^M = - j ^
(25)
where MQ is the initial monomer concentration.
3
PARAMETER FITTING
The kinetic parameters to be fitted correspond to the bulk styrene polymerization process. In Table 2 experimental information in the form of conversion and molecular weight distribution is shown. For solving the dynamic parameter fitting issue the problem was cast in terms of the following unconstrained nonlinear optimization program:
«»|:(^)^
m
where k represents the set of kinetic rate parameters to be fitted, yi(t) is obtained by numerically integrating the set of equations representing the mathematical model and yi represents the set of experimental data. Fitted results for styrene at 1=0.016 M and 110 '^C are shown in Figure 1; the fitted kinetic rate constants are given in Table 3. The same set of fitted kinetic rate constants was used for predicting experimental information recorded at different conditions. For instance. Figure 2 depicts a comparison between experimental and predicted behaviour at 0.045 M initiator concentration and 110 ^C, while Figure 3 depicts experimental against predicted information for 0.016 M initiator concentration and 100 ^C. In order to predict the behaviour shown in Figure 3, the kp rate constant was corrected for temperature changes, an activation energy value of 30000 J/gmol-^K being used. Therefore the k^ value used in Figure 3 was 850.
4
O P E N - L O O P D Y N A M I C SIMULATION
Figure 4 shows the evolution of conversion, M^, M^, and polydispersity. There is a contrast with traditional free radical polymerization in which molecular weight averages remain nearly constant throughout the polymerization. In this case molecular
28 ] Styrene a=.016 M 1 Time (min) Conversion (%) 4.65 20 12.42 50 16.76 70 100 27 1 150 29.95 1 Styrene Ci=.016 M 1 Time (min) Conversion (%) 1.42 30 4.86 50 70 6.35 11.46 90 |_ 150 21.09 1 Styrene a-.045 M 1 Time (min) Conversion (%) .7182 20 40 10.7175 60 18.8606 100 25.4279 1 120 29.8043
T = 110 °C ]
M„
1
63480
1
4610 27290 41880 58530
T = 100 "C 1
M„
1
58360
1
1826 26930 32080 48080
T = 110 ° C l
M„
1
26030
1
1730 2180 13990 22880
Table 2: Experimental styrene conversion and molecular weight recorded at different initiator concentrations and temperatures.
kd rCy.
Ki Kp "kp
krd
he hd
.030482 28408.6 .0136 1095.8 227241797344 2.38 64903458 0
Table 3: Fitted kinetic rate constants for styrene 0.016M and 110 ^C.
29
7000
(a)
8000
9000
(b)
Figure 1: (a) Conversion and (b) Molecular weight fitted results for I=.016 M and llO^C. weight increases with conversion. This occurs because the polymeric chains remain living and adding monomer units during most of the reaction. However, at long reaction times, even in this case some irreversible termination takes place, increasing the polydispersity. Figures 5,6 and 7 depict the radical, dormant and dead polymer concentrations, respectively. In a first stage, TPSE decomposes readly (Reaction 1, kd=.0304:82)^ producing a large amount of free radicals S which participate in the formation of dormant polymers; reactions 2, 3 and 4. When the concentration of free radicals stabilizes, a maximum level of dormant species has been formed and a second stage begins. Here TPSE molecules have dissappeared and dormant species regenerate P and S radicals, producing a polymerization with living character. Nevertheless, the deactivation between chains, reactions 5 and 6, becomes evident as indicated by the decreasing concentration of dormant species in Figure 6. Each irreversible termination event leaves radicals A without partners. The temporary excess of free radicals S may accentuate the importance of reaction (6); however, its propensity to initiate chains, cuts down this effect. Dead polymer concentration behaves in the opposite manner with respect to dormant species (Figure 7).
5
PARAMETRIC SENSITIVITY
In this section output sensitivities (conversion, molecular weight distributions and polydispersity) with respect to the kinetic rate constants were computed. The aim of this study is to have a clear idea about which of the kinetic rate parameters have greater influence on the reactor response. The results of the sensitivity analysis help in knowing which parameters should be carefully evaluated and also which ones are estimable •'
30 4
x10
y ^
3.5
-
3
o
2.5 Q
-
S" 2
1.5
/
-
0
-
1
0.5
O O
0
1000
2000
3000
4000 5000 Time (s)
6000
7000
8000
9000
"o
(a)
100C
2000
3000
4000 5000 Time (s)
6000
Experiment Fitted 7000
8000
9000
(b)
Figure 2: (a) Conversion and (b) Molecular weight fitted results for 1==.045 M and llO^CAssume that a mathematical model of a given system is available: rfx
f(x,p)
(27)
where x G 5ft'^ stands for system states and p G W^ represents the system parameters. The equations that describe the way reactor behaviour depends on the parameters are given by [10]: 9x/
\9p
(28)
where, S-11 (29) Si\
besides, (30) One of the suggested numerical procedures to evaluate the parametric sensitivity coefficients consists in the simultaneous solution of the equations representing the system dynamic behaviour (27) and the set of equations representing the sensitivity coefficients (28). According to this the numerical procedure to compute Sij has the following steps.
31
X / /
o o 0
0
/
1000
°
2000
;
3000
4000 5000 Time(s)
6000
Experiment
Fitted 7000
1
8000
(a)
(b)
Figure 3: (a) Conversion and (b) Molecular weight fitted results for I=.016 M and lOO^C. • Integrate the mathematical model X = f(x,p),
x(to) = X o
• Evaluate the partial derivatives given by
af(x,p) 5f(x,p) .9p • Integrate the sensitivity equations ^gf(x,p)
s+
ax
gf(x,p)
ap
in order to compare output sensitivities on the same basis, scaled sensitivities were computed as (Caracotsios): Pj
(31)
Due to the fact that some of the initial states are zero it is more convenient to use the above semilogarithmic scaling instead of the complete logarithmic scahng procedure.
32
6
DISCUSSION AND CONCLUSIONS
Figure 8 shows the sensitivity of conversion to the kinetic parameters (the program Athena [11] was used to compute the numerical values of the sensitivity coefficients). The propagation rate constant turns out to be the most important factor followed by the initiation and the termination constant (negative sensitivity). Conversion is moderately influenced by the capping/decapping reaction constants, while it is almost not influenced by the initer decomposition parameters. The number average molecular weight exhibits sensitivity to parameters similar to that of conversion, except for the fact that it is more sensitive to the initer decomposition constants, perhaps due to the influence of these parameters in the number of active centers formed. For the weight average molecular weight it is noticeable the increase in importance of the capping/decapping reaction constants. This is also reflected in the fact that these parameters are practically the most important ones for polydispersity. A rapid rate of exchange is considered to be the determining factor for getting polymer chains growing at approximately the same rate. Figure 9 shows the sensitivity of conversion to the kinetic parameters. The propagation rate constant turns out to be the most important factor followed by the initiation and the termination constant (negative sensitivity). Conversion is moderately influenced by the capping/decapping reaction constants, while it is almost not influenced by the initer decomposition parameters. The number average molecular weight exhibits sensitivity to parameters similar to that of conversion, except for the fact that it is more sensitive to the initer decomposition constants, perhaps due to the influence of these parameters on the number of active centres formed. For the weight average molecular weight it is noticeable the increase in importance of the capping/decapping reaction constants. This is also reflected in the fact that these parameters are practically the most important ones for polydispersity. A rapid rate of exchange is considered to be the determining factor for getting polymer chains growing at approximately the same rate. An additional conclusion that can be drawn by examining Fig. 9 is that all the sensitivity curves for parameters ktp and krd are symmetrical, indicating that these parameters cannot be independently estimated from the outputs shown here. The ratio of them (equilibrium constant) can be estimated but not single values; independent experiments are required. Figure 10 is virtually identical to Fig. 9, indicating that sensitivities obtained at diff"erent reaction conditions are practically the same. This parametric sensitivity study confirms the importance of the dormant/living exchange parameters on the broadness (polydispersity) of the molecular weight distribution. An additional interesting conclusion is that this variable (as well as others) are very little affected by the initer decomposition parameters (as long as they have values in the order of the case studied). On the other hand, it will be difficult to accurately recover (estimate) initer decomposition parameters based on conversion or polydispersity data alone, although average molecular weights will be more useful for this task.
33 REFERENCES 1. N.A.Dotson, R.Galvan, R.Laurence and M.Tirrell, Polymerization Reactor Modeling, John Wiley, 1996. 2. C.M.Villa and W.H.Ray, Chemical Engineering Science, 55,2,275-290 (1999) 3. J.CVerazaluce-Garciay A. Flores-Tlacnahuac and E.Saldivar-Giierra, Industrial and Engineering Chemistry Research, 39,6,1972-1978 (2000) 4. M.K.Georges,G.K.Hamer and N.A.Listigovers, Macromolecules,31,(1998),9087-9089 5. H.Shinoda, RJ.Miller, K.Matyjaszewski, Macromolecules, 34(10),(2001),3186-3194 6. G.Odian, Principles of Polymerization, Wiley, 3rd Ed., (1991) 7. J.Texter and M.Tirrell, AICHE J,47,8,1706-1710 (2001) 8. K.Matyjaszewski, Overview: Fundamentals of Controlled/Living Radical Polymerization in Controlled Radical Polymerization, Edited by K.Matyjaszewski, ACS Symp.Series 685 (1997) 9. W.H.Ray J.Macromol.Sci.-Revs.Macromol.Chem., C8(l),(1972)1-56 10. H.K.Khalil, Nonlinear Systems, 2nd Ed. Prentice-Hall (1996) 11. http://www.athenavisual.com
34
(a)
(b)
(c)
(d)
Figure 4: Open-loop dynamic simulation of the batch styrene reactor.
35
Time (h)
Figure 5: Radical concentration.
4.5
x10
4
3.5
^ - ^ ^ ^
1 3 CO
"c
8
^
^
-
-
^
^
^
^
•
§2.5
>>E 1 2
^ ^ ^ ^ ^ - :
CO
E
H 1
\
0.5
\ Time (h)
Figure 6: Dormant polymer concentration.
36
Time (h)
Figure 7: Dead polymer concentration.
37
— ___
K k' Kn
CK tc
(a)
(b)
(c)
(d)
Figure 8: Sensitivity coefficients for Q =.016 M and 110 "C.
38
x'"'
I
—
-
- —
1 °" - 1/^' 1
1
0.05
k^ k' k'
\\.
0.15|
k,.
o
O
^
y /"
"""
^
<8
-\^~" ~Z^—-z:z (a)
(b)
(c)
(d)
Figure 9: Sensitivity coefficients for d =.016 M and 100 °C.
39
— ^k — r
k.
/ I
I
k
kJ
0.15
^d —
^
0.1
O
1
J
J2
1
•§
'to
::::
1 1
\ ^ ' ~
(a)
(b)
,,6
K
.
4
-
k
-
tp
j^jj:;^-'^'"
r
k
Kc
•
^
^-\^^ ~^~:::::..^.._
(c)
"'^^ -
(d)
Figure 10: Sensitivity coefficients for Cj =.045 M and 110 °C.
This Page Intentionally Left Blank
Dynamic Model Development: Methods, Theory and Applications S.P. Asprey and S. Macchietto (editors) © 2003 Elsevier Science B.V. All rights reserved
41
An Investigation of some Tools for Process Model Identification for Prediction Niels Rode Kristensen^, Henrik Madsen^ and Sten Bay J0rgensen^ ^Computer Aided Process Engineering Center, Department of Chemical Engineering, Technical University of Denmark, DTU, DK-2800 Lyngby, Denmark ^Mathematical Statistics Section, Informatics and Mathematical ModelHng, Technical University of Denmark, DTU, DK-2800 Lyngby, Denmark Process identification is undergoing tremendous developments as computing capabilities are increasing. These developments also lead to an increased demand for tighter problem formulations for modelling for a given purpose, e.g. prediction. In this paper some modelling and model quality assessment tools are investigated for prediction models. The investigated tools combine modelling based on first engineering principles and data based modelling into a modelling cycle that facilitates use of statistical methods with the aim of synthesizing more reliable models for prediction in a systematic way. 1. I N T R O D U C T I O N The essential aspects of system identification were reviewed by Astrom and Eykhoff (1971) in the early 1970's. Practical identification tools based on prediction error and instrumental variable methods have since been addressed in the landmark books by Ljung (1987) and Soderstrom and Stoica (1989), and modelling purpose related aspects of process identification have been reviewed by Andersen et al. (1991) and J0rgensen and Lee (2001). The latter papers address key issues in the process model identification cycle and discuss the relation between the modelling purpose and the tools used in and the decisions made during the identification cycle. An example of a purpose oriented aspect are the different requirements to the identification cycle when developing models for prediction versus models for robust control design. Requirements for the latter are addressed in J0rgensen and Lee (2001), whereas the purpose of this paper is to address the requirements for the former. In particular, attention is focused on tools and decisions related to analysis of model defects and synthesis of improved models on the basis of this analysis. More specifically, a continuous time stochastic modelling approach is selected, because this approach has several advantages in terms of developing models for prediction as delineated below. A key point related to the purpose of this paper is that the assumption of stochastic disturbances allows application of statistically sound methods for investigating and revealing model defects. The paper is organized as follows. First, the steps in a general process model identification cycle are presented and briefly discussed. Then the specific modelling cycle
42
First engineering principles
v ^
IVIodel
^N.
H •w
^s.
pre-analysis
^ ^
y ^ ^.^^
Experimental design
^ ^
Experiments
Invalid
y^o<^Q\ validation^\ ^ or invalidation and ^\^post-analysis ^ ^
valid
Model
y^ Statistical ^ \ tests and residual ^ v , ^ ^ analysis ^ ^
Parameter estimation
Figure \. A general process model identification cycle.
investigated in this paper is presented. Most of the steps in this cycle are also essential in the more general identification cycle. Subsequently, the different steps within the specific cycle are detailed to reveal some theoretical and practical advantages displayed by the selected continuous time stochastic modelling approach. First, issues related to data collection and parameter estimation are treated. Then some tools are presented for investigating defects of model prediction performance and methods are discussed for modifying the original model based on the detected defects. A software implementation of the tools constituting the specific cycle is then briefly presented, and the overall methodology is finally illustrated on a simulated fed-batch bioreactor modelling example. 2. THE PROGESS MODEL IDENTIFICATION CYCLE The process model identification cycle represents the tasks to be performed to develop a process model for a given purpose. This cycle can therefore be represented in a variety of ways, depending on the specific tasks needed for a given purpose. A general process model identification cycle includes the essential steps depicted in Figure 1. When applying the general identification cycle for a specific purpose it is desirable first to analyze the identification task at hand with the objective of determining which steps are relevant based on the available degrees of freedom. Such an analysis step may be labelled a pre-analysis. In this paper the key focus is on investigation of tools for validation or invalidation of models for prediction and their relation to model improvement, and the paper therefore primarily addresses tasks related to model formulation^ parameter estimation^ statistical tests and residual analysis and model validation or invalidation and
43
First engineering principles
y ^ \ .
Model formulation
^
Data
y ^ \.
Parameter estimation
.
s.
invalid
^^—
^^ Model validation V. or invalidation
\,
^s^ ^_>
valid —
•
Continuousdiscrete stochastic state space model
y ^ Statistical ^ N ^ tests and residual "X. analysis / ^
Figure 2. The continuous time stochastic modelling cycle.
post-analysis. Hence, experimental design and issues related to performing experiments^ although of tremendous importance in practice, will not be covered here. The specific simplified version of the process model identification cycle considered here is shown in Figure 2 in the form of a modelling cycle for performing continuous time stochastic modelling for the purpose of prediction. Prediction performance measures the ability of a dynamic model to predict the future evolution of a process over a given time horizon, e.g. in one-step-ahead prediction or in pure simulation. In order to have good prediction performance the model should ideally be able to capture the inherently nonlinear behaviour of many processes and to accommodate noise, i.e. process noise due to approximation errors or unmodelled inputs and measurement noise due to imperfect measurements. Continuous time stochastic modelling provides a way of doing this through the use of stochastic differential equations (SDE's). Models based on SDE's are appealing, because they combine the advantages of deterministic models, which are well-suited for describing nonlinear behaviour, and stochastic models, which are well-suited for describing systems influenced by noise. Furthermore, since SDE's are structurally similar to ODE's, conventional modelling based on first engineering principles can still be applied to set up such models, which in turn preserve their physical interpretability. Previous work in this area include the work of Madsen and Melgaard (1991) and Bohlin and Graebe (1995) and references therein. In the following the individual elements of the continuous time stochastic modelling cycle are described in detail, emphasizing the important decisions to be made to ensure the desired prediction performance of the model. 2.1. Model formulation Within the proposed modelhng cycle, the first step deals with formulation of a basic model in the form of a continuous-discrete stochastic state space model, which consists of a set of SDE's describing the dynamics of the system in continuous time and a set of algebraic equations describing measurements at discrete time instants, i.e. dxt = f{xu Ut, t, 6)dt + a{ut, t, 6)diJt
(1) (2)
44
where ^ € R is time, cc^ G Af C M^ is a vector of state variables, Ut GU (Z M.^ is a vector of input variables, yj^ G y cW is a. vector of measured output variables, 6 e Q CW is a vector of unknown parameters, /(•) GM"", cr(-) G M " ^ ^ and /i(-) EM} are nonlinear functions, u:t is a g-dimensional standard Wiener process and e^ G N {0,S{uk,tk,6)) is an /-dimensional white noise process. SDE's may be interpreted both in the sense of Stratonovich and in the sense of Ito, but since the Stratonovich interpretation is unsuitable for parameter estimation (Jazwinski, 1970; Astrom, 1970), the Ito interpretation is used in the following. 2.2. Parameter estimation The second step in the proposed modelling cycle deals with estimation of the unknown parameters in (l)-(2) using data sets from one or more experiments. 2.2.1. Estimation methods The solution to (1) is a Markov process, which means that an estimation scheme based on statistical methods can be applied, e.g. maximum likelihood (ML) or maximum a posteriori (MAP). Maximum likelihood estimation. Given the model structure in (l)-(2) ML estimates of the unknown parameters can be determined by finding the parameters 0 that maximize the likelihood function of a given sequence of measurements JJQ^ y^, . . . , y^, . . . , y^y. Introducing the notation 3^A: = [2/fc,2/fe-i,.-.,2/1,2/0]
(3)
the likehhood function is the conditional probability density (4)
L{yN\o)=p{yN\e) or equivalently / N
L{yN\e)= npiy,\yk-i.o)
\
p{y,\e)
(5)
k=i
where the Markov property of the solution to (1) has been applied to form a product of conditional probability densities. To find the true ML solution the initial probability density function must be known and all subsequent conditional probability densities must be determined via successively solving Kolmogorov's forward equation and applying Bayes' rule (Jazwinski, 1970). In the general case this is infeasible, so an alternative approach is needed. Nielsen et al. (2000) have recently reviewed the state of the art with respect to parameter estimation in discretely observed Ito SDE's and found that, in the general case of higher-order partially observed systems with measurement noise, only methods based on approximate nonlinear filters provide a feasible solution to this problem. However, since the diffusion term cr(-) in (1) does not depend on the state variables Xt, a method based on the much simpler extended Kalman filter, which is a linear filter, can
45
be applied. By introducing yk\H-i=E{y^\yk-ue}
(6)
Rk\k-i
(7)
= v{yk\yu-i,e}
£k = yk-
(8)
yk\k-i
and by assuming t h a t the conditional probabihty densities are Gaussian, the likelihood function then becomes
/N
expf-lelRzl_,ek^
Liy^m = , n
,
'"
* fc=i - /det (Rk\k-i)
\
p(y^i0)
'^"^ (V2^)
(9)
'
and the parameter estimates can be determined by further conditioning on j/g and solving the nonlinear optimisation problem ^ = argmin{-ln(L(3;;v|0,yo))}
(10)
For each set of parameters 6 in the optimisation, the innovations e^ and their covariances Rk\k-i can be computed recursively by means of a continuous-discrete extended Kalman filter, i.e. by the output prediction equations Vkik-i = h{xk\k-i,
Uk, tk, 6)
(11)
Rk\k-\ = CPk\k-iC
+ S
(12)
the innovation ^k = yk-
equation (13)
Vkik-i
the Kalman gain equation (14)
Kk = Pk\k-iC^Rk\k-\ the updating equations Xk\k = Xk\k-i + KkSk
(15)
Rk\k = Pk\k-i ~ KkRk\k-iK^
(16)
and the state prediction
equations
dxt\k = f{xt\k,uut,e) dt 4 ^
dt
_
>1D.. , jy.
(17) AT ,
= APt\k^Pt\kA^^(TCT^
^,T
(18)
where the latter are solved over the interval t G \tk^tk+i[' notation ^ ^ '^\^k\k-i.'^kM
V ^ = '^\^k\k-i,uk,tk
In the above equations the
5 ^ = cr{ut,t,0)
,S =
S{uk,tk,S)
46 has been used. Initial conditions for the extended Kalman filter are Xt^to = ^o and Pt\to = PQ, which can be prespecified or estimated as a part of the overall problem. Being a linear filter the extended Kalman filter is sensitive to nonlinear effects, and the approximate solution obtained by solving (17)-(18) may be too crude (Jazwinski, 1970). Furthermore, the assumption of Gaussian probability densities is only likely to hold for small sample times, so to provide a better approximation, the time interval [tjfc, tk-\-i[ is subsampled, i.e. [t^,..., t^,..., tfe+ifj and the equations are linearized at each subsampling instant. This way numerical solution of (17)-(18) can also be avoided by applying the analytical solutions to the corresponding linearized propagation equations between subsamples ^
= fo + Aixt-Xj)
+ B{ut-Uj)
^
= AP,y + P,yA^ + aa^
(19) (20)
over the interval t G [tj, tj^i [. In the above equations the notation •A = ^^\xjy_i,uj,tj
, B = •Q^\xj\j_-i_,uj,tj , fo = f{xj\j-i,Uj,tj,6j
, a =
has been used. The analytical solutions are Xj+iij = Xjij + A-^ (*, -I)fo+ Pj+ilj = ^sPjij^J
(A-^ ( $ , - I ) - In) A^^Ba
+ ^'e^o-o-^e^^'^rfs
(21) (22)
0
where r^
'• ^ j + 1 -tj
and $ 5 = e^^^ , and where
-Uj
^J+1 a = =^—^
(23)
^j+l -tj
has been introduced to allow both zero order hold (a = 0) and first order hold (a 7^ 0) on the inputs. The matrix exponential $5 = e^'^^ can be computed by means of a Fade approximation with repeated scaling and squaring (Moler and van Loan, 1978), but as it turns out, both $s and the integral in (22) can be computed simultaneously by instead computing the matrix exponential
and combining submatrices of the result (van Loan, 1978), i.e.
^s = Hl{rs) ^'' e'^'aa^e^^'ds
=
HI{TS)H2{TS)
(25) (26)
0
The solution (21) to (19) is undefined if A is singular, but by introducing a coordinate transformation based on the singular value decomposition of A (19) can also be solved for singular A.
47 Maximum a posteriori estimation. If prior information about the parameters is available in terms of a prior probability density function p(^) for the parameters, Bayes' rule can be applied to give an improved estimate of the parameters by forming the posterior probability density function
p{e\yN) = ^ ^ » M oc L{y,\e)p{e)
(27)
and subsequently finding the parameters that maximize this function, i.e. by performing MAP estimation. By assuming that the prior probability density of the parameters is Gaussian, and by introducing tJ^e = H^} Se = V{e}
(28) (29)
€e = 0 - / x e
(30)
the posterior probability density function becomes / ^
exp (-lelRl^_,eu\ \
exp(-|4S^
*fc=i• /det {Rk\k-i) {y/2^) '
^/'^^^ (^e) < v27r)
and the parameter estimates can now be determined by further conditioning on J/Q ^^^ solving the nonlinear optimisation problem d = arg min {- In (p(0|3^jv, Vo))}
(32)
crGv7
If no prior information is available (with p{6) uniform), this formulation reduces to the ML formulation in (10), and it can therefore be seen as a generahzation of the ML formulation, which increases the flexibility of the estimation scheme. In fact, this formulation also allows MAP estimation on only some of the parameters (with p{0) partly uniform), which increases the flexibility of the scheme even further. Using several independent data sets. If, instead of a single sequence of measurements, several consecutive, but yet separate, sequences of measurements, i.e. yj^^, yj^^, . . . , y^., . . . , y^^, possibly of varying length, are available, a similar estimation method can be applied by expanding the expression for the posterior probability density function to an even more general form
„„., „ (f, ^ft5!EHa^£^^ ^,^.„,^ ^ » i=i » fc=i - /det (Rl^k-i) (V2^) where
'
^
) ,33,
' ^/^^* (^«) (v27r)
48 and where the individual sequences of measurements are assumed to be stochastically independent. The parameter estimates can now be determined by further conditioning on Yo = [2/05 2/o5 • • •' 2/o5 • • • 5 2/o] ^^^ solving the nonlinear optimisation problem e = argmin{-ln(p(^|Y,yo))}
(35)
If only one sequence of measurements is available {S = 1), this formulation reduces to the MAP formulation in (32), and it can therefore be seen as a generalization of the MAP formulation, which further increases the flexibility of the estimation scheme. 2.2.2. Data issues Raw data sequences are often difficult to use for identification and parameter estimation, e.g. if irregular sampling has been applied, if there are occasional outliers or if some of the observations are missing. The present parameter estimation scheme also provides features to deal with these issues, making it very flexible with respect to the kinds of data that can be used for the estimation. Irregular sampling. The fact that the system equation (1) is continuous makes it very easy to deal with irregular sampling, because the corresponding state prediction equations (17)-(18) of the extended Kalman filter can simply be solved over time intervals of varying length. Occasional outliers The objective function of the general formulation (35) is quadratic in the innovations e^, and this means that the corresponding parameter estimates are much influenced by occasional outliers. To deal with this, the objective function can be modified by replacing the quadratic term
4 = (ei)^(ilt|,_i)-iei
(36)
with a function ^{i^D^ which simply returns the argument for small values of z^|., but is a linear function of e^ for large values of i^l. One such function is given by
where c > 0 is a constant. The derivative of this function with respect to z/|.
iMiJK
--JtS
(38)
is called the influence function and goes to zero for large values of ul. Missing observations. The algorithms within the present parameter estimation scheme make it easy to handle missing observations, i.e. to account for missing values in the output vector yl when calculating the term (39)
49 in (33) for some i and some k. The usual way to account for missing or non-informative values in an extended Kalman filter is to formally set the corresponding elements of the covariance matrix S in (12) to infinity, which in turn gives zeroes in the corresponding elements of {Rk\k-i)~^ and the Kalman gain matrix Kk, meaning t h a t no updating will take place in (15)-(16) based on the missing values. This approach cannot be used when calculating (39), however, because a solution is needed which modifies el and RW-i to reflect t h a t the effective dimension of yl is reduced due to the missing values. This is accomplished by replacing (2) with the alternative measurement equation yk = E{h{xk,Uk,tk,0)
+ ek)
(40)
where E is an appropriate permutation matrix, which can be constructed from a unit matrix by eliminating the rows corresponding to missing values in yj^. If, for example, y^ has three elements, and the one in the middle is missing, the appropriate permutation matrix is E=^'
(41) ^ ^
' ''^ 0 0 1
Equivalently, the equations of the extended Kalman filter are replaced with the alternative output prediction equations tk\k-i
= Eh{xk\k-uUk,
nk\k-i
= ECPk\k-iC^E^
tk, 0) + ESE^
(42) (43)
the alternative innovation equation (44)
'^k = y~k-^k\k-i the alternative Kalman gain equation Kk ^ Pk\k-iC
(45)
E Rk\k-i
and the alternative updating equations ^k\k = Xk\k~i + KkSk
(46)
Pk\k = Pk\k-i - 'KkRk\k-iKl
(47)
whereas the state prediction equations are the same. These replacements in turn provide the necessary modifications of (39) to .
expr_i(4f(fi;
)-iej;^ (48)
^det (a^uk-i^ ^k\k-
(V2^)'
except for the fact t h a t / must also be reduced with the number of missing values in y^.
50 2.3. Statistical tests and residual analysis The third step in the proposed modelhng cycle deals with investigation of the properties of the model once the unknown parameters have been estimated. This step involves applying statistical tests and performing residual analysis to assess the quality of the model. 2.3.1. Statistical tests An estimate of the uncertainty of the parameter estimates can be obtained by using the fact that the sampling distribution of the estimator in (35) is asymptotically Gaussian with mean 6 and covariance Sfl = H-'
(49)
where the matrix H is given by
and where an estimate of H can be obtained by equating the expectation with the observed value, i.e.
This equation can therefore be used to approximate the covariance matrix E^, which can then in turn be used for calculating the standard deviations of the estimates and their correlation matrix via Eg = a-gRcT^
(52)
where CTQ is a diagonal matrix of the standard deviations and R is the correlation matrix. The validity of the approximation depends on the amount of data used in the estimation, because of the Gaussianity assumption is only assymptotically correct. However, it is the experience of the authors that the amount of data needed to be sufficiently close to the limiting Gaussian distribution is often moderate, so this is not a serious limitation. The asymptotic Gaussianity of the estimator also allows t-tests to be performed to test the hypothesis, ifo, that any given parameter is marginally insignificant. The test quantity is the value of the parameter estimate, 6j^ divided by the standard deviation of the estimate, cr^., and under HQ this quantity is asymptotically ^-distributed with a number of degrees of freedom that equals the number of data points minus the number of estimated parameters, i.e. z{ej) = ^
e ^3
t
^Ni-p
(53)
1=1
To test the hypothesis, HQ, that some parameters are simultaneously insignificant, several tests can be applied, e.g. a likelihood ratio test, a Lagrange multiplier test or
51 a test based on Wald's W-statistic. Under iJo these test quantities all have the same asymptotic x^-distribution with a number of degrees of freedom that equals the number of parameters to be tested for insignificance (Hoist et a/., 1992), but in the context of the proposed modelling cycle the test based on Wald's W-statistic has the advantage that no re-estimation is required. The statistic is computed as follows Wid,)=f,j:7%
e
x'''dini(0,)'i
(54)
where ^* C ^ is the subset of the parameter estimates subjected to the test and S^^ is the corresponding covariance matrix, which can be computed from the full covariance matrix as follows E^^ = E^i:^E
(55)
where E is an appropriate permutation matrix, which can be constructed from a unit matrix by ehminating the rows corresponding to parameter estimates not subjected to the test. Tests for insignificance are important in terms of investigating if the structure of the model is correct. In principle, insignificant parameters are parameters that may be eliminated, and the presence of such parameters is therefore an indication that the model is incorrect or over parameterized. On the other hand, because of the particular nature of models of the type (l)-(2), where the diffusion term a{') in (1) is included to account for process noise due to approximation errors or unmodelled inputs, the presence of significant parameters in this term is an indication that the drift term /(•) in (1) is not correct. More details about tests for significance and other tests can be found in Hoist et al (1992). 2.3.2. Residual analysis Another important aspect in assessing the quality of the model is to investigate its predictive capabilities by performing cross-validation and examining the corresponding residuals. Depending on the intended application of the model this can be done in both a one-step-ahead prediction setting or in a pure simulation setting. In either case a number of different methods can be applied (Hoist et al, 1992). One of the most powerful of these methods is to compute and inspect the sample autocorrelation function (SACF) and the sample partial autocorrelation function (SPACF) of the residuals to detect if there are any significant lag dependencies, as this indicates that the predictive capabilities of the model are not perfect. Nielsen and Madsen (2001) recently presented extensions of these inherently linear tools to nonlinear systems in the form of the lag-dependence function (LDF) and the partial lag-dependence function (PLDF), which are based on the close relation between correlation coefficients and values of the coefficients of determination for regression models and extend to nonhnear systems by incorporating nonparametric regression. More specifically, to derive the LDF, which is a generalization of the SACF, the equivalence between the squared correlation coefficient between the stochastic variables Y and Xk, i.e. V{Y] - V{Y\Xk} ji Po{k) = V{V}~
^^^^
52 and the coefficient of determination of a linear regression of observations of Y on observations of Xfc, i.e. 2
_ SSQ - g5o(fc)
^«(^)-
5 ^ —
(^^)
is used. In (57) SSQ = Y^^iiVi — X^^ |f)^ and S'S'o(fc) is the sum of squares of the residuals from the regression, and the equivalence is due to i?Q/^N being the ML estimate of pQ/^^ when Gaussianity is assumed. For a time series of observations of a stationary stochastic process {Xt}, the squared SACF at lag k is equivalent to the squared correlation coefficient pg/^x between Xt and Xt-k^ and it can therefore be closely approximated by the corresponding value of R^rj^) obtained via a linear regression of observations of Xt on observations of Xt-k- Replacing the linear regression with a nonparametric fit of the conditional mean fk{x) = E{Xt\Xt-k = ^ } , e.g. by using a locally-weighted polynomial smoother, the LDP can be defined as a straightforward extension of the SACF, i.e. LDF(fc) = s i g n ( A ( 6 ) - A ( a ) ) ^ % J
(58)
where a and b are the minimum and maximum over the observations, and where Rhj^v is the corresponding value of the coefficient of determination. The sign is included to provide information about the average slope. To derive the PLDF, which is a generalization of the SPACF, the equivalence between the squared partial correlation coefficient between the stochastic variables (l^|-^i, •.., Xk-i) and(Xfc|Xi,...,XA;_i), i.e. 2 _V{Y\Xu--.,Xk-i}-V{Y\X,,...,Xk} P(o.)|(i,...,.-i) - — ]/{y|Xi,...,X,_i}
,_„. ^^^^
and the coefficient of determination
is used. In (60) 55O(I,...,A;-I) is the sum of squares of the residuals from a linear regression of observations of Y on observations of ( X i , . . . , Xk-i) and 550(1,...,A;) is the sum of squares of the residuals from a linear regression of observations of Y on observations of ( X i , . . . , X^), and the equivalence is due to Rfok)\(i,. .,k-i) being the ML estimate of P?o/e)|(i,...,A;-i)/^^^^ Gaussianity is assumed. For a time series of observations from a stationary stochastic process {X^}, the squared SPACF at lag k is equivalent to the squared partial correlation coefficient plok)\(i,...,k-i) between (Xt|Xt_i,... ,Xt_(/c-i)) and {Xt-k\Xt-i,...,Xt-(k-i)), and it can therefore be closely approximated by the corresponding value of ^?OA;)|(I,...,A;-I) obtained via linear regressions of observations of Xt on observations of (X^-i,..., Xt-(k-i)) and on observations of (Xt_i,..., Xt-k), i-e. via fits of the AR model Xt = (f)jQ + (l)jiXt-i +
h (j^jjXt-j -\-et
(61)
53 for j = k - l,k. Replacing the AR models with additive models (Hastie and Tibshirani, 1990), i.e. Xt = fjo + fn {Xt-i) + ••• + fn (Xt-j) + et
(62)
for j = A; — 1,fc,where each fji is fitted nonparametrically, e.g. by using a locally-weighted polynomial smoother, the PLDF can be defined as a straightforward extension of the SPACF, i.e. PLDF(fc) = sxgniMb)
- Ma)).
%^^^^^^
(63)
where a and b are again the minimum and maximum over the observations, and where Rfok)\(i k-i) ^^ ^^^ corresponding value of the coefficient of determination. Again, the sign is included to provide information about the average slope. Being an extension of the SACF, the LDF can be interpreted as being, for each A:, the part of the overall variation in the observations of X^, which can be explained by the observations of Xt-k- Likewise, being an extension of the SPACF, the PLDF can be interpreted as being, for each k, the relative decrease in one-step-ahead prediction variation when including Xt-k as an extra predictor. However, unlike the SACF and SPACF, the LDF and PLDF can also detect certain nonlinear dependencies and are therefore extremely useful for residual analysis. More details about the LDF and PLDF and other similar tools can be found in Nielsen and Madsen (2001). Details about locallyweighted polynomial smoothers and additive models can be found in Hastie and Tibshirani (1990) and Hastie et al (2001). 2.4. Model validation or invalidation The last step in the proposed modelling cycle deals with model validation or invalidation, or, more specifically, with whether, based on the information gathered in the previous step, the model is invalidated with respect to its intended application or not. If the model is invalidated, the modelling cycle is repeated by first changing the structure of the model in accordance with the information gathered in all steps of the previous cycle. 2.5. Software implementation The parameter estimation scheme described in Section 2.2 and some of the features for applying statistical tests and performing residual analysis described in Section 2.3 have been implemented in a software tool called CTSM (Kristensen et a/., 2001), which is an extension of the tool presented by Madsen and Melgaard (1991). 3. CASE STUDY: MODELLING A FED-BATCH BIOREACTOR To illustrate how the proposed modelling cycle can be used to develop or improve the quality of a first engineering principles model, a simple simulation example is given. The process considered is a fed-batch bioreactor described by a simple unstructured model of
54 biomass growth, i.e. (dX\ ( f^{S)X-^ \ ^ rf5 I = • - ^ + ( ^ £ ^ ,dt+
0
F
dV
hi\ y^
fan ,0
/X\ k
^
/eA
/e,\ k
^^
k
^^
0 (722 0
Ol 0 ,du;t ,tG
[0,3.8]
(64)
(733
/N(0,S„)\ k
^(O'^^aa)
where X is the biomass concentration, S is the substrate concentration, V is the volume of the fermenter, F is the feed flow rate, Sp (=10) is the feed concentration of substrate, Y (=0.5) is the yield coefficient of biomass and /i(S') is the growth rate, for which three different cases are considered, i.e. • A model structure with linear kinetics: M(S) = ^imaxS
(66)
• A model structure with Monod kinetics: S (67)
f^{S) = fimaxj^^ • A model structure with Monod kinetics and substrate inhibition: ^^^) = ^-K,S^
+ S + K,
^''^
In the following the model consisting of equations (64), (65) and (68) with K2 = 0.5, i.e. the model with Monod kinetics and substrate inhibition, is regarded as the true process to be modelled, and using the true parameter values in the top row of Table 3 the two data sets shown in Figure 3 are generated. Each data set consists of 101 equidistant samples of F , 2/1, 2/2 and ?/3 generated by stochastic simulation using a Milstein scheme with F being perturbed along an analytically determined optimal trajectory corresponding to maximum biomass productivity. The noise levels used are typical for real experiments of this type, if not slightly exaggerated. In the following, the data set on the left hand side of Figure 3 is used for estimation and the data set on the right hand side is used for vahdation. It is assumed that the intended purpose of the model to be developed is simulation or infinite-horizon prediction, e.g. for use in a model predictive controller (MFC). The essential performance criterion for the model to be developed is therefore that it has good prediction capabilities in pure simulation. 3.1. First cycle: Linear kinetics The model consisting of equations (64), (65) and (66), i.e. the model with linear kinetics, is regarded as an existing first engineering principles model to be investigated. In the context of the first step of the modelling cycle this is therefore the basic model.
55
.......O-il 1.5
2
ul>«i^ 2
2.5
2.5
(b) Validation data.
(a) Estimation data.
Figure 3. The data sets used for estimation y2, dash-dotted: ys).
and validation.
(Solid: F, dashed: yi, dotted:
Moving to the second step, the unknown parameters of the model are estimated with C T S M (ML on the estimation data set), and this gives the results shown in Table 1. Moving to the third step of the modelling cycle, Table 1 also includes the t-scores for performing tests for marginal insignificance. These tests show that ass and S22 are marginally insignificant, whereas all other parameters are marginally significant, including an and a22- Recalling t h a t the presence of significant parameters in the diffusion term o'(-) in (1) is an indication of approximation errors or unmodelled inputs in the drift term / ( • ) in (1), this in turn indicates that the drift terms of the equations for X and S in (64) are not correct in terms of describing the variations in the estimation data set. Otherwise an and (J22 would also have been insignificant. To investigate this further, residual analysis is performed. The left part of Figure 4 shows one-step-ahead prediction results on the validation data set and Figure 5 shows the SACF, SPACF, LDF and P L D F for the corresponding residuals. The one-step-ahead prediction results show discrepancies between the true and predicted values of yi and 2/2, which is confirmed by inspecting the corresponding residuals. Furthermore, the SACF,
Table 1 Estimation
results using linear
Parameter True value Estimate Std. Dev. t-score Significant
XQ 1 1.053 0.054 19.45 Yes
SQ 0.245 0.244 0.030 8.136 Yes
kinetics. VQ 1 1.000 0.011 93.63 Yes
fimax
(y\i
(^22
(^2,2,
0.804 0.052 15.33 Yes
0 0.442 0.061 7.245 Yes
0 0.451 0.035 13.06 Yes
0 0.000 0.000 0.000 No
-
Sii
0.01 0.004 0.001 3.067 Yes
0.001 0.000 0.000 0.000 No
0.01 0.011 0.001 7.221 Yes
56
(a) One-step-ahead prediction.
(b) Pure simulation.
Figure 4. Cross-validation (CV) results for the model structure with linear kinetics. (Solid: predicted values, dashed: true yi, dotted: true y2, dash-dotted: true ys).
SPACF, LDF and PLDF all reveal a significant lag dependence at lag 1 in the yi residuals. Especially the LDF and PLDF also reveal several significant lag dependencies in the y2 residuals, whereas in the ys residuals there are no significant lag dependencies. These results all provide additional evidence to suggest that the drift terms of the equations for X and S in (64) are not correct. A final piece of evidence that something is wrong is gathered from the pure simulation results in Figure 4. Moving to the last step of the modelling cycle, the information now available clearly invalidates the model, and the cycle must therefore be repeated by modifying the structure of the model. 3.2. Second cycle: Monod kinetics The information available suggests that it is the drift terms of the equations for X and S that need to be modified, i.e. precisely those parts of the model that depend on /x(5). Replacing (66) with (67) to yield a model with Monod kinetics and re-estimating the unknown parameters with CTSM, the results shown in Table 2 are obtained.
Table 2 Estimation results using Monod kinetics. Parameter True value Estimate Std. Dev. t-score Significant
XQ 1 1.042 0.014 72.93 Yes
SQ 0.245 0.250 0.010 24.94 Yes
VQ 1 0.993 0.001 689.3 Yes
fimax
-
0.737 0.008 96.02 Yes
Ki
-
0.003 0.001 2.396 Yes
(Jii
Cr22
<^33
0 0.104 0.018 5.867 Yes
0 0.182 0.010 18.26 Yes
0 0.000 0.000 1.632 No
Sii
S22
S33
0.01 0.008 0.001 6.453 Yes
0.001 0.000 0.000 3.467 Yes
0.01 0.011 0.003 3.801 Yes
57
Tl , I
rrrn 111
~r^
11
I I I
I M
I
xa
I M ,1 I I I I
.1
-r^r
Figure 5. One-step-ahead prediction CV residuals and SACF, SPACF, LDF and PLDF for the model structure with linear kinetics. (Top: yi, middle: 2/2; bottom: y^).
Results of ^-tests for marginal insignificance now show that the only insignificant parameter is 0-33, whereas GH and a22 are still significant. This in turn indicates that the drift terms of the equations for JC and S in (64) are still not correct in terms of describing the variations in the estimation data set, and to confirm this, residual analysis is performed. One-step-ahead prediction results on the validation data set are shown in Figure 6 and Figure 7 shows the SACF, SPAGF, LDF and PLDF for the corresponding residuals. The one-step-ahead prediction results and the corresponding residuals now show no immediate discrepancies, but inspecting the SACF, SPACF, LDF and PLDF for the 2/2 residuals reveals a significant lag dependence at lag 1, providing additional evidence to suggest that something is wrong in the equation for S in (64). A final piece of evidence is gathered from the pure simulation results in Figure 6. The information now available again invalidates the model, since its intended purpose is simulation, and the modelling cycle must therefore be repeated by modifying the structure of the model. On the other hand, if the intended purpose of the model was one-step-ahead prediction, it might now be suitable. 3.3. Third cycle: Monod kinetics with substrate inhibition Again, the information available suggests that the drift terms of the equations for X and S in (64) need to be modified. By replacing (67) with (68) to yield the correct model with Monod kinetics and substrate inhibition and re-estimating the unknown parameters
58
(a) One-step-ahead prediction.
(b) Pure simulation.
Figure 6. Cross-validation (CV) results for the model structure with Monod kinetics. (Solid: predicted values, dashed: true yi, dotted: true y2, dash-dotted: true y^).
witE-CTSM/ttie results show t-tests for marginal significance now indicate that CTH, (J22 and (J33 are all insignificant, indicating that there are no longer any significant approximation errors or unmodelled inputs to be accounted for in the drift terms of the equations in (64). To test the hypothesis of simultaneous insignificance of all three parameters, a test based on Wald's W-statistic is performed, showing that the hypothesis indeed cannot be rejected. Additional evidence that the modified model is correct is gathered by performing residual analysis. One-stepahead prediction results on the vahdation data set are shown in Figure 8, and the SACF, SPACF, LDF and PLDF for the corresponding residuals are shown in Figure 9. Again, the one-step-ahead prediction results and the corresponding residuals show no discrepancies, and no significant lag dependencies are now revealed. A final piece of evidence of the validity of the modified model is gathered from the pure simulation results in Figure 8. In summary, since the intended purpose of the initial model was simulation or infinitehorizon prediction, e.g. for use in an MFC controller, it has been now been invalidated and a more reliable model has been developed.
Tables Estimation results using Monod kinetics with substrate inhibition. Parameter True value Estimate Std. Dev. t-score Significant
XQ 1 1.004 0.010 101.0 Yes
SQ 0.245 0.262 0.008 32.75 Yes
VQ 1 1.003 0.007 143.3 Yes
fimax 1 0.999 0.009 109.4 Yes ,
Ki 0.03 0.030 0.007 4.240 Yes
an a22 0 0 0.000 0.000 0.000 0.000 0.003 0.005 No : No
0^33 0 0.000 0.000 0.003 No
0.01 0.009 0.001 7.142 Yes
S22 0.001 0.001 0.000 7.391 Yes
S33 0.01 0.011 0.001 7.193 Yes
59
I
iz
"^n~
'
I
•
'
I
•
I
'
I
J,.:
'
I.
1 ! "
Figure 7. One-step-ahead prediction CV residuals and SACF, iSPA'CF; LBF and PLDF for the model structure with Monod kinetics. (Top: yi, middle: y2, bottom: ys).
4. CONCLUSION Some relationships between the modelHng purpose and the tools used in and decisions made during the process model identification cycle have been investigated both theoretically and practically with regards to modelling for the purpose of prediction. In particular, attention has been focused on analysis of model defects and synthesis of improved models on the basis of this analysis, and it has been indicated that continuous time stochastic modelling, i.e. modelling based on stochastic differential equations, is particularly appealing in this sense for a number of reasons. More specifically, this approach allows conventional modelling principles to be applied to formulate the model, which means that prior physical knowledge about the system can be included and that the parameters of the model can easily be given a physical interpretation. At the same time, however, the inclusion of stochastic terms to account for approximation errors and unmodelled inputs not only improve the predictive capabilities of the model, but also allow application of statistically sound methods for investigating and revealing model defects. A simulation example has also been given of modelling a fed-batch bioreactor, and this example demonstrates how continuous time stochastic modelling and particularly the application of various statistical tests and some novel residual analysis tools based on nonparametric regression can be used to detect specific model defects and provide guidelines for the steps to take to synthesize a more rehable model.
60
(a) One-step-ahead prediction.
(b) Pure simulation.
Figure 8. Cross-validation (CV) results for the model structure with Monod kinetics and substrate inhibition. (Solid: predicted values, dashed: trueyi, dotted: truey2, dash-dotted: true ys).
References Andersen, H. W.; Rasmussen, K. H. and J0rgensen, S. B. (1991). Advances in Process Identification. In W. H. Ray and Y. Arkun, editors, Chemical Process Control - 4i pages 237-269, New York, USA. AIChE. Astrom, K. J. (1970). Introduction to Stochastic Control Theory. Academic Press, New York, USA. Astrom, K. J. and Eykhoff, P. (1971). System Identification - A Survey. 7(2), 123-162.
Automatica,
Bohlin, T. and Graebe, S. F. (1995). Issues in Nonlinear Stochastic Grey-Box Identification. International Journal of Adaptive Control and Signal Processing^ 9, 465-490. Hastie, T. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman k Hall, London, England. Hastie, T. J.; Tibshirani, R. J. and Friedman, J. (2001). The Elements of Statistical Learning - Data Mining, Inference and Prediction. Springer-Verlag, New York, USA. Hoist, J.; Hoist, U.; Madsen, H. and Melgaard, H. (1992). Validation of Grey Box Models. In L. Dugard; M. M'Saad and I. D. Landau, editors. Selected Papers from the 4th IFAC Symposium on Adaptive Systems in Control and Signal Processing, pages 407-414. Pergamon Press.
61
I I I
iJ I
I 'I
I.
. 1
I,,
~T^~r
Figure 9. One-step-ahead prediction CV residuals and SACF, SPACE, LDF and PLDF for the model structure with Monod kinetics and substrate inhibition. (Top: yi, middle: y2, bottom: ys).
Jazwinski, A. H. (1970). Stochastic Processes and Filtering Theory. Academic Press, New York, USA. j0rgensen, S. B. and Lee, J. H. (2001). Recent Advances and Challenges in Process Identification. In J. B. Rawling and B. A. Ogunnaike, editors, Chemical Process Control - 6, New York, USA. AIChE. Kristensen, N. R.; Melgaard, H. and Madsen, H. (2001). CTSM2.0Lyngby, Denmark.
User's Guide. DTU,
Ljung, L. (1987). System Identification: Theory for the User. Prentice-Hall, New York, USA. Madsen, H. and Melgaard, H. (1991). The Mathematical and Numerical Methods Used in CTLSM. Technical Report 7/1991, IMM, DTU, Lyngby, Denmark. Moler, C. and van Loan, C. F. (1978). Nineteen Dubious Ways to Compute the Exponential of a Matrix. SIAM Review, 20(4), 801-836. Nielsen, H. A. and Madsen, H. (2001). A Generalization of some Classical Time Series Tools. Computational Statistics and Data Analysis, 37(1), 13-31.
62 Nielsen, J. N.; Madsen, H. and Young, P. C. (2000). Parameter Estimation in Stochastic Differential Equations: An Overview. Annual Reviews in Control, 24, 83-94. Soderstrom, T. and Stoica, P. (1989). System Identification. Prentice-Hall, New York, USA. van Loan, C. F. (1978). Computing Integrals Involving the Matrix Exponential. IEEE Transactions on Automatic Control, 23(3), 395-404.
Dynamic Model Development: Methods, Theory and Applications S.P. Asprey and S. Macchietto (editors) © 2003 Elsevier Science B.V. All rights reserved
63
Multivariate Weighted Least Squares as an Alternative to the Determinant Criterion for Multirespouse Parameter Estimation P.W. Oxby, T.A. Duever* and P.M. Reilly Department of Chemical Engineering, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
Box and Draper's [1] determinant criterion for multiresponse parameter estimation is commonly used in preference to ordinary least squares when the measurement error covariance matrix is unknown. Phillips [2] has shown that the determinant criterion is numerically equivalent to an iterated generalized least squares scheme. From this equivalence, it is shown that, of all such weighting schemes, the determinant criterion in a certain sense minimizes the estimated parameter variances. However, when the number of sets of measurements is not large relative to the number of responses, Monte-Carlo simulation, using a number of case studies, reveals that a multivariate weighted least squares (MWLS) scheme can give parameter variances that are smaller than those given by the determinant criterion. This has been demonstrated using three case studies, one of which involving a binary copolymerization example is presented here. The results suggest that the optimality property of the determinant criterion cited above is only asymptotically valid. Monte-Carlo simulation also reveals that, in contrast to multivariate weighted least squares, the determinant criterion can yield parameter estimates whose frequency distribution is very far from normal in the tails. Multivariate weighted least squares is therefore recommended as a robust alternative to the determinant criterion for multiresponse parameter estimation. 1. INTRODUCTION Chemical engineers frequently deal with models of physico-chemical processes, which contain parameters, which must be estimated by statistical means. The models usually consist of systems of equations where several responses are considered as functions of one or more independent variables. In our work we are interested in first principles, mechanistic models of polymerization
64 reactors. The development of these models is supported by an extensive experimental program, which is used to generate a database of physico-chemical parameters. Updating and augmenting the database leads to problems in nonlinear experimental design, model discrimination and multiresponse parameter estimation. An overview of the work in this area, is given in the paper by Duever and Penlidis [3]. Here we focus on the work reported first by Oxby [4], which deals specifically with the problem of parameter estimation for models of the type encountered in our research. The multiresponse parameter estimation problem considered here is complicated by the fact that different equations may share common parameters. However this complication, offers an opportunity in fact, since in principle, parameters can be better estimated from multiple responses than single responses. In section 2 of this paper, the problem will be described which motivated the research presented here. In section 3, the determinant criterion for multiresponse parameter estimation will be developed and its optimality property will be considered. Also an alternative to the determinant criterion, namely multivariate weighted least squares will be developed. Section 4 contrasts the differences between the alternative parameter estimation objective functions, using another example from polymerization modeling. Finally, in sectiori 5, we present our conclusions. 2. RESEARCH MOTIVATION An anomalous result associated with the use of the determinant criterion developed by Box and Draper [1] motivated the research into the alternative multivariate weighted least squares objective function for multiresponse parameter estimation. We discuss here the anomaly associated with an estimation problem described in the PhD thesis of Burke [5]. A second anomaly can be found in Oxby's thesis which he found re-examining the multiresponse estimation problem described by Box et al.[6]. In her thesis, Burke [5] considered the following reaction scheme in modeling binary copolymerization. The two reactants are labeled using subscripts 1 and 2 and Rn,i* denotes a propagating copolymer chain of length n ending in free radical monomer unit i, where i is 1 or 2, and Mj denotes monomer j . The four polymer chain propagation reactions can be represented by:
V + ^ y ^ ^ V ;
0=1,2
(1)
Although there are four propagation rate constants, kij, the rate is usually expressed in terms of the two homopolymerization rate constants kn and k22 and two monomer reactivity ratios defined by:
65
K\
_^22
(2)
The latter are the parameters of interest. Reactivity ratios can be estimated from triad fractions, which are the sequences of three consecutive monomer units in the copolymer chain. These fractions are denoted by A^k, where i, j and k denote monomer units 1 and 2. This gives a total of eight triad fractions but A211 and A112 are indistinguishable as are A221 and A122. This leaves a total of six distinct triad fractions. The relationship between the triad fractions and the reactivity ratios is given by:
An -
^12 ~
r^A'
(3a) (3b)
fi
(3c)
Kf'+2rJJ,+f^
where fi and £2 represent the mole fraction of monomers 1 and 2 in the feed mixture. Model predictions for monomer-2 centred triads are obtained by interchanging the subscripts for monomers 1 and 2. Note also that (4)
A l l "^ A12+211"'" A212 ~ ••A22 "^ A22+221 "*" A21 ~ •'•
X 0 Y 1 Z 0 A B+C D
0
{^--uf 1
l-a,, a,,
CFjl-CT,j\
^n
An A
^112+211
J L An All
^12
1-CT'
l-CTj^cTj^
2a^^(\-a^^)
a-cT,j
4 ^122+221
A22
(5)
66 Experimentally, triad fractions cannot be measured directly, but, for the copolymer styrene/methyl-methacrylate are linearly related to C^^-NMR spectral data (Aerdts [7]): where X,Y,Z,A,B,C, and D are relative peak areas in the C^^-NMR spectrum and (Ju = 0.44 and a22 = 0.23.
Burke measured experimental data from which the reactivity ratios can be estimated. The data, given on page 160 of her thesis is given below and plotted in Figure 1.
Table 1: C^^-NMR Data from Burke's Thesis i 1 2 3 4 5 6 7 8
fi
0.212 0.225 0.393 0.397 0.517 0.517 0.791 0.792
X 0.175 0.203 0.169 0.156 0.119 0.120 0.046 0.058
Y 0.548 0.538 0.606 0.533 0.624 0.610 0.747 0.765
Z 0.277 0.260 0.224 0.311 0.257 0.270 0.217 0.177
A 0.099 0.083 0.088 0.116 0.112 0.113 0.133 0.160
B+C 0.684 0.714 0.782 0.750 0.817 0.779 0.781 0.741
D 0.217 0.203 0.130 0.134 0.070 0.108 0.085 0.099
Since the peak areas are normalized, we should have X+Y+Z = A+B+C+D =1. Box et al. [6] and McLean et al. [8] discuss why this will cause difficulty in applying the determinant criterion. The difficulty is avoided by dropping two of the redundant observations. The peak denoted X was excluded because it is the smallest of the first three peaks (X, Y and Z) and the peak D was excluded because it should go to zero for comonomer feed rich in styrene monomer. The value of Z7, which has been highlighted in Table 1 because of the normalization check, showed a discrepancy subsequently attributed to a typographical error. The correct value should be 0.207 instead of 0.217. The difference of 0.01 provides the basis
67
0.4
0.6
1.0
Feed (mole fraction styrene)
1.0 (b) i;r 0.8
^ / ^ ^ ^
2 i 0.6
^
o o
—^
0.4
Q
+
D
A
O O
B+C D
^ \ ^
QQ
^- 0.2
^§c;i^]7^^ 0.0 0.0
0.2
0.4
8 0.6
0.8
Feed (mole fraction styrene)
Figure la y lb: MWLS Fit to Burke's C^^-NMR Data
1.0
68 for a sensitivity check on parameter estimates derived from this dataset. Table 2 below gives the parameter estimates fitted to the data , for both values of Z7, using the determinant criterion and multivariate weighted least squares (MWLS). The data and the fit from MWLS are graphically shown in Figures la and lb. Table 2: Point Estimates of Parameters (In ri, In r2) Z7 0.207 0.217
Determinant Criterion (-0.477,-0.373) (-0.440,-0.469)
MWLS (-0.439,-0.682) (-0.447, -0.682)
The standard deviation in Z is estimated from the MWLS residuals to be 0.027. This is almost three times the perturbation in Z7 of 0.01, so that one might expect that the perturbation would result in changes in the parameter estimates that are small relative to the parameter uncertainties. Estimates of the parameter uncertainties for the determinant criterion and MWLS are given by equations (13) and (15) respectively. These equations are based on a linearized model. Equation 3 is nonlinear in the parameters ri and 12. It happens that the model is more nearly linear with respect to the logarithm of the parameters. Since reactivity ratios cannot be negative, this transformation gives parameter uncertainty estimates that are more realistic. The approximate 50% joint confidence regions for the parameter estimates are determined using equation (6).
(0-0ji:-'(e-S)
(6)
where F(p,v,a) is the upper a quantile of the F-distribution with p and v degrees of freedom. The latter is the number of degrees of freedom associated with the parameter covariance matrix estimate. Oxby [4] provides some guidance on the selection of v for multiresponse models. Figures 2a and 2b show the approximate 50% joint confidence regions for the determinant criterion and MWLS respectively. The contours for the two values of Z7 are given by the solid (Z7 = 0.207) and broken (Z7 = 0.217) ellipses. Note that the areas of the joint confidence regions from the determinant criterion are about half the areas from the regions from MWLS. Also the shift between the solid and broken ellipses is much greater for the determinant criterion compared to MWLS. (A 50% joint confidence region was chosen instead of 95% to highlight this difference). These two observations present a contradiction in that the determinant criterion theoretically should give parameter estimates with a smaller variance that those given by MWLS. It is curious then that the parameter estimates from the determinant criterion are much more sensitive to a perturbation in the data. This finding led us to re-examine the optimality properties of the determinant criterion and ultimately propose MWLS as an alternative in certain cases.
69
-0.3 Inr2
-0.7 -0.6
-0.5
-0.4
-0.3
Inr1
Figure 2a: Approximate 50% Joint Confidence Region for the Determinant criterion -u.t -
. / - — - - > ^ •/ *^^^^^
^ v » v^ ^ .
!V I\
-0.6 -
^^V
X X
In r2
*^^V
-0.8 -
-1.0 -0.6
*\
"^V^^
1 J
N>S^
•
• /
"^^^^^^
*'x
'
^
^
\
1
1
\
1
1
-0.6
-0.5
-0.5
-0.4
-0.4
-0.3
In r1
Figure 2b: Approximate 50% Joint Confidence Region for MWLS
70
3. MULTIRESPONSE ESTIMATION AND THE DETERMINANT CRITERION In a multiresponse estimation problem, m dependent variables or responses yi,...ym are associated with an independent variable, x. For n measured values of the independent variable, xi,.. .Xn, there are mn measured values of the dependent variables, yn,.. .ymnBetween the independent variable x and m dependent variables y there are m functional relationships, fi,.. .fm, parameterized by a vector 9 of p parameters. Thus yy - fi(^jy^)'^^i
i=l,..M (responses) j = \,...,n (trials)
(7)
where the measurement errors, 8, are assumed to have zero mean, to be independent between measurement vectors, and to have constant covariance, given by the m by m matrix Eg, within measurement vectors. The mn deviations, yij - fi(xj,0), regarded as functions of 9, can be assembled into an m by n matrix Z(9). The functional dependence of Z on 9 will henceforth by imphcitly assumed. If the measurement errors, s, are assumed to be normally distributed, then Box and Draper [1] showed that the likelihood function for 9 and Eg is
L(e,sj <x \zX''" expL^tr[z(0/i:;'z(e)i
(8)
Bard [9] showed that the maximum likelihood estimate of Eg, as an implicit function of 9, is just the deviation covariance: i:^=ZZ'/n
(9)
By substituting (9) back into (8) it follows that the maximum likelihood estimate of 9 is that which minimizes |ZZ^|. Box and Draper [1] first developed this determinant criterion using a Bayesian argument. 3.1 The Determinant Criterion as an Iterated GLS Scheme The likelihood function (8), conditioned on a fixed estimate of the error covariance, Eg, is L(0\i;j^expl~tr[z(0/i;'Z(0)t
(10)
71 This can be written in terms of a more familiar quadratic form as
L(e\Ej c^exp|-^zr^/i-V^;|
(11)
where z = vec(Z) is a vector of length mn made up by the concatenated columns of Z and ^sn = ^« ® ^f is a block diagonal matrix whose n blocks are S^. Equation (11) has the form of a likelihood function for generahzed least squares (GLS) which justifies the following GaussNewton equation for determining the value of 6 that maximizes this (conditional) likelihood:
AO^^^ ^(x^ t^^xy' x^ E2z(e^^^)
(12)
where Xij = 5zi/50j for i = l...mn and j = l...p and k denotes the kth Gauss-Newton iteration. With the maximum likelihood estimate of 0 conditioned on E^ from (12), the maximum likelihood estimate of Eg conditioned on 0 can be determined from (9). When iterated to convergence, these two steps yield estimates of 9 and Eg that maximize the likelihood function (8). This iterated GLS scheme is, then, equivalent to the determinant criterion. This result is originally due to Phillips [2]. For the practical purpose of computation, the estimation of Eg from the residuals z, can be embedded within the Gauss-Newton steps (12). Since equation (12) can be regarded as giving an approximately linear relationship between the residuals, z, and the parameter estimates 0, then an estimate of the p by p parameter covariance matrix, Ee, is given by ^e
= (ZX'^)''
(13)
As pointed out by Kang and Bates [10], who developed (13) using a similar argument, the vaHdity of (13) is based on two assumptions. This first is that the linearization implicit in (12) is a good approximation to the nonlinear model functions, and the second is that the residual covariance, E^(^), is a good estimate of the true error covariance. 3.2 An Optimality Property of the Determinant Criterion Equation (12) can be generalized by replacing the inverse of the estimate of the error covariance matrix, E^^ , with an arbitrary symmetric ymmetric positive definite weight matrix, W:
Ae^''^=(x''wx)-'x'^Wz(e^''^)
(i4)
72 The estimate of the parameter covariance is:
i^ = (x'^wx)-' x''wi:^jvx(x''wx)-'
(i5)
If the determinant of the estimate of the parameter covariance matrix, E|9 , is minimized with respect to the elements of W, the solution is independent of X and is simply
^=i;„'
(16)
Substitution of (16) back into (14) gives (12) which, as has been shown, is equivalent to the determinant criterion. Therefore of all weighting schemes in the form of (14) , it is the one equivalent to the determinant criterion that minimizes Z^ , the determinant of the estimate of the parameter covariance matrix when the error covariance is unknown. This appears to be a rather strong result in support of the determinant criterion. It might also be noted that, unlike derivations based on likelihood, this result does not depend on the measurement errors being normally distributed. But it will be shown that the practical usefulness of the result does depend on the assumption that the residual covariance Z{6)Z{0) / « , is a good estimate of the true error covariance. Ideally one would want W to be the inverse of the error covariance matrix. The determinant in effect makes a compromise and substitutes an estimate of it. The potential problem here is that if the data set is not large, the residual covariance matrix may be a poor estimate of the error covariance matrix. A poor estimate of the error covariance matrix will lead to a poor estimate of the parameter covariance matrix. Therefore, although the determinant criterion gives the minimum determinant of the estimate of the parameter covariance matrix, if this estimate is poor, then the optimality property may be of little significance. This suggests that the optimality of the determinant criterion may be more relevant for large data sets than small ones. The simulation studies presented in Section 4 will confirm this to be true. 3.3 Multivariate Weighted Least Squares (MWLS) As an alternative to the determinant criterion, a two-step iterated weighting scheme will be considered. In the first step, the model parameters are estimated by minimizing a weighted sum of squares of deviations with respect to the model parameters: e = e s.t. e minimizes tr \z(ef
WZ(0)\
(17)
Here W is a diagonal weight matrix which can be initialized to the identity matrix. In the second step the diagonal elements of the weight matrix are set to the inverse of the diagonal elements of the residual covariance matrix:
73
w = dmgiz(9)Z(6//n\
(18)
The m diagonal elements of W are the inverse of the estimated error variances of the m responses. Because ^depends on W and W depends on 0, equations (17) and (18) are iterated to convergence. Therefore this is a multivariate weighted least squares scheme where the weights are determined iteratively from the variance of the residuals. Carroll and Ruppert [11] discuss a conceptually similar iteratively weighted least squares scheme in the context of uniresponse models where the measurement errors are assumed to be independent but heteroscedastic. The general Gauss-Newton equation (14) is apphcable to MWLS. The estimate of the parameter covariance matrix for MWLS is given by equation (15) and, unlike the case for the determinant criterion, no further simplification is possible in this case. Before concluding this section, a brief comment on the handling of redundant response variables will be made. The reason redundant response variables can and should be dropped when applying the determinant criterion is that the information in the redundant variables is implicit in the deviation covariance matrix. By in applying MWLS only the diagonal elements of this matrix are used as weights. Consequently, dropping redundant response variables can result in a loss of information for MWLS. Furthermore, retaining redundant response variables can do no harm because the weights will adjust themselves accordingly. 4. SIMULATION STUDIES Before discussing the copolymerization case study used to compare MWLS and the determinant criterion it is important to discuss how multiresponse parameter estimation methods can be compared. For single response problems , relative goodness of fit is usually established by applying an F-test to a ratio of residual sum of squares. The statistical vahdity of the test is based on the assumption that the measurement errors are independent and of constant variance. But in the multiresponse problem the residuals for different responses are assumed to be correlated and to have different variances. Therefore a simple F test is inappropriate for statistical inference in the multiresponse case. Another means of comparison will therefore have to be considered. It is usually desirable for an estimation method to generate good parameter estimates where 'good' means small biases and small norms of the covariance matrix of the parameter estimates. It is also desirable for the estimation method to yield a good estimate of the covariance matrix of the parameter estimates. Therefore, the quality of the parameter estimates, as determined in a frequentist sense using Monte-Carlo simulation, will be used here as the basis for assessing alternative multiresponse parameter estimation methods.
74
4.1 Case Study The case study described here to compare the MWLS and determinant methods for multiresponse parameter estimation, is an extension of the copolymerization problem presented in section 2. There a model relating triad fractions in copolymers measured by NMR to reactivity ratios was based on terminal kinetics because the reaction rate for the mechanism shown in equation (1) is assumed to be influenced only by the terminal monomer unit on the propagating polymer chain. The data collected by Burke [5] and shown in Table 1 was collected as part of a model discrimination study comparing the terminal to the penultimate model. In the penultimate model, the next to last, or penultimate unit is also assumed to influence the reaction rate leading to the following mechanism:
V+^*-^V/
(19)
This reaction mechanism leads to four reactivity ratios defined by : k
k ^212
%2 _ ^222 .
^
(20)
_ ^122
^^221
"-121
The triad fraction equations for the penultimate model are slight modifications of equations 3 (a) to (c): ^1 = ^21^11/^ + 2r2i/i/2 + Z^'
All =Vnf\ ^12+211
A12 ~ fl
/^i
^21)
^'21717 2^*^1
/"l
Equation (21) represents half the model. The equations for monomer-2 centred triad fractions are obtained by interchanging subscripts 1 and 2. For the penultimate model, the NMR peak assignments and equation (5) remain unchanged. The penultimate model therefore has four parameters to be estimated. As part of a simulation study, Burke performed a ninth experiment, shown in Table 3, in addition to the eight experiments shown in Table 1.
75
Table 3: Additional C^-NMR Data from Burke's Thesis i 9
fi
0.560
X 0.120
Y 0.561
Z 0.320
A 0.173
B+C 0.778
D 0.049
Fitting the penultimate model to the nine sets of data using MWLS resulted in the following point estimates: lnr^^=-0.567 lnr^^=-0.447
(22)
/«r22 =-0.877
lnr^^= 0.041
Analysis of the residuals leads to the following estimates of the error standard deviations and correlation matrix. o-.= (2.41 3.02
3.03 1.99 4.04 1 -0.398 -0.399 -0.066 1 L682 -0.657 1 0.709 Ps = 1
4.30)xl0"' -0.654 0.644 0.416 0.106 -0.107 1
-0.086 -0.427 -0.362 -0.888 1
(23)
The parameter estimates in (22) and the error standard deviations and correlation matrix in (23) will be used as true values in the simulations described below. 4.2 Simulation Study 1: Parameter Estimate Distributions The objective of this first simulation study was to compare the parameter estimate distributions obtained by both MWLS and the determinant method. This was accomplished by simulating a population of parameter estimates using each method. One milHon data sets were synthesized using the parameters estimated with MWLS from the Burke data as true values and with errors multinormally distributed with zero mean and covariance matrix equal to the estimate calculated in the previous analysis (equation 23). Each dataset contained nine sets of triad fraction measurements with the nine values of the independent variable fi used in the original study and
76 shown in tables 1 and 3. Parameters were estimated from each dataset using both MWLS and the determinant criterion with two redundant response variables dropped for the latter. Redundant response variables should be dropped when applying the determinant criterion, since the information in the redundant response variables is implicit in the deviation covariance matrix. Since in MWLS only the diagonal elements of this matrix are used, dropping redundant response variables can result in a loss of information. Furthermore retaining redundant response variables can do no harm because the weights will adjust themselves accordingly. The simulation results from one million datasets are shown in Table 4: Table 4: Simulation results from one million datasets comparing MWLS to the Determinant Criterion Parameter Inrii Inr2i Inr22 Inri2
True Value MWLS
-0.567 -0.447 -0.877 0.041
-0.568 -0.488 -0.874 0.052
Means
DET
MWLS
-0.565 -0.450 -0.877 0.071
0.156 0.133 0.179 0.396
Std. Dev. DET 0.201 0.167 0.242 0.544
While the mean values of the sample distributions of the parameter estimates shown in Table 4 do not suggest that bias is an issue here, the MWLS standard deviations are all smaller than those for estimates calculated using the determinant criterion which is suspect. Theoretically, the determinant of the parameter covariance matrix should be smaller for the determinant criterion. While the estimate of the parameter covariance matrix from a single set of data is given by equation (15), from the results of a simulation study, q sets of simulated data may be used to estimate the parameter covariance matrix Ee from q sets of estimated parameters:
Since determinants which represent hypervolumes, greatly exaggerate small differences in spaces of even a moderate number of dimensions, p, in this work we shall compare on the basis of the generalized parameter variance and standard deviation defined by:
11
E^
and
E.
\lip
(25)
The generalized parameter standard deviation is 0.168 for MWLS and 0.219 for the determinant criterion. Clearly the simulation result is not consistent with the theoretical result that the determinant of the parameter covariance matrix estimate should be smaller for the determinant method. Figures 3(a) and (b) show the sample distributions for two of the parameters rn and r2i. The plots for the other two parameters are similar. The frequency scale is logarithmic here to highlight the tails of the distribution. On this scale, a normal distribution takes on a parabolic shape. Clearly the MWLS distributions (solid curves) are very nearly normal. However the tails of the distributions from the determinant criterion (broken curves) are much heavier than those from a normal distribution. For a linear model, normally distributed errors lead to normally distributed parameter estimates if the weight matrix W is constant. The lack of normality in the sample distributions for the parameter estimates from the determinant criterion cannot be attributed to model nonlinearity however, since this would affect the MWLS distribution results as well. The difference in the distributions must therefore be associated with the weight matrix which is determined from equation (16) for the Determinant method and equation (18) for MWLS. To characterize the variation in the residual covariance as a distribution, it is convenient to consider a scalar function of the residual covariance matrix, the ratio of the condition numbers of the residual covariance matrix and the sample error covariance matrix! The latter is the covariance matrix of the synthetic errors for a given sample:
condition ratio =
condition number
z(e)z(ef /n\
condition number [Z(0* )Z(0* f
^ /n\
(26)
where 9 is the true value of the parameters. The sample error covariance matrix can only be known in a simulation and is different from sample to sample. Taking the ratio of the
78
0.5
Figure 3: Sample Distribution with 95% Confidence Intervals
1
79 Condition numbers compensates for the effect of random variation in the condition numbers of the sample error covariance matrices. The condition ratio is a measure of how much the residual covariance matrix is biased as an estimate of the sample error covariance matrix. By using the condition number, the bias is expressed in terms of ill-conditioning. The higher the condition ratio, the more the residual covariance is biased towards ill-conditioning. Figure 4 shows a distribution, which is nearly symmetrical about zero for the MWLS results. The distribution for the determinant criterion is skewed far to the right in the direction of ill-conditioning. The determinant method finds the point in parameter space that minimizes the determinant of the residual covariance matrix. The determinant is the product of the eigenvalues. The condition number is the ratio of the largest to the smallest eigenvalue. In minimizing the determinant, the determinant method tends to preferentially bias the smallest eigenvalue towards zero. This in turn biases the residual covariance matrix towards ill-conditioning. The lack of normality in the parameter distributions from the determinant criterion can be attributed to the simple fact that the generalized weight matrix is the inverse of a residual covariance matrix that is biased towards illconditioning. MWLS shows no evidence of ill-conditioning. This explains why MWLS can give a smaller value of the determinant of the true parameter covariance matrix than the determinant criterion.
- 2 - 1 0
1
2
3
log condition ratio
Figure 4: Sample Distribution of Condition Ratio
80
4.3 Simulation Study 2: Parameter Uncertainty For a given simulation run, the parameter covariance matrix estimate is given by equation (15) in general, which simplifies to equation (13) for the determinant method. In this simulation study, we try to assess how good the uncertainty estimates are. In this case, ten thousand simulated datasets were generated using the same methodology described in section 4.2. For each parameter, point estimates were calculated, and the parameter covariance matrix estimate was calculated using equation (15) for both MWLS and the determinant criterion. Since this is a simulation study, the error covariance matrix. Eg is known. In this case, the optimal estimation method is generalized least squares (GLS) in which the generalized weight matrix is set to the inverse of the error covariance matrix. Hence for the purpose of simulation, GLS provides a reference standard to which MWLS and the determinant method can be compared. If the number of simulation data sets is large, then equation (24) gives a much better estimate of the parameter covariance matrix than does equation (15). For ten thousand data sets, equation (24) may be regarded as giving the "true" value of Ze. The results of the simulations are shown in Figures 5 (a) - (c). Note that the distribution of the I)^ is very skewed, so that for comparison purposes, the quantity In E^ //> was used to generate more symmetric distributions. Also note that the "true" value of |E^|is indicated by a vertical broken line. For GLS, the distribution is very symmetric, however there is some evidence of bias in the estimates of In S51 \ p . This is evident because the distribution is centered to the left of the "true" value. For MWLS both the "true" value and the distribution shift to the right indicating more uncertainty in parameter estimates. For the determinant method, the distribution shifts to the left and widens, while the "true" value shifts even more to the right. The reason the distribution shifts to the left, is that the Determinant criterion by virtue of its optimality property, gives the smallest estimate of |S^|. However, minimizing the estimate of |Z^| does not guarantee that ^Q\ itself will be minimized. The main conclusion here is that the distribution of the estimates of the parameter uncertainty, as determined using equation (15) is much closer to the reference standard (GLS) for MWLS than for the determinant criterion. This can be attributed to the fact that the determinant criterion biases the residual covariance matrix as an estimate of the error covariance matrix. 4.4 Simulation Study 3: The Effect of Replication In this next study, the effect of replication in the data on the parameter variance was examined. The vector of independent variables was set to: /=f0.219 0.395 0.517 0.792/
(27)
81 As in the previous study, ten thousand synthetic data sets were generated. Model parameters were estimated from each data set using the three methods: MWLS, the determinant criterion, and GLS with the weight matrix set to the inverse of the true error covariance matrix. Again the latter provides a reference, which is idealized, since the true error covariance matrix is only available in simulation.
800
800
(a) Generalized Least Squares
(b) Multivariate Weighted Least Squares
f^ 600 400 200
800 ^
(c) Determinant Criterion
600
Figure 5: Sample Distributions of Parameter Uncertainty Estimates
82
The determinant of the parameter covariance matrix, |E^|, for each of the three methods was estimated from 10000 sets of fitted model parameters using equation (24). To study the effect of the number of sets of measurements, n, on the results, the simulation study was repeated for six full levels of replication; n = 4, 8, 12, 16, 24 and 32. The value of n indicates the total number of data in the data set, so that n = 8 would imply two full replicates; i.e. two data at each value of fi. Figure 6 shows the results in the form of the generalized parameter variance |S(9|
, plotted as a function of n on a log-log scale.
^ -3.5
8 C
CO CD
-4
- Multivariate weiglited least squares
-5
0
-5.5
Q.
•D (1)
-6 -6.5 J
In
-7
En e cc N 0
c -7.5 O) CI)
--0 • - Determinant criterion
-4.5
>
a)
Q
V^^
1
^Nn-
~ '^
~°"— GLS reference standard "v.
J
2.5
1
1
_
^
1
3
3.5
4
4.5
5
5.5
number of sets of measurements (1092)
Figure 6: The Effect of Sample Size on Parameter Variance For GLS (sohd line), the relationship between the generalized parameter variance and n is nearly linear with a slope of -1 (on a log-log scale) reflecting the inverse relationship between the variance and sample size for simple estimation. As n increases, the results from the Determinant criterion (dotted line) converge to the ideal reference line (GLS) because the residual covariance matrix converges to the true error covariance matrix with increasing n. MWLS (dashed line) shows an offset from GLS since the inverse of the diagonal weight matrix does not converge to the true error covariance matrix with increasing n. The graph provides strong empirical support to the argument that the determinant method is only assymptotically optimal.
83
Divergence of the determinant method leads to a hmiting case of n = m-1, where m is the number of response variables. Here the residual covariance matrix is singular and the parameter covariance for the determinant is infinite. A singular covariance matrix does not cause a problem for MWLS unless one of the diagonal elements is zero. Thus the crossing of the two curves for the determinant criterion and MWLS with decreasing n is not a peculiar feature here. In his thesis, Oxby [4] uses two additional examples to compare MWLS and the determinant criterion. The first are the "Alpha-Pinene " problem which analyses the data of Fuguitt and Hawkins [12] and is discussed by Box et al. [6] and Bates and Watts [13] for example. The second is the chemical reaction kinetics example discussed by Box and Draper [1]. Simulation studies similar to those described in sections 4.2 - 4.3 lead to the same conclusions. 5. CONCLUDING REMARKS The simulation studies discussed here show that the Box and Draper determinant criterion's optimality, giving the smallest determinant of the estimate of the parameter covariance matrix, may be of limited practical significance. The simulation study shows that even when the measurement errors satisfy the standard validating assumptions, MWLS can give a smaller determinant of the parameter covariance matrix than the Determinant criterion. The frequency distribution of the parameter estimates show near normal distributions from MWLS, while the distributions from the determinant criterion show relatively heavy tails. The Determinant criterion biases the residual covariance matrix as an estimate of the error covariance matrix in the direction of ill-conditioning. The latter can be severe enough to render the optimality criterion meaningless. MWLS is a more robust estimation method than the determinant criterion especially when the data set is not large. Robustness comes with a price when there is a large amount of data whose measurement error structure conforms to that which validates the Determinant criterion. Then MWLS will give parameter estimates with a larger variance than the determinant criterion. REFERENCES 1. 2. 3. 4. 5. 6. 7. 8.
G.E.P. Box and N.R. Draper, Biometrika, 52 (1965), 33. P.C.B. Phillips, Econometrica, 44 (1976) 449 T.A. DueverandA.Penlidis,Appl. Math. andComp. Sci., 8(4)(1998)815. P.W. Oxby, PhD Thesis, University of Waterloo, (1997). A.L. Burke, PhD Thesis, University of Waterloo, (1994). G.E.P. Box, W.C. Hunter, J.F. MacGregor and J. Erjavec, Technometrics, 15 (1973), 33. A.M. Aerdts, PhD Thesis, Technische Unversiteit Eindhoven, Holland, (1993). D.D. McLean, D.J. Pritchard, D.W. Bacon and J. Downie, Technometrics, 15 (1979) 291.
84
9. Y. Bard, Nonlinear Parameter Estimation, Academic Press, (1974). 10. G. Kang and D.M. Bates, Biometrika, 77 (1990) 321. 11. R.J. Carroll and D. Ruppert, Transformations and Weighting in Regression, Chapman and Hall, (1988). 12. R.E. Fugguitt and J.E. Hawkins, J.Am.Chem.Soc, 69 (1947), 319. 13. D.M. Bates and D.G. Watts, Nonlinear Regression and Its AppHcations, Wiley, (1988).
Dynamic Model Development: Methods, Theory and Applications S.P. Asprey and S. Macchietto (editors) © 2003 Elsevier Science B.V. All rights reserved
85
Model Selection: An Overview of Practices in Chemical Engineering Peter J.T. Verheijen Dept. of Chemical Engineering, Technological University Delft, Julianalaan 136, 2628 BL Delft, The Netherlands. The problem of choosing out of a large set of models is considered here. Five criteria are mentioned: the level of rigor that a model describes phenomena; the accuracy of the model with respect to data; the adequacy of the model for the purpose of simulation, optimization or design; the flexibility of a model; and the computational complexity. The focus is on the purely statistical criteria. A systematic procedure is presented to eliminate sequentially hierarchical models and equivalent models, based on the F-test and Bartlett's x^-test. The use of Bayesian statistics as a tool is reviewed. And recent developments in optimization-based approaches are listed. An overview of available measures for the distance between models is given. It is shown that one should not accept a single model, as is engineering practice, but rather allow for a set of models when the selection does not give sufficient reason to discriminate between models. Examples from the area of reaction kinetics and adsorption are given. 1
Introduction
After engineering insight in a process, the most important step is to be able to make a quantitative model of that process in order to test the understanding gained, or to make use of the model in engineering decisions for design or operation. In the area of kinetics, the typical problem is the level of detail required in the reaction network. Normally, different networks are proposed, that make sense chemically, and a choice must be made on the basis of the available data. The thermodynamic model of a multi-phase, multi-component system is primarily determined by the attributes, e.g. polar or non-polar, of the molecules involved, but also different models are often considered acceptable by the experts. Individual process units can be governed by very simple shortcut models or very complex detailed rigorous models. An example is the heat
86 exchanger described either by one simple area and one heat coefficient, or by a detailed CFD-description accounting for the flow details. Another example is the multi-compartment model of a tank based process unit, such as a crystallizer. The number of compartments is a "design" variable of the model. A second dimension is the use of the model. The science view is that a model should be sufficiently complex, to be sure that all phenomena are described. Scientific work is namely aimed at understanding fundamental phenomena. The engineering view is that a model should be sufficiently simple to describe only the really necessary phenomena. This sets different requirements on the level of complexity. Especially in the latter case, the model user can put limits on the operation ranges — temperature and pressure — in order to enhance insight for design or for computability. In practice it is often difficult to generate many models and the theory often limits the number of possibilities. So, this reduces a potentially complex to one of a simple choice between two of three competing models, and the engineer of scientist only wants to make a choice. A generic approach to model selection is presented here to encompass the whole selection process. The problem statement is: Assume, that data, a set of models, and a purpose for the model is given. Here we present methodologies for the selection of a sub-set of models or a single model that fulfils the model purpose and that is consistent with the data. At the root of this process, lies the need to fit the model to the data. So, it is also assumed that some parameters, /3, are to be estimated from the data, Y. The model is the functional relation, /(x;/3), expressing one or more observables as a function of set point, x, and a set parameters, /?, to be estimated from an experiment. Next to the model and the data also the "error model", i.e. the width of the error distribution, CTJ, should be known at each measurement, Xj. The estimate of the parameters comes from the usual minimization (e.g. Bates and Watts [1]) of
55,es(/3) = V^^i-:^-fe^V.
(1)
with respect to the parameters, /?. The parameter estimate, /3, its covariance matrix, the value of the minimum sum squared, SS^esiP)^ and the calculated responses, f{:s.i\(3)^ are the direct quantitative results. Essentially, we will follow three approaches. Firstly, an inference approach reduces the whole process to a sequence of decisions. The end result is a clear accept or reject for the candidate models. Secondly, the selection problem can be restated as an optimization problem. The selection process reduces to the
87 obtainment of a single candidate model that is found to reach an optimum for some chosen objective. Thirdly, a more subtle evaluation of each alternative is given, by calculating the probability for each of the candidate models and choosing the model with the largest probability given the data.
2
Criteria
In the whole process from model to a quantitative support for analysis and design the following can be distinguished: Physico-chemical criteria are essential. The phenomenon should be described by the right choice of parameters, variables and equations. Flexibihty of a model is interesting in respect to its re-use in various situations. The re-use not only makes the model more useful, but also implicitly tests the underlying ideas. Computational criteria describe the process of computing a model within a reasonable time and within limited memory resources. Statistical criteria evaluate the distance between the model and data. This is governed by the uncertainty of the measurements. In more complex situations also the actual statistical distribution of the data have to be taken into account. Engineering criteria refer to the usefulness of the model. The model should describe the process well enough within the range specified. These criteria are contradictory. In engineering practice, the expedient and direct usefulness is considered more important than any of the other criteria. In a later stage, the modeler then often finds that the whole process has to restart from the very beginning. The investment in models, whose rigor, flexibility and computability have been optimized, is much larger, but it has a long-term benefit. Once the degree of rigor, the range of complexity is set, and the models can be computed, only the statistical criteria guide with the remaining questions of adequacy and selection. Linhart and Zucchini [2] give a summary of statistically relevant measures to express the distance between model and data. Uncomplicated measures, which are especially found in the systems identification literature [3], are based on
88 the sum of squares in its various forms, Error variance
s^es
= SSres/in —p)
Akaike's Final Prediction Error FPE = s^ n±2±i resn-p-1
Akaike's Information Criterion AIC Shortest Data Descriptor
/<^\
= n \II{S'^Q^) + 2p
SDD = ln(5?gj + (p + 1) ln(n).
The simplest and most powerful remains the error variance. The final prediction error and the shortest data descriptor are both designed to favor models with less parameters. Their usefulness hes in the fact that they emphasize the robustness of the predictions done with the models. Akaike's information criterion more closely follows the scientific paradigm to retain information. This gives models with a higher number of parameters an advantage. A second set of criteria is based on a general norm: Li,L25 • • • ^oo, where of course the L2 norm is related to sum-of-squares criteria above. These norms are useful in parameter estimation as they represent difference distance measures, which can be adjusted to suit circumstances. In model selection they are less useful as an interpretation of the criteria is lacking. Also, the higher-order norms emphasize the larger residuals, and are therefore comparing different models on just a few locations in the response space, rather than the models in a specified domain. A third set of criteria is based on statistical measures. The likelihood or the loglikelihood function is directly related to the maximum likelihood estimator, and it is also related to the Akaike's information criterion [3]. Lastly, the Bayesian criterion of posterior model is in fact the model probability, which is the subject of a separate section below. The pragmatic approach is to take one of the measures in eq. (2).
3
Inference approach
Statistical inference starts from a hypothesis and relies on a framework to come to a clear decision. The hypothesis is rejected, or is not rejected. In order to apply this to the model selection issue, the whole process is to be divided in stages. Here, we will define three such stages. In the first stage, each model is compared to the data, and tested on adequacy. The second stage identifies the common situation that there exists sets of models related to each other but only differing in complexity. Here, tests of model reduction to choose an appropriate level of complexity can be performed. The third stage puts the
89
Model Adequacy I Model Adequacy Accept/Reiea Model Accept/Reject Accejii/ivejecij i Accept/ivejeqi j""1 Accept/Reject Model Adequacy Model
Nested Models Set complexity
Accept/Reject Nested Models Set complexity
f
Nested Models Set complexity
1 — ~ Independent models limit possibilities Rank models Fig. 1. Decision tree for selecting models by inference. The four levels are treated in sections 3.1, 3.2, 3.3 and 3.4, starting from the top remaining models together and seeks a test that further reduces this set. The top three levels of Figure 1 illustrate the three stages.
3.1
Model adequacy
We are considering here the process of rejecting or accepting a single given model with the given data. Physico-chemical insight should be the first criterion to be used. Even a simple polynomial approximation assumes that the response variable fulfils continuity properties. However, this leads to the general question of model validation which also on the specific discipline, e.g. Murray-Smith [4]. Simple actions are: • Visual inspection of particular responses before and after fitting the data. The opinion of experts, or the response of knowledge management systems. • Degeneracy tests, i.e. defining limiting cases such as extremely high and low temperatures and simulate the model in those regions. • Comparison tests, where model simulations are contrasted to e.g. short-cut models. If we further assume, that the model at least fulfils its engineering purpose, there remain statistical measures to consider. Known measurement errors allow the well known x^—test. Repeated measurements are the basis for the lack-of-fit test. Residuals can be tested whether they reject the assumed distribution, often
90 the normal distribution. M e a s u r e m e n t s in time-series can be tested on independence between residuals such as applied in system identification [3]. In the last decade, it is becoming accepted practice to reconcile data [5] with the constraint that mass balances should always be fulfilled. This leads to better estimates at the measurement positions, but this process allows also for intermediate lack-of-fit testing, as the mass balance hypothesis can be tested as well. Ideally speaking, a model should be rejected when it fails one of the statistical tests. However, a model is often accepted in practice, especially when the alternatives are not much better. In that case, it should be realized that the statistic defined in eq. (1) has no statistical meaning, and is nothing more than a distance measure of equal footing with any other distance measure, especially the L2-norm.
3.2
Nested models
In this case, between the models exist a hierarchy such that one model is a special case of an extended model. The obvious example is the class of polynomials, where the appropriate degree should be found. The Pade approximations are the natural extension of the polynomial class, but they are often overlooked. Within each class a smaller model can be seen as a simplification of a more complex model. It is a single power term that can be added or deleted ^ The virial expansion in thermodynamics has a similar property, and in kinetics it is the gradual increase of the reaction scheme that makes it fortuitous to pose these as nested models as well. Here we consider one class of nested models, and the method is to come to one single candidate model by the method of continuous elimination by inference. In the engineering approach — aiming for simple models — the fits of two nested models can be compared. We start from the simplest model and add terms until the statistic
{SSUkp^Pi)) SSUp{p)))/p - ^Fa{pi,n-p-pi), SS,es{P{p^pl))/{n-p-pi)
(3)
is inferred not to reject the last outcome with the given significance level a. Here p is the number of data of the lower order model, pi are the added number of parameters, and n is the number of data. Note that the class of polynoriiials is a special case of the Pade approximations.
91 In the science approach — aiming for the complex models — we start from the most complex model that is commensurate with the given number of data. The statistic
(5g.e.(^(p+pO)-gg.e.W))M ^
^^ )
(4)
is evaluated. The result of each this process is a single model for each set of nested models. 3.3
Independent models
Each of the classes of nested models, or each stand-alone model that cannot be related to any other model is considered as independent or non-nested models. The result of the fit process, -SiSres? for each of the models should have a known distribution, the x^—distribution after scaling with the yet unknown error variance, <7^. The assumption of complete independence of this set of models is a rather abstract and not a pragmatic proposition. However, with this assumption Bartlett's x^-test of homogeneity of variances can be applied. Namely,
J:^=i{n-Pm)HsL/sl) 1 + 3(M-1)
2^m=l
n-pr
^^^^M-ll
E ^ ^ 1 n-pm M
(5)
J
where M is the number of models, Pm is the number of parameters for each model m, 5^ is the error variance of model m, and 5^^^ the error variance based on all error variances together. Eq. (5) can be evaluated and if it leads to rejection the model associated with the largest error variance is deleted from the set. This process is repeated until a set is obtained which does not fail the inference test. It can be reasoned that this elimination process is not exhaustive but, more importantly, it will not eliminate the most appropriate model. The result is small set of models that cannot further be distinguished. 3.4
Decision process
There are then three clearly defined stages in the inference approach. The model adequacy stage allows considering physico-chemical criteria, and leads
92 ~T
1
1
1
1
1
1
T"
4 Models 3 Models
.^ 5 ^ /I
2 models
8
10 12 Experiment No.
14
16
18
20
Fig. 2. A hypothetical case of 11 data points fitted with models containing from 1 to 10 parameters. The simulation data were based on a model containing 4 parameters. to a set of models and classes of models. The model reduction phase with nested models allows the introduction of engineering criteria. Choosing one model within each class further reduces the model set. At the end the remaining independent models can be further restricted by mutual comparison. If still some candidate models remain, there is no further possibility to come to a single model. It remains to rank the models with respect to some measure. The measures mentioned in eq. (2) are the preferred candidates, as they are easily calculated. Figure 2 illustrates this schematically. It is clear that the shortest data descriptor favors a model with less parameters, while the error variance and AIC tend to the more complex models. The structure of the three decision stages with the final ranking is shown in Figure 1.
4
Optimization approach
A more forthright approach is to define a criterion such as given in section 2, and simply choose the model that has an optimum for the given criterion.
93 The ranking given in the previous section is in fact an optimization apphed after the set of models is reduced by inference. The extension is to evaluate the criterion to all candidate models and simply choose the best. Physico-chemical criteria can be taken along, by applying model adequacy test as described in section 3.1. Optimization, by postulating all possible models and finding the global optimum among those, has the advantage that it connects to models based on superstructures. An example is the description of reaction networks. All possible reactions can be suggested or automatically generated. These are incorporated in an overall model that is to fit the available data. The variables are typically the reaction constants expressed as pre-exponential factors and reaction energies. Added to this are binary variables that determine the absence or presence of certain reactions. Physico-chemical criteria can be taken along by adding constraints or weight functions. Such a problem is therefore restated as an MINLP optimization problem. For example, a model is given with p parameters and pi < p parameters are multiplied by a binary variables Zk^ k = 1.. .pi, to account for the presence, Zk = I, 01 absence, Zk = 0, of phenomenon k characterized by parameter pkWe specify e.g. the error variance of eq. (2) as objective. The optimization problem is then
y rain— /3,z
(Yi-fiKi-Az)Y ^ ^. n-Ek^k
(6)
Petzold and Zhu [6] reported such an approach for chemical kinetics problems. Their objective function was the simple Z/2-norm. They reformulated the problem with extra non-hnear constraints, which can be interpreted as penalty functions to constrain the sequential optimization process. Also, Edwards et al. [7] heuristically defined additional equality and inequality constraints. Both groups focus on methods to compute the MINLP solution, which is not trivial. Edwards et al. [7] noted that sofar it is proven that the optimization approach is viable, and that it can only improve with the introduction of evolving MINLP algorithms. In the choice of objective function the purpose of the model can be somewhat expressed. As mentioned before, an FPE and AIC (eq. (2)) emphasize either simplicity or complexity. Model selection in the optimization approach leads to a single candidate model.
94 5
Bayesian approach
In statistics each event has a probabihty associated. Such an event is the assignment of a model to a phenomenon, i.e. model selection. Assume a finite set of models with index m = 1 . . . M and initial information, / . A probability, p ( m | / ) , is assigned to each of the events that model m is the valid model, such that M
V p ( m | / ) = 1.
(7)
Any prior physico-chemical information can thus be quantified in this set of probabilities where the engineer or scientist can incorporate existing knowledge in what is effectively a model probability. In the absence of any prior information, it is assumed that all model probabilities are equal and so p{m\I) = 1/M. Syvia [8, chapters 4 and 5] gives a readable description of the considerations. Any new experiment or new data can then be seen as added information that changes the model probability. It is straight forward to calculate the probability or likelihood, p(yyi+i|m,/), of the realization, y^4-1, assuming model m is right. Bayes' rule then allows the evaluation of the model probability p{m\n + 1,/) after n + 1 experiments based on the prior model probability p(m|n,/). p(yn+i|m,/) x p ( m | n , J ) p{m\n + 1, / ) = ^ M ^/,, 1^ n .. ^ / ^ i ^ n^ Em=i P{yn+i \m, I) X p(m|n, / ) '
(^)
where p(m|0, / ) = p{m\I). The summation is necessary as often the likelihood only can be calculated. So starting with the initial probabilities after all A'' experiments are finished a posterior probability, p(m|A/',/), is evaluated p(m|0, / ) =^ p(m|l, 1)=^ .,.=^ p[m\N, I).
If one of the models, m, has a probability, p{m\N,I) exceeding a predefined level, this is then equivalent to the method of hypothesis testing. The advantage of Bayes' approach is that in the case of models that cannot easily be discriminated, a quantitative measure is given, probability, which is easily interpretable and comparable with other requirements of the model.
95 6
Experimentation for model selection
Experiments can be designed beforehand in order to improve the model selection process. Firstly, the set of possible experiments should be known. Secondly, a set of models that need discrimination are to be defined, either as the result of an engineering process or as the result of a model selection procedure applied to existing data. The question is to evaluate each of the experimental designs to best discriminate between the given models. Imperative in the experimental design process for non-linear models is the existence of an estimate of the given parameters, either from prior knowledge and expectation of the parameter values or on the basis of model fits from, for example, previous experiments. Atkinson and Donev [9, chap. 20] deals with this by postulating that one model with known estimates for the parameters is true, and then the parameters of all other models are determined by least squares minimization. This is a vahd procedure to obtain parameter estimates for all models as a basis for the estimated responses. The lack-of-fit sum of squares also allows hypothesis testing as in section 3. The standard approach is to define a discrimination criterion, D, which describes the distance between the estimated responses, /(x^+i; P) of all models, and to maximize this with respect to the design variables, x^+i. maxi?(xn+i),
(9)
where n + 1 indicates the next set point. The search space is limited to the space of possible experiments. The theory can be extended with the same ideas to find more set points simultaneously, but then also the specific experimental circumstances should be accounted for. These circumstances determine whether a single or a set of set points is measured with a single experimental effort. A relatively simple discrimination criterion ignores the uncertainty in the estimations. The Lfl-norm, M-l
M V
Da{yin^l)=y^ m=l
\fm{Xn+l]
Pm) - fk{Xn+l]
Pk)[
,
(10)
k=m-\-l
allows to find those set points where the models diverge most. It is easily interpretable and used without much difficulty. When a = 2, this is called the T-optimal design in the procedure of Atkinson and Donev [9], who give a clear treatment of this subject.
96 The obvious extension of the criterion above is to use the sum of squares, a = 2 and weigh the sum with the available variance, 7-^ / \ \ : ^ v ^ \Jm\p^n+l] Hm) ~ Jk\^n+l] Pk)) ^2VXn+ij = > > 2 m=l k=m-\-l ^m,k,n+l
'
/., ^ \ \^^)
where the denominator accounts for the uncertainty in the responses. It comprises the two uncertainties due to the estimates of the two models concerned, and also the measurement variance, cr^^.!, ^m,fe,i = var(/^(xn+i; ;5^)) + Yar{fk{xn-^i] Pk)) + a^+i-
(12)
The measurement variance depends sometimes on the response, /(x„_{_i|/3^) or the set point. In the absence of detailed information a constant value can be substituted or it can be neglected. An extension to the class of divergence criteria above is to account for the model probabilities introduced in section 5. M
M
Dsip^in+l) = J2Pi'^\^) Yl m=l k=l,k^m
Dm,ki^n-\-l)^
(13)
where Dm,k{'^n+i) is one of the divergence measures specifically evaluated between the responses of model m andfcat set point x^_j_i. Finally, this summary ends with the entropy measure, which goes back to the work of Reilly [10]. Entropy is defined as — I]y^p(m|n, / ) ln(p(m|n, / ) and this should be reduced with each subsequent experiment. This leads to a divergence criterion:
171=1
where y is the response at the next set point and ^rn(y) its probability density function given model m. Burke et al. [11] compared a few divergence criteria on copolymerization experiments and they concluded that a measure similar to eq. (11) proved eff'ective closely followed by the entropy measure of eq. (14). It is estimated that the number of experiments could be reduced by a factor of two. Recently, Singh [12] developed a more general divergence measure for multi-response data, in order to take into account correlation between responses.
91 7 7.1
Examples and practical considerations Ammonia
An extensive example of the application scheme in Figure 1 is given through a study of isotherm models for the adsorption of ammonia by Helminen et al. [13]. They considered 16 different isotherm models and three different 5 different sorbents. Here we will focus on the results of zeolite 13X [13, Tables 4 and 5], where the Henry model was inadequate. The basic model gives the amount of adsorption, g, as a function of pressure, p, and temperature, T,
This actually presents the Langmuir-Freundlich (LP) form and the Langmuir (L) form. In the latter case the power, a, equals unity, a = 1. Secondly, both forms have five different temperature dependencies of the saturation adsorption, ^s, which equals, gso, ^1 + ^ , ^1 exp f - ^ j , qso{l - OLT) and g^o e x p ( - a T ) ,
(16)
respectively. These models are named L F l , . . . , LF5. Next to these 10 models there are 2 based on the Margules theories (VS-Ml and 2), and one each on the Wilson theory (VS-W), the Flory-Higgins expression (VS-FH), and the Dubinin-Astakhov isotherm (DA). There were no data given to verify model adequacy in each individual case. For example, an estimate of the experimental accuracy was missing. The second step is model reduction with the F-test. In each case, the more complex Langmuir-Freundlich model could not be rejected in favor of the simpler Langmuir model. Also the more complicated Margules model could not be rejected with the approach taken in section 3.2. The results are summarized in Table 1. The Bartlett-test (eq. (5)) actually shows that all models except for the four Langmuir-Freundlich models can be rejected. Strictly, following the argument of section 3 the end here is with four undistinguishable models where even a further ranking has no influence. Here, it is necessary to reconsider the four remaining models. In fact, they only differ because the temperature dependence is weak and each has a slightly different approach representing the dependence by a straight hne, an exponential
98 Table 1 Summary of fit results after model reduction with nested models and 32 data points. Basic data are from Helminen et al. [13] Model
No of
^3
FPE
AIC
SDD
parameters LF2
5
0.0361
0.0528
-96.29
-85.49
LF3
5
0.0361
0.0528
-96.29
-85.49
LF5
5
0.0361
0.0528
-96.29
-85.49
LF4
5
0.0392
0.0573
-93.65
-82.85
DA
3
0.0924
0.1188
-70.21
-62.34
VS-W
7
0.2016
0.3450
-36.40
-22.67
VS-FH
5
0.2070
0.2946
-41.25
-30.45
VS-M2
5
0.3612
0.5279
-22.59
-11.79
decrease, etc. A physico-chemical argument could be that the describing function for Qs should always have a finite positive value. This leaves only LF2, where g^ = ao + ai/T and ao, ai > 0, as the desired model. ^
7.2
Reaction lumping scheme
[14] reported the results of experiments on CHO-reactions taken in a reactor. It concerned decalin cracking, which could be described with a three-reaction model or a four-reaction model. The data were the actual GLC-spectra. The problem was that the different reaction schemes that were proposed, were based on different lumping schemes. Depending on the lumping scheme different species were taken together in one pseudo-component. This meant that, depending on the model, in the GLG-spectra different areas were taken together. This is a rather unusual situation. As the areas were input to the fitting process, each of the different models had different data, but these were intrinsically from obtained from the same experimental data. The straightforward application of the decision tree (Figure 1) could not be done. Actually, a naive analysis assuming constant errors was first attempted which led to ambiguity. The introduction of a sound error analysis of the areas in the GLC-spectra effectively solved the model selection problem. Besides a scale factor, the same absolute and relative errors could be apphed to all 36 or 45 data. The three-reaction model was then heavily favored, as S'S'res/(^3 — ^ Helminen et al. [13] preferred the model LF5. They considered mostly the coefficient of regression.
99 Pa) = 14 against SSres/i^A — PA) = 113 for the four-reaction model. From a purely statistical point of view, the weighted sum-of-squares indicates that both models should be rejected in the model adequacy stage. Nevertheless, the more pragmatic practitioners accept these misfits as reasonable, especially as the lumping approach might lead to large errors.
7.3
Error model and model error
A recurring problem in chemical engineering is the tradition that the experimental errors are not well determined from the measurement systems. So, the errors intrinsically assumed in eqs. (1) and (12) is unknown. Secondly, nearly all models are approximations of the reality and also the size of this approximation is unknown. The resulting sum-of-squares, SSres, is a superposition of a pure error component, SSpey and a model error component, SSme^
There is a small possibility to investigate this. If all measurements are n times replicated, the error component should decrease with 1/n. In some circumstances measurements are indeed repeated. As illustration, this is done here with a repeated simulation of the case given in Table 1 for the three models LF3, LF4, and LF5. Model LF2 is the base model for the Monte Carlo simulation. The result (Figuur 3) shows that the limiting sum-of-squares, 1/n —> 0, leads for the three models to a model variance of respectively, 7%, 75% and 50% of the total variance. Strictly, speaking Figure 3 shows that ideally speaking the model error is significant for models LF4 and LF5, while for models LF2 and LF3 the measurement variance dominates. Ideally, in the first case discrimination measures should be based on the simple La norm, such as in eq. (10). In the latter case, the full eqs. (11) and (12) can be applied. Of course, the choice will be either one of the two discrimination criteria for the total selection problem. In that case, it is advisable to take at least a rough estimate for the model error, as this slightly improves the selection procedure.
7,4
Time series and multi-variable data
Model selection is a major activity in time series analysis as a support for model-predictive control [3]. The problem here is exactly as stated in the introduction. Namely, given the data of a time-series, determine the regression
100 ;
/^Z^^'''<>\
1
'
•
'
'
/ -
0.9 0.8 h
5
0.7
^
0.6
8/
/ 41
I 0.5-h 6 I 0.4
:
/ 1
CD
0.3 h
0.2 -
\
,
/ 1
0.1 1
300
310
320
330
9
-^""^
^
1
1
1
1
1
1
340
350 T(K)
360
370
380
390
400
Fig. 3. Illustration of variance reduction with replicated measurements, n is the number of replications. model that describes the data and that can be used for control purposes. Inputs are, typically, the autocorrelation functions of the signals. Outputs are blackbox models, which are normally a severe but adequate simplification of reality. The criteria of eq. (2) are guidelines. A further step is to consider cross-correlations between various time series. The associated cross-correlation matrices contain the information for principal components analysis, PGA [15]. Standard matrix decomposition techniques are the basis for a dimension reduction, which helps to focus on the important aspects of a model for the whole system. This separation between components describing the core of the model and insignificant components can also be formulated as an inference problem for model reduction. An extension is the consideration of the sensitivity matrices of process systems. The same decomposition technique can be applied namely. Rankin and McCormick [16] give an example of an application in the field of reaction kinetics.
7.5
Ammonia: experimental design
The ammonia data discussed in the first part of this section provides an opportunity to illustrate the design of experiment for model selection. The parameters of model LF2 [13, Table 3] were used as the "correct" model. The
101 1
1
1
— y^y^\\
--r-—
-1 •
I
1
1
/
0.9 0.8
1 ^/
3
0.7
^ E
0.6 r
^
0.5
\
/
/
\
4 j
O
\
1
\\
\
0.4
/
\\
//
' '
r
E •g 0.3
0.2 -
\
.
300
\
1 9 ^
0.1 310
77
J
1
1
1
1
1
1
1
1
1
320
330
340
350 T(K)
360
370
380
390
400
Fig. 4. Discrimination criterion as function of measurement temperature. The numbers near the curves refer to the experiment number in Figure 5. assumed experimental error was 0.1. An experiment consists of measuring at one temperature, between 298 K and 398 K, and at one fixed series of eight pressures that were directly taken from the original paper ^ .The start point was the simulation of four measurements at four different temperatures. The purpose was to determine the sequence of temperatures for experimentation in order to optimally discriminate between the four independent candidate models, LF2, LF3, LF4 and LF5, of eq. (16) and Table 1. Figure 4 shows the discrimination criterion mentioned in eq. (11) as function of the temperature after 1, 4, 8 and 19 extra experiments. Obviously, according to eq. (9), at first one should measure in the mid-range and at the end at the highest possible temperature. Thus following this recipe, the sequence depicted in Figure 5 was obtained. The Bartlett test statistic (eq. 5) is used here. It namely gives also a clear decision about when to remove a model from the allowable set. Here, after 3 extra experiments LF4 is ehminated, after 7 experiments LF5, and finally LF3 was removed after 19 extra experiments. The sequence and number of experiments needed for reducing the choice to one model was undetermined, but invariably the simulation model was detected as the remaining model. Contrary to the original paper, four in stead of five parameters were allowed to be estimated. The fifth one, AiJ, would make the system badly ill-conditioned. This can be explained by considering the competing temperature dependencies in the Langmuir-Freundlich formula used.
102 9
~i
r
n
T"
n
4 Models
8 7
3 Models
6 o ^ 5
S
CO W A
2 models
H 3
0
2
4
6
8
10 12 Experiment No.
14
16
18
20
Fig. 5. Test statistic in Bartlett's test, eq. (5) as evaluated after each "experiment". 8
Discussion and conclusion
Model selection is based on the premise that models exist, and that they can be fitted to data or at least be compared with data. An overlooked aspect is that also the error model should be given. This is essential for the inference and Bayesian approach of model selection, as that is mostly based on knowledge of the statistics involved. However, actual practice is still that models are rough approximations of reality, although the use of more rigorous models in chemical engineering becomes more wide spread. A careful weighing of model error versus experimental error determines the criteria used in especially the optimization-based selection procedures. The decision scheme (Figure 1) in the inference-based procedure is a good engineering tool, as it translates model selection to a clear-cut procedure. Also the optimization-based selection procedure has this advantage. The first has more possibilities to deals with physico-chemical model requirements, while the latter lends itself more for superstructure-based models [17, chap. 20]. The Bayesian scheme of model selection is best in quantifying the relative
103 benefits of each of the models. At the end the user has the possibihty to make a final selection decision based on knowledge of the probability associated with each model and his expert knowledge of all criteria involved. This is thus more suitable for a scientific environment. The use of the optimization approach appears still limited to the few references discussed in section 4. Its application might be hampered by the numerical and algorithmic challenges. There are here no cases reported of model selection using Bayes' approach. Reports about its use are dated as the citations by Buzzi-Ferraris [18] and Burke et al. [11] show. In the statistical literature the interest is still continuing in this approach. Chickering and Heckerman [19] suggest some new approaches where also a distinction is made between engineering and scientific criteria. In normal practice, it is expected that a single model is chosen. As must be clear from above, the inference-based approach may lead to a set of statistically valid models. In the Bayesian approach actually no choice is made, although a user might define a probability level as a lower bound for [13] a valid model. Similarly, the optimization-based selection could use a deviation from the global optimum in order to define a finite set of models. That has its difficulties as each of the objectives used has a specific interpretation, which does not lend itself for defining well argued deviations.
References [1] [2] [3] [4]
[5] [6] [7]
[8] [9]
D.M. Bates and D.G. Watts. Non linear regression and its application. John Wiley, 1988. H. Linhart and W. Zucchini. Model Selection. John Wiley, 1986. L. Ljung. System identification: theory for the user. Prentice Hall, 1987. D.J. Murray-Smith. Methods for the external validation of continuous system simulation models: a review. Mathematical and Computer Modelling of Dynamical Systems., 4:5-31, 1998. J.A. Romagnoli and M.C. Sanchez. Data processing and reconciliation for chemical process operatiosn. Academic Press, 2000. L. Petzold and W.J. Zhu. Model reduction for chemical kinetics: an optimization approach. AICHE Journal, 45:869-886, 1999. K. Edwards, T.F. Edgar, and V.I. Manousiouthakis. Reaction mechanism simplification using mixed-integer non-linear programming. Comput. Chem. Eng., 24:67-79, 2000. D.S. Syvia. Data analysis: a Bayesian tutorial. Oxford University Press, 1996. A.C. Atkinson and A.N. Donev. Optimum experimental designs. Clarendon Press, 1992.
104 [10] P.M. Reilly. Statistical methods in model discrimination. Can. J. Chem. Eng., 48:168-173, 1970. [11] A.L. Burke, T.A. Duever, and A. Penlidis. Discriminating between the terminal and penultimate models using designed experiments: an overview. Ind. Eng. Chem. Res., 36:1016-1035, 1997. [12] S. Singh. Model selection for a multi-response system. Chem. Eng. Res. Des., Trans IChemE pari A, 77:138-150, 1999. [13] J. Helminen, J. Helenius, E. Paatero, and I. Turunen. Comparison of sorbents and isotherm models for nhs-gas separation by adsorption. AICHE Journal, 46:1541-1555, 2000. [14] M.P. Helmsing. FCC Catalyst testing in a novel laboratory riser reactor. PhD thesis, Dept. Chem. Engineering, TU Delft, The Netherlands, 1996. [15] W.F. Ku, R.H. Storer, and C. Georgakis. Disturbance detection and isolation by dynamic principal component analysis. Chemom. Intell. Lab. Syst, 30:179-196, 1995. [16] S.E. Rankin and A.V. McCormick. Hydrolysis pseudoequilibrium: challenges and opportunities to sol-gel silicate kinetics. Chem. Eng. Sci., 55: 1955-1967, 2000. [17] L.T. Biegler, I.E. Grossmann, and A.W. Westerberg. Systematic methods of chemical process design. Prentice Hall, 1997. [18] G. Buzzi-Ferraris. Planning of experiments and kinetic analysis. Catalysis toda?/, 52:125-132, 1999. [19] D.M. Chickering and D. Heckerman. A comparison of scientific and engineering criteria for bayesian model selection. Stat. Comp., 10:55-62, 2000.
Dynamic Model Development: Methods, Theory and Applications S.P. Asprey and S. Macchietto (editors) © 2003 Elsevier Science B.V. All rights reserved
105
Statistical Dynamic Model Building: Applications of Semi-infinite Programming Steven P. Asprey^ ^Centre for Process Systems Engineering, Imperial College of Science, Technology and Medicine, London SW7 2BY, United Kingdom
We consider the semi-infmite programming (SIP) problem and its applications in dynamic nonlinear model building. Li particular, applications of SIP include parametric identifiability testing, parametrically robust model discrimination (distinguishability), as well as parametrically robust design of experiments for parameter estimation. The question of identifiability of a model is to decide, based on the mathematical structure of the model itself - before supporting experimental data have been gathered and analysed - whether all of the parameters within the model can in principle be uniquely identified from data. Model distinguishabihty arises when more than one model is proposed to describe the same experimental system, and we wish to determine the best model from the candidate list. Design of experiments for parameter estimation and precision improvement determines optimally informative experiments to be performed through parametric sensitivity optimisation. Due to model nonlinearity, iterative methods are used to design experiments; however, the quality of these designed experiments can be adversely affected by poor starting values of the parameters. Thus, there is a need for a mechanism to ensure designs that are insensitive ("robusf) to these starting values. In this chapter, we pose the above problems in a semi-infinite programming framework for nonlinear dynamic process models. We provide a thorough description of the optimisation framework, detailing a solution algorithm for solving the semi-infinite-dimensional problem. For cases in which time-varying inputs to the process are allowed, all problems above are cast as optimal control problems. Within this framework, one can calculate optimal fixed and variable external controls with input constraints, and initial conditions of a dynamic experiment. To mathematically represent time-varying external controls to the process, we use the control vector parameterisation (CVP) technique, with either piecewise constant, piecewise linear, or piecewise quadratic functions with zeroth or first-order continuity.
106 1. INTRODUCTION Detailed mathematical models are increasingly used by engineers in academia and industry to gain competitive advantage through such applications as model-based process design, operations, and control. Thus, building and verifying high quality steady-state or dynamic, single or multiresponse mechanistic models of processing systems are key activities in Process Engineering. These activities often involve an iterative process consisting of several steps such as performing initial experiments, sensitivity analysis, parameter estimation, model adequacy tests, and experimental design for further experiments. More often than not, performing just one experimental run or simulation to produce data can be costly, both in terms of time and money. Thus there is a distinct need for systematic methods of planning experiments so as to minimise any incurred loss and to maximise the information content of the experiment. Despite the importance of such an activity, there has been very little work in the past on the application of experimental design techniques to design dynamic experiments for nonlinear situations, of which most was in the late 1980's and early 1990's (Espie, 1986; Espie and Macchietto, 1989; ZuUo, 1991) for models consisting of differential-algebraic equations (DAE's). More recent work has begun on this topic (Korkel et al, 1999) that is finally receiving overdue attention in both industry and academia. In this work, we present applications of semi-infmite programming within a general, systematic procedure to support the development and statistical verification of nonlinear dynamic models (Asprey and Macchietto, 2000). Within this procedure there are a number of statistical tools used to address several key problems, such as structural identifiability and distinguishability testing, and optimal design of dynamic experiments for model discrimination and improving parameter precision. The question of identifiability is to decide, based on the mathematical structure of the model itself and a proposed set of variables to be measured - before supporting experimental data have been gathered and analysed - whether all of the parameters within the model can in principle be uniquely identified. The question of distinguishability is to decide whether, given two or more model structures, each can be structurally distinguished from one another. Following a brief discussion of currently available methods, we present a new optimisation-based identifiabihty test (Asprey and Mantalaris, 2001). For problems involving time-varying inputs {i.e., input trajectory optimisation), the problem is cast as an optimal control problem. Within this framework, one can calculate optimal sampling points, final time, fixed and variable external controls with input constraints, and initial conditions of a dynamic experiment. To mathematically represent time-varying external controls to the process, we use the control vector parameterisation (CVP) technique (Vassiliadis et al, 199A), with either piecewise constant, piecewise linear, or piecewise quadratic functions with zeroth or firstorder continuity. Potential applications of these techniques in all engineering disciplines are abundant, including apphcations in biochemical engineering, tissue engineering, polymer reaction engineering, chemical kinetics and reaction mechanism elucidation, and physical properties estimation. We provide example apphcations of SIP; in particular, the identifiability of an unstructured animal cell culture is presented, along with the design of robust experiments for
107 a fermenter model, demonstrating the power of the proposed techniques in their ability to reduce the quantity of experimental work required, while increasing the quality of the results. The remainder of this chapter is organised as follows. In section 2, we present an overall dynamic model building framework, followed by a general mathematical statement of the semi-infmite programming problem in section 3. Section 4 details appUcations of SIP within the statistical model-building framework, including solution methodology and algorithms for global parametric identifiability testing using an optimisation-based approach, as well as robust experiment design for parameter precision. Example applications of the methods to real process systems and chemical engineering problems are given in section 5, detailing the global parametric identifiability of a dynamic model of hybridoma cell culture, and the robust design of dynamic experiments for parameter precision in the fed-batch fermentation of baker's yeast. Concluding remarks are given in section 6.
2. THE MODEL BUILDING PROCESS When building models whose use is to explain an observed phenomenon, one uses a priori knowledge such as physical, chemical or biological "laws" to propose (conceivably several) possible models. In each case, these laws dictate the model structure, and we may wish to know whether one or more such structures are adequate for the problem at hand. These models contain parameters that may have physical meaning, and we may wish to know if it is at all possible to determine their values, and, if so, to do so with maximum precision. In what follows, we therefore consider general deterministic models in the form of a set of (possibly mixed) differential and algebraic equations:
where x{t) is a w^-dimensional vector of time dependent state variables, nit) is a ficdimensional vector of time-varying controls or inputs to the process, w is a wrdimensional vector of constant controls, 0 is a P-dimensional vector of model parameters to be determined within a continuous, reahsable set 0 , and y{i) is an M-dimensional vector of measured response variables that are a function of the state variables, x{i). In most cases, g(x(0) will simply be a "selector" matrix, selecting those state variables that are in fact measured. In this general model formulation, we can envision a model taking the form of strictly algebraic equations (AEs), strictly differential equations (DDEs), or a mixed set of differential/algebraic equations (DAEs). Extension of this framework to partial differential algebraic (PDAE) systems is straightforward. Asprey and Macchietto (2000) presented an overall model building strategy, depicted here in Figure 1 for completeness. The first step to this strategy involves the user specifying one or more models to describe the process at hand. In Stage I, the strategy then uses these models to perform preliminary structural identifiability and distinguishability tests before any data are collected. Reasons for determining these conditions may include:
108 (i) (ii)
(iii) (iv)
the model parameters may have physical meaning, and we simply wish to know if it is at all possible to determine their values (Ljung and Glad, 1994); during parameter estimation, degeneracy of the Hessian matrix (the second-order derivatives of the likelihood criterion with respect to the parameters) may cause problems in numerical search procedures when the parameters are not unique Ljung and Glad, 1994); unnecessary (and costly) experiments can be avoided; mathematical tools may help guide selection of appropriate experiment measurement variables for the purposes of model and parameter identification.
In Stage II, the next step in the strategy involves designing experiments for model discrimination, followed by parameter estimation and model adequacy checking to determine whether or not any of the proposed models can be rejected, with the final purpose of selecting the "best" model from the given set. Once we are left with the "best" model, we then go on to Stage III to design experiments for improving the precision of the parameters within that model, eventually arriving at a final, statistically verified model formulation and its most precise parameter values. Preliminary Analysis
Propose ]Vkxiel(s)
Model Discrimination
Parameter Precision
Design of Experiment(s)
Structural Identifiability
Local Sensitivity Analysis Global Sensitivity Analysis
Final Model
Fig. 1. The overall model-building scheme.
109 3. SEMI-INFINITE PROGRAMMING The semi-infinite programming problem can be written in its most general form as: min/(x) s.t.
xeX
(2)
G(x,y)<0, VyeY X = {xG9^"|g/(x)<0, i = l,..,k}
where X and Y are nonempty compact subsets of 9^" and 9^^ and/and G are continuously differentiable on X and XxY respectively. The term semi-infinite programming derives from the property that x denotes finitely many variables, while Y is an infinite set. In any case, finitely many variables appear in infinitely many constraints. The SIP problem was first studied by John (1948), who gave necessary and sufficient conditions to its solution. Since then, it has been extensively studied in literature, including efforts by Blankenship and Falk (1976), Coope and Watson (1985), Gustafson (1981), and Polak and Mayne (1976). For a review of the literature, the interested reader is referred to Hettich and Kortanek (1993) and the references therein. Typical appUcations of SIP include path-planning problems in robotics, design of engineering systems under uncertainty, as well as a class of vibrating membrane problems. Reformulation of the general continuous minmax optimisation problem into the SIP fi-amework is currently gaining popularity (Rustem and Zakovic, 2000). Problems such as (2) are solved such that the infinitely many constraints G(x,y) < 0, yeY are replaced by constraints G(x,y\x)) < 0, with y^x) denoting local solutions of the parametric problem in x, maXy{G(x,y)|yGY} (see Hettich and Kortanek, 1993). Using results fi-om parametric programming, assumptions can be given ensuring that (locally) the problem is equivalent to a common finite nonlinear problem with smooth constraints. As such, conventional codes (i.e., the NAG algorithm E04UFF) for solving nonlinear programming problems (NLP) can be used iteratively to solve the SIP problem. 4. DYNAMIC MODEL-BUILDING APPLICATIONS OF SIP 4.1 Global Parametric Identifiability In general terms, thQ parametric identifiability problem can be described as follows: a model giving output trajectory J(0J«(O?®?O V^e[0,r/] within a time horizon of interest, is globally parametrically identifiable if, for any two parameter sets, 0 G 0 and 0*G0*, and for all time varying system inputs u(t)eU, and all time-invariant system inputs © G Q and the same initial conditions y(/=0):
y(e,u(0,0=y(^*Mt%(o,t) \fte[o,tf] iffe = e*
(3)
no and thus the solution trajectory is unique for any given value of the parameter vector, 0. In our illustration of Figure 2, we wish to ensure that when the same inputs [u{t),&] are apphed to the same model, but with different parameter vectors 0 and 0 (such that 0^0 ), the two predicted model output trajectories do not overlay one another. 1.2 T
Fig. 2. Identifiability illustration. There is a substantial literature on structural global (SGI) and local (SLI) identifiability. Identifiability methods for both linear-in-input (LI) and non-LI models are covered in the books by Walter (1982), Godfrey (1983) and Walter (1987), while more recent surveys are given by Walter and Pronzato (1990) and Chapell et al. (1990). Further extensions and algorithms for non-LI models can be found in Ljung and Glad (1994). The majority of past approaches for testing structural identifiability in non-LI dynamic models find their roots in differential geometry, where, in particular, the model is approximated locally by its inputoutput mapping using functional expansions (Lecourtier et al., 1987). Coefficients of these functional expansions are then used to form systems of nonlinear algebraic equations, whose unique solution 0=0 ensures identifiability. As can be expected, this approach has its advantages and disadvantages; its advantages being (i) it is a definitive test that gives a yes/no result through the existence of a unique solution to a system of nonlinear algebraic equations; (ii) the methods are not affected by scaling problems; and (iii) some indication of the mathematical relationship between parameters is obtained. Disadvantages of the methods are (i) they are limited to models of special forms {e.g., linear state-space models, nonlinear control-affine models); (ii) the accuracy of functional expansions can be adversely affected by the degree of nonlinearity of the model; and (iii) they are generally limited to models with dimensions (M+P)<10, small for modelling reahstic process systems {cf Ljung and Glad, 1994). For these reasons, in this paper, we propose a new dynamic optimisation-based identifiability test for the determination of global parametric identifiabihty of nonlinear dynamic process models, generally described by not only state-space models, but also models comprising partial differential and algebraic equations (PDAEs).
Ill There is also a substantial literature on local identifiability analysis. These analyses are either based on assessing the rank of an estimability matrix (Shaw, 1999) or of the Fisher Information matrix (Manuck and Posten, 1989; Espie, 1986); collectively termed "quantitative" or "practical" identifiability (Vajda et al, 1989). The term "local" stems from the fact that the analysis is performed given a nominal set of initial state values, operating conditions, and parameter values. This definition envelops problems other than identifiability alone, such as ill-conditioning of the estimation problem. As pointed out by Asprey and Mantalaris (2001), using the concept oi local sensitivity analysis, based on the use of the MxP matrix of sensitivity coefficients: (4) the dynamic parameter estimability matrix can be formed: ^1
^ 1
501
t
50p
tspx
^M
^M
a0i
^E
=
8Qp
'~>i
dyA
5yi|
SGil
^M\
;=V,
dQp\
f=V2
^Ai=t spnsp
..
'~Vi
(5)
'='sp2
^M\
dQpl =t ''spnsp
by solving the matrix of sensitivity coefficients of the M model response variables, y, in the model, computed at each of the risp sampling points (times at which it is proposed that the experimental responses are measured). For the problem to be parametrically estimable, the column rank of PE must be equal to P, the number of parameters in the model. This criterion for estimability also impHes that \irank(?E) < P, all parameters are not estimable, a situation in which the sensitivity coefficients are not linearly independent (see Shaw, 1999). A fiirther complication arises from the "degree of linear independence". In such cases, the rank of PE may be equal to P, however, near linear dependencies will render PE ill-conditioned indicated by a large condition number (X,), defined as the ratio of the largest to smallest nonzero singular values of PE. The local identifiability tests are more amenable to appUcation to large-scale models (Shaw, 1999; Munack and Posten, 1983; Jacquez and Greif, 1985). Solution Methodology Definition: Identifiability. (Asprey and Mantalaris, 2001) A model yielding the output trajectory j(0,ii(O,O over a time horizon of interest t^[0,tj\, is globally parametrically identifiable if, for any two parameter sets, 9 G 0 and 0*e0*, for all system inputs [u{t)eU, COGQ], and the same initial conditions y(^=0), the global maximum:
112
o . = max {e-e*YwJe-d*)
(6a)
subject to: 'j(y(u(4.A0-yk4"r.)rw,
<,^vu(0eU,.eO (6b)
/(y.y.-yz,z-ynX,x,.,x,.,.,u(0,aj,(?,^)=o vz.e(o,z.) (the model equations) gives Oj < s
113 Table 1. The semi-infinite programme solution algorithm Solution Algorithm Given: A nominal vector of parameter values, 0^^^G0, 0*^^^e0* and inputs U^^^EU, co^^^eQ. Step 0: SetA::=l Step 1: Set OK := true Globally solve Ot^Umax((?-^*f We((?-^*) k = h..,K
s.t. ^ to obtain e^''^'^ If O/ > s,j,^ then goto Step 4 Step 2: Globally solve '^
^t^u max 'j(y("Wco,(?,0-y("W".*,Of
dt
to obtain uf^^^U^^"^^
If ^f^l > s^ then set OK := false Step 3: If NOT OK then set K := K+l and repeat from Step 1 Step 4: Stop: solution found As pointed out in Asprey and Mantalaris (2001), while testing model identifiability in this v^ay, the use of global optimisation is recommended. For instance, in step 1 of the solution algorithm, we wish to ensure that we have not converged to a local maximum. In this case, a local maximum may show identifiability, when in fact the model may not be identifiable, as illustrated in Figure 3 with a simplified one-dimensional case. One should note at this stage that, if Oj > s^^ in step 1, the model can be deemed unidentifiable, and the solution can be stopped. In step 2 of the solution algorithm, we wish to ensure that we have appropriately (globally) searched the entire domains, U and Q, for any constraint violations. For global optimisation, we chose to use a stochastic clustering algorithm (Rinnooy Kan and Timmer, 1987) together with the NAG SQP algorithm (E04UFF). L
(local) identifiable / solution
(global) unidentifiable solution
Fig. 3.Global optimisation illustration. Wder a priori problem regularisation via control vector parameterisation to obtain a NLP problem to be solved globally.
114 For the purposes of solving the model equations, we made use of the newly developed "equation set object" (ESO) available for use with gPROMS (PanteHdes, 1996). The ESO is able to use as input a gPROMS input file containing the model equations, and allows use of gPROMS as a "server", exposing the model equations as well as any first-order derivative information required for solution algorithms. Furthermore, the ESO, through gPROMS, automatically handles discretisation of PDAEs to DAE form using various techniques such as forward/centraL1)ackward finite differences or orthogonal collocation on finite elements. Integration was performed using the DAE solver DASOLV (Jarvis, 1994), an algorithm capable of handling large, sparse systems of DAEs with impUcit or explicit discontinuities. 4.2 Robust Experiment Design An additional application of semi-infinite programming is in designing robust optimal experiment designs, when little or no information is known about the parameter values a priori (Asprey and Macchietto, 2001). The aim of experiment design for improved parameter precision is to decrease the size of the inference regions of each of the parameters in any given model. This is equivalent to making the elements of the parameter variancecovariance "small". Thus, one may start with an estimate of the highest posterior density (HPD) region, and, in particular, the marginal posterior density covariance as given by (Bard, 1974):
v(^,<^)=
/ v~^
MM
(7)
Li Equation (7),
is the vector of experiment design variables (defined in detail below), S^is the matrix of partial derivatives of the r* equation in the model with respect to the parameters 6 calculated at the n+\ experimental points: _ay(^^
dB
(8)
a[^ is the rs^"^ element of the inverse of the estimate Z of the variance-covariance matrix of the residuals S = cov(y ^, y^), estimated by:
a" = ^
A^-1
(9)
and 2:^(19) is an approximate variance-covariance matrix of the parameters. As suggested by Box and Lucas (1959), prior information on 6 is ignored by dropping the dependency of (7) on ^e^)' Thus the design of experiments for improving parameter precision reduces to minimising some metric of:
115
v(^,<^)=
EZ-rsrs,
(10)
In order to compare the magnitude of different matrices, various real-valued functions have been suggested as metrics. Three common criteria are: 1. D-optimality - an experimental design is Z)-optimal if it minimises the determinant of the covariance matrix (10), and thus minimises the volume of the joint confidence region. 2. £'-optimality - an experimental design is E'-optimal if is minimises the largest eigenvalue of the covariance matrix (10), and thus minimises the size of the major axis of the joint confidence region. 3. ^-optimality - an experimental design is ^-optimal if it minimises the trace of the covariance matrix (10), and thus minimises the dimensions of the enclosing box around the joint confidence region. Figure 4 illustrates geometrically the experiment design situation. The designs described above make use of a quadratic approximation to the underlying likelihood surface, depicted by the dashed ellipse. The point estimates of the parameters is indicated by a dot(»), located at the centre of the figure.
'A-ojJtimality
| - E-optimali|y
e,
Fig. 4. A geometric interpretation ofthe experiment design situation. Various approaches have been proposed in the past to address this problem, two of which are described briefly here for completeness:
116 (i) The first approach takes into account the a priori uncertainty in the model parameters. Thus, 9 is assumed to belong to a population 0 , whose distribution or density/>(0) is known. This knowledge is usually expressed in terms of basic assumptions on the admissible parameter space (i.e.^ a uniform or Normal distribution over a specified range). Walter and Pronzato (1987) formulate this problem as: <^^^=argmax^£{|v-H^,<^)| j
(11)
where E{-} denotes expected value, |-| denotes determinant, V(l?,0) is the posterior covariance of the parameters, 0, with domain 0 and <^ is a vector of experiment decision variables with domain O. In this fashion, an ED-optimal experiment is one for which the choice of experiment decision variables maximises the expected value over the population of possible parameter values of a scalar measure of the information to be gained from the experiment. As Walter and Pronzato (1987) point out, experiments designed using this approach are good on average, but can be poor for some values of the parameters that are associated with very low values of/7(9). (ii) The second approach aims to determine experiment designs, <j>j^c^ that optimise the worst possible performance for any value of (? G 0 (Federov, 1980): 0rc =argmaxmin||V"'(S,0)| }
(12)
The prior information on 9 is limited to the knowledge of the admissible domain 0 - no information on the distribution />(9) is necessary. In this way, the design tries to ensure acceptable performance for all possible values of the parameter vector (however unlikely they may be). As pointed out by Walter and Pronzato (1987), this approach has not been widely used due to the computational burden introduced by min-max optimisation, a hurdle now overcome by recent advances in optimisation (see, for example, Zakovic and Rustem, 2000). Unlike the above approaches, the work of Asprey and Macchietto (2001) utilises a criterion based on the information matrix for dynamic nonlinear multiresponse systems, first derived by Zullo (1991). The design problem is cast as an optimal control problem, as suggested by Espie and Macchietto (1989) and later by Zullo (1991). The experiment design decisions comprise optimal (time) sampling points, time-varying and time-invariant external controls and initial conditions. Zullo (1991) defined an information matrix for experiment design for the improvement of parameter precision in non-linear, multi-response, dynamic situations. The design criterion requires maximising a metric of the information matrix, defined in the following way for dynamic models: /
K
M
M
117 where 6 is the vector of the best available estimates of the model parameters, and, again, <^ is the vector of experiment design variables (defined in detail below). The matrix Q^ is the matrix of sensitivity coefficients of the /^ response variable in the model computed at each of the Hsp sampling points, and is defined as: ^ 3^2
Q.-
<= sampling point 1
dQp
<= sampHng point 2
dQp
(14)
<= samphng point n^^ a02
dQp_
This work embeds the D-optimal criterion within a robust optimisation framework, using both the expected value approach and the worst-case approach. Note that the notation used in Eqs. (13-14) is different than that used for Eqs. (7-10) to highlight the fact that Eqs. (1314) deal with J^n^m/c information, while Eqs. (7-10) deal with "traditional" steady-state information. 4.2.1 The Expected Value Approach Following Asprey and Macchietto (2001), we assume that prior information is available about the model parameters, 9, in the form of multivariate Normal probability distributions (i.e., 0. ~ N[Oi,al, j , or ^ ~ A^(^, £ J ) . Thus, prior information for the f^ parameter, 9^ can be quantified using the probability density function (pdf):
p{d,)={i^al p exp[- ^a,f (^, - ^)]
(15)
From this, we can now derive the criterion for expected-value experiment design, first for the more general case where we do not ignore correlation between parameters:
msixE{\Mr(eM for which:
(p ^ee ^ '
]
(16)
^' ^
£{|M,(,0)| y{lnY^%f
J |M,((?,^)|
e
(17)
Thus, to determine the expected value experiment design criterion, we must evaluate a multiple integral over a /7-dimensional hyper-rectangular region. To do this, Asprey and Macchietto (2001) use a three-point Gaussian-type multi-dimensional quadrature rule for approximating integrals of the form:
118
[g]=jj-jg(^v« = R[g]=z^,gy
(18)
where z is an w-vector, and g is an 5'-vector of integrands (in our case s = l and n =p). The ^ are the evaluation (or quadrature) points and w, are the corresponding weights of the quadrature rule, with7=l,..,Z. Given a model of the form of Equation (1), with its associated set of measured responses, a nominal set of parameter values of the parameters to be estimated, and the probability distribution functions associated with each of the parameters, an expected value dynamic experiment design can be computed using the scheme presented in Asprey and Macchietto (2001). Such experiment designs are termed "ED-optimaT'. 4.2.2
The Worst-Case Approach
The worst-case approach aims to determine experiment designs that optimise the worst possible performance for any value of ^ G 0 . Thus, the prior information on 0 is limited only to the knowledge of the admissible domain 0 - no information on the distribution p(Q) is necessary. In this way, the design tries to ensure acceptable performance for all possible values of the parameter vector (however unlikely they may be). The resulting robust design in the dynamic case, termed 7?-optimal, is the solution of the optimisation problem: <^^ =arg maxmini I M^(^,<^) I }
(19)
Thus, like Federov (1980), one seeks to determine an experiment design, 0 , which would yield the maximum amount of information (as measured by the determinant of the matrix M/) for the worst possible values of the parameters 0. The above problem may be rewritten as: max "¥ subject to:
(20)
T<|M,(fl,0)| V 0 G 0.
Here, again, we are faced with a semi-infinite dimensional problem, as the constraint must be satisfied for all values of 0 within the infinite continuous compact set 0 . Such problems can be solved using the general algorithm described in Gustafson (1981) for constrained nonlinear optimisation under uncertainty (see Table 2). At each iteration of the algorithm, the constraint is enforced at SL finite set of points 6 = {$^^\k = l,..,K]. Once an "optimal" experiment design, (j)*, is obtained with respect to this set, the algorithm checks whether there is a different combination of parameter values for which the design 0 does not
119 perform as well as the previous optimisation indicates. If such a point 0' exists, we add it to the set 9 (increasing the number, K, of its elements by 1) and repeat the calculation. The algorithm for solving the semi-infmite programming problem in Equation (20) is given in Table 2. Table 2. The max-min optimisation algorithm for robust design of experiments. Solution Algorithm Given: A nominal vector of parameter values, 0'^^IG0 Step 0: Set ^ : = 1 Step 1: Solve ^ ^ ' ' ^ ^ m a x T
s.t. T<det(M,(,0))Ui^ = l,..,^ to obtain ^ ^^ Step 2: Solve ^^^""^ =mindet(M,(«,0))|^m to obtain 9^^""^^ Step 3: If ^^""^ < ^^""^ then set K'=K+\ and repeat from Step 1 Step 4: Stop: i?-optimal experiment design is <^ ^^^
4.3 The Dynamic Optimisation Framework When testing parameter identifiability or designing robust experiments for dynamic models with time-varying system inputs, we are faced with solving a nonlinear dynamic optimisation problem. The dynamic optimisation problem is solved using a control vector parameterisation approach (Vassihadis et al., 1994). Following ZuUo (1991) and Asprey and Macchietto (2001), within this optimisation framework, the time-varying inputs to the process, u(t), are assumed to be piecewise constant, piecewise linear, or piecewise quadratic functions of time defined over a number of control intervals. Zeroth or higher order continuity may be enforced at the interval boundaries. For the piecewise constant case, each control variable, Ui(t), is defined in terms of its value (wij) over each interval j , as well as the "switching" times delineating these intervals (t^^, ): "/(0=>^/j
i=l,..,nuj=l,..,n,^^
W . - ^ - ^^^u
^" 1 v.,«^/;7 = 1,.., «.w,
(21) (22)
where «„ is the number of time-varying controls, and n^^, is the number of switching points associated with the f^ control (defined a priori). The above formulation allows for a different number of intervals and control levels of each time-varying control variable, which may be switched at different times. All Wij and t^^, are optimisation variables, as well as the initial conditions of the experiment, yo, any time-invariant controls co, and sampling points or
120 times at which measurements are taken, tsp. variables, (j), is defined as:
Overall then, the vector of optimisation
<^ = L,,^ = lv-,«.p,(^.>v,..>^/j.^' = l--.««.7=
(23)
A variety of constraints can be imposed on the optimisation variables. There are simple bounds and optional nonlinear equality/inequality constraints on the initial conditions, and simple bounds on the constant inputs (co): yo^^yo^^yo^ h{y,)>0
i = K^M
(24) (25)
0)f
/=l,..,/2co
(26)
ACN^...-V.^^C
/=lv,«.p
(27)
Of particular notice is Equation (27), in which a minimum span between two consecutive sampling points may have to be imposed to ensure the availability of measurement equipment for subsequent analysis. To avoid mathematical singularities caused by the collapse of one or more control intervals, the following linear constraints on the time-varying control switching points are introduced: A^.^^. <^..,^. -^..,^._.
i=h..,nu', 7 = !,..,«..,.
(28)
Simple bounds are also imposed on the levels of the piecewise constant controls within each control interval: wj^j<Wij<w^j
/=!,..,««; 7 =!,..,«...,
(29)
Finally, the constraint: ^^w,_^V
2 = !,..,««
(30)
stipulates that there is no point in effecting any changes to the control variables past the time horizon of interest. As pointed out by Asprey and Macchietto (2001), the above formulation defines, in a very comprehensive and versatile manner, a large range of dynamic experimental conditions. Although this is an attractive feature, very large optimisation problems may arise, even for small dynamic systems. For instance, the optimisation problem for a model with only two response variables, two variable initial conditions, two time-varying controls, five switching times for each time-varying control, and the corresponding piecewise levels, involves 20 optimisation variables.
121 5. ENGINEERING EXAMPLES 5.1
Global Parametric Identifiability of an Unstructured Dynamic Model of Hybridoma Cell Culture
To illustrate the application of the aforementioned techniques, in what follows, we present an example of an unstructured kinetics model for the culture of hybridoma cells, following the model developed by Jang and Barford (2000), and as previously presented in Asprey and Mantalaris (2001). The model is comprised of a set of mixed differential and algebraic equations (DAEs). The model is a typical unstructured (i.e., the internal composition and structure of the cells are unaccounted for), unsegregated (i.e., all cells are considered identical) model in which only two key nutrients are assumed to be the limiting factor for growth, while inhibitory products include ammonia and lactate. A diagram depicting the culture reactor is shown in Figure 5, including the inlet stream, reactor body, and measurement variables. ^in^J^i^' ^1
x,s
Fig. 5. Illustration of the semi-batch cell culture reactor. The mass balance for the viable cells within the culture reactor is as follows: d(VXj_ '- = y.VX,(\-f^,)-Yi,VX, dt
-F,X,
(31)
Here, [iXy(l-fGo) represents the rate of cell growth of cells not arrested (/bo = (^6-0)/b is the fraction of cells arrested in the GO state), and jiidZv the rate of cell death. The coefficients a and b used for this study were taken from the original paper by Linardos et al. (1992). The term FQXV represents the rate of viable cells washed out by sampling or harvesting. The material balance for the non-viable cells within the reactor is given by: d{VX,)_ li,VX,-k,^,VX,-F,X, dt
(32)
122 The total cell concentration within the reactor is thus given by Xt = Xy + X^. The intracellular product antibody balance is given by:
^HAb])_
(
[GLN]
k-'P'o[Ab]
[GLNJ+^GLNJ
(33)
where g^b =^C(I-/GO)+&O/GO- The overall material balance for the reactor can be written as: dV_ = F.-F dt
(34)
To describe the specific growth rate, |LI, and death rate, |Lid, we use the following relations: (35)
^^ = ^^max^^lim^^inh
[GLN]
[GLC]
>lim =
1
^GLC+[GLC] V^QLN+[GLN],
Ki„ l^inh
, K.
(^d=l^dn
+[AMM]
K,
+[LAC]
(37)
l + (^dAMM/[AMM])"
Here, we account for product and substrate inhibition to cell growth. concentration within the reactor can be described by the following equation:
at
(36)
The DNA
(38)
[\ ^y^t i /
-FJDNA] RNA molecules are synthesised only by translation of DNA, the rate of which may be limited by the availability of an intracellular pool of ribonucleotides and DNA. As such, the RNA concentration can be expressed as: rf(F[RNA]) ^ ^^
r/rr^xTAl ^x,
= emRNA^llim^linh^lDNAJ
^
\^t J (^deg./« + ^d.g2 (1 - / . )y[RNA]- F„ [RNA]
The concentration of protein can be described by:
(39)
123
'x.^ V^ty
(40)
'X.^ X,
(41)
^dPRT^[PRT]-FjPRT] Lipids concentration is expressed as: T
ymLPDl^liml^mh'^ l ^ ^ ^ J
dt
K^^v[i.m\-F,\Lm] Polysaccharides (glycogen) are biosynthesised by some proteins (enzymes), and their concentration can be described by:
(42) ^dLPD^[PSD]-FjPSD]
The consumption rates of glucose and glutamine may be defined as: ^FfGLC]) ' = -GGLC^^V - ^o [GLC]+i^. [GLC],
(43)
dt
with 2GLC ~ "
- + '^GLC+2. GLC ^ ^exGLC
[GLC]
1
[ G L C ] + ^ ,eycGLC
/
'-^^-Q...vx.-K^..v[ou^\
(44)
(45)
-F„[GLC]+i=;.[GLC], where 2GLN = KI/^X,GLN- hi Eqs. (43) and (45), we have added terms to handle the fed-batch situation. Lactate production is described as a function of glucose consumption as follows: J(F[LAC])
dt
= eLAc^^v-^o[LAC]
(46)
where QLAC = ^LAC,GLC-2GLC- While ammonia may be produced mostly from glutamine metabohsm and spontaneous degradation of glutamine: '-^^^~^
= Q^VX^
.r,,,,F[GLN]-FjAMM]
(47)
124 where QAMM = componentj.
^AMM,GLN-QGLN-,
with Yij denoting the yield of component i with respect to
We extend the model to handle dissolved oxygen concentration, described by the following equation (Tatiraju et al, 2000): ^ ^ ^ ^ ^ = Qao2VX^ -m,o2VX^ + k^av([dO, J - [dO,])
(48)
where gd02 = -Ii/Yx,d02The important experimental conditions that characterise a particular experiment are: 1. 2. 3.
the initial viable cell concentration, J^y, with range 20.0x10'^ to 30.0x10^ cells/L: the inlet flowrate, F,, range 0.0 to 0.005 L/h; the nutrient concentration in the feed, [GLU]/ and [GLN]/, both with ranges 2.0 to 8.0 mg/L.
We assume that the following process variables can be measured during the course of an experiment: y = [Xy, X^, [GLC], [GLN], [LAC], [AMM], [Ab], [DNA], [RNA], [PRT], [LPD], [PSD], [d02]]^. Furthermore, we allow time-varying inputs to the process, namely the inlet flowrate, F/, and the feed nutrient concentrations, [GLN]/ and [GLU]/ (cf. Figure 5). For problem regularisation, we use control vector parameterisation with five time intervals (the same for all inputs, with a minimum span of 15.0 h), with piecewise constant levels. Time-invariant inputs to the process include the initial values of all differential variables X^y, X^d", etc. The parameters tested for identifiability are shown in Table 4. Given the measurement vector, the input vector, the parameter vector, and the model equations (Eqs. 31 - 48), we form the identifiability test for the cell culture model with dgiWo) set to the midpoint of the respective ranges given by the lower and upper bounds presented in Table 4 and dg(Wy) set to the relative magnitudes of the quantities presented in Table 3. We assume an experimental error of 4.0% for each of the measured variables, together with a time horizon of interest of 125 h, and thus use £y=5. Table 3 shows the iterations of the solution algorithm (cf. Table 1), giving a final value of O^ =3.32x10"^ (>s^^ ==10"^); indicating that the hybridoma cell culture model is not parametrically identifiable. Here, we were not required to search for worst-case inputs [u(0,G)], as under simple nominal (batch) conditions, the model proved unidentifiable. The parameter values obtained from performing the test are given in Table 4. From these values, when adjusted with the weights, t/g(We),we can discern that parameters Xdegi, ^deg2,^RNA, Qc, QGO ^ill cause problems, while parameters ATILAC, ^IAMM? may cause problems when attempting to estimate them from collected experimental data, even under relatively noisefree conditions.
125 Table 3. Iterations of the identifiability test of the cell culture model. Optimisation-based Identifiabihty Iteration Results Max 3.32x10"^ I At [n(t),co] = [0.0,0.0,0.0;25.0xl0^5.0xl0^0.8, 4.0,0.7,30.0,0.3,20.0,6.0,1.0,5.5, 15.0,30.0,45.0,60.0,6.0, 5.0] STOP Table 4. Parameter values determined by the identifiability test for the hybridoma cell culture model. Paramet er M'max Pdmax ^GLC ^GLN ^ILAC -^lAMM -^dAMM
Sc SGO ^X,GLC WGLC
^X,GLN QexGLC ^exGLC ^LACGLC ^AMM,GLN -^DNA GmRNA
Lower
Upper
0
0*
0.005 0.005 0.1 0.01 50.0 5.0 1.0 0.1x10' 0.1x10^ 1.0x10^ 0.5x10-'^ 1.0x10' 0.5x10"'° 5.0 1.0 0.1 0.5x10"^ 0.1
0.5 0.1 1.0 0.1 150.0 50.0 10.0 5.0x10"' 5.0x10' 5.0x10' 5.0x10-'^ 5.0x10' 5.0x10-'° 15.0 5.0 1.5 0.1 1.5
0.1449 0.0649 0.75005 0.075008 90.01 15.02 4.5002 3.5x10' 5.0x10' 2.369x10' 2.0001x10"'^ 7.9x10' 1.9x10'° 10.0008 1.9997 0.70005 0.90002 0.80001
0.145 0.065 0.75 0.075 90 15 4.5 0.7x10' 1.0x10^ 2.37x10' 2.00x10'' 8.00x10' 2.00x10"'° 10 2 0.7 0.9 0.8
Param eter 2mPRT QmLVD
Lower
Upper
0
0*
0.1 0.005
1.5 0.05 1.0 5.0x10"" 0.05 0.0005 0.05 0.01 0.5 5.0 5.0x10"' 1.5 0.5 1.0 5.0 1.0 0.1 0.5
0.500005 0.0120002 0.40001 1.00013x10"" 0.035 0.00015 0.0200003 0.00500008 0.150002 0.99998 1.5958x10"' 0.94073 0.094002 0.60997 1.99972 0.70008 5.003x10"' 0.200002
0.5 0.012 0.4 1.00x10"" 0.03 0.0001 0.02 0.005 0.15 1j 1.60x10"' 0.94073 0.094 0.61 2 0.45 5.00x10"' 0.2
5mPSD ^dDNA
Kiee.1 ^degZ -^dPRT i^dLPD •^dPSD ^X,DO WDO
^La
a b n fm«NA 'flys ^dGLN
0.5x10"" 0.01 0.00005 0.01 0.002 0.05 0.5 0.5x10"^ 0.5 0.05 0.1 1.0 0.1 0.5x10"' 0.05
Local Quantitative Identifiability Asprey and Mantalaris (2001) provide further evidence to support the findings of the optimisation-based identifiabihty test through use of a local identifiability test carried out (at the experimental conditions used for the test above) by calculating the parameter estimability matrix presented in Eq. (4). Equidistant sampling times of 5h, lOh, 15h, .., 140h, were used to form PE, thus giving a [364x36] matrix of first-order sensitivity coefficients. The column rank of PE in this case was found to be 31 (< P), indicating local unidentifiability under these conditions; null parameter columns correspond to i^degi, ^deg2,.^lRNA, Qc, QGO- Furthermore, the condition number (k), defined as the ratio of the largest to smallest non-zero singular values, of the remaining 31 columns of PE was found to be 5.83x10^; indicating very near linear dependency between sensitivity coefficients and a nearly singular parameter estimation situation, regardless whether the parameters are locally identifiable.
126 5.2 Fermentation of Baker's Yeast: Robust Experiment Design for Parametric Precision We consider the fermentation stage of an intracellular enzyme process (Uesbeck et al., 1998), as presented in Asprey and Macchietto (2001). We use a typical unstructured (/.^., the internal composition and structure of the cells are unaccounted for), unsegregated (i.e., all cells are considered identical) model in which only one key substrate is assumed to be the limiting factor for growth and product formation. The product is an intracellular enzyme; non-viable cells are also formed. We assume isothermal operation of the fermenter, and that the feed is free from product. A figure depicting the fermenter is shown in Figure 5, including the inlet stream, reactor body, and measurement variables. Assuming Monod-type kinetics for biomass growth and substrate consumption, the system can be mathematically modelled by the following set of DAEs (Nihtila and Virkunnen, 1977; Espie and Macchietto, 1989): -^ = {r-u,-Q^)x, (49) dt
63
where xi is the biomass concentration [g/L]; X2 is the substrate concentration [g/L]; wi is the dilution factor [h"^]; and U2 is the substrate concentration in the feed [g/L]. Key experimental conditions that characterise a particular experiment are: 1. the initial biomass concentration (or inoculation), xi^, with range 1 to 10 g/L 2. the dilution factor, wi, with range 0.05 to 0.20 h"^ 3. the substrate concentration in the feed, U2, with range 5 to 35 g/L. For our experiment design purposes, we keep the initial substrate concentration, X2^, at 0.1 g/L, and do not consider this variable for further experiment design. Both x\ and X2 can be measured during the experiment. The objective is to design an experiment to yield the best possible information for the estimation of the four parameters 0/, i=l,..,4. The only a priori available information on the latter is that they lie in the ranges [0.05-0.98;0.05-0.98;0.05-0.98;0.01-0.98], respectively. A nominal set of mid-range values, 9f^^=0.5, /=1,..,4, is therefore used to start the various design algorithms.
127 We assume that budget constraints limit the number of measurements per experiment to 10 sampling points. Based on manipulated variable behaviour, we use a piecewise constant variation over 5 switching intervals for each of the two controls (dilution factor, ui(t), and feed substrate concentration, U2(t)). The elapsed time between any two successive sampling points is bounded between 1 h and 20 h (cf. Equation 27), and the duration of each control interval between 5 h and 20 h (cf. Equation 28). An initial (arbitrary) guess for the experiment design vector is shown in Table 5. As it turns out, this initial design (coupled with the nominal (guessed) parameter values) leads to a modest amount of predicted information, with det(M/) having a value of 2.410x10^. The associated model predictions in this case are shown in Figure 6, with the input trajectories for u\(t) and U2(t) shown in Figure 7, respectively. Table 5. Initial design vector. Design Variable Biomass Initial xi" Condition Measurement Times ^sp,'^ Dilution Factor Control Switching Times Feed Substrate Control Switching Times Dilution Factor Control Levels Feed Substrate Control Levels
Value(s)
Symbol 5.5
i=^^-^^sw,
2.0,4.0,6.0,8.0,10.0,12.0, 14.0,16.0,18.0,20.0 4.0, 8.0, 12.0, 16.0, 20.0
^ • = l v . , n^^^
4.0, 8.0, 12.0, 16.0, 20.0
hi'
^ • = l v . , n^^^
0.12,0.12,0.12,0.12,0.12
h/'
^'=lv.,
15.0, 15.0, 15.0, 15.0, 15.0
t
^SW2j
I
;
'
l,..,W5p
«.W3
10.0
15.0
20.0
25.0
Time [h]
Fig. 6. Model predictions for the starting-point dynamic experiment design.
128 ^
0.25 -
g- 40.0
4 0.20 -
a> 30.0
% 0.15(0
^
1 0.05-
•D 10.0
"^ 0.10 -
1
1
20.0
30.0
U.UU 1
00
10.0
^
20.0
0.0
0.0
Time [h]
10.0
20.0
30.0
Time
Fig. 7. The input profiles from the starting-point design. The Standard D-Optimal Experiment Design As pointed out in Asprey and Macchietto (2001), step 1 of the first iteration of the solution algorithm in Table 2 corresponds to a conventional £)-optimal experiment design carried out at the nominal values of the point estimates of the parameters. The experiment design obtained for this case is shown in Table 6, and corresponds to an information prediction of det(M/) = 1.278x10 ^, which represents a very large improvement over the initial experiment design guess in Table 5. The model predictions for the Z)-optimal design are shown in Figure 8, where the sampling points of the two response variables (in this case, both measured at the same sampling points or times) are indicated by vertical arrows along the respective prediction trajectories. Figure 9 shows the input profiles for u\{t) and W2(0? respectively, for the Z)-optimal design. Table 6. .D-optimal design vector. Design Variable Symbol Biomass Mitial X," Condition Measurement Times l=\,..,nsp tspj Dilution Factor Control Switching Times Feed Substrate Control Switching Times Dilution Factor Control Levels Feed Substrate Control Levels
Value(s) 8.53
^Wl,(
'
' = 1 - ' «»w,
21.2,22.2,23.2,24.2,25.2, 26.2,27.2,28.2,29.2, 30.2 5.4,11.3,19.3,24.4,30.2
^^2,;
'
' ' = l v , «„,
2.1,7.1,20.0,25.2,30.2
\n
' • = 1 - . ««,
0.2, 0.05, 0.05, 0.05, 0.05
Z2,,-;
' = l v , «.„,
35.0,35.0,35.0,22.8,15.0
129 14.00 12.00 -J L. "(0
1
10.00 6.00
C
4.00
O
^M 1
8.00
r
(D
/^Vi^f^L--^
2.00
1/Vj<(i)
0.00
1
^ * • ^ A A ^ A ^
1 i1 I i 1M M !
1
0.0
10.0
30.0
20.0
40.0
'
Time [h]
Fig. 8. Model predictions from the Z)-optimal design. ^0.25
:J40.O
£0.20
% 0.15 (0
"• 0.10 c
10.05 ^ 5 0.00
t
20.0
^
10.0 -\
CD
n
—r-
10
20 Time [h]
30
40
£
0.0
0
10
20
30
40
Time [h]
Fig. 9. The input profiles from the Z)-optimal design. The Worst-case (R-Optimal) Experiment Design The /^-optimally designed experiment would indeed yield a very large amount of information if the values of the parameters 6/, z=l,..,4 in reahty are all equal to the nominal value of 0.5. As this is highly unlikely to be the case (recall that these are guessed values), step 2 of the solution algorithm in Table 2 seeks to establish the worst-case values of 9/, i.e., those for which the experiment of Table 6 would be as unsuccessfiil as possible. This yields the values 9^^40.05,0.98,0.98,0.98]. The corresponding value of det(M/) is only 4^^'^= 4.187x10"^ which indicates that the quality of the D-optimal experiment is highly dependent on the initial guess of the parameters and could well be much worse than originally assessed. Consequently, asT^^^<^^^^ K is increased by 1, and the algorithm returns to step 1 seeking to establish a new experiment design >^^^ that will perform well taking into account both Q^^^ and 9^^l The corresponding value of the objective fimction is ^^^^=9.700x10^ which is better than T^^^= 4.187x10"^ since the need for the experiment to yield a higher information content is taken into account explicitly during its design. Although the experiment <^f^^
130 performs better than <> /^^^ at both 0=9^^^ and 0=0^^^, there may still be a different combination of parameter values 0^^^ for which its performance is not so good. This is determined by a second execution of step 2 of the solution algorithm which yields 0^^^= [0.98,0.05,0.05,0.01] and 4^^^^= 7.456x10"^, showing that the algorithm must continue. Following four iterations of the algorithm, convergence was achieved. Thus, there is no experiment design worse than (l>^'^\ guaranteeing a minimum objective function value of ^^"^^=4.846x10^ for any combination of the values of the parameters 0 within the range under consideration. The final i?-optimal design vector is given in Table 7. Table 7. The i?-optimal design vector. Design Variable Symbol Biomass Initial X," Condition Measurement Times l=l...nsp ^sp, » Dilution Factor Control Switching Times Feed Substrate Control Switching Times Dilution Factor Control Levels Feed Substrate Control Levels
SWij
t SW2J
Zv; ^2,i •>
Value(s) 9.74
>
^•=l-"«.w,
0.2,3.2,6.5,9.5,10.5,11.5, 12.5,13.5,14.5,40.0 0.4,8.5,18.6,35.0,40.0
;
^' = i---'^.w,
0.1,5.1,10.1,15.1,40.0
i=l,..n^^
0.2,0.05,0.2,0.05,0.13
^•=l---«.w,
35.0, 35.0, 35.0, 35.0, 15.0
The model predictions and corresponding input profiles for the i?-optimal design are shown in Figures 10, 11(a) and 11(b), respectively.
0.0
10.0
20.0
30.0
Time [h]
Fig. 10.
Model predictions from the i?-optimal design.
131 - 0.25 - 0.20 % 0.15 TO
•^0.10
I 0.05 1
"2 0.00
1
40
60
1
1
1
20
40
60
20 Time [h]
-7 40.0
^ 30.0 w 20.0 n ^ 10.0 •^ on LL U.U 1 ()
Time [h] Fig. 11. The input pro files from the i^-optimal design. Expected Value (ED-Optimal) Experiment Design Following Asprey and Macchietto (2001), for the expected value design, we assume an informative prior Normal distribution for each of the parameters, all with variance 0.40 and mean 0.5, and the same parameter ranges as in the i?-optimal case. Solving the optimisation problem expressed by Equations (16-17) gives the design shown in Table 8, which corresponds to an expected value of the determinant of the information matrix of 5.346x1 o\ The model predictions for the £D-optimal design vector using the nominal set of parameters are shown in Figure 12, where the sampling points of the two response variables (in this case, both measured at the same sampling points) are indicated by vertical arrows along the prediction trajectories. Figure 13 shows the input profiles ofu\(t) and W2(0? respectively, for the £Z)-optimal design.
132
Table 8, The £Z)-optimal design vector. Symbol Design Variable Biomass Initial ;c," Condition Measurement Times l^\...nsp hpi ' Dilution Factor Control Switching Times Feed Substrate Control Switching Times Dilution Factor Control Levels Feed Substrate Control Levels
Value(s) 9.61
^'=l---«.w.
0.1,1.1,2.1,3.1,4.1,5.1, 37.0, 38.0, 39.0, 40.0 10.3,15.4,27.6,33.3,40.0
^•=l---«^,
4.9,9.9,26.9,31.9,40.0
\ i '
^'=l---«.Mi
0.05, 0.2, 0.2, 0.2, 0.2
^2,^
^ • =
SW^j '
•^^2,/
'
5.0,5.0,5.0,35.0,15.0
! • • • « . . ,
Z^.[} -
ZJ 20.0 g 15.0 "TO
1 10.0 -, o c
O
X(1)
/x(2)
5.0 U.U If 4 * f f 1
00
1
1
1
10.0
20.0
30.0
f f f f 40.0
Time [h]
Fig. 12. Model predictions from the ED-optimal design. „
0.25 1
r
0.20
5^ 40.0 i ' 30.0
1 0.15
n 20.0
LL
c 0.10 .9 2 0.05 Q
3 CO •D (1) 0 LJ_
0.00
0.0
20.0
40.0
Time [h]
60.0
10.0 0.0
0.0
20.0
40.0
Time [h]
Fig. 13. The input profiles from the £Z)-optimal design.
60.0
133 Assessment of the Dynamic Experiment Designs In order to provide a practical assessment of the quality of the designs in Tables 6, 7, and 8, pseudo-experiments were performed by performing simulations with the "true" parameter vector 6* = [0.31, 0.18, 0.55, 0.05]^. Multivariate Normally-distributed noise with zero mean and variance 0.04 was added to both responses, xi and X2 (naturally, the results obtained here will be affected by the level of noise present in the experimental data). The "experimental" results collected at the appropriate sampling times were then used to perform parameter estimation using a maximum likelihood functional, thus giving MLE estimates. The parameter estimates obtained are shown in Tables 9, 10, and 11 for the Z)-optimal, i?-optimal, and ED-optimal designs, respectively. In each case, linear-approximation 95% parameter confidence intervals and correlation matrix are included.
Table 9. Parameter estimates from the experiment designed using the Z)-optimal criterion. Approximate Correlation Matrix Parameter Optimal 95% Estimate Confidence Interval
ei 63 64
0.193 0.092 0.498 0.044
±0.169 ±0.092 ±0.436 ±0.034
61
62
63
64
1.00 0.99 -0.99 -0.98
1.00 -0.97 -0.98
1.00 0.99
1.00
Table 10. Parameter estimates from the experiment designed using the jg-optimal criterion, Parameter Estimate 95% Approximate Correlation Matrix Confidence Interval 6, 62 64 63 61 62 63 64
0.304 0.200 0.536 0.045
±0.081 ±0.073 ±0.027 ±0.008
1.00 -0.85 0.77 0.76
1.00 -0.71 -0.74
1.00 0.96
1.00
134 Table 11. Parameter estimates from the experiment designed using the £!D-optimal criterion. Parameter Estimate 95% Approximate Correlation Matrix Confidence Interval
0.279 0.218 0.546 0.049
ei 62 64
±0.228 ±0.181 ±0.018 ±0.004
Oi
O2
03
04
1.00 0.99 0.33 0.10
1.00 -0.17 -0.45
1.00 0.96
1.00
As can be seen from these results, in all cases, after a single experiment, the parameter estimates have been greatly improved from their initial guesses of 0/ = 0.5, z=l,..,4. However, parameter estimates obtained from the experiments designed with the i?-optimal and ED-optima\ criteria are much closer to the true values (i.e., more accurate) used to generate the "experimental" data. Moreover, the /^-optimal estimates are much more precise than the estimates using the ED-optimally designed experiments, which are, in turn, more precise than those obtained from the i)-optimally designed experiment. This can clearly be seen in Figure 14, which shows plots of the approximate joint 90%-confidence regions for parameters 02 and 03. 1 -r
0.8 •ioptimal 0.6
starjting point
0.4
optimal
0.2
0.2
0.4
0.6
0.8
02
Fig. 14.
Joint 90%-confidence regions for 02 and 03.
In this simulation-based "experiment", an additional interesting comparison can be made using the values of the design criterion evaluated at the true parameter values 9 , giving an indication of the amount of information that would be collected had an actual experiment been conducted, calculated as:
135 det(M,(^*,<^J)= 5.147 xlO^' det(M^(^*,>^^))= 5.305x10' det(M^(^*,>^))= 7.036x10^
(50)
This indicates that, in "reality, the i?-optimal design is providing more information content than the £D-optimal design, which, in turn is providing more information content than the Doptimal design for parameter estimation purposes. In practice, of course, this comparison cannot be made since the "true" values of the parameters are unknown, nor is the "true" model. Asprey and Macchietto (2001) point out that although the i?-optimal design outperforms the ED-optimal design, which, in turn, outperforms the D-optimal design, the improved performance comes with a computational price, with the i?-optimal method requiring approximately twice as much computational time than the ^D-optimal method, and over ten times as much computational time than the conventional £)-optimal method. 6. CONCLUDING REMARKS We have presented the semi-infinite programming (SIP) problem and its applications in dynamic nonlinear model building. In particular, the applications of SIP included parametric identifiability testing, parametrically robust model discrimination (distinguishabihty), as well as parametrically robust design of experiments for parameter estimation. It is well known that previous methods for testing model identifiability are highly limited to model size and form; methods based on symbolic algebra cannot cope with models that are not easily transformable to control-affine form, and are limited to models involving more than ca. 15 differential variables and model parameters. In this chapter, we have presented an SIP-based optimisation method to test for identifiability. Though the methods are not structural in nature, they will detect unidentifiable parameters in any model that, no matter the form or size, can be solved numerically using robust numerical integration code. Due to model nonlinearity, iterative methods are used to design experiments; however, the quality of these designed experiments can be adversely affected by poor starting values of the parameters. Two methods that take into account a priori information about the parameter space (as ranges and distributions, respectively) have been presented for the determination of robust designs for dynamic experiments. One method, the worst-case method, combines a max-min criterion suggested in previous literature with the information matrix developed for dynamic systems. The other takes an expected value approach in an attempt to give dynamic designs that are good on average. It has been shown that the proposed i?-optimal and EDoptimal designs are much less sensitive to poor starting estimates of the model parameters than the conventional Z)-optimal designs. The SIP-based (i?-optimal) methods are shown to out-perform both expected-value and conventional approaches based on parameter point estimates for designing dynamic experiments.
136 ACKNOWLEDGEMENTS S.P.A. is grateful for the financial support of Mitsubishi Chemicals Corporation through a Mitsubishi Fellowship, as well as the EPSERC research grant GR/N38688 for the development of the optimisation-based identifiability methods.
REFERENCES Asprey, S. P. and A. Mantalaris, Global Parametric Identifiability Of A Dynamic Unstructured Model Of Hybridoma Cell Culture, In Proc. of 8^^ Int. Conf. Comp. App. in Biotech., IFAC, Quebec City, Canada, 461-467, 2001. Asprey, S. P. and S. Macchietto, Designing Robust Optimal Experiments, /. Proc. Control, ADCHEM-2000 Special Edition, 2001. Asprey, S. P. and S. Macchietto, Statistical Tools for Optimal Dynamic Model Building, Comp. Chem. Eng., 24, 1261-1267, 2000. Asprey, S. P. and Y. Naka, Mathematical Problems in Fitting Kinetic Models - Some New Perspectives, J. Chem. Eng. Japan, 32, 328-337, 1999. Bard, Y., Nonlinear Parameter Estimation, Academic Press, London, 1974. Beck, J. V. and K. J. Arnold, Parameter Estimation in Engineering and Science, John Wiley & Sons, New York, USA, 1977. Belhnan, R. and K. J. Astrom, On Structural Identifiability, Math. Biosci., 7, 329-339, 1970. Blankenship, J. W. and J. E. Falk, Infinitely Constrained Optimisation Problems, JOTA, 19, 261-281, 1976. Box, G. E. P. and H. L. Lucas, Design of Experiments in Non-linear ^ii\xditions, Biometrika, 46, 77-90, 1959. Chapell, M., K. Godfrey, S. Vajda, Global Identifiability of Non-linear Systems with Specified Inputs: A Comparison of Methods, Math. Biosci., 102, 41-73, 1990. Chiu, W. Y., G. M. Carratt, D. S. Soong, A Computer Model for the Gel Effect in FreeRadical Polymerization, Macromo/ecw/e^, 16, 348-357, 1983. Coope, I. D. and G. A. Watson, A Projected Lagrangian Algorithm for Semi-infinite Programming, Math. Programming, 32, 337-356, 1985.
137 Espie, D. M. and S. Macchietto, The Optimal Design of Dynamic Experiments, AIChEJ., 35, 223-229, 1989. Espie, D. M., "The Use of Nonlinear Parameter Estimation for Dynamic Chemical Reactor Modelling", Ph. D. Thesis, University of London, 1986 Fedorov, V. V., Convex Design Thoory, Math. Operationsforsch. Statist, Ser. Statistics, 11, 403-413, 1980. Godfrey, K. R., Compartmental Models and their Apphcation, Academic Press, London, 1983. Gustafson, S. A., A Three-Phase Algorithm for Semi-infinite Problems. In A. V. Fiacco and O. Kortanek (Eds.), Semi-infinite Programming and Applications (pp. 138-157), 1981. Hettich, R. and K. O. Kortanek, Semi-infinite Programming: Theory, Methods and Applications, SIAMReview, 35, 280-429, 1993. Jacquez, J. A. and P. Greif, Numerical Parameter Identifiability and Estimability: Integrating Identifiability, Estimability, and Optimal Sampling Design, Math. Biosci., 77, 201-227,1985. Jang, J. D. and J. P. Barford, An Unstructured Kinetic Model of Macromolecular Metabolism in Batch and Fed-batch Cultures of Hybridoma Cells Producing Monoclonal Antibody, Biochem.Eng. J., 4, 153-16S, 2000. Jarvis, R. B., "Robust Dynamic Simulation of Chemical Engineering Processes", Ph.D. Thesis, University of London, 1994. John, F., Extremum Problems with Inequahties as Subsidiary Conditions, Studies and Essays, Courant Anniversary Volume, John Wiley and Sons, New York, 1948. Korkel, S., L Bauer, H.G. Bock, J. P. Schloder, A Sequential Approach for Nonlinear Optimum Experimental Design in DAE Systems, Proc. of Int. Workshop on Scientific Computing in Chemical Engineering, May 26-28, 1999, Vol. 11, TU Hamburg-Harburg, Germany, 1999. Lecourtier, Y., F. Lamnabhi-Lagarrigue and E. Walter, Volterra and Generating Power Series Approaches to Identifiability Testing, in Identification of State Space Models, Walters, E. (Ed.), Academic Press, London, 1987. Linardos, T. I., N. Kalogerakis, L. A. Behie, Cell Cycle Model for Growth Rate and Death Rate in Continuous Suspension Hybridoma Culture, Biotechnol Bioeng., 40, 359-368, 1992. Ljung, L. and T. Glad, On Global Identifiability for Arbitrary Model Parameterizations, Automatica, 30, 265-276, 1994.
138 Munack, A and G. Posten, Design of Optimal Dynamical Experiments for Parameter Estimation, In Proc. o/^/ze^CC, FA4-11:45, 1989. Nihtila, M. and J. Virkunnen, Practical Identifiability of Growth and Substrate Consumption Models, Biotechnol.Bioeng., 19, IS31, 1977. Oh, M. and C. C. Pantehdes, A Modelling and Simulation Language for Combined Lumped and Distributed Parameter Systems, Comp. Chem. Eng., 20, 611-633, 1996. Pantehdes, C. C. (1996). gPROMS - An Adymcod Tool for Process Modelling, Simulation and Optimisation, Proc. Chemputers '96, McGraw-Hill. Polak, E. and D. Q. Mayne, An Algorithm for Optimisation Problems with Functional hiequality Constraints, IEEE Trans. Automatic Control, AC-21, 184-193, 1976. Rinnooy Kan, A., and G.T. Timmer, Stochastic Global Optimization Methods. Part XL Multilevel Methods, Math. Progng, 78, 39-57, 1987. Shaw, B. M., Statistical Issues in Kinetic Modelling of Copolyermisation, Ph. D. Thesis, Queen's University, Canada, 1999.
Gas-Phase
Ethylene
Tatiraju, S., M. Soroush, and R. Mutharasan, Multi-rate Nonlinear State and Parameter Estimation in a Bioreactor, Biotech. <SL Bioeng., 63, 22-32, 2000. Uesbeck, F., N. J. Samsatli, L. G. Papageorgiou and N. Shah, Robust Optimal Fermentation Operating Policies, Comp. Chem. Eng, 22S, S167-S174, 1998. Vajda, S., H. Rabitz, E. Walter and Y. Lecourtier, Quahtative and Quantitative Identifiability Analysis of Nonlinear Chemical Kinetic Models, Chem. Eng. Commun., 83, 191-219, 1989. Vassihadis, V. S., R. W. H. Sargent and C. C. Pantehdes, Solution of a Class of Multistage Dynamic Optimisation Problems 1: Problems Without Path Constraints, Ind. Eng. Chem. i?e5., 33, 2111-2122, 1994. Walter, E. (Ed.), Identifiability of Parametric Models, Pergamon Press, Oxford, 1987. Walter, E. and L. Pronzato, Qualitative and Quantitative Experiment Design for Phenomenological Models - A Survey, Automatica, 26, 195-213, 1990. Walter, E. and Y. Lecourtier, Global Approaches to Identifiability Testing for Linear and Nonlinear State Space Models, Math. Comp. in Simulation, 24, 472-482, 1982. Walter, E., Identification of State Space Models, Springer, Berlin, 1982.
139 Zakovic, S. and B. Rustem, Semi-infinite Programming and Applications to Minimax Problems, presented at APMOD 2000, Applied Mathematical Programming and Modelling, Brunei University, Uxbridge, 17-19 April, 2000. ZuUo, L., "Computer Aided Design of Experiments. An Engineering Approach", Ph.D. Thesis, University of London, 1991.
This Page Intentionally Left Blank
Dynamic Model Development: Methods, Theory and Applications S.P. Asprey and S. Macchietto (editors) © 2003 Elsevier Science B.V. All rights reserved
141
Non-constant Variance and the Design of Experiments for Chemical Kinetic Models Anthony C. Atkinson Department of Statistics, The London School of Economics, London WC2A 2AE, UK The paper develops methods for the design of experiments for mechanistic process models when the response has to be transformed to achieve symmetry and constant variance. Because of the nature of the relationship between response and the mechanistic model, it is necessary to transform both sides of the model. Expressions are given for the parameter sensitivities in the transformed model and examples given of optimum designs, not only for single response models, but for experiments in which multivariate responses are measured. Three simple examples are considered. It is shown that the designs may be highly sensitive to the transformation used.
1. INTRODUCTION This paper is concerned with the design of experiments for process models when the response has to be transformed. The transformation is necessary to provide a response with constant variance, so that least squares is efficient for fitting models and estimating parameters. Since larger observations tend to have larger variances, transformations like the square root or the logarithmic are often used to stabilize the variance. The results given here show that such transformations can have a large effect on good designs for estimating the parameters of nonlinear mechanistic models. Many process models are polynomials in the process variables. These models are linear in the parameters. For such linear regression models, transformation of the response does not affect the design, unless, as in Atkinson and Cook (1996), it is required to estimate the transformation. This is not the case here, where we assume that the desired transformation is known from previous analyses of similar data. Whether or not the response is to be transformed, the models can be built using the customary designs from response surface methodology (Box and Draper 1987) and optimum design theory and practice (Atkinson and Donev 1992). However, transformation of the response does have a strong influence on design for models derived from chemical kinetics in which the measured responses are nonlinear functions of the parameters, such as rate constants and reaction orders. The different effect of transformations on designs for kinetic models and those using response surfaces arises because in kinetics we are concerned
142 with models in which the two sides of the model are functionally related. Not one, but both sides have to be transformed to find a model which maintains the relationship between response and chemical kinetic laws, when using a transformation is intended to obtain constant variance. The seminal statistical paper for parameter estimation in the mechanistic models which typically arise in pharmacokinetics and chemical kinetics is Box and Lucas (1959), which found locally D-optimum designs for the parameters of nonlinear models when the errors are not only independent but have constant variance. The statistical model is formed by adding these errors to the expression for the observed response given by the kinetic model. The addition of such errors has continued to be the standard statistical approach. Recent examples in the process modelling literature include Bauer, Bock, Korkel, and Schloder (2000) and Asprey and Macchietto (2000). The paper starts with the simple example of designs for estimating the single parameter in exponential decay and shows how the design changes if, because of the error structure, it is appropriate to work with the logarithms of the observations. It is shown that the log transformation, which is sometimes used in pharmacokinetics, for example Mentre, Mallet, and Baccar (1997), gives a silly design in this example. Optimum design for both single and multivariate responses is reviewed in §3. The D-optimum designs found by Box and Lucas and other authors depend on the parameter sensitivities, that is the partial derivatives of responses with respect to the parameters. §4 develops the theory of designs for transformations. In particular, a simple form is found for the sensitivities after transformation of both sides of the model. The simplest example, exponential decay, is investigated more fully in Section 5 and designs found for a range of transformations. In section 6 the model is that for two consecutive reactions. Designs are found for the transformed response when the concentrations of either one or both chemical species are measured. The example in Section 7 extends this model to have a reversible second reaction and to the measurement of up to three responses. Comments in Section 8 conclude. 2. TRANSFORMATIONS AND EXPONENTIAL DECAY A simple example of the effect of transformation of the response on experimental design and parameter estimation comes from the nonlinear response model resulting from first-order decay
AAB in which the concentration of chemical A at time t is given by the nonlinear function [A]=rjA{t,9) = e-''
(^,t>0),
(1)
if it is assumed that the initial concentration of A is 1. If the ith experiment consists of measuring the concentration of A at time ti, a simple statistical model of the observations is Vi = r]A{U,e)-^ei,
(2)
where the errors e^ are independently distributed with zero mean and constant variance. Unweighted least squares is the appropriate method of estimation. The variance of the least squares estimator 9 then depends on the parameter sensitivity
m,i') = ^^^^^^ = -UeM-Oti).
(3)
143 Both Box and Lucas (1959) and Atkinson and Donev (1992)findthe locally L^-optimum designs minimising the variance of 6, which consist of taking all measurements where f{t,ijj) is a maximum, that is at a time t* = 1/0. Now suppose that the model needs to be transformed to give constant variance. If the log transformation is appropriate and [^4] is measured, taking logarithms of both sides of (1), combined with additive errors, yields the statistical model \ogyi = \og{r]A{tue)}-^ei = -eti^ei.
(4)
The log transformation thus results in a linear statistical model with response log y, for which the parameter sensitivity is just the time t. The optimum design puts all observations at the maximum possible time, when the concentration is as small as possible, a clearly absurd answer. As is shown in §5, less severe transformations give less extreme designs. Our interest is in finding the dependence of designs on the transformation needed to give errors of constant variance. But in passing we comment on the use of the linearised model (4) for parameter estimation. Whatever the error distribution, the rate constant ^ in (1) can be estimated by taking logs on both sides and applying least squares to the linearised model. But, unless the errors of observation in (1) are multiplicative, those after taking logarithms will not be additive. As a result, the conditions for least squares to be efficient will not apply, and biased estimators of unnecessarily high variance may result. A second implication of the linear model (4) is that observations at very long times will have a large effect on the estimation of ^. Again, unless the error conditions for the statistical model hold, estimates of high variance will result. This estimation procedure is very sensitive to small variations in observations taken at long times, when the response is virtually zero. Estimation using (4) is not recommended unless the errors in the untransformed observations are multiplicative, preferably with a lognormal distribution. 3. OPTIMUM DESIGN 3.1 One Response In all our examples, the experiments consist of measuring the concentration of one or more chemicals after the reaction has been running for a time t. In this section the theory is given when the concentration of only one chemical is measured. The extension to multiple responses is presented in §3.2. One experimental run yields one observation yi and the experimental design is a list of the n times, ti^i = 1 , . . . , n, not necessarily distinct, at which measurements are to be made. For mathematical convenience only continuous designs are discussed, in which the design (^ is a continuous measure specifying both a set of k distinct points in a design region T and the proportions, Wi, of observations taken at these points
The times U are the points of support of the design ^ and wi the design weights. In practice, when n observations must be taken, the design has the number of trials at ti the integer closest to nwi.
144 The nonlinear regression model is (5)
y = rj{t,^lj)^e
where the random errors e are additive and independently, identically, normally distributed with zero mean and constant variance cr^. In examples of chemical kinetic models we break the general parameter vector i/j into two components: 0, the rate constants and, in §6, the orders of reaction z/. The distinction, while physically important, does not affect the general theoretical development of this section. The information matrix of a design ^ for the p parameters ip is
where F isak x p matrix and the i-th row vector f'^{ti^i/j) has j-th element /,(i.,V') = ^ ^ ^ ,
fovj =
l,...,p,
called the sensitivity for parameter j , and W — diag{w;i,..., Wp}. The information matrix thus depends on the unknown parameters ^. Here only locally optimum designs will be considered in which a best guess tp^ is taken for the parameters. In the examples the calculations are of Z^-optimum designs maximizing the logarithm of the determinant of the information matrix, log |M(^, ^ ) | . The use of other design criteria, as well as Bayesian designs to reflect parameter uncertainty, in experiments for chemical kinetics is described by Atkinson and Bogacka (1997). The well-known Equivalence Theorem of Kiefer and Wolfowitz (1960) relates maximization of the determinant of the information matrix to minimization of the maximum variance of the predicted response over T. With the standardized variance of the prediction at t defined by
d{t,^,i^) = f{t,i;)M-\^,i;)fit,ij),
(6)
the Equivalence Theorem states that, for the optimum design, ^*, the maximum value of d{t, ^*, ip) over the design region, T, is p, the number of parameters in the model, and further that this maximum value is attained at the support points t* of ^*. The theorem provides a basis for the construction and checking of i^-optimum designs. All the locally jD-optimum designs for nonlinear models found in this paper have exactly p points of support when one response is measured. For univariate responses the weights at the support points are then equal to 1/p. This is not the case for the multivariate designs of the next section. 3.2 Multivariate Response Now suppose that the concentration of more than one chemical is measured. There will then be a model for each expected response giving a matrix Fi of parameter sensitivities for the ith response, i = 1 , . . . ,m,
where ^ = 1,..., A: denote the design points and j = 1, ...,p denote the parameters. The generalization of the single-response case is that now the observations follow the model
145 with
x^/
X
^
Eiein)=0,
^/
N
\
0
if
£;(e,.e,„) = I ^^^ if
u ^ V
u=v ^
when the variance-covariance matrix of the responses is ^ = Wil}i,l=l,...,m' Draper and Hunter (1966), following arguments similar to those of Box and Lucas (1959) for the single-response case, show that for normally distributed errors the information matrix is given by 771
m
i=l 1=1
where E-^ = {a'^}i,i=i,...,mThe results of Fedorov (1972, p.212) show that a form of the usual equivalence theorem applies for D-optimality. If the standardized variance of prediction in (6) is extended to
duit,^,i;) = f^{t,i;)M-\^,i>)Mt,i^),
(8)
with M(^, ip) given by (7), the Equivalence Theorem of §2.2 applies to m
m
i=l 1=1
4. PARAMETER SENSITIVITIES AND TRANSFORMING BOTH SIDES Power transformation of the response is helpful if the variance of Y increases with the expected value E{Y) of F . If vary oc{£;(y)}2(i-^), (9) Taylor series expansion shows that the variance is approximately stabilized by using as the response
logy
X = 0.
So, for A = 1, the variance is independent of the mean and no transformation is necessary. When A = 0.5, the variance is proportional to the mean and the square root transformation is indicated, whereas, when A = 0, the standard deviation is proportional to the mean and the logarithmic transformation provides approximately constant variance. These three are the most frequently encountered values of A. This simple power transformation was extended by Box and Cox (1964) to provide continuity at A = 0. For transformation of just the response y in a regression model, they analyze the normalized power transformation ,(,)
f(.^-i)/(Ay-) 1 ylogy
A^o A = 0,
(10)
146 where the geometric mean of the observations is written as ^ = exp(E log yi/n). When A = 1, there is no transformation. The model to be fitted is (5) with response z{\). The parameter sensitivities for this model with A = 1 will be written
/M« = ^ .
0.)
If the model r]{t, ^) is an empirically determined polynomial, design for known A does not depend on the need to transform the data. The data are transformed before fitting the model to give homogeneity of variance, so that the optimum design does not depend on the particular transformation employed. However if, for example, 7/(t, V^) is a mechanistic model based on chemical kinetics, the relationship between the response and the concentrations of the other reactants needs to be preserved after transformation. This is achieved by transformation of both sides of the model, as described in Chapter 4 of Carroll and Ruppert (1988). We transform both sides of the nonlinear model (5) in the absence of error
with the normalized Box-Cox transformation (10), to obtain ^MN^ I ( y ' - l ) / ( A y ' - ' ) ^ \ ylogy
= {v'-l)l{\y^-') = ylogrj
A^O X = 0,
where, as before, the geometric mean of the observations is written as y. For fixed A 7^ 0, estimation of the parameters -0 in (12) does not depend on whether the response is z{X) or the nonnormalized y^. Multiplication of both sides of (12) by Xy^~^, simplification and the introduction of observational error on this transformed scale leads to the statistical model y' = {vit,iP)}' + e. (13) The parameter sensitivities in this transformed model are
«M..«,,,„.-.MMA) - AW«,v.)}»-'//«.«^ //('.« = -^^'liiF-=mtM'-'"-^^
(14)
For fixed A, multiplication by A in (14) does not change the optimum design, so the sensitivities have the easily calculated form
j%^l>) = W.m^-'f%^) = fj{t,i>)/Ht,i^)y-'-
(15)
If A < 1, the variance of the observations increases with the value of rj{t,ip). Thus transformation of both sides for such values of A will increase the relative value of the sensitivities for times where the response is small. We can expect that designs for A < 1 will include observations at lower concentrations than those when no transformation is needed. The results in the next sections show this to be the case. We apply these results for nonlinear models when A is known. One case of this is Horwitz' Rule in analytical chemistry which establishes an empirical relationship between concentration and variance which shows that a power transformation of the response is needed to stabilise variance.
147 Horwitz' rule is a relationship between the variability of chemical measurements and the concentration of the analyte. Lischer (1999) states that it is supported by the results of studies involving almost 10,000 individual data sets with analytes varying in concentration in mass/mass units from 1 (pure substances) to ultratrace contaminants, with concentrations of one part in 10~^^. The rule is often expressed by a linear relationship between log standard deviation and log concentration reflecting the increase in standard deviation at higher concentrations. Taking logarithms in (9) shows that the slope of such plots is 1 — A. If the slope of the relationship were one, the log transformation would be appropriate. The increase is found to be slightly less than this, depending on the laboratory, averaging around 0.86, so that the transformation is to the power 0.14. As we shall see, this is a sufficiently strong transformation to give designs very different from those when no transformation is required. Although all the models we use are nonlinear, the derivation of sensitivities given above would also hold if the model r/(t, x/j) were linear but expressed a physical relationship between y and other variables which needed to be preserved after transformation. Ohm's law would be an example. If the relationship has no physical basis, so that rj{t^ t/;) is a polynomial graduating ftmction which can be thought of as arising from Taylor expansion of the unknown underlying physical relationship, similar Taylor expansion of the transformed model (13) again yields a polynomial model. Since the sensitivities for such models are the variables in the polynomial, independently of the coefficients, the transformation does not affect the design. 5. EXPONENTIAL DECAY The model for exponential decay introduced in §2 comes from solution of the differential equation
The concentration of A at time t was given in (1) as [A]=r]A{t,e) = e-''
{0,t>0).
The concentration of B is [B]=r]B{t,0) = l-e~''
{0,t>0).
If [A] is measured the sensitivity is f\{t,9)
= -texp{-0t),
(16)
whereas if [B] is measured fB{t,9)=texp{-et), both of which have their extreme value at the time t* = 1/9. Therefore, in the absence of a transformation, all readings should be taken at this one value of time. Now suppose that the model needs to be transformed to give constant variance. From (14) fAit.O) = {r]A{t,9)y-'f\{t,9)
= -texp{-X9t),
(17)
148
Lambda
Figure 1. Exponential decay: left-hand panel, time of optimum reading; right-hand panel, efficiency relative to measuring both [^4] and [B]. Dashed line [B], continuous line [A], dotted and dashed line, both [A] and [B] measured.
The optimum design when [A] is measured is therefore at a time of 1/(A^). As A decreases, the time for the optimum design increases reaching, as we saw in §2, infinity when A = 0, the log transformation. The analysis when [B] is measured is similar, but does not yield an explicit value for the optimum time. The sensitivity is now
/Bit, 0) = {vBit, e)y-'fsit,
e) =teM-Om
- exp(-^t)}A-1
(18)
which is maximized by the optimum time. As A -> 0, the optimum time does likewise. When A = 0, t - 0. As well as designs in which only [A\ or [B] is measured, we can also find designs when both concentrations are measured. This calculation uses the sensitivities (17) and (18) in the formulae for multiple responses (7). In the calculations for this paper we take the responses to be independent with the same variance. The optimum design when both [A\ and [B] are measured also requires measurements at just one time. The left-hand panel of Figure 1 is a plot of the optimum time at which the readings of the concentration of ^4 or 5 or both should be taken as a function of A. In the calculations 0 = 0.2, so that the optimum time, in the absence of a transformation, is 5, whichever of the three sets of responses is measured. The figure shows the strong dependence of the design on the need for a
149 transformation. When A = 0.5, the optimum time of measurement has dropped, for [B], to 3.22 whereas, for [A] it has risen to 10. Times when both responses are measured begin, when A is near one, close to 5 but increase with decreasing A. For small values of A the design for [^4] and [B] is dominated by the relatively very small variance of measurements of [A] at high times, so that the design for measuring both responses is indistinguishable from that for measuring just [^4]. The right-hand panel of the figure shows the efficiencies of designs when [B] or [A] is measured, compared to measuring both responses, the efficiencies being the ratio of the variances of the parameter estimate. When A = 1 both designs have an efficiency of 50%. But, as A decreases, the efficiency of measuring [B] goes to zero whereas that for measuring [A] goes to one. 6. TWO CONSECUTIVE FIRST-ORDER REACTIONS The results on the exponential decay model show that, as A decreases, the times at which readings are taken on a single response move towards regions where the response is smaller. However, the design for measuring more than one response may not be so easily categorised. We now extend this analysis to a two parameter model, that for two consecutive reactions A%B^C.
(19)
The kinetic differential equations for [A]^ [B] and [C], the concentrations of the chemical compounds A, B and C as functions of time t are
dt dt
-
9i[AY'- e2[BY'
= e2[BY\
(20)
where Oi and O2 are the rates of reaction and Ui and U2 are the orders. Atkinson and Bogacka (1997) discuss the consequences for experimental design of this distinction between the two parts of-0. Here we take both reactions as first order, that is z/i = z/2 = 1? which . Given the initial concentrations ofA^B and C, an explicit algebraic solution can be found for the concentrations as a fiinction of time. If the initial concentration of A is one and that of B and C are zero, rjA{t, 0) follows the exponential decay (1) with 0 = 6i. The other concentrations are given by
rjc{t,e)
= l-rjA{t,0)-r]B{t,e).
(21)
These concentrations are plotted in the left-hand panel of Figure 2, which establishes the style of lines used in other figures for experiments in which [B] or [C] are measured. The plot shows the concentration of B rising from zero to a maximum and then decreasing almost to zero, whereas the concentration of C increases steadily. The parameter sensitivities f^^ {t, 0) and /^^. (t, 0) = -f\. (t, 6>) - f^^ {t, 0) are readily found by differentiation of the concentrations in (21). The parameter sensitivities for the transformed
150
0 O
c o O
o O
15
20
Figure 2. Concentrations as functions of time: left-hand panel, two consecutive first-order reactions; right-hand panel, reversible reaction. The individual components are plotted with the same line in each plot. model are then found from (14). Since the rate of exponential decay of [A] depends only on 6i, information on O2 cannot be obtained from measurements only on [A]. Measurements on [B] provide information on both parameters. By comparison, measurements of [C] provide rather imprecise information about the value of ^1. As there are now two parameters, we need to choose a design criterion which is a fiinction of the information matrix. We use D-optimality in which the determinant is maximised. For measurements on a single response the optimum designs consist of readings at just two time points, each with weight one half With 61 ^ 0.7 and 62 = 0.2, the times for the optimum design for measuring [B] are 1.23 and 6.86, which is the case considered by Box and Lucas (1959). If [C] is measured they are higher at 3.37 and 9.98. Since [B] becomes small as t increases, we can expect that the experiment will include measurements at a high value of time as A decreases. The design region is therefore taken to have a maximum value of 20 for t. At this time the concentration of C is 0.974. Figure 3 shows how the designs change with A. As A decreases the time points at which just [C] is measured decrease to regions of lower concentration. The time points of the design for measuring [B] likewise move towards regions of lower concentration, one time becoming smaller, the other larger, reaching the upper limit of 20 when A = 0.27. Also given in Figure 3 are the optimum designs when the two responses are both measured. Again they are assumed to have the same variance and zero covariance. As the figure shows, the design for measuring both
151
o
T3 C
TO
O
Lambda
Figure 3. Two consecutive first-order reactions. Optimum two-point designs when one or two components are measured: dashed Une [B]; dotted and dashed line [C]; dashed line with three dots [B] and [C].
responses is closer to that when only [B] is measured than that for measuring [C]. However, with multivariate responses, the design weights are no longer constrained to be equal. The lefthand panel of Figure 4 shows that, although the weights are not equal, in this case they are close to 0.5. The maximum weight of 0.555 is on the lower time point when A = 0.43. The right-hand panel of Figure 4 shows the efficiencies of measuring just [^] or [C] compared with measuring both responses, as given by the square root of the ratio of determinants of the information matrices for the two designs. When A = 1, measuring [B] is 68.1% efficient, rising to a maximum of 88.3% as A decreases. Measurement of only [C] is much less efficient, with a maximum value of 15.3%. A discussion of the effect of including measurements of [A] as well as those of [B] and [C] is given by Box and Draper (1965) when A == 1. Their interest is solely in analysis, not in design. However, they do comment that designs that go near to completion are rarely performed in practice.
152
Figure 4. Two consecutive first-order reactions: left-hand panel, weights of optimum design of Figure 3 when both [B] and [C] are measured - dashed line W2; right-hand panel, efficiencies of measuring [B] or [C] relative to measuring both - dashed line [B] only. 7. REVERSIBLE REACTION The behaviour of the designs in the previous sections for experiments in which only one component was measured changed in a predictable way: as A decreased, the design points moved towards regions of lower concentration. The multiresponse experiments also showed this general behaviour. We now consider a reversible experiment in which the behaviour is not so easily characterized. The model for two consecutive first-order reactions of §6 is now extended so that the second reaction is reversible Oi 62 A -^ B ^ C, (22) with 6s the rate of the reverse reaction. The kinetic differential equations for [A], [B] and [C] are now
f ^ -.M dt dt
=
ei[A]
e,[B] + es[c] • 03[O].
(23)
153 [C]
[B]
Figure 5. Reversible reaction: times for three-point optimum design. Left-hand panel, only [B] is measured; right-hand panel, measurements of only [C]. If, again, the initial concentration of A is one and those of B and C are zero, rj{t, 9) again follows exponential decay with 9 = 6i. The other concentrations are given by VB{t,9)
=
r]c{t,e)
=
9i-
-(02+e3)t\
J 92+ 9^ {• l-r]A{t,e)-r]B{t,e),
9s
r
-9it
, g-(«2+«3)t|
9^
(24)
which reduce to (21) as 9^ -^ 0. The important difference from the earlier model is that now [B] ^6'3/(6>2 + i 9 3 ) a s t ^ o o . The right-hand panel of Figure 2 shows the responses as a function of time when, as before, 9i = 0.7 and ^2 = 0-2. The value of ^3 has been taken as 0.15, so that the asymptotic value of [B] is 3/7. We can now expect, since there are three parameters, that single-response designs will have at least three points of support, the third being at the maximum value of t. We can also expect, since the value of [B] no longer decreases to zero with time, that there will be only a slight effect of A on any design points in the middle of the region when only [B] is measured. The designs when either [B] or [C] are measured are given in Figure 5. Both are three-point designs with weights one third at each time and with one support point at the maximum time, here taken to be 20. The two upper design points of the design for [B] in the left-hand panel are indeed virtually unaffected by A, ^2 increasing very slightlyft-om4.96 to 5.30. The lowest time point however decreases from 1.17 to zero as A decreases, in line with the behaviour of this design point for the design for the reversible model, shown in Figure 3. The design when just
154 [B] & [C]
[A], [B] & [C] o .
1/
(D
1m-
—
, ^'^
o 0.0
0.2
0.4
0.6
0.8
1.0
Lambda
Figure 6. Reversible reaction: times for optimum designs when more than one response is measured. Left-hand panel, \B\ and [C] are measured; right-hand panel, measurement of all three responses. [C] is measured is similar to that for the consecutive reaction, with a third observation at t = 20. As A decreases, both design points move towards regions of lower concentration. The designs however change in a less predictable way when more than one response is measured. The left-hand panel of Figure 6 shows the design points when both \B\ and [C] are measured. For A = 1, there is a three-point design, similar to that when either response is measured separately. The weights for these designs are in Figure 7, the left-hand panel of which shows the weights at each time point are very close to one third when A = 1. However, by the time A has decreased to 0.61, a two-point design is optimal, with weight 2/3 on the lower of the two design points. In these two figures, the same type of line is used for each time and its associated weight. The effect of A on the design is more extreme if [A] is also measured. As the right-hand panel of Figure 7 shows, the initial three-point design has effectively become a two-point design when A = 0.89. Thereafter, until A = 0.25, there is a two-point design, but now with weight 2/3 on the lower design point. The design changes at slightly lower values of A, briefly again having three support points. Around here the upper support point of the design is at times less than 20. For the smallest values of A we again obtain a two-point design, with the upper time point at 20. But now the maximum weight is on the upper design point. The designs plotted in the figures are found by numerical optimisation for a series of values of A. The visible fluctuations in the plots are caused by small differences in convergence of the numerical procedure for different values of A, particulafly when two design points are close
155 [B] & [C]
[A],[B]&[C]
'"^ \
Y
/^ /
V
\
A
"CD
W ^
y \ i
1
1
/
!l A 0.0
0.2
^
f
/ ' 0.4
0.6
0.8
1.0
Lambda
Figure 7. Reversible reaction: weights for the optimum designs of Figure 6 when more than one response, is measured. Left-hand panel, [B] and [C] are measured; right-hand panel, measurement of all three responses. The lines in the two figures are of the same type for the same design points. together, or one weight is small. Under these conditions the optimum can be flat, so that the designs differ more than the value of the criterion they provide. However, the main effect of measuring [A] is clear. Because [A] is close to zero for much of the experimental region, very precise information is obtained on the value of ^i when A has values away from one. Including measurements on [A] increases the emphasis in the design on measurements at high times. The final plot. Figure 8, shows the efficiencies of designs for [B], for [C] and for [B] and [C] relative to the design when all three responses are measured. This is calculated as the ratio of the determinants of the information matrices, raised, since there are now three parameters, to the one third power. For A = 1, the efficiency of measuring [B] and [C] instead of all three responses is 73.4%. As A decreases, all experiments become steadily less efficient relative to that including [A]. The smoothness of the plots of efficiencies in Figure 8 shows that the fluctuations in the optimum designs found, particularly when all three responses are measured, do indeed not affect the value of the design criterion. 8. DISCUSSION In all three examples it has been possible to find analytical solutions to the kinetic differential equations and so to find analytical expressions for the sensitivities. There is in principle no difficulty in using numerically calculated sensitivities, for examplefiromthe direct method
156
•D C (0
ca o "o c o o
Lambda
Figure 8. Reversible reaction: efficiencies of designs relative to that when all three responses are measured; reading upwards: [C], [B] and [B] and [C]. (Valko and Vajda 1984;Ucinski 1999). However, the dependence of the sensitivities in the transformation model on values of the response and the attraction of the designs for small A to regions of low response, does mean that the designs can be very sensitive to small inaccuracies in the numerical calculation of responses and sensitivities. Transformations were introduced in this paper in order to provide statistical models in which the observations had constant variance. An alternative (Bogacka and Wright 2001) is to use weighted least squares with weights proportional to E{Y)~^^~'^\ The resulting parameter sensitivities and so designs, are identical to those of §4. The weights in this form of weighted least squares include the parameters of the linear model. Information on these parameters is therefore obtained from the change of the variance with experimental conditions as well as from the change of mean. Atkinson and Cook (1995) find D-optimum designs for heteroscedastic linear models, including the special case when the linear model for the structure in the variance is the same as that for the mean. The resulting information matrix is the sum of two matrices, one of which is that for weighted least squares. An example of a design for a nonlinear model when the variance is of this special form is given by Downing, Fedorov, and Leonov (2001). An important difference between such methods and the transformation studied here is in the model implied for fitting the data. In the transformation model, the original observations
157 will have skewed distributions, which become symmetrical, with constant error variance, after the appropriate transformation. However, weighting and models with structured variances lead to symmetrical distributions of error of non-constant variance in the original scale of the observations. The identity of the designs from the two approaches when weighting is used while ignoring the parameterised structure of the variance, arises because the design criterion is based only on second-moment properties of the observations. A combination of the two kinds of model is used by Lindsey (2001, Chapter 7) in an analysis of pharmacokinetic data using compartmental models. He finds that skewed distributions such as the gamma and the log normal, when combined with parameterised variance fiinctions yield the best fitting models. The conclusion is that the error distributions on the original scale are skew, although he does not examine the transformation family used here for designing experiments. In this paper the effect of transformation has been found on designs in which the time points of measurement have to be determined. It has been shown that the designs can be very sensitive to the particular transformation used. For building and maintaining process models, designs in which measurements are taken at fixed time, but in which flow rates, concentrations and temperature are the experimental variables are of great importance. It would therefore be interesting to study the dependence of such designs on the need for a transformation. Finally, the designs, in some cases, have been shown to be sensitive to the particular value of A employed. One piece of evidence that transformations of the response are important comes from analytical chemistry, where concentrations can vary over several cycles. Under such conditions, transformation of the response is likely to be important. But in modelling, for example, a steady-state process, the concentrations being measured will not vary much from run to run, although they will need to vary to some extent if experimental information is to be usefiil in model building. Transformations of the response may be less important under such conditions. What kind of transformation, and so what kind of design, is required in a particular case can best be determined by the analysis of data from similar experiments. ACKNOWLEDGEMENTS I am gratefiil to Dr Barbara Bogacka of Queen Mary, University of London for comments which helped improve the clarity of presentation. REFERENCES Asprey, S. M. and S. Macchietto (2000). Statistical tools for optimal dynamic model building. Computers and Chemical Engineering 24, 1261-1267. Atkinson, A. C. and B. Bogacka (1997). Compound, D- and D^-optimum designs for determining the order of a chemical reaction. Technometrics 39, 347-356. Atkinson, A. C. and R. D. Cook (1995). D-optimum designs for heteroscedastic linear models. Journal of the American Statistical Association 90, 204-212. Atkinson, A. C. and R. D. Cook (1996). Designing for a response transformation parameter. Journal of the Royal Statistical Society B 59,111-124. Atkinson,A.C. and A.N. Donev (1992). Optimum Experimental Designs. Oxford: Oxford University Press.
158 Bauer, I., H. G. Bock, S. Korkel, and J. P. Schloder (2000). Numerical methods for optimum experimental design in DAE systems. Journal of Computational and Applied Mathematics 120, 1-25. Bogacka, B. and F. Wright (2001). A non-linear design problem in a chemical kinetic model with non-constant error variance, (submitted). Box, G. E. P. and D. R. Cox (1964). An analysis of transformations (with discussion). Journal of the Royal Statistical Society, Series B 26,211-246. Box, G. E. P. and N. R. Draper (1965). The Bayesian estimation of common parameters from several responses. Biometrika 52, 355-365. Box, G. E. P. andN. R. Draper (1987). Empirical Model-Building and Response Surfaces. New York: Wiley. Box, G. E. P. and H. L. Lucas (1959). Design of experiments in nonlinear situations. Biometrika 46, 77-90. Carroll, R. J. and D. Ruppert (1988). Transformation and Weighting in Regression. London: Chapman and Hall. Downing, D., V. V. Fedorov, and S. Leonov (2001). Extracting information from the variance ftinction: optimal design. In A. C. Atkinson, P. Hackl, and W. G. Muller (Eds.), MODA 6 -Advances in Model-Oriented Design and Analysis, pp. 45-52. Heidelberg: Physica-Verlag. Draper, N. R. and W. G. Hunter (1966). Design of experiments for parameter estimation in multiresponse situations. Biometrika 53, 525 - 533. Fedorov, V. V. (1972). Theory of Optimal Experiments. New York: AcademicPress. Kiefer, J. and J. Wolfowitz (1960). The equivalence of two extremum problems. Canadian Journal of Mathematics 12, 363-366. Lischer, P. (1999). Good statistical practice in analytical chemistry. In B. Grigelionis (Ed.), Probability Theory and Mathematical Statistics, pp. 1-12. Dordrecht: VSR Mentre, F., A. Mallet, and D. Baccar (1997). Optimal design in random-effects regression models. Biometrika 84, 429-442. Ucinski, D. (1999). Measurement Optimization for Parameter Estimation in Distributed Systems. Zielona Gora: Technical University Press. Valko, P. and S. Vajda (1984). An extended ODE solver for sensitivity calculations. Computers and Chemistry 8,255-111.
Dynamic Model Development: Methods, Theory and Applications S.P. Asprey and S. Macchietto (editors) © 2003 Elsevier Science B.V. All rights reserved
159
A Continuous-Time Hammerstein Approach Working with Statistical Experimental Design Derrick K. Rollins Department of Chemical Engineering & Statistics, Iowa Sate University, Ames, Iowa 50011
Hammerstein modelling is a basic approach to dynamic predictive modelling of non-linear behaviour. The development of the continuous-time Hammerstein approach by Rollins, et al. (2002) provides a natural compatibihty with statistical design of experiments (SDOE) and thus, allows for optimal experimentation and the efficient treatment of all non-linear and interactive effects. This approach is demonstrated on a theoretical Hammerstein model from literature and a real house-hold dryer with four inputs and five outputs. INTRODUCTION When modeling dynamics process behavior, engineers will commonly choose a pseudo-random binary sequence (PRBS) or a pseudo-random sequence (PRS) to 2excite" the process. A PRBS is a series of input changes with random times for changes from one level to another level and then back to the same level. A PRS will have multiple levels which may also be randomly set. If the ultimate change behavior is truly additive (i.e., all interactive effects are zero) and linear, a PRBS can provide information that is useful in estimating model parameters but it is not likely to be optimal in terms of minimal changes to the process. (Interactive effects are cross product terms or multi-linear terms or multi-linear terms.) In this situation, a low-resolution statistical experimental design, that does not "confound" main effects with other main effects, is likely to require fewer runs. (Two effects are confounded when they are perfectly correlated. Thus, when two effects are confounded, their cause and effect relationship on the response cannot be distinguished [i.e., separated]. "Partial confounding" is when two effects are correlated but not perfectly correlated.) However, two critical reasons that an engineer may not choose a statistical experimental design approach are the lack of recognition of this advantage or the lack of ability to determine such a design. To overcome the later reason, engineering curricula have made dramatic changes in recent years to include the teaching of statistical design of experiments (SDOE). It is one of the objectives of this paper to help to overcome the former reason. If nonlinear effects are present, a PRBS is incapable of providing information to estimate this behavior as pointed out by Pearson and Oguimaike (1977). In the cases, the
160
most popular approach taken by engineers to obtain models of process behavior is the use of a PRS design (see Su and McAvoy, 1993). Multiple input change levels in this design will aUow for the estimation of non-linear ultimate response behavior. However, the "non-intelligent" (since it is random) setting of design points (i.e., treatment combinations) of a PRS design will likely confound (at least partially) significant effects. Thus, in addition to ineflBciency, a PRS design will also not allow the collection of information to estimate significant interactions. On the other hand, SDOE does not suffer fi'om these limitations because of its superior "intelligent" approach to data collection. It would not make sense to use an experimental design to collect information to estimate non-linear and interactive ultimate response behavior if the modeling method (i.e., the form of the dynamic model) was incapable of addressing this behavior, as in the case of hnear transfer fimction modeling. The only approach we have found in the literature meeting this criteria is the class oiHammerstein models (see S. A. Billings, 1980) which can be a subset of Volterra models (see Seinfeld and Lapidus, 1974) as pointed out by Pearson and Ogunnaike (1997). The Hammerstein approach combines linear dynamics with non-linear steady state gains. A description is shown in Figure 1 for a continuous multiple-input, multiple-output (MIMO) system. Thus, its capability to address non-linear and interactive effects are possible by the unrestrictive mapping of the steady state gains. There are two basic classes of Hammerstein models, continuous-time and discrete-time. However, discrete-time Hammerstein modeling, which can also be classified as NARMAX (Nonlinear AutoRegressive Moving Average models with exogenous inputs) modeling, appears to be the most popular one in the chemical engineering literature. It is interesting to note that, even though discrete-time Hammerstein modeling has the ability to address interaction terms, this is not commonly done in practice because of the enormous parameter identification burden this would cause as pointed out by Eskinat, et al. (1991). Thus, in practice, a PRS is made reasonable (but not necessarily practical) via the a priori assumption that all interaction effects (cross product terms) are zero.
Static Nonlinear Map Linear Dynamics
> >
f(X)
X >
——-^>
G(s)
•
> Figure 1. A description of the general "MIMO Hammerstein model" structure as it appears ki Pearson and Ogunnaike (1997). The input vector X passes through a static map and produces the gain vector f(X) which can be non-linear and then passes through the linear dynamic map and produces the output vector y.
161 In the statistics literatxire for experimental design, with no a priori assumption for the model form, we have not found the treatment of dynamic systems (e.g., see Montgomery, 1984). This treatment of experimental design has been restricted to the paradigm of steady state systems. In real dynamic applications the common practice by statisticians is to make a process change as* specified by the design and allow the process to reach steady state before collecting the response data associated with this change. We feel that there are two basic reasons that statisticians have not extended SDOE to include the modeling of dynamic systems. The first one is the lack of training in continuous-time dynamic systems and the second one is the lack of a methodology for this extension. So on the one hand, you have engineers that treat dynamic systems but not interactive effects and thus, are able to "get by" without the need for SDOE. On the other hand, you have statisticians that treat interactive effects but ignore the modeling of process dynamics. In this paper we present a non-linear dynamic methodology that treats interactive effects as easily as steady state modeling andfiiUyutilizes SDOE without adding any new requirements for knowledge. This is done through the presentation of a continuous-time Hammerstein modeling approach that was originally developed by Rollins, et al. (1998) for single iaput, single output (SISO) processes but has recently been extended to multiple input, multiple output (MMO) processes by Rollins, et al. (2002). Thus, this methodology seeks to connect eflScient and sufScient experimental design with comprehensive and complete expressions for dynamic model development. This paper is written with the engineering community in mind. Rollins and Bascunana (2001) have written an article directed towards the statisticians that contains considerable detail on concepts of continuous-time dynamic model forms. In the next section, we describe the proposed method in detail. Following this section, we present a theoretical Hammerstein model from literature and demonstrate the ability of proposed method to fit this model. Next, this paper demonstrates the application of the proposed method on a on a real process. 2. THE APPROACH This section describes the attributes of the Rollins, et al. (1998) (also, Rollins, et al., 2002) continuous-time, Hammerstein approach (CTHA) in detail. Figure 2 is block diagram representing the structure of CTHA for a two input, two output system. In comparing Figure 1 with Figure 2 one sees that Figure 2 omits the block for the static map and shows the nonlinear gain fimctions entering directly into the linear dynamic block. Another difference is that the linear dynamic fimction is shown in the "s" (Laplace) domain m Figure 1 and in the "t" (time) domain in Figure 2. This was done to illustrate a critical achievement of CTHA. In the context of continuous-time, Hammerstein modeling, we have not seen an explicit time-domain expression for the outputs. That is, we have not seen a time domain expression for G(s) in Figure 1. However, CHTA is only expressed in the time domain. As illustrated by Figure 2, as in all Hammerstein approaches, the expressions for the outputs separate the ultimate response fimctions from the dynamic response fimctions. An example of a CTHA expression is given below for a two input system with first order dynamics, undergoing a step change only at time 0, which is at steady state before this change.
162 fi(Ax„ Axj)
f^CAxi, AX2)
Figure 2. The block diagram representation of CTHA for a two input, two output system. Inputs to the blocks are fimctions of process variables and the models in the ultimate responses (i.e., gains) are able to address iateraction and other terms. Note, ia contrast to Figure 1, the transfer functions are in the time domain. .Vi(0 =J'i(0) +/(Ax„Ax2)-g(/;T)-5(0 = 7i(0) + (Po + h^x
-^ P2^2 + Pj^iAXj + P4AX2')-1 1 - e"^] Sit)
(1)
where S(t) is the unit step function. Note that at t = 0, Eq. 1 gives yi(t) = yi(0) and at t = oo^ it gives yi(°°) = yi(0) +y(Axi, Axj). Thus, Eq. 1 has the correct initial behavior and the correct ultimate behavior ifX^Xj, AX2) is an accurate expression for the ultimate value of y for these input changes. In CTHA this function is determined for each output by fitting the ultimate response values against the input changes over the input space specified by the statistical experimental design using multiple linear regression. Hence, for accurate fits, one would need accurate ultimate response performance. Implementation of Eq. 1 for predictionfiromseveral input changes is not simply a matter of inverting the transfer function given by g(t; x) and using an input sequence fory(Axi, AX2) in the s domain. This is not possible because^Axj, Axj) cannot be written the s domain. In a moment, we present a novel algorithm for implementing models of the form given by Eq. 1. However, before this is presented, we describe our procedure for estimatingX^x; p) and g(t;T). The steps for obtaining the fitted model for CTHA are as follows: 1.
ii. m. IV.
Determine the statistical experimental design. Run the experimental design as a series of step tests, allowing steady state to occur after each change while collecting the data dynamically over time. Use the steady state data to determine the ultimate response function, y(Ax;P), for each output. Use the dynamic data to determine the dynamic response function, g(t;T), for each output.
163 After obtaining the fitted equations for y(Ax;P) and g(t;T), they are incorporated into an algorithm to predict output response for changes in inputs. The CTHA algorithm is a procedure that predicts output response from the fits fory(Ax;P) and g(t;T) in a scheme that depends only on the most recent change for each input. For an input change occurring at time tj, a generic representation of the CTHA prediction algorithm is given by Eq. 2 below: For t > t,:
y{t) = yit,) + [/{Ax(0;P) - Kh) + y{0)]^t-t,'^)S[t-t,)
(2)
where y(t) is the estimated output response at time t; y(0) is the measured value of the output at the initial time, 0; Ax(t) is a vector that contains the deviation values of the process variables from their initial values at time t; p is a vector that contains the estimates of the steady state response parameters determined from the current input conditions; f(Ax(t);p) is the fimction that computes the change in the ultimate response for the change Ax(t); x is a vector that contains the estimates of the dynamic parameters that could depend on Ax(t); g(t -1^; T) is the semi-empirical non-linear fimction that computes the dynamic portion of the response such that as t -* «>, the fimction - 1; and S(t - t j is the shifted unit step fimction. Note that at tj, y(t) = y(ti), and as t - oo, y{X) - y(0) +fl;Ax(t);p).Next, we demonstrate the ability of CTHA to accurately fit theoretical Hammerstein models. 3. AN APPLICATION OF CTHA TO A THEORETICAL HAMMERSTEIN PROCESS In their survey paper of structure identification methods for non-linear dynamic systems, Haber and Unbehauen (1990) presented the following Hammerstein model: v(t) = 2.0 + u{t) + 0.5 uitf
ioM).xO=v(0
^^^
dt with u(0) = 0. In this section we wiU demonstrate the ability of our proposed method (i.e., CTHA) to identify Eq. 3 in integrated form and to predict the response y(t) continuously over time quite effectively using Eq. 2. Note when placed in the context of Eq. 3, Eq. 2 terms are f(Ax;^)
=f(u;^)=
u(t) ^ 0,5u(tf
(4)
and
withy(O) = 2. One measure of the soxmdness of this approach v ^ be its ability to accurately obtain Eqs. 4 and 5. The other critical measure will be to incorporate them into Eq. 2 and obtain from Eq. 2 accurate predictions over time for changes in u(t). The first step in applying CHTA is to determine the experimental design. Since this is a single input, single output process, and the ultimate response is quadratic, the following four input changes were chosen as the experiment design: u = -4, -1, 1, and 4. These input changes were made and the fits shown in Figure 3 were obtained using first order models.
164 lO
-
u=4
14 -
——
12 10 -
>
"3
O
/
u = -4
6-
/
420-
""^
^^^^^'^'^
-—
1 ^ ^ '
1
r
10
20
• '•
..^^_^
1
1
1
30
40
50
True response foru = -4 - - Fitted response foru=-4 True response foru = -l Fitted response foru = -l "'^-' True response for u = 1 Fitted response for u = 1 True response for u == 4 — - Fitted response for u = 4
60
Time Figure 3. The response of y and the fits of the first order models for the four changes in u to obtain estimates of Eqs. 4 and 5. As shown, the fit to each step test is quite accurate. From each trial (i.e., change in u), the ultimate response value was obtained, and used to estimate Eq. 4. Figure 4 shows a plot of the ultimate response data versus u. Five points are shown which include the fi^ur design points and the center point (u = 0). This figure demonstrate the highly nonlinear behavior of the ultimate response over this input space. Fitting a quadratic model to data in Figure 5 using linear regression produced the following, excellent, approximation of Eq. 4, /(w;P) =0.998w(0 + 0.499w(0^ (6) The four values of x estimatedfiromeach of the first order fits in Figure 3 were averaged to obtain the estimate for Eq. 5. This value was 9.93 giving a close estimate of Eq. 5 as shown below: t
g(/;T) - \
- e
9.93
(7)
Testing CTHA for this process consisted of incorporating Eqs. 6 and 7 into Eq. (2) and examining its predictive behavior when making arbitrary changes in u over the fitted input space. The input change sequence for this test is shown in Figure 5. The process response and the fitted response of CTHA for this input sequence change is presented in Figure 6. As shown, CHTA fits aknost perfectly to the real process. Thus, the proposed approach appears to be very capable of accurately predicting non-linear dynamic process behavior having a Hammerstein nature.
165
-5
-3
-1
-2
0
1
Change i n U Figure 4. A plot of the ultimate change in y against u with the fitted line (Eq. 6) for the changes in the u.
5432-
D
'-
n 0-1 -2 -3 -
-4-5 -
-T
10
20
30
40
50
60
70
80
Time Figure 5. The input sequence for testing CTHA for the first order process.
90
100
166
100
Figure 6. The fit of CTHA for the input testing sequence of Figure 5. To further demonstrate the ability of CTHA to fit Hammerstein processes, a more complex second order process is used. This process is described below:
d^yif)
I
\dy{t)
,,.
dv{t)
(8)
v(0
with all initial conditions and derivatives equal to zero, where a^ = 2.0, a2 = 1.0, ag = 1.5, a^ = 0.5, a5 = 1.0, T^ = 15.0, Ti = 2.0, and Xj = 5.0. The experimental design for this case was a 3^ full factorial design to allow estimation of all main effects, quadratic effects and the two factor interaction. The specific design points are shown in Table 1 below. As an example of the fitting, the fit for Trial 1 is shown in Figure 7 below. As one can see, the fit to this more complicated process is quite good. Although not only this case is shown for space consideration, this is typical of the performance for the other cases. The fitted functions obtained for Eq. 2 are given below. /(AJC;P)
=/(i/;p) = 2.0wi(0 + 1.0^2(0 + L5u,it)u^{t) + 0,5u,{tf
^ + i = 1 + T^ ^x - h withx^ = 14.98, t, = 2.01 and x^ = 4.98.
g(f^)
+ hOu^(tf
(9) (10)
167 Tablel. The experimental design used by CTHA for the process given by Eq. 8.
Trial
Ul
U2
1
5.0
-5.0
2
-5.0
5.0
3
0.0
5.0
4
0.0
-5.0
5
5.0
0.0
6
-5.0
0.0
7
-5.0
-5.0
8
5.0
5.0
9
0.0
0.0
Figure 7. Typical fit of the response of y for the second order models for the design in Table 1 to obtain Eqs. 9 and 10.
168 Testing CTHA for the second order process consisted of incorporating Eqs. 9 and 10 into Eq. (2) and examining its predictive behavior when making arbitrary changes in Uj and U2 over the fitted input space. The input change sequence for this test is shown in Figure 8. The process response and the fitted response of CTHA for this sequence change is presented in Figure 9. As in the first order case, this case also fits the real process almost perfectly. Thus, this example fiuther illustrates the ability of the proposed approach to accurately model nonlinear dynamic processes meeting the form of Hammerstein processes. Next we illustrate CTHA on a real process with five outputs and four inputs. The abiKty of CTHA to exploit SDOE will be more appreciated in this example.
Ul
--U2
a 0 \
0
100
200
300
400
500
600
700
800
Time Figure 8. The input sequence for testing CTHA for the second order process.
900
1000
169 lOU n
—— True Process - - - CTHA
I
1
100-
>4
3 & 3
O
I
50-
r
0-50-
()
100
2CK)
300
400
r
(V_
1
500
Time
600
700
800
900
10
Figure 9, The fit of CTHA for the input testing sequence of Figure 8. 4. AN APPLICATION OF CTHA TO A REAL PROCESS CTHA will be illustrated using a common household-style dryer that has been retro-fitted with several sensors and recording instruments. For details of this process see Rollins, et al. (2002). The input variables chosen for this study were the power supplied to the heater (P), the inlet fan speed (N), the dry weight of the clothes in the dryer (w), and the initial moisture content of the clothes (m). Although this study consisted of five outputs (see Rollins, et al., 2002), we will present the results for only one here for space considerations, the temperature of the air exitiag the heater (TJ. The dynamic modeling goals of this study consisted of accurate output prediction over the complete input space, whQe maintaining the ability to fit non-linear behavior as well as significant two factor interactions (i.e., bilinear effects) in as few runs (i.e., step tests or trials) as possible. As stated previously, a PRS design would not meet this criteria in terms of estimability or eflSciency. However, the CTHA structure allows for thefiiUuse of SDOE, which meets the data collection, experimental design and modeling criteria. The experimental design that was selected, meeting the specified criteria, was a central composite design with replicated center points (see Montgomery, 1984). For specific details on the levels of the inputs and the design of this study see Rollins, et al. (2002). Each experiment consisted of starting the dryer with the input variables set to values for that run and then recording the outputs dynamically. Due to the batch nature of the dryer, the process does not reach steady state, but the output variables tend to level off. The g(t;T) model forms were selected by a visual inspection of the dynamic response of the output. The drying
170
process can be divided into two distinct phenomena; the constant rate drying and the falling rate drying. This study considered only the constant-rate drying period. The dynamic response of Ta for Rim 10 is shown in Figure 10. The other runs gave similar fitted performance.
200
Measured response - - - Fitted response
10
Time, min
Figure 10. The dynamic response of the air exiting the heater for Run 10. The fitted curve is obtained by using a second order model with a lead term and the parameters (x^, Xj, and X2) are estimated by non-linear regression. Using the steady state values for the 27 runs, the following fimction was obtained for the ultimate change fimction using multiple linear regression. = 0.04239AP - 0M203AN
/(AX(0;P)
+ 4.393Aw - 0.2374Am
(11)
Thus, non-linear or interacting effects, although statistically tested, did not indicate significance at the 0.1 test level. For each of the 27 runs, non-linear regression was used to fit the dynamic response behavior. By inspection, a second order continuous-time model form with a lead term was determined to fit all 27 cases the best. The three dynamic parameters (xj, X2, and x j for the final form were determined by averaging the 27 values for each one. The final equation for g(t; T) is given by Eq. 12 below.
g(nr) =
1 +
V
X, - X~ V 1 2;
/^'
+
a
2
x^ - X,
(12)
e '^ )
where Xj = 0.389, X2 = 3.62 and x^ = 3.04. After obtaining Eqs. 3 and 4, they were incorporated into the prediction algorithm (i.e., Eq. 2) and CTHA was evaluated using the test sequence shown in Figure 11.
169 lOU n
—— True Process - - - CTHA
I
1
100-
>4
3 & 3
O
I
50-
r
0-50-
()
100
2CK)
300
400
r
(V_
1
500
Time
600
700
800
900
10
Figure 9, The fit of CTHA for the input testing sequence of Figure 8. 4. AN APPLICATION OF CTHA TO A REAL PROCESS CTHA will be illustrated using a common household-style dryer that has been retro-fitted with several sensors and recording instruments. For details of this process see Rollins, et al. (2002). The input variables chosen for this study were the power supplied to the heater (P), the inlet fan speed (N), the dry weight of the clothes in the dryer (w), and the initial moisture content of the clothes (m). Although this study consisted of five outputs (see Rollins, et al., 2002), we will present the results for only one here for space considerations, the temperature of the air exitiag the heater (TJ. The dynamic modeling goals of this study consisted of accurate output prediction over the complete input space, whQe maintaining the ability to fit non-linear behavior as well as significant two factor interactions (i.e., bilinear effects) in as few runs (i.e., step tests or trials) as possible. As stated previously, a PRS design would not meet this criteria in terms of estimability or eflSciency. However, the CTHA structure allows for thefiiUuse of SDOE, which meets the data collection, experimental design and modeling criteria. The experimental design that was selected, meeting the specified criteria, was a central composite design with replicated center points (see Montgomery, 1984). For specific details on the levels of the inputs and the design of this study see Rollins, et al. (2002). Each experiment consisted of starting the dryer with the input variables set to values for that run and then recording the outputs dynamically. Due to the batch nature of the dryer, the process does not reach steady state, but the output variables tend to level off. The g(t;T) model forms were selected by a visual inspection of the dynamic response of the output. The drying
172 5. CLOSING REMARKS This work presented a continuous-time Hammerstein modeling approach for dynamic non-linear M M O predictive modeling that complements the use of the powerful jBeld of statistical design of experiments (SDOE). It appears that Hammerstein modeling has seen much activity in discrete-time situations but very limited application in continuous-time modeling. The most common experimental design used for these modeling situations has been a pseudo-random sequence (PRS). Discrete-time or NARMAX modeling typically omits fitting interaction (i.e., cross product) terms because this causes huge parametrization, regardless of whether a PRS design would provide the needed information for estimation of the parameters, which is not likely when cross product terms are significant. A critical reason that the proposed method is successfiil is its ability to formulate output prediction in explicit form with separate terms for the steady and dynamic portions. Another key to its success is the ability of its prediction algorithm to achieve high accuracy with serial input changes over time. Parametrization for this approach is usually quite small in comparison to discrete-time modeling. Our main motivation for selecting the dryer process for this study was to demonstrate the ability of the proposed approach to model a real system. However, this process, to our surprise and disappointment, did not exhibit strong non-linear ultimate response behavior. We demonstrated the ability of the proposed approach to model dynamic non-linear behavior in the theoretical Hammerstein process studies. 6. ACKNOWLEDGMENTS The author wishes to acknowledge the partial support for the research by the National Science Foundation under grant number CTS-9453534 and by the Frigidaire Corp., lEC (Iowa Energy Center), and CATD (Center for Advanced Technology Development). We are also gratefiil to the Program for Women in Science and Engineering at Iowa State University and Trisha Greiner for her assistance as well as the organizers of The Life of a Process Model: From Conception to Action Workshop for providing support to attend.
173 7. LITERATURE CITED Billings, S. A., 1980, "Identification of Nonlinear Systems - a Survey," lEEEProc, 127, pp. 272-285. Eskinat, E., Johnson, S. H. and Luyben, W. L., 1991, "Use of Hammerstein Models in Identification of Nonlinear Systems," ^ / C M J : , 37(2), pp. 255-268. Haber, R. and H. Unbehauen, 1990, "Structure Identification of Nonlinear Dynamic Systems A Survey on Input/Output Approaches, Automatica, 26(4), pp. 651-677. Montgomery, D. C , 1984, "Design and Analysis of Experiments, " John Wiley and Sons, New York. Pearson, R. K., and Ogunnaike, B. A., 1997, "Nonlinear Process Identification," Nonlinear Process Control Prentice-Hall PTR, Upper Saddle River, NJ, pp. 11-110. Rollins, D. K., P. Smith, and J. M. Liang, 1998, "Accurate Simplistic Predictive Modeling of Non-linear Dynamic Processes," ISA Transactions, Vol. 36, pp. 293-303. Rollins, D. K, N. Bhandari, A. M. Bassily, and G.M. Colver, 2002, "A Continuous-Time Hammerstein Approach With Application to a Real Dynamic Batch Process," submitted to Ind. Eng. Chem. Res. Su, H. T. and McAvoy, T. J., 1993, "Integration of Multilayer Perceptron Networks and Linear Dynamic Models: A Hammerstein Modeling Approach," Ind. Eng. Chem. Res., 32, pp. 1927-1936.
This Page Intentionally Left Blank
Dynamic Model Development: Methods, Theory and Applications S.P. Asprey and S. Macchietto (editors) © 2003 Elsevier Science B.V. All rights reserved
175
Process Design Under Uncertainty: Robustness Criteria and Value of Information F. P. Bemardo^ P. M. Saraiva^ and E. N. Pistikopoulos^ ^Department of Chemical Engineering, University of Coimbra, Polo II, Pinhal de Marrocos, 3030 Coimbra, Portugal ^Centre for Process Systems Engineering, Department of Chemical Engineering, Imperial College, London SW7 2BY, United Kingdom
In the last three decades, process design under uncertainty has evolved as an optimizationbased tool to systematically handle uncertainty at an early design stage thus avoiding overdesign, underdesign or other suboptimal decisions. In this chapter, v^e address this issue presenting a generic framework to guide the decision-maker on the design problem definition that systematises several considerations and assumptions, namely regarding constraints violation, uncertainty classification and modelling, operating policy and decision-making criteria under uncertainty. This generic framev^ork is then explored for handling process robustness and value of information features. A case study of a reactor and heat exchanger system illustrates the several presented formulations, namely optimal process design accounting for product quality and optimal R&D decisions in order to selectively reduce uncertainties. 1. INTRODUCTION Decision-making in the presence of uncertainty is a key issue in process systems design, since at this early stage decisions have to be made v^ith limited know^ledge, whether concerning the assumed process model (kinetic constants, transfer coefficients, etc.), external environment (product demand, raw material availability, etc.), among other sources of uncertainty. Before the development of systematic tools to handle such uncertainties, the traditional approach used to rely on a deterministic paradigm, where the optimal design values were determined based on nominal values of the uncertain parameters, thus considered to be perfectly known and to assume fixed values over the plant lifetime. The solution thus obtained was then corrected applying, for instance, empirical overdesign factors to the sizes of equipment units. Ranges for these factors can be found in the literature for different kinds of equipment (Rudd and Watson, 1968), but their general application is quite arguable, since they are only based
176 on accumulated experience and extrapolation to a new specific situation may lead to severe over or underdesign. In practice, corrections were made, and are still being made, mainly based on the decision-maker experience and intuition. In the last three decades, however, several systematic approaches have been proposed, based on optimization formulations and application of decision-making theories under uncertainty to process design problems (see Grossmann et a/., 1983 and Pistikopoulos, 1995 for a review). The exponential growth of computational resources and the development of efficient numerical tools have obviously stimulated research in this area (Grossmann and Halemane, 1982; Diwekar and Kalagnanam, 1997; Acevedo and Pistikopoulos, 1998; Bernardo, Pistikopoulos and Saraiva, 1999a, amongst others), and made it able to handle problems of practical interest. In this chapter, we present a generic optimization framework to process design under uncertainty, mainly focussing on process robustness issues and the value of information regarding uncertain parameters. The proposed formulations are able to identify optimal solutions and corresponding accurate overdesign factors relative to a fully deterministic approach, and furthermore take into account process variability and R&D investments in order to selectively reduce uncertainties. It is thus an attempt to formally integrate our previous published work, namely robustness criteria in process design under uncertainty (Bernardo, Pistikopoulos and Saraiva, 1999b, 2001) and the value of information in process design decisions (Bernardo, Saraiva and Pistikopoulos, 2000), giving a general perspective of design problem formulations and a package of alternative or complementary design criteria that may be considered by the decision-maker. Therefore, the remaining parts of this chapter are organised as follows: in section 2 we highlight some concepts and criteria for process design decision-making under uncertainty; then, section 3 provides a quick tour through process design under uncertainty developments observed in the last three decades, using for that purpose an illustrative case study that comprises a chemical reactor and a heat exchanger; next, section 4 introduces a generic framework for process design problem formulations under uncertainty, and explains how it can be explored for handling robustness and value of information issues; finally, section 5 establishes some concluding remarks and points towards some possible lines for future work.
2. DECISION-MAKING CRITERIA UNDER UNCERTAINTY In order to introduce some basic concepts and criteria for decision-making under uncertainty, let us start by briefly recalling a homely example taken from Rudd and Watson (1968), who took these issues into consideration for process design problems as early as in the sixties: An entrepreneur has contracted to paint a set of buildings and offered a one-year guarantee that the paint will not fade. If the paint fades, the entrepreneur must repaint the building at his own expense. The entrepreneur receives $500 for the contract and has available paints A, B and C at a cost of $200, $100 and $5, respectively. Paint A will never fade, paint B will fade after 250 days of sunshine, and paint C will fade after 50 days of direct sun. The cost of
177 labour is $200 regardless of the paint used. The contract did not say when the repainting job must be done, so the entrepreneur can use paint C to repaint any time within 50 days of the expiration date of the contract andfulfil the guarantee according to the letter of the law, if not the spirit. Which paint should the entrepreneur choose? The problem here is basically to decide on the paint in face of the uncertainty regarding next year's weather, and therefore a decision-making criterion is needed. Let us first consider the max-min profit criterion, which leads us to choose the paint maximizing the minimum profit that can possibly happen in face of the different weather scenarios. This is a pessimistic criterion, protecting the decision-maker against the worst case that can possibly happen. Three different scenarios should be discriminated: if 0 stands for the number of days of sunshine next year, we'll have scenario 6>^^^ for 0 < 6>< 50, 0^^^ for 50 < 6>< 250 and (9^^^ for 250 < 6>< 365. Table 1 shows the profit matrix, constructed computing the net profit that corresponds to each paint and for each 6 scenario. According to the max-min criterion, paint A is the best, resulting in a minimum profit of $100. Table 1 .Profit and regret matrixes ($). Profit matrix ^(1)
Paint A Paint B Paint C
100 200 295
Regret matrix
^(2)
^(3)
^(1)
100 200 90
100 -5 90
195 95 0
0(2)
100 0 110
^(3)
0 105 10
If the entrepreneur focus attention on the loss of opportunity associated with a decision, he might prefer to adopt the min-max regret criterion, with the regret being computed as the difference between the profit that might have been made in the absence of uncertainty and the profit made in the given uncertain environment. In this case, we construct a regret matrix (Table 1), where for instance the use of paint C under scenario ^^^^has an associated regret value of $110, since only a $90 profit is achieved, while if paint B were to be chosen a $200 profit could have been obtained. Looking at this regret matrix, we conclude that paint B is the one that we should be used. The above two criteria are both based on extreme uncertainty scenarios. If one has additional information about the probability associated with each one of the scenarios, i.e., in other words, if an uncertainty model can be constructed, then perhaps a decision based on average or expected profit is preferable. Suppose that the past weather records indicate probabilities of 0.52, 0.28 and 0.20, for scenarios 6 ^^\ 6 ^^^ and 6 ^\ respectively. Then, the maximum expected profit occurs for paint C: E{P) = 0.52x295 + 0.28x90 + 0.20x90 = $198. Note that this expected profit corresponds to the average profit to be realised over a large number of trials.
178 Therefore, this very simple example provides enough evidence to support that different best solutions can be found depending on the particular way uncertainties are handled. Ignoring that they exist, as is commonly done in many process design problems, is thus very likely to result in wrong or at least suboptimal solutions regarding namely equipment dimensions, operating conditions, forecasted economic performance, etc. Let us now analyse in more detail the concept of regret mentioned above. The regret may also be interpreted as the value ofperfect information (VPI) about future events, i.e., in this case, next year's weather. This VPI is computed as the difference between two extreme behavioural models of action mvdQx future uncertainty: the wait-and-see model, where the decision is made after uncertainty realization, and the here-and-now model, where the decision is takon prior to uncertainty resolution (lerapetritou et al., 1996). For instance, the here-and-now decision on paint C and supposing that scenario 0^^^ will take place, has an associated VPI of $110, which is the difference between the profit of the best wait-and-see decision under 0^^^ scenario (paint B, $200) and the profit of the here-and-now decision on paint C ($90). Fig. 1 shows the VPI associated with paint C across the several 0 scenarios. Profit ($) 300
V \^
250
^
200
\
wait-and-see decisions (each point corresponds to a different paint)
\
^i^
\
4N
\ \
150 100 50
\
VPI
\ , here-and-now • decision on paint C 0-50
50-250
\ \
250-365
Number of sunshine days, 6 Fig. 1. Here-and-now versus wait-and-see decisions (painting example). The expectation operator can also be applied to VPI: the decision on Paint C, for instance, has an associated expected value ofperfect information of EPVI = 0.52x0 + 0.28x110 + 0.20x10 = $32.8. A minimum EVPI criterion can then be constructed, leading to a decision with a minimum expected regret, which is precisely paint C in this case. The above concepts and criteria can also be used in the context of process design problems under uncertainty, which may be represented mathematically as follows:
179 optimize
^[f(d,z,x,0)]
d,z,x
(1)
s.t. h(d,z,x,O) = 0 g(d,z,x,e)<0 deD,zeZ,xeX,0ee,
where d, z and x are the vectors of design, control and state variables, respectively, 6 represents the vector of uncertain parameters over the domain 0 , h and g are vectors of process model equality and inequality constraints. The decision-making criterion is here to optimize O, where O is a function of the scalar/that defines a process performance metric (generally an economic indicator). For instance, i f / i s a profit function and an expected value criterion is adopted, O is the average profit obtained over the uncertainty space 0, or the feasible part of 0 (the topic of feasibility in face of 0 will be addressed in section 3.3). In the case of ^uncertainty described by a joint probability density function (PDF)7(0, the expectancy operator applied to a general scalar function/is provided by the following w-dimensional integral, where n is the number of uncertain parameters: (2)
EM)-\f{0)mde. 0
Although we will not cover here the associated numerical challenges, for complex problems it is well known that the computation of reliable estimates for a multidimensional integral such as the above can be rather time consuming, although different specific techniques are available for that purpose (for more on this topic, see Bernardo, Pistikopoulos and Saraiva, 1999a). Problem (1) is commonly formulated under the optimistic assumption that during process operation control variables z are optimally adjusted to uncertain parameters realisations, according to the observed state of the system (Watanabe et al., 1973; Grossmann and Sargent, 1978; Halemane and Grossmann, 1983; Pistikopoulos and lerapetritou, 1995). Such an operating policy will be here designated as perfect control operation. Given this assumption, problem (1) is formulated in two stages: the first (design) where design variables are selected (here-and-now decisions) and the second (operating) where a perfect control strategy is assumed (control variables as wait-and-see decisions): Design stage. optimizeO[/\d,6)],
deD^Oee
d
Operating stage: fXd,e) = m^ixf(d,z,x,e) z,x
s.t. h(d,z,x,6) = 0 g(d,z,x,e)<0 zeZ,xeX.
(3)
180 In the context of decision theory, the problem of process design under uncertainty can be interpreted as a two-person game between nature and the designer, where the uncertain parameters 6 are the nature's strategy, while the decisions d and z form the strategy of the designer (Watanabe et al., 1973). According to this interpretation, perfect control operation supposes that when the process is brought into the stage of operation, the nature's strategy 0 becomes clear and, consequently, controls z are adjusted so as to maximize/in the presence of each possible ^realisation. According to formulation (3), Table 2 summarises the possible decision criteria mentioned above in reference to the painting example, where the min-max regret criterion needs a formal definition ofVFl(d,0). In the case of problem (3), when perfect control operation is assumed, the value of perfect information will be
(4)
y?i(d,e) = f\0)-fxd,0i
where / " ( ^ ) is the process performance assuming design variables as wait-and-see decisions: / "((9) = m a x / ( J , z,x,6>) d,z,x
/^\
s.t. h(d,z,x,O) = 0
g(d,z,x,O)<0 d eD,zeZ,xeX and f\d,0)
as defined in (3) (design variables as here-and-now decisions).
Table 2. Decision criteria in process design under uncertainty problems. max-min performance max min / \d,0) d
e
min-max VPI
minmaxfVPI(J,(9)l
max expected performance
maxEQ {/\d,0)}
min expected VPI
min^0{VPI(J,<9)}
d
e ^
d
d
The choice of the most suitable metrics depends on the problem in hands and the decisionmaker own judgement. From the strictly mathematical and decision theory point of view, however, it is usual to assume that good criteria should verify a number of common sense properties, such as transitivity and strong domination. As pointed out by Rudd and Watson (1968), and among the first three criteria in Table 2, only the expected performance criterion meets all these tests. In the literature of process design under uncertainty, this is precisely the most widely used criterion, although exceptions may be cited: Nishida et al. (1974) proposed
181 a min-max cost criterion for the optimal synthesis of process systems; Watanabe et al. (1973) suggested an intermediate strategy between the min-max cost (pessimistic) and the minimum expected cost (optimistic) criteria; lerapetritou and Pistikopoulos (1994) and lerapetritou et al, (1996) presented formulations of operational planning under uncertainty where a restriction of maximum allowed regret is incorporated. In Watanabe et al. (1973) a study of different decision strategies is presented from the viewpoint of decision theory: the first three criteria in Table 1 and also other more sophisticated criteria are applied to a simple case study, illustrating the different corresponding solutions thus obtained. 3. PROCESS DESIGN UNDER UNCERTANTY: A QUICK TOUR Taking into account the previous considerations and process design decision-making criteria identified, we will now enumerate several possible approaches for problem formulation and solving, through a simple case study that will be described next. 3.1 An illustrative case study Fig. 2 presents a flowsheet consisting of a reactor and a heat exchanger (RHE), where a first order exothermic reaction A -> B takes place (Halemane and Grossmann, 1983; ChaconMondragon and Himmelblau, 1996). Table 3 describes the system mathematical model, including mass and heat balances, process constraints and quality specifications, while parameter nominal values are shown in Table 4. Process performance is quantified through the total plant annual cost, including investment and operating costs. The model variables can be classified as follows: design variables d= {V^}, control variables z = {FuF^^} and state variables x= {XA^TI, Ti, T^i}.
R CA\
F,r,
CAO
T\
To L ^
^1 ^1
J
^T^
A )HE "--^
'A
t>v2
Fig. 2. Reactor (R) and heat exchanger (HE) system.
182 Table 3. Reactor and heat exchanger system mathematical modef. Reactor material balance ^ ^Co(l-^.)F = 0, x ^ = % - ^ ^ 0 ^ ^ - • kj^ exp RT, •^AQ Reactor heat balance F,Cp{T, -T,)-F,Cp{T, -T2) + (-AHj,)FoX^ =0 Heat exchanger design equation (T,-T^2)"(T2-T^,) F,cJT,-T2) = AUAT,^, ATi^=In ^1 ~ ^w2 ^2 ~ ^wl
Heat exchanger energy balance Temperature bounds (K) Heat exchanger operation constraints
F\Cp(Ji -T2) = F^Cp^(T^2 -^wi) 3110,
T,2-TM^0
T,-T^2^nA, x^ > 0.9
r^-r^i>ii.i
Quality constraint Cost function ($/year) C - 691.2F^-^ + 873.6^^-^ +1J6F^ + l.Q56F^ Profit function ($/year) P = 15000x.-C ^ Variables: F, reactor volume (m^); A, heat transfer area for the heat exchanger (m^); Fi, reactant flowrate in the heat exchanger (kmol/h); F^, cooling water flowrate (kg/s); XA, conversion of A in the reactor; Tx, reactor temperature (K); 7^2, reactant temperature after cooling (K); T^2, cooling water outlet temperature (K).
The design problem, considering the model parameters to be exactly known and equal to the nominal values shown in Table 4, is to determine the best values for V and A and also the optimal nominal operating point {FuF^} (and corresponding state), so as to minimize the annual plant cost C. Attending to the process model, this design problem is then a non-linear non-convex optimization problem whose solution, using GAMS/MIN0S5, results in an overall cost of 12 230 $/year, that corresponds to the optimal deterministic design {V,A} = {4.429 m^,5.345 m^}. As expected, the optimizer moves the reactor temperature Ti to its upper bound (389 K), in order to increase reaction rates. This fact, by itself, reveals some limitations of such a deterministic paradigm, since for the system operating around the solution thus obtained with an active constraint the reactor temperature upper limit is likely to be violated: for instance, if the operational heat transfer coefficient becomes slightly smaller than the assumed value or the feed temperature increases momentarily. The traditional procedure adopted to prevent situations like this one is to apply empirical overdesign factors, that may however be inaccurate, therefore leading to unfeasible or too conservative design solutions.
183 Table 4. Parameter values (RHE system). Qo Concentration of A in the feed stream Fo Feed flowrate (pure A) To Feed temperature Twi Cooling water inlet temperature kR Arrhenius rate constant U Overall heat transfer coefficient E/R Ratio of activation energy to the perfect gas constant AHR Molar heat of reaction Cp Reactant heat capacity Cpvt; Cooling water heat capacity
32.04 kmol/m^ 45.36 kmol/h 333 K 293 K 12 h~^ 1635 kJ/(m^.h.K) 555.6 K -23 260 kJ/kmol 167.4 kJ/(kmol.K) 4.184 kJ/(kg.K)
For this reason, in the last three decades several developments try to address in an explicit and systematic way, based upon optimization formulations, the topic of uncertainty in process design. In this section we highlight some of these efforts, without being extensive and giving special emphasis to the assumptions and objectives underlying each one of the formulations, and their application to our RHE case study, rather than to the respective mathematical details. All the different optimization problem formulations that we will describe here and in the following sections have been solved using GAMS together with the solvers MIN0S5 or CONOPT (Brooke e^ a/., 1992). 3.2 Basic assumptions in the formulation of design problems under uncertainty The formulation of a design problem under uncertainty like (1) needs to address the following items: (i) hard/soft constraints, (ii) uncertainty classification and modelling, (iii) assumed operating policy in face of uncertainty and (iv) decision criteria. Decision criteria in face of uncertainty were already discussed in section 2, but it should be noticed that the generic function 0 ( / ) may cover other design objectives, besides strict process economics, such as flexibility, robustness, quality, safety, environmental concerns and value of information issues. We will consider some them in forthcoming sections, and for now focus our attention around points (i), (ii) and (iii) above. 3.2.1 Hard/soft constraints One of the key concepts in process design under uncertainty, as we'll see in section 3.3, is process flexibility, which is basically the probability of feasible operation in face of uncertainty. Complete feasibility over the entire 0 space is a usual assumption (for instance, Halemane and Grossmann, 1983) that can be, however, too conservative, especially concerning those constraints whose verification is not a strict demand and are only violated for a barely probable ^scenario. Furthermore, at a design stage there may be some uncertainty associated with the true upper/lower limits for a number of constraints. Finally, the 0 ( / ) gains
184 deriving from a solution with a reduced violation of a constraint under unlikely ^realisations may lead the decision-maker to prefer such a solution, rather than forcing strict compliance with all of the inequality constraints. Thus, one needs to make a clear distinction between hard constraints, i.e., those which must always be satisfied, and soft constraints, that may be violated for some realisations of the uncertain parameters. may be considered as a hard constraint if one knows for sure that above this value a reactant decomposition or phase transition occurs, safety issues arise, etc. However, especially at an early decision stage, this limit may be by itself uncertain and if a considerable benefit may result from its violation for given 0 realisations, then perhaps a soft constraint provides a better problem representation. Quality constraints, such as XA > 0.9, should usually be treated as soft, with the performance function / penalised with a quality loss term (Bernardo, Pistikopoulos and Saraiva, 2001), eventhough it is still common practice to adopt strict product specification limits as hard constraints to be verified. 3.2.2 Uncertainty classification and modelling Based on the sources of uncertainty, Pistikopoulos (1995) proposes an uncertainty classification with four categories (Table 5a), where for each category an example referring to our case study is also provided, together with information sources that may be used for reducing the corresponding present levels of uncertainty. A second possible classification is based on the uncertainty nature and the models adopted to describe it (Table 5b). The so-called "deterministic" uncertain parameters, here designated by categorical parameters, are well described by a set of TV^ discrete scenarios 6^ (or periods, in the case of uncertainty along time), with a given probability of occurrence (for instance, Grossmann and Sargent, 1978; Halemane and Grossmann, 1983). The seasonal variation of product B demand, treatment of different raw materials A or operation at different levels of capacity Fo are specific instances for this kind of uncertainties. The so-called stochastic uncertainties, on the other hand, have a continuous random variability, described by a joint probability density function (PDF) (Pistikopoulos and lerapetritou, 1995; Bernardo and Saraiva, 1998). This model seems to be more adequate to describe, for instance, modelinherent uncertainties or the variability of an operating variable in a steady-state process. The choice of an adequate PDF obviously depends on the available information, with an uniform distribution representing a maximum degree of ignorance and any other distribution function assuming greater knowledge about the uncertain parameters (Rudd and Watson, 1968). From the modelling point of view, the distinction between categorical and stochastic uncertain paranieters should not be taken in a severe way, since, for instance, intrinsically continuous stochastic parameters may be approximated by a scenario-based model and a given discrete PDF may also be chosen to describe a seasonal periodic fluctuation. A third approach for handling uncertainty, and probably the most ambitious one, is simply not to model it, but rather to solve problem (1) parametrically, in the space of uncertain parameters (Acevedo and Pistikopoulos, 1996, 1997; Pertsinidis et al., 1998). The resulting
185 solution is then itself a function of the uncertain parameters reahsations, providing a full map of optimal decisions over 0 . Another relevant distinction is between hard and soft uncertainties (Table 5c). Like hard and soft constraints, this classification refers to process flexibility: hard uncertain parameters are those for which feasible operation must be ensured along the entire domain 0 , while for soft parameters, a design decision that guarantees feasibility only along a part of the 0 domain is allowed. This classification is closely related with two distinct design objectives in face of process flexibility (see section 3.4). Hard parameters are usually also categorical while a continuous stochastic model commonly describes soft parameters. Table 5. Uncertainty classification. a. Based on the source of uncertainty Examples (RHE case study) Source of information Category Kinetic constants, heat transfer Experimental and pilot plant Model-inherent uncertainty coefficients (^i?, t/) data Flowrates and temperature (On-line) measurements, Process-inherent uncertainty variations (Fi, Ti) equipment specifications Raw-material (A) availability, Historical data, market Extemal uncertainty equipment cost coefficients indicators Equipment availability, Supplier's specifications, Discrete uncertainty seasonal product (B) demand operational and marketing data b. Based on uncertainty nature/uncertainty model Examples (RHE case study) Category Seasonal variation of product Categorical parameters (B) demand Continuous stochastic parameters
Fifluctuationsabout a steadystate nominal point
c. Based on feasibility in face of Category Hard parameters (usually also categorical) Soft parameters (usually also continuous)
0 Examples (RHE case study) Seasonal variation of product B demand Model-inherent uncertainties (kR,U)
d. Based on uncertainty eventual reduction Category Examples (RHE system) "Reducible" parameters Model-inherent uncertainties "Non-reducible" parameters
Product B demand
Uncertainty model Variability described by a set of scenarios 0 = {|9:6>^<6>'<6>^,/ = 1,...,7V| Continuous random variability 0 = {6>: 6> e j(0)}
Description Complete feasibility over 0 is required Feasibility is only required over R e 0 Description Present uncertainty level can be further reduced Additional uncertainty reduction is beyond our control
186 Regarding information about uncertain parameters, a fourth uncertainty classification is proposed by Bernardo, Saraiva and Pistikopoulos (2000), who distinguish between parameters whose present knowledge can be improved through further experimentation and parameters whose uncertainty reduction at the present time is believed to be beyond our control (Table 5d). Uncertain process model parameters, such as a kinetic parameter fe or heat transfer coefficient U, fall in the first category, since further information about them can be obtained, although at a certain cost associated with laboratory or pilot plant experiments. On the other hand, parameters such as product B demand may belong to the second category, assuming that their variability is due to market fluctuations that cannot be further reduced or forecasted with more accuracy than at present. 3.2.3 Operating policy in face of uncertainty The selection of an operating policy in the presence of uncertainty is another issue to be considered when addressing problem (1). Perfect control operation, already formulated in section 2, corresponds to the most optimistic assumption. On the other hand, the most conservative policy is to assume fixed setpoints for the control variables, regardless of operation information that will become available. Under this perspective, here designated by rigid control operation, both design and control variables are treated as here-and-now decisions that remain constant during process operation. The work by Diwekar and Rubin (1994), Bernardo and Saraiva (1998) can be included in this category. While the rigid control policy is too conservative, the perfect control assumption is rather optimistic. A more realistic policy should fall somewhere between these two extreme approaches, selecting an operating policy that makes use of plant data through available supervisory control systems. The work by Bathia and Biegler (1997) points in this direction, by considering that available information about uncertainty is subject to an assumed feedback control law relating state and control variables. 3.3 Flexibility Analysis One of the first questions that arises when trying to solve the general formulation (1) is process feasibility in face of ^.VrocQ^s flexibility \^ thus the ability that a process does have to operate under feasible conditions in face of the considered uncertainty, and for di fixed ydX\xQ of the design variables d. Two distinct problems may then be formulated in a flexibility analysis: (i) flexibility test, where one determines whether the process is feasible or not in face of 0 ; (ii) flexibility index, where the goal is to determine the extent of process flexibility, according to a given measure. In both cases, perfect control operation is usually assumed. Referring to our case study, let us consider the design {F,^} = {5.3 m^5.5 m^}, parameters fe and t/to be uncertain and described by a range of possible values: kR = 12(1 ± 0.2) h~^ (20% variation around nominal value) and U = 1635(1 ± 0.3) kJ/(m^.h.K) (30% variation around
187 nominal value). A feasibility test for a given design J and parameters realisation 6 can be formulated as follows: ^ ( J , ^ ) = minw, s.t. h(d,z,x,0) =
OAg(d,z,x,0)
(6)
with the projection of our feasible region in the ^space being represented by the condition y4^d,0) < 0 (see the qualitative illustration in Fig. 3). Assuming that this region is one-dimensional convex (let's call C this convexity condition), the flexibility test is reduced to feasibility tests conducted over all of the 0 vertices (Swaney and Grossmann, 1985a). Solving (6) for the four vertices of our 0 rectangle, one verifies that the process is unfeasible in face of the uncertainty level considered, since the maximum value of ^ is positive, occurring for both vertices (14.4,1144.5) and (9.6,1144.5), where larger constraint violations occur. It should be noticed however that it is quite difficult, if not impossible, to verify if condition C holds, since that would require to explicit x variables in the h equations. C/(kJ/(mlh.K) Feasible Region, y/(d,0) < 0 2125.5
1635
1144.5
9.6
12
14.4
>h(h-')
Fig. 3. Illustration of flexibility index F and stochastic flexibility (SF). Several flexibility indices have been proposed, two of them being also illustrated in Fig. 3: the index F of Swaney and Grossmann (1985a) and stochastic flexibility (SF) (Pistikopoulos and Mazzuchi, 1990; Straub and Grossmann, 1990). Index F corresponds to inscribe within our feasible region the largest possible rectangle (hyperrectangle in the case of n uncertain
188 parameters), whilG SF is a less conservative measure, that corresponds qualitatively to the fraction of 0 space that lies within the feasible region (striped area). Under the convexity condition C, index F can be calculated as the minimum of the allowed deviations S along each one of the directions defined between the nominal point 0^ and 0 vertices. For each direction. Sis computed by a problem of the type: S = maxu, s.t. h(d,z,x,d) = OAg{d,z,x,0)
= 0^-huA0,u>O,
u,z,x
(7)
where A^ is the expected deviation along that direction. Considering our assumed deviations (positive and negative) for parameters kR and U, the minimum of J occurs along the direction of the vertex (14.4,1144.5) and equals F = 0.5356. That is, the process is feasible within the rectangle defined by kR = 12(1 ± 0.2F) = 12(1 ± 0.11) h"^ ^ = 1635(1 ± 0.3F) = 1635(1 ± 0.16) kJ/(mlh.K), qualitatively represented in h'^ and U= Fig. 3 by the shaded area. When the number n of uncertain parameters increases, the above formulations, based on vertex enumeration, become computationally intensive. Swaney and Grossmann (1985b) propose two algorithms that avoid this explicit enumeration (an heuristic for direct vertex search and an implicit enumeration algorithm), maintaining however the limitative hypothesis of condition C. More sophisticated formulations, that are able to identify non-vertex solutions, have also been proposed (Grossmann and Floudas, 1987; Ostrovsky et al., 1994, 1999) Stochastic flexibility should only be defined in the case where a stochastic uncertainty model is available, that is, when the uncertain parameters are described by a joint PDFX^- SF is then the w-dimensional integral ofj(0) over the region i?(J) (striped area in Fig. 3), which represents the portion of 0 lying within the feasible region, i.e., the probability for a given design J of having feasible operation: (8)
SF(d)=l^^^j(0)d6, R(d) = {0ee\3(z,x):h(d,z,x,0)
= OAg{d,z,x,0)
(9)
SF is thus dependent upon the values taken by variables d, that correspond to a particular process design solution, and the major difficulty in evaluating SF, besides the numerical problems that arise for high values of «, is that the integration region R(d) is only implicitly known. Straub and Grossmann (1990) propose an integration technique (here designated by collocation technique) that overcomes this difficulty, and is based on a product Gauss formula obtained applying a Gaussian quadrature to each dimension of 0 , with points placed within R(d).
189 Let us now consider kR and f/described by independent normal PDFs, truncated to 3.09 sigma between the limits shown in Fig. 3. This means that, taking for instance kR, the probability of 9.6
s.t. h(d,z,x,0) = OAg(d,z,x,0)<0
ku = max ku
(10)
k^,U,z,x
s.t. h(d,z,x,d) =
0Ag{d,z,x,6)<0,
resulting in the feasible interval [10.03,14.4]. One then collocates 5 quadrature points within this interval and, for each one of them, solves problems similar to (10), but relative to parameter U, evaluating the limits of R(d) along the U direction. Using also 5 quadrature points along this direction, a grid with a total of 5x5 = 25 points is obtained, and the integration formula based on this grid gives us the estimate SF = 0.9610, which means that the probability of feasible operation under the uncertainty of kR and Uis about 0.96. 3.4 Optimal Design Formulations Although flexibility analysis does not strictly establish an optimal design approach, since it is performed for a fixed value of the design variables d, it is, as we'll see, an important tool to formulate optimal process design problems under uncertainty. The concept of flexibility gives rise to two distinct design objectives (Grossmann et ai, 1983; Pistikopoulos, 1995): (i) design for a fixed degree of absolute flexibility, ensuring feasible operation for any possible realisation of the uncertain parameters; (ii) design for an optimal level of flexibility, exploring the trade-offs between flexibility and economics. Regarding Fig. 3, in objective (i) the triangle (feasible region) must enclose the rectangle 0, while in objective (ii) part of the rectangle may be outside the triangle, specially if the corresponding solution is interesting enough from other perspectives (such as expected profit). Objective (i) is associated with hard uncertain parameters, for which complete feasibility must be ensured along 0. In this case, the uncertainty space 0 is usually approximated by a set of discrete scenarios with a given probability, and, as a result, the original problem (1) is transformed into a multiperiod optimization problem (Grossmann and Sargent, 1978; Halemane and Grossmann, 1983). The main difficulty is then to select a finite number of scenarios ^ ' so as to ensure feasible operation over the entire 0 continuous space. This guarantee exists if 0 is an hyperrectangle, the convexity condition C holds and all the vertices of 0 are included in the set of points considered. Let us now revisit our case study with kR and L'^ described by normal distributions truncated to the rectangle 0 of Fig. 3. Although this corresponds to a continuous stochastic model, it
190 can be incorporated within objective (i), where feasible operation is ensured over the entire 0 rectangle. Using the Gaussian formula mentioned above, now with the 25 points having a fixed location in 0 , and adding to this set the 4 vertices of 0 , so as to ensure complete feasibility, and considering an expected profit criterion, one obtains the following optimization problem: 29
maxY V>(J,z\x^6>0, s.t. h(d,z\x\0') = OAg{d,z\x\0')
d,z',x' .^j
(11)
Note that also here is subjacent the assumption of perfect control operation, since z and x variables are indexed over i. The weights w^ corresponding to the integration points are calculated according to the quadrature and respective j(0) values, while for the 4 vertices w' = 0. This problem formulation results in the final optimal solution that corresponds to column A in Table 7 (section 4.1). Objective (ii) is associated with the so-called soft uncertain parameters, for which entire feasibility over 0 is not a strict demand, and are usually described by continuous stochastic models. Two distinct formulations may be followed: maximize stochastic flexibility subject to a cost upper limit (Straub and Grossmann, 1993) and maximize a profit integral over the feasible region R(d) c 0 (Pistikopoulos and lerapetritou, 1995). The basic idea behind this second formulation is to explore the trade-off between flexibility and profitability, considering design solutions whose feasible region does not cover the entire 0 space but are associated with large average profit scores. Mathematically, this can be formulated as maximizing the integral of the profit function over the region i?((i) c 0 , which is equivalent to maximizing the product between profit expected value over R(d) and the correspondent stochastic flexibility: m^{
P\d,e)Ji0)d0 = max{Ej,^,^[PXd,d)lSF(d)}
(12)
The integrand in (12), according to the assumption of perfect control, is itself an optimization problem that corresponds to the operating stage (just like in problem (3)): P\d,e) = rmxP{d,z,x,e\ s.t. h(d,z,x,e) = 0Ag(d,z,x,6)<0. z,x
(13)
Since R{d) is only implicitly known, the two-stage design problem - design stage (12) plus operating stage (13) - has to be solved by a decomposition strategy. Making use of the Generalised Benders Decomposition (GBD) and the collocation technique mentioned above, the original problem is converted into a sequence of smaller problems. For a fixed value ofd, feasibility subproblems like (10) are solved and integration points collocated within R(d). For each of these points, problem (13) is then solved and the respective profit integral estimated, which is a lower bound for problem (12) solution. An upper bound and a new d estimate can be obtained solving a suitable master problem constructed based upon GBD principles. Repeating this procedure, this upper bound will converge to the greatest lower bound, and the optimal design solution of (12) is then obtained.
191 The application of this strategy to our case study, considering the profit function and the stochastic model for RR and t/mentioned above, results in solution B of Table 7 (section 4.1).
4. A GENERIC FRAMEWORK FOR PROCESS DESIGN UNDER UNCERTAINTY Given the set of decision criteria and problem formulations presented in the previous sections, we will now revisit the generic formulation (1) for a process design problem under uncertainty: optimizeQ)\^f{d,z,x,0)\ d,z,x
s.t. h(d,z,x,6) = 0
(1)
g{d,z,x,e)<0 dGD,zeZ,xeX,Oee, The complete problem definition and solution comprises, given an available process model, the following steps: Step 1. Distinguish hard from soft constraints. Step 2. Identify and classify the significant uncertainties that are present. Step 3. Define an assumed operating policy in face of these uncertainties. Step 4. Establish adequate decision criteria Step 5. Formulate the specific optimization problem according to the previous steps. Step 6. Characterise the obtained formulation in terms of optimization and numerical details. This generic framework for process design problem formulation will now be illustrated for the situations already introduced in section 3.4, while in the following sections we will see how it can cover situations where robustness (4.1) and value of information (4.2) issues are to be also taken into account. Formulation A: Complete feasibility (Grossmann and Sargent, 1978; Halemane and Grossmann, 1983) Step 1. All constraints are hard. Step 2. Model-inherent, continuous stochastic, hard and non-reducible uncertainties {UR and U). The assumption of hard parameters guarantees an optimal design with *SF= 1. Step 3. Perfect control operation. Step 4. Maximize expected performance over 0 space. Step 5. Problem formulation
192 max£'@{/'(J,^)} "" /'(d,0) = max f(d,z,x,9),
(14) s.t. h(d,Z,X,0)
Z,X
= OAg{d,z,x,6») < 0, V(9e0
Step 6. Problem (14) is a two-stage optimization problem, but using an integration formula with Nf points along 0 it can be formulated as a single level multiperiod problem like in (11): N
msixYw'f(d,z\x\e%
d,z',x' .^1
s.t. h(d,z\x\e')
= OAg(d,z\x\0')
/ = !,...,7V
(15)
Complete feasibility is ensured (subject to the convexity condition C) adding 0 vertices to the set of integration points, leading to a total number A^ equal to A^i + 2", where in the objective function a zero weight w is assigned to 0 vertices. In the case of our RHE example, no numerical difficulties were encountered since two uncertain parameters are considered, thus resulting in only 25 quadrature points. For higher values of n (number of uncertain parameters) more efficient integration tools should be used, such as sampling techniques (Diwekar and Kalagnanam, 1997) or specialised cubatures (Bernardo, Pistikopoulos and Saraiva, 1999a). Another difficulty that arises when n increases is the dimension of the associated optimization problem. That being the case, the use of a decomposition strategy may then be considered (Grossmann and Halemane, 1982). Recent advances in directly solving large multiperiod problems like (11) have been reported by van den Heever and Grossmann (1999). Formulation B: Explicit feasible region evaluation (Pistikopoulos and lerapetritou, 1995) Step 1. All constraints are hard. Step 1. Model-inherent, continuous stochastic, soft and non-reducible uncertainties {kR and U). Step 2. Perfect control operation. Step 4. Maximize the integral of performance function / over the feasible region R{d) c 0 (trade-off between profitability and flexibility is explored). Step 5. maxj d
/'(^,^)7(^)^^
jR(dr
fXd,e)
= msixf(d,z,x,dls.t.
h{d,z,x,d) = OAg{d,z,x,0)
(16)
z,x
R(d) = {0ee\3(z,x):h(d,z,x,O)
=
OAg(d,z,x,O)
Step 6. Problem (16) is a two-stage optimization problem, with R(d) only implicitly known, decomposable in a sequence of smaller problems using Generalised Bender Decomposition together with a collocation technique. No global solution is guaranteed, since the limits of
193 R{d) are a function of d with unknown convexity properties. In our case study, no significant numerical difficulties were encountered since only two uncertain parameters are considered, thus resulting in only 25 quadrature points. For higher values of n integration using a more efficient technique, besides product formulae, has to be performed over 0 (with points outside R(d) being rejected) since there is no method (as far as we know) to collocate integration points within R{d). The above formulation B deserves some remarks, namely regarding the decision criterion adopted. The profit integral over R{d), as stated before, is the product between profit expected value over R{d) and the correspondent stochastic flexibility. In this way, formulation B does explore the trade-off between profitability and flexibility, although a simple product may not be the more suitable way of doing so. Furthermore, for the design solution thus obtained, there is no guarantee about the associated flexibility level reached or control over which constraints and to what extent are being violated over the portion of 0 that lies outside R{d). A possible strategy to cover these issues is perhaps to replace a soft uncertainty approach by a soft constraint approach. Indeed, formulation B can be seen as a relaxation of formulation A, assuming soft uncertainty, and thus may lead to design solutions that do not ensure complete feasibility over 0, that is with SF < 1. If we instead relax formulation A assuming that some of the constraints are soft, we are then able to supervise their violation through a penalty term in the objective function or explicitly limiting the probability and/or expected extent of violation, while simultaneously optimizing the expected process performance over 0. A formulation of this type (formulation C) is presented in the following section, where quality constraints are considered soft and a continuous loss is associated with their violation. 4.1 Process robustness criteria The formulations presented so far do not take explicitly into account process performance variability. For instance, perfect control operation, assumed in (16), may lead to excessive dispersions for relevant quality variables. Some criteria that turn decisions sensitive to process robustness may thus be quite helpful. Here, we define robustness as the ability that a process does have to operate under changing conditions with a relatively constant performance, that is, process insensitivity to uncertainty realisations. Let us designate hyy a set of quality-related process variables (usually a simple function of state and control variables), with desired values y*. Regarding our case study, conversion of A in the reactor can be considered to be a quality variable with %^* = 0.9. One way to directly control process variability would be to include in the problem formulation a restriction of maximum allowed variance for some or all of the y variables. Although this simple criterion will also be considered, there are more meaningful forms for penalising variability, namely according to Taguchi's perspective of continuous quality loss (Phadke, 1989). The j^ deviation from ;;* is thus penalised through an economic quality loss, Z, that may also be designated as quality cost Q , usually given by a quadratic function of the type:
194
C=kiy-y*Y,
(17)
where A: is a penalty constant, also known as quality loss coefficient. It can be easily demonstrated (Phadke, 1989) that the expected value of the loss function (16) is: EQiC^) =
k[a^+(M-y*f],
(18)
where ju and a are the mean and standard deviations of quaUty variable y. Equation (18) clearly shows the two components taken here into account: the loss associated with variability (ka ^) and the loss resulting from deviation of the mean value regarding our desired target [k(ju-y*f]. Fig. 4 illustrates two different perspectives in view of the constraint y^
A
Fig. 4. Quality cost models according to a) Taguchi's perspective and b) hard constraint perspective.
195 Taguchi's perspective of continuous quality loss can be formulated by simply considering quality constraints as soft and thus relaxing them. Process robustness may be guaranteed incorporating in the problem formulation two different types of elements: (i) penalty term in the objective fimction, such as a Taguchi loss function like (17); (ii) explicit restriction over process robustness metrics, such as an upper bound hard constraint over the variance of a quality related variable (Bernardo, Pistikopoulos and Saraiva, 2001). For the first case, Table 6 provides four kinds of loss fimctions based on the quadratic form (17), together with relevant application examples. In the second case, a general robustness metric can be defined as a fimction r of the statistical moments my of the quality variable y, with the following constraint being added to the problem formulation: r{my) < y. The statistical moments rriy are easily obtained using the expectancy operator, with the first three ones (mean (//^), variance (oy^) and skewness (§;)) given by: y"^ = - £ • © (
y)
y-fiy
(19)
^2
^y=E@\
\
y J
The generic robustness criterion r(my)< 7 can represent several situations, namely hard quality constraints of the form ju(y) > y, o(y) < y o(y)/ju(y) < yor constraints for six-sigma performance (see section 4.1.1). It can also describe more sophisticated criteria, such as onesided robustness criteria, where for instance only variability above or below a specification is penalised (Ahmed and Sahinidis, 1998) or an upper limit is imposed on the probability of a soft inequality constraint violation and/or its expected extent of violation (Samsatli et aL, 1998). Table 6. Taguchi loss functions based on the quadratic form C = kiy-y'^) (LI) Nominal-the-best [symmetric] (L2) Nominal-the-best [asymmetric] (L3) Larger-the-better (L4) Smaller-the-better
.
k values Same k for all y
Example of a quality variable Product stream with a target composition y*
k = ki if>'<>'* k = k2 ify > 3^* k = k\ ify;;* A:=Oif ;;<;;* k = k2 ify > j ^ *
Product stream with minimum purity requirement;;* (ki = 0) Product stream purity with maximum possible valuej^* Concentration of a pollutant in a waste stream where the minimum concentration that can be achieved is j ; *
196 Using our previous generic problem formulation framework, and adding to it the above robustness considerations, one obtains the following problem statement, which can be seen as a relaxation of formulation A, assuming that some of the constraints are soft. Formulation C: Robust formulation with complete feasibility (Bernardo, Pistikopoulos and Saraiva, 2001) Step 1. Quality constraints are soft {XA > 0.9 is soft), all the other constraints are hard. Step 2. Model-inherent, continuous stochastic and non-reducible uncertainties (UR and U). Parameters are considered to be soft in respect to quality constraints and hard relatively to hard constraints. Step 3. Perfect control operation. Step 4. 4.1. Robustness criteria: quality constraints are relaxed, with process performance / being penalised through a Taguchi loss fimction; if desired, an additional hard quality constraint of the form r{my) < y can be added, where r is a ftmction of the statistical moments my for the quality variable y. In the case of our example, we relax x^ < 0.9 (it simply vanishes) and penalise profit with a one-sided loss function (asymmetric nominal-the-best loss function with
fe = 0):
^ ^ k ( x ^ - 0 . 9 ) 2 if x^< 0.9 ^ |0ifx^>0.9, with ^1 = 6.4x10^ 4.2 Decision criterion: maximize expected performance over 0 space (trade-off between profitability and robustness is explored). Step 5. Problem formulation similar to (14) but incorporating the robustness criteria mentioned in 4.1. In our case study, profit function is penalised with Q and the following constraints are added to the operating stage, in order to switch between penalty above and below x^* = 0.9:
c^^K^A? hXA>XA-XA
(CI)
AJC^>0
(C2).
Since Q is being minimized, when XA < XA*, (CI) is active; otherwise, (C2) is active and therefore the quality cost is equal to zero. Step 6. Same as in formulation A. Formulation C can be further relaxed by not including 0 vertices in the multiperiod optimization problem. The stochastic flexibility for the design solution thus obtained should be then verified a posteriori. This provides a quick way to estimate the loss of opportunity
197 associated with the assumptions of hard uncertain parameters and hard constraints. We designate this variation as a C-soft problem formulation. As stated before, formulation B can be seen as a relaxation of formulation A, assuming soft uncertainty, while C results from A relaxing quality constraints. Considering the same robustness issues as in C and the same decision criterion as in B, we now present a mixed formulation where both types of relaxation are present, that is quality soft constraints and soft uncertainties. Formulation D: Robust formulation with explicit feasible region evaluation Step 1. Quality constraints are soft {XA > 0.9 is soft), all the other constraints are hard. Step 2. Model-inherent, continuous stochastic, soft and non-reducible uncertainties {ICR and U). Step 3. Perfect control operation. Step 4. 4.1 Robustness criteria: same as in formulation C. 4.2 Decision criterion: maximize the integral of performance fimction / over the feasible region R(d) e 0 (trade-off between profitability, flexibility and robustness is explored simultaneously). Step 5. Problem formulation similar to (16) and incorporating the robustness criteria described in formulation C. Step 6. Same as in formulation B. Table 7 shows the solutions obtained according to the five alternative formulations mentioned so far: A, B, C, C-soft and D. Overdesign factors (odf) for both reactor volume V and heat exchanger area A are also shown and were computed as the ratio between the solutions obtained and the one that corresponds to a fully deterministic approach (whose solution is {V^} = {4.429 m^5.345 m^}). The respective SF, /j, and a values were estimated by numerical integration, using the product Gauss formula mentioned earlier on with 25 points along 0 (solution A, C and C-soft) or R{d) (solutions B and D), being also included in Table 7. Table 7. Optimal design solutions considering kR and L^uncertainties (RHE system). A B C C-soft D E(P) ($/ano) 1022' 1021 708 96r 999 4.583 4.513 5.545 4.898 4.513 V{m') odf(F) 1.02 1.02 1.03 1.25 1.11 ^(m^) 6.224 6.175 6.739 6.199 6.495 odf(.4) 1.22 1.26 1.15 1.16 1.16 1 0.9920^ 0.9919^ SF 0.9335 1 0.9182 0.9092 0.9014 0.9014 0.9027 M(XA) 0.004676 0.004568 0.005533 0.005465 0.005533 O(XA) * Ratio between profit integral over R(d) and SF. ^ Estimated using 36 quadrature points; the unfeasible region is a narrow band for low values of U and kRX 12h~^.
198 Solution A is the most conservative one, presenting the largest overdesign factors. The other formulations are obviously less conservative due to soft uncertainty and/or soft constraints relaxation. The comparison between different solutions requires a careful analysis of the assumptions underlying the respective formulations. For instance, when solutions A and B are compared, one should bear in mind that their respective objective functions are equivalent if in formulation A one assigns a zero profit for ^points outside i?(J). If negative profits are observed within R{d), which is the case for this example, the direct comparison of expected profit solutions is not fair. The same remark applies when solutions C and D are compared. Formulations with similar assumptions, on the other hand, can be directly confronted. For instance, looking at solutions A and C, a 41% raise in expected profit can be associated with a relaxation of the XA>0.9 constraint. This increase indicates that the quality constraint, when assumed to be a hard constraint, is a significant source of unfeasibility and consequent overdesign. The expected profit increase is not so significant when solutions B and D are compared, indicating that the problem relaxation corresponding to the assumption of soft uncertainty turns the quality constraint less critical. Comparing solutions C and C-soft, one can see that hard uncertainty and constraint assumptions are not a significant source of overdesign. Regarding process variability, looking at solutions A and C we realise that the quality constraint relaxation results in greater XA standard deviation, together with a smaller average value, very close to the constraint lower limit. Larger quality loss coefficients will result in optimal solutions associated with reduced XA standard deviations, and a parametric study may be conducted in this regard. 4.1.1 Design formulation for six-sigma quality Six-sigma quality is a statistical standard metric for process performance and a philosophy of customer satisfaction. As a statistical standard, six-sigma refers to keeping critical-to-quality characteristics (y variables) within plus or minus six standard deviations of their mean values, and these ones to have a maximum deviation from the corresponding target values of plus or minus 1.5 standard deviations (Craig, 1993). Based on the assumption that random >' variables are normally distributed, six-sigma quality guarantees a defect rate of no more than 3.45 parts per million (Deshpande, 1998). As a philosophy, six-sigma is the commitment to continuous improvement by reducing variation and increasing robustness of processes and products (Craig, 1993; Harry and Shroeder, 2000). The six-sigma quality metric can be easily incorporated into both of formulations C or D above presented, under the form of a hard quality constraint r(my) < y. Process capability indices Cp and Cpk (Deshpandel, 1998) are usually used to quantify six-sigma performance. For a one-sided lower specification;;^ and considering a 3-sigma variation below the target, Cp is defined as:
199
C,=^^
(20)
The Cp index can be interpreted as the inverse of a normalised standard deviation and thus is inversely proportional to y variability. The Cpk index, on the other hand, is a measure of how close the y distribution mean is to the target value >'*, and in the same case of a one-sided lower specification j ^ can be defined as:
^Pk -
^P
1 - ~* T y -y
(21)
If the mean distribution equals target value j ; * , then Cpk= Cp; but otherwise Cpk increases as the mean distribution gets closer to it. Similar definitions can be established in the case of a one-sided upper or two-sided specification. According to the above equations, six-sigma performance can be guaranteed adding to a our design formulation the following hard quality constraints: Cp>2 Cp,>1.5
(22)
Referring to our case study, and when a lower specification XA^ = 0.87 is considered, a C type formulation incorporating constraints (22) results in a slightly more severe solution, mainly due to an increase in the operational costs: EQ(P) = 920 $/year, F = 4.510 m"^, A = 6.494 m^, JU(XA) = 0.9010, O{XA) = 0.005000, CP = 2.000 and Cpk = 1.936. 4.2 Value of information regarding uncertain parameters In the case of perfect control operation, the value of perfect information for a given design decision d was already defined in section 2 as being given by: Yn(d,O) = f\O)-fXd,0l
(4)
where/"(^) stands for process performance assuming both design and control variables as wait-and-see decisions (wait-and-see performance) and f\d,d) stands for the same process performance but with design variables decided prior to uncertainty realisation (here-and-now performance). In section 3.2.2 we introduced the distinction between reducible uncertain parameters, whose present knowledge can be improved at a certain R&D cost, and non-reducible parameters, whose uncertainty reduction is believed to be beyond our control. In both cases, the above
200 definition of VPI (4) applies, if a perfect control operation is assumed. lerapetritou et al. (1996) have addressed this question focusing on non-reducible parameters and their expected VPI in the context of operational planning problems, while Bernardo, Saraiva and Pistikopoulos (2000) have considered the case of reducible uncertain parameters in an early process design stage, where an optimal investment in R&D activities should be decided, in order to reduce uncertainty in the most valuable and selective way. In this last case, the expected VPI receives the name of expected value of eliminating uncertainty (VEU), since it refers to parameters relatively to which perfect information is usually impossible to achieve, even in future process operation. A possible decision criterion is based on expected values of VPI or VEU. Namely, one can establish process design decisions with a maximum allowed expected VPI relative to a given parameter uncertainty or R&D decisions supported by expected VEU values (Bernardo, Saraiva and Pistikopoulos, 2000). Therefore, we will now establish a generic framework to incorporate the evaluation of expected VPI/VEU values within process design formulations of the types presented in our previous section. For that purpose, we must at first subdivide the vector ^ in two disjunctive subsets 6\ e 0i and 9z e 02, where 02 are the uncertain parameters whose value of information is to be computed. Considering decision criteria of expected performance/ equation (4) takes then the following form: £e^ [VPI(J,^2)] = E^^ {£e, [f\0iA)]}-Ee,
{^e, [f\dAA)]}
(23)
Please notice that in the case of formulations with explicit feasible region evaluation, expected values should be taken over R\(d) and Riid), instead of 0 i and 02, respectively. The second term in (23) is simply the expected value over 0 of the here-and-now performance, while the first term is the expectancy over 02 for the expected wait-and-see performance over 0 i , itself a function of ft. In other words, the expected VPI relative to ft is the expected value over 02 of a wait-and-see solution (itself a function of ft, designated by SwsiOz)) minus a here-and-now solution (Shn)'Ey?Iid) = Es^[S„M)]-SHn
(24)
The first term in (24) can be estimated solving wait-and-see problems for each one of the integration points in 02. Thus, a generic procedure to evaluate EVPI can be stated as: Step 1. Identify subset ft relative to parameters whose EVPI is to be evaluated. Step 2. Solve the optimization problem of the type (14) or (16) deciding design variables d for the present level of uncertainty, thus obtaining AS/JW.
201 Step 3. For each integration point in the 02 space, solve the same problem, obtaining a set of solutions Sws{^)- Estimate the expected value of these solutions using an integration technique over 02. Step 4. Compute EVPI(^ as in (24). We also applied this procedure to our case study, considering a wider uncertainty model than before (Table 8). All of the 6 uncertainties are described by normal independent PDFs truncated to 3.09-sigma, such that £= 3.09a/ju. The last three parameters clearly belong to the category of reducible parameters, and the goal here is to find, for the here-and-now optimal design, the expected VEU values associated with each one of them, in order to then select those parameters whose uncertainty reduction is more value adding, and around which R&D efforts should be allocated. Table 8. Uncertainty model (RHE system). Fo To Twi kR
U EIR
Feedflowrate Feed temperature Cooling water inlet temperature Arrhenius rate constant Overall heat transfer coefficient Activation energy over perfect gas constant
Mean, ju 45.36kmol/h 333 K 293 K 12 h"^ 1,635 kJ/(m\K) 555.6 K
Error s 0.20 0.04 0.04 0.30 0.30 0.30
Within the scope of our generic problem formulation framework, introduced in section 4, and adopting a C-soft type of approach, our problem in this context can be defined as follows: Step 1. Quality constraint x^ > 0.9 is soft, all the other constraints are hard. Step 2. Process-inherent, continuous stochastic, non-reducible uncertainties (Fo, To, Ty^\)\ model-inherent, continuous stochastic and reducible uncertainties (fe, U, EIK). Parameters are considered to be soft in respect to the quality constraint and hard relatively to hard constraints. Step 3. Perfect control operation. Step 4. ^^ 4.1. Robustness criteria: x^ < 0.9 is relaxed, with cost being penalised through a one-sided loss function {x/" = 0.9, k\ = 6.4x10^ and ki = 0); a hard quality constraint O(XA)/JU{XA) < 0.01 is also considered. 4.2 Decision criterion: minimize expected cost over 0 space. Step 5. The design problem is formulated as a one-stage problem of the form (15), not including 0 vertices, adding additional constraints to switch between penalty values above and below XA* and also the hard quality constraint. The here-and-now solution of the above formulation results in an overall expected cost of 14 596 $/year (column A of Table 9). Expected values over 0 are estimated using a specialised cubature formula with 2" + 2« = 76 points. In Fig. 5 we plot wait-and-see solutions referring
202 to EIR uncertainty elimination, of which expected value, according to EIR normal PDF and using a specialised quadrature with only 3 points, is 13 317 $/year. The expected VEU for this parameter, and according to (24), is then 14 596 - 13 317 = 1279 $/year. Doing the same calculations for the other reducible parameters, one obtains expected VEU values of 379 and 0 $/year, respectively for parameters kR and U. This reveals that activation energy is indeed the parameter whose uncertainty reduction is most relevant for achieving overall plant cost savings and that, in average, there is no benefit associated with increasing our knowledge about the heat transfer coefficient.
mm
E{C) ($/yr)
r 0.008
14000 •
^
13600 13200
•
.
^
••—"^""•""'^
^ ^,^^
^
0.006
^
- 0.004 •
^
12800 ^
;
^
•
•
.
.
.
^
0.002 - n c\c\c\
350 400 450 500 550 600 650 700 750 £:/7^(K) Fig. 5. Wait-and-see solutions (solid line) and EIR PDF (dotted line) as a function oiEIR true but presently unknown value. Suppose now that after selecting the most value adding parameters, let us say the subset ft G ft, we want to decide the optimal investment that should be allocated in R&D around this subset, together with the corresponding optimal uncertainty levels that we will end up with. This can be done exploring the trade-off between an assumed information cost and the associated benefits due to uncertainty reduction. The following annual information cost function is then considered for each parameter ^ e ft:
Cij-h^ifj^^j
1 v^^-
1
(25)
J J
where Cifj represents a fixed cost (associated for instance with the investment in laboratory or pilot scale equipment needed to run experiments), while the second term corresponds to variable costs (e.g. reactant and operation costs). If no experiment takes place, the binary variable bj affecting the fixed cost equals zero and the relative error associated with parameter 6j remains at its current nominal level {Sj = sf). Otherwise, bj = 1 and the information cost grows as the relative error decreases, since more precise knowledge about 0/ becomes
203 available; in the limit, an infinite R&D investment would lead to perfect knowledge about this parameter. The total cost of information, C/, is simply the sum of Qj for all ft parameters. When a certain amount Q is spent in R&D, our knowledge about ft parameters increases, with vector //3 assuming a more accurate value in the domain 03 = {ft: ft G yXft)} and the errors £3 being reduced. In order to find out what are the optimal errors £3, corresponding information cost, and best design d, one has to solve parametrically, for different possible realisations of//s, a design problem where (25) is incorporated in the objective function, with Z?3 and £3 as additional decision variables. Adopting a C-soft type of formulation, omitting the operating stage and eventual hard quality constraints, the problem is:
S(jU3) = max E^ sJ.O<Sj<sf
{f;^(dM-Cj(h,s,) (26)
ef-Sj
In face of the set of solutions S(jU3), a possible decision criterion may be to select the worst case scenario for all possible JU3 values (e.g., the point where largest optimal information costs are obtained). However, this decision may be too conservative and overestimate R&D resource allocation. Thus, one may instead want to adopt an average criterion identifying the errors £3, associated information costs, and design d that minimize the objective function expected value over ©3. Adopting a C-soft type of formulation, the design problem according to the above average criterion can be formulated as follows (operating stage and eventual hard quality constraints are once more omitted):
s.t.O<Sj<sf
(27)
sf-sj
Although the above formulations are defined considering a subset ft of the most valuable reducible parameters, optimal levels of uncertainty can be decided relatively to the entire set ft. The problem complexity, however, when a realistic number of parameters is considered, motivates the above two steps approach: Step 1. Select the subset ft of the most value adding reducible parameters, based on their expected VEU.
204 Step 2. For the reduced subset thus found decide optimal R&D investments and correspondent levels of uncertainty based on an information cost function. Going back to our case study, and focussing only on the most valuable parameter, which is the activation energy. Fig. 6 shows the minimum expected cost and correspondent optimal level of uncertainty £(E/R), when a problem similar to (26) is solved for different E/R mean values (the same 64 points cubature is used). The initial nominal error considered is £r^=0.30, together with an assumed fixed research cost Cif= 100 $/year and a = 90. As activation energy mean increases, optimal information costs become larger, and therefore the corresponding parameter relative error, s(E/R), smaller. This indicates that bigger R&D spending should be afforded if indeed the parameter true mean value happens to correspond to the less favourable scenarios of high E/R values. The worst case scenario (solution B, Table 8) corresponds to an optimal error of only 0.099 around the barely probable mean value of 722.3 K.
E{Cost) ($/yr)
£(E/R)
15500 1
0.18
15000
0.16
14500
\ 0.14
14000
0.12
13500
0.10
13000 350 400 450
500
550
600
650
700
0.08 750
M(E/R){K) Fig. 6. Expected cost (solid line) and optimal error (dotted line) as a function of E/R mean values. A less conservative decision may be obtained using an average criterion weighted by E/R normal PDF. A problem similar to (27) is then solved, using a 3 points specialised quadrature over 02 and a 42 points specialised cubature over 0 , resulting in a total of 228 points of integration. The optimal solution obtained (column C, Table 8) indicates that, in average, it is profitable to launch a R&D program with an annual depreciation value of 475 $/yr, in order to increase the currently available knowledge about the activation energy, up to the point where an error of 0.133 is achieved.
205 Table 9. Optimal design solutions considering present level of kR, U and EIR uncertainty (solution A) and optimizing EIR uncertainty level (solutions B and C). A B" C 14 252'' 15 169 14 596 E(C) ($/ano) 8.254 8.667 6.193 V{m') 7.804 7.442 7.962 A{m^) 0.300 0.300 0.300 Kh) 0.300 0.300 0.300 6(U) 0.133 0.099 0.300 €{EIR) 0.9421 0.9184 0.9250^ MM 0.01^ 0.01 0.01 O{XA)IJU{XA) ^^^ Worst case scenario: /u{EIR) = 722.3 K. ^^ Expected values in face of ju(E/R) uncertainty.
5. CONCLUSIONS AND FUTURE WORK Process design problems under uncertainty are inherently underdefined problems whose complete definition requires a set of considerations and assumptions, namely regarding constraints violation, uncertainty classification and modelling, operating policy and decisionmaking criteria. In this chapter we have analysed the above issues and integrated them under a generic framework for optimal process design, including a simple procedure to guide the decision-maker on the problem definition and also the associated optimization formulations. This generic framework was then used to systematise some design formulations presented in the literature (mainly focusing on process flexibility), including also some of the authors previous published work on robustness criteria and value of information about uncertain parameters. Starting with a complete feasibility formulation, where all the constraints are forced to be satisfied for every uncertain parameter possible realisation, we have derived other three design problem formulations relaxing constraints and/or uncertainties (uncertainty relaxation corresponds to allowing feasibility over a subset R(d) of the uncertainty space &). Although both types of relaxation can be formulated, we believe that constraint relaxation approaches (C type formulation) may provide a more effective way to define design problems, since they allow us to selectively supervise constraint violations, through penalty terms in the objective function or explicitly limiting the probability and/or expected extent of violation, while simultaneously optimizing the expected process performance and ensuring entire feasibility for hard constraints. On the other hand, formulations with a soft uncertainty assumption (B or D types), even with an explicit lower limit for stochastic flexibility, do not allow us to control which constraints and to what extent are being violated over the portion of 0 outside R(d). Regarding the value of information about uncertainty, we have proposed a two steps procedure to selectively allocate, at an early design stage, optimal investments in R&D to increase the present knowledge about uncertain parameters. In the first step the most value
206 adding parameters are identified, based on their expected value of perfect information, while in the second stage the best design together with optimal R&D investments are decided exploring the trade-offs between economic added value deriving from uncertainty reduction and the associated information costs. A case study comprising a reactor and a heat exchanger has illustrated the usefulness of the presented formulations, namely regarding the following features: accurate computation of overdesign factors, optimal design accounting for product quality, trade-offs between profit, flexibility and robustness, identification of activation energy as the most valuable model parameter regarding which it is profitable to conduct R&D experiments in order to increase our present knowledge about it. In the future we intend to further investigate and materialise some of the issues that were handled here, such as: inclusion in a C type formulation of constraints violation limits, definition of a design objective that covers safety and environmental issues, use of decisionmaking criteria based on value of perfect information, definition of a more effective operating policy somewhere between perfect and rigid control operation. Process design has always been one of the most challenging and noble activities of chemical engineering, where a combination of art and science results in the conception of new or revamped plants. Some of its key concepts have already been defined a number of decades ago. However, due to the lack of adequate computational tools, some of such issues have since then remained to a large extent unexplored, including the ways used (or not) to include and handle several kinds of relevant uncertainties. The past three decades, and the nineties in particular, have shown that new capabilities, tools and frameworks are now available in order to address in a consistent way and consider explicitly uncertainty sources as key issues in achieving design solutions that will result into competitive plants under environments of increasing randomness and volatility. Therefore, uncertainty handling becomes more and more critical, and we now have a number of tools available for making out of the ways we use to do process design more of a science and less of an art. Nevertheless, new and unexplored paths remain to be carefully studied in the future, making the next 30 years in this regard at least as promising and rich as the past 30 years that we have tried to portrait in this chapter. REFERENCES 1. Acevedo, J. and Pistikopoulos, E. N., A Parametric MINLP Algorithm for Process Synthesis Problems under Uncertainty, Ind. Eng. Chem. Res., 35 (1996) 147. 2. Acevedo, J. and Pistikopoulos, E. N., A Multiparametric Programming Approach for Linear Process Engineering Problems under Uncertainty , Ind. Eng. Chem. Res., 36 (1997), 717. 3. Ahmed, S., and Sahinidis, N. V., Robust Process Planning under Uncertainty, Ind. Eng. Chem.Res., 37(1998), 1883.
207 4. Bernardo, F. P., Pistikopoulos, E. N. and Saraiva, P. M., Integration and Computational Issues in Stochastic Design and Planning Optimization Problems, Ind. Eng. Chem. Res., 38 (1999a), 3056. 5. Bernardo, F. P., Pistikopoulos, E. N. and Saraiva, P. M., Robustness Criteria in Process Design Optimization under Uncertainy, Comp. Chem. Eng., 23, Suppl. (1999b), S459. 6. Bernardo, F. P., Pistikopoulos, E. N. and Saraiva, P. M., Quality Costs and Robustness Criteria in Chemical Process Design Optimization, Comp. Chem. Eng., 25 (2001), 27. 7. Bernardo, F. P. and Saraiva, P. M., A Robust Optimization Framework for Process Parameter and Tolerance Design, AIChE J., 44 (1998), 2007.. 8. Bernardo, F. P., Saraiva, P. M. and Pistikopoulos, E. N., Inclusion of Information Costs in Process Design Optimization under uncertainty, Comp. Chem. Eng., 24 (2000), 1695. 9. Bhatia, T. K. and Biegler, L. T., Dynamic Optimization for Batch Design and Scheduling with Process Model Uncertainty, Ind. Eng. Chem. Res., 36 (1997), 3708. 10. Brooke, A., Kendrick, D. and Meerans, A., GAMS: A User's Guide, Release 2.25, The Scientific Press Series, 1992. 11. Chacon-Mondragon, O. L. and Himmelblau, D. M., Integration of Flexibility and Control in Process Design, Comp. Chem. Eng., 20 (1996), 447. 12. Craig, R. J., Six Sigma Quality, the Key to Customer Satisfaction, 47th Annual Quality Congress, May 1993, Boston, 206 (1993). 13. Diwekar, U. M. and Kalagnanam, J. R., Efficient Sampling Technique for Optimization under Uncertainty, AIChE J., 43 (1997), 440.. 14. Diwekar, U. M. and Rubin, E. S., Parameter Design Methodology for Chemical Processes Using a Simulator, Ind. Eng. Chem. Res, 33 (1994), 292. 15. Deshpande, P. B., Emerging Technologies and Six Sigma, Hydrocarbon Processing, No. 77(1998), 55. 16. Grossmann, I. E. and Floudas, C. A., Active Constraint Strategy for Flexibility Analysis in Chemical Processes, Comp. Chem. Eng., 11 (1987), 675. 17. Grossmann, I. E. and Halemane, K. P., Decomposition Strategy for Designing Flexible Chemical Plants, AIChE J., 28 (1982), 686. 18. Grossmann, I. E., Halemane, K. P., and Swaney, R. E., Optimization Strategies for Flexible Chemical Processes, Comp. Chem. Eng., 7 (1983), 439. 19. Grossmann, I. E., and Sargent, R. W. H., Optimum Design of Chemical Plants with Uncertain Parameters, AIChE J., 37 (1978), 517. 20. Halemane, K. P. and Grossmann, I. E., Optimal Process Design under Uncertainty, AIChE J., 29 (1983), 425. 21. Harry, M. and Schroeder R., Six Sigma: The Breakthrough Management Strategy, Currency, New York, 2000. 22. van den Heever, S. A. and Grossmann, I. E., Disjunctive Multiperiod Optimization Methods for Design and Planning of Chemical Process Systems, Comp. Chem. Eng., 23 (1999), 1075. 23. lerapetritou, M. G. and Pistilopoulos, E. N., Simultaneous Incorporation of Flexibility and Economic Risk in Operational Planning under Uncertainty, Comp. Chem. Eng., 18 (1994), 163. 24. lerapetritou, M. G., Pistikopoulds, E. N. and Floudas, C. A., Operational Planning under Uncertainty, Comp. Chem. Eng., 20 (1996), 1499.
208 25. Nishida, N., Ichikawa, A. and Tazaki E., Synthesis of Optimal Process Systems with Uncertainty, Ind. Eng. Chem. Proc. Des. Dev., 13 (1974), 209. 26. Ostrovsky, G. M., Volin, Y. M., Barit, E. I. and Senyavin, M. M., Flexibility Analysis and Optimization of Chemical Plants with Uncertain Parameters, Comp. Chem. Eng., 18 (1994), 775. 27. Ostrovsky, G. M., Achenie, L. E. K. and Gomelsky, V., A New Approach to Flexibility Analysis, PRES'99 Proceedings, Hungary, Budapest (1999). 28. Pertsinidis, A., Grossmann, I. E., McRae, G. J., Parametric Optimization of MILP Programs and a Framework for the Parametric Optimization of MINLPs, Comp. Chem. Eng., 22, Suppl.(1998), S205. 29. Phadke, M. S., Quality Engineering using Robust Design, Prentice Hall, New Jersey, 1989. 30. Pistikopoulos, E. N., Uncertainy in Process Design and Operations, Comp. Chem. Eng., 19, Suppl. (1995), S553.. 31. Pistikopoulos, E. N. and lerapetritou, M. G., Novel Approach for Optimal Process Design under Uncertainty, Comp. Chem. Eng., 19 (1995), 1089. 32. Pistikopoulos, E. N. and Mazzuchi, T. A., A Novel Flexibility Approach for Processes with Stochastic Parameters, Comp. Chem. Eng., 14 (1990), 991. 33. Rudd, D. F. and Watson, C. C , Strategy of Process Engineering, John Wiley & Sons, New York, 1968. 34. Samsatli, J. N., Papagergiou, L. G. and Shah, N., Robustness Metrics for Dynamic Optimization Models under Parameter Uncertainty, AIChE J., 44 (1998), 1993. 35. Straub, D. A. and Grossmann, I. E., Integrated Stochastic Metric of Flexibility for Systems with Discrete State and Continuous Parameter Uncertainties, Comp. Chem. Eng., 14 (1990), 967. 36. Straub, D. A. and Grossmann, I. E., Design Optimization of Stochastic Flexibility, Comp. Chem. Eng., 17 (1993), 339. 37. Swaney, R. E. and Grossmann, I. E., An Index for Operational Flexibility in Chemical Process Design: Formulation and Theory, AIChE J., 31 (1985a), 621. 38. Swaney, R. E. and Grossmann, I. E., An Index for Operational Flexibility in Chemical Process Design: Computational Algorithms, AIChE J., 31 (1985b), 631. 39. Watanabe, N., Nishimura, Y., Matsubara, M., Optimal Design of Chemical Processes Involving Parameter Uncertainty, Chem. Eng. Sci., 28 (1973), 905.
Dynamic Model Development: Methods, Theory and Applications S.P. Asprey and S. Macchietto (editors) © 2003 Elsevier Science B.V. All rights reserved
209
A Modelling Tool for Different Stages of the Process Life Mauricio Sales-Cruz and Rafiqul Gani CAPEC, Department of Chemical Engineering, Technical University of Denmark, DK-2800 Lyngby, Denmark A computer-aided modelling tool, MoT, that assists the model developer in terms of model import, model translation, model analysis, model solution and model transfer without the user having to write any programming codes is presented. The main features of MoT are presented within the context of the work process related to various modelling activities during the life of a process. External models written in text-format and/or XML-format can be imported to MoT, which then translates and expands the model according to a Reverse Polish Notation algorithm. The translated model can be solved, after satisfying mathematical consistency requirements, equation by equation in the debug-mode or simultaneously in the solution-mode. The solvable model can also be exported through a model transfer feature to other simulation engines and/or external software. The use of MoT is highUghted through a number of interesting and illustrative modelling examples. 1. INTRODUCTION The life of a process, starting with the "birth" of an idea to manufacture a product and ending with the dismantling of the process built to manufacture it, could be divided into different stages, such as business planning, research and development, process conceptual design, process detailed design, engineering design and operation & maintenance, with their corresponding activity model (Okada and Shirao (2002)). This means that for process calculations in any stage of the hfe of a process, models at different levels and with different views may be needed. That is, models would be needed to generate information (knowledge) in the form of data through which various types of problems (modelling context) can be solved for the system under consideration. Figure 1 illustrates this idea where the system could be a process/product, an operation, a production system, a business, etc., while the problem, could be related to design, production, planning, analysis and many more. The model provides all or part of the information needed to solve the specific problem when the equations representing the model are solved. For the same problem and system, models of different complexity (levels) and with different perspectives of interest
210 (views) may be needed. For example, in the various stages of the life of a process, simulations at various levels are needed for the same system of compounds. In the conceptual (process) design stage, process simulation models including sub-models for constitutive variables (such as physical properties), are needed. These process simulation models may be used from within process synthesis/design methods to generate the optimal design and/or used for verification of the synthesis/design results. In either case, the process simulation models may be of different perspectives of interest or view and having different levels of complexity. Infonnation (knowledge, data) '
Process/ Product
Problem Design Production
Simulation
Operation
Planning Environment
^
Model
Analysis
Business System Figure 1: Why do we need models? Clearly, as highlighted by Stephanopoulos et al. (1990), multifaceted modelling is necessary for modelling activities at different stages of the life of a process. On the other hand, the life of a model could be viewed as cyclic (see Figure 2), where a number of modelling steps are repeated until a desired model is obtained. By changing the model objectives in any cycle, a different model for the same process is obtained, leading to the concept (Marquardt et al. (2000)) of life cycle process modelling where different facets of process models are considered together with their relationships and the evolution of modelling artefacts in the sense of a work process. According to this concept, the evolution of a model from its initial creation, through a number of application and modification cycles is monitored and different versions and views of a model are related to each other. This is certainly true when the models describe the same physical equipment or operation or chemical system. Once the common features at various levels or views have been identified, starting from a reference model frame, various versions of a model may be generated. Figure 3 illustrates this concept for the generation of various process models where the chemical system is fixed (which is usually the case for any process manufacturing a specific product). This figure also highlights the fact that considerable amount of model development work is needed for the various activities corresponding to the Hfe of a process. Therefore, use of a computer-aided tool that can assist in the model development work is highly recommended.
211
Figure 2: Cyclic modelling work process
Region of opemtion of the process for the total lifecycle of the product
Need qualitatively correct models "nlth large application range
Need quantitatively correct models >vith large application range
Need quantitatively correct models with limited application range
correct models with simple phenomena models
Figure 3: Multifaceted modelling needs Until recently, most of the model development work has been done without too much involvement of computers. Only at the solution stage, the computers have been used. This has changed since the development of a number of computer-aided modelling tools, such as ModDev (Jensen and Gani, 1999), Model.La (Bieszczad, 2000), ICAS-MoT (Russel and Gani, 2000) and ModKit (Bogusch et al. 2001). Increasing use of these computer-aided modelling tools means a bigger role for computers in the cycle of modelling activities (see Figure 2).
212 ModDev and Model.La are knowledge-based modelling systems that are able to generate process models through a set of model building blocks and a reference model frame, using user-provided descriptions of the process, boundary, phenomena and so on. In the case of ModDev, the generated model is then translated into a solvable form and linked to a solver or a simulation engine through ICAS-MoT. ModKit is an interactive modelling tool based on the process modelling language VeDa (Marquardt et al 1993) that specifies the relevant domain language. Several interactive tools contribute to assist the model developer in setting up a model through ModKit. The reader is referred to Hangos and Cameron (2001) for more information on modelling concepts and modelling in general; to von Wedel et al. (2002) for modelling frameworks; and to Eggersmann et al. (2002) for apphcations of modelling tools. The objectives of this chapter are to present ICAS-MoT as a modelling tool-box and highlight its use through interesting modelling exercises. Most of the models used in the modelling exercises have been collected from the literature. The examples cover problems from various stages of the life of a process. ICAS-MoT is an integrated environment to build, analyse, manipulate, solve and visualise mathematical models. The aim of ICAS-MoT is to provide an intuitive, flexible and efficient way of integrating different aspects of the modelling needs. With reference to Fig. 2, MoT assists the model developer in the steps involving model construction, model solution, model verification and model calibration & validation. These are also the steps that the computer can do more easily than the user (human). Note that model construction involves model generation, model import, model translation and model analysis. 2. MATHEMATICAL MODELS Mathematical models for a process may be derived by applying the principle of conservation of mass, energy and/or momentum on a defined boundary (representing the process) and its connections to the surroundings. A process may be divided into a number of sections where each section is defined by a boundary and connections with other sections and the surroundings. In this way, models for different sections of a process may be aggregating together into a total model for the process. In general, the model equations may be divided into three main classes of equations, • • •
Balance Equations (mass, energy and/or momentum equations) Constitutive Equations (equations relating intensive variables such as temperature, pressure and/or composition to constitutive variables such as enthalpies, reaction rates, heat transfers, etc) Connection and Conditional Equations (equations relating surroundings-system connections, summation of mole fractions, etc.)
213 The appropriate model equations for each type of model may be derived based on the specific model needs. The model needs are translated to a set of model assumptions and together, help to describe the boundary and its connections. Therefore, based on this description, different version of a model for the same process may be derived. For example, a simple process model may include only the mass balance equations and the connection/conditional equations because the energy and momentum balance effects are assumed to be negligible and the constitutive variables are assumed to be invariant with respect to composition. A more rigorous model may include the mass and energy balance equations, the connection/conditional equations as well as the constitutive equations. There could be two modes of these models (steady state or dynamic). In the steady state mode, the rate of change of accumulation is assumed to be zero (or negligible) while in the case of dynamic mode, they vary with respect to time (the independent variable). An even more rigorous model may add the distribution of the intensive variables as a function of space (in one or more dimensions). In general, the balance equations in a process model are based on the laws of conservation and take one or more of the following forms,
0 =f{y,z,t) dy/dt=f{y,z,t) | . | l = /(3.,«.z.O
(1) (2) (3)
ot ou In the above equations, Eq.l represents a set of AEs (Algebraic Equations) and models the steady state behaviour; Eq. 2 represents a set of ODEs (Ordinary Differential Equations) and models the dynamic behaviour when the independent variable is time; while Eq.3 represents a set of PDEs (Partial Differential Equations) and may be used to model both steady state and dynamic behaviours, depending on the dimension of the problem and the type of the corresponding independent variables, y represents a vector of state variables, z a vector of specified variables in Eqs. 1-2 but is an independent variable in Eq. 3, t is an independent variable, whereas u represent another independent variable (in a 2-dimensional PDE system). The constitutive equations are usually algebraic (but could also be of the ODE and/or PDE type). In the algebraic form, they may be written as, 0 = q-g{T,P,xj)
(4)
Where ^ is a constitutive variable, which may be a function of temperature (J), pressure (P), composition (x) and parameters (d). Usually, T, P and/or x are represented by ;; in the balance equations. These are usually expHcit in nature, that is, knowing T, P, x and/or d, it is possible to compute q.
214 The connection and/or conditional equations (to be called henceforth connection equations) are also usually algebraic and may be represented as, 0 = r-h{y,z,t) (5) As Eq. 5 implies, some connection may be explicit (if they are not functions of r) while others may be implicit (if they are functions of r). 2.1 Forms of Models and Solution Modes Models represented by AE sets (1,4-5) usually represent the total steady state model; by DAE (ODE + AE) sets (2,4-5) usually represent the total dynamic model; while models represented by PDAE (PDE + ODE + AE) sets (2-5) usually represent models in continuous domain. Solution of the model equations depends on the forms of the model. Models forms of the AE-type require a linear or nonlinear equations solver depending on whether the model is linear or nonlinear with respect to the unknown variables. Usually, the AE set can be ordered and decomposed into subsets of implicit and explicit algebraic equations. The explicit equations can be solved analytically and this means that some models of AE-type may be explicit and solved analytically. Models of the DAE-type may or may not include AE sub-sets. Most process models, however, include AE subsets, which when inserted into the ODEs, yield a system with ODEs only. Models of DAE-type may be solved in the dynamic-mode and/or steady state mode (where the condition under which the accumulation term becomes zero is sought). If the AE subsets are explicit, models of DAE-type are usually solved in the ODE-dynamic mode while the DAE-dynamic mode is employed when a part of the AE subset is implicit. Models of the PDAE-type may or may not include AE subsets and ODE subsets. The PDE set is usually discretised with respect to the independent variables to yield a set of ODEs. Thus solution of PDAEs involves a discretisation step before solution of the resulting DAEs or AEs. Note that introducing or removing model assumptions also generate different versions of a process model. 2.2 Model Generation Model generation here implies generation of various versions of an available (reference) model according to the modelling needs. A process separating a stream into two phases through a separator (see Figure 4) will now be used as an example to highlight the different forms and versions of models. The starting point is a simple mass balance model.
215
' V, y, Hv
F,Z,HF
F, V, L: Flowrates of feed, vapour & liquid streams z, y, x: Composition (mole fractions) of feed, vapour & liquid streams
P,T
- • L, X,
HL
Figure 4: A two-phase separation process 1 M l : Simple mass balance model (steady state); Si is split factor for component i; NC is number of components; MB is mass balance; CI is constitutive 1 equation; C2 is connection equation 1 M2: Simple mass balance model (steady state); replace Si with equilibrium constant Ki for component i; selected CI model for Ki assumes ideal system; pi is vapour pressure of component i M3: Mass balance model (steady state); select another model for Ki for component i that does not assume ideal system; yi is activity coefficient for component i in Hquid phase; 9i is fugacity coefficient of component i in vapour phase
tf, H^, H^: Enthalpies of feed, vapour & liquid streams P: Pressure T: Temperature Q: Heat duty 0 = Fzi-Vyi-Lxi 0 = Si - (Vyi)/( Fzi) 0 = 1-ESi
i = l,NC i=l,NC i=l,NC
MB CI C2
0 = Fzi-Vyi-Lxi 0 = Ki-pi(T)/P 0 = yi-XiKi 0 = 1-lyi 0 = Fzi-Vyi-Lxi 0 = Ki-(pi(T)yi)/(P(pi) 0 = yi-f(T,x) 0 = (pi-f(T,P, y) 0 = yi-XiKi 0=1-Eyi M3 plus
i=l,NC i=l,NC i = l,NC i=l,NC i = l,NC i=l,NC i = l,NC i=l,NC i=l,NC i=l,NC
MB CI C2 C2 MB CI CI CI C2 C2
M4: Rigorous steady state model with mass & energy balance (EB); H^ is enthalpy for stream k; different constitutive (enthalpy) models can be 0 = Q-(FH^-VH^-LH^) selected. 0 = H^ - f (T, P, composition) M5: Rigorous dynamic model with mass & energy dni/dt = F z i - V y i - L x i i = 1, NC balance; same model as M4 except new MB & EB dE/dt = Q - ( F t f - V H ^ - L H ^ ) (only the extra equations are shown); assume 0 = Xi-ni/(Zni) negligible vapour hold-up; Ui is molar liquid holdup 0 = L-f(n,A,T,P) for compound i; E is energy holdup 0 = V-f(n,T,P) M6: Rigorous two-phase reactor model; add reaction Reaction term (mass) = (L Xk )Ratei terms to MB & EB and add kinetic (constitutive) Reaction term (energy) = AHR L model; d are kinetic parameters; AHR is heat of 0 = Ratei-f(T,P,n,d) reaction; k is reference reactant
EB CI MB EB C2 C2 C2 1
216 From the above models, it can be noted that in the life of a process, since the components in the process are unlikely to be changed, the CI (constitutive) models would not change but the MB, EB and C2 models may change as one moves from on stage another in the life of a process. On the other hand, in the life of a process model, the EB, MB models are unlikely to be changed but the CI and C2 models may be changed. For example. Model M5 may be further simplified by assuming rate of change of energy holdup (this will convert the EB to an algebraic equation) or the vapour holdup may not be assumed to be negligible (this will make the model more complex as now the holdup Ui will be sum of liquid and vapour holdups). Alternatively, Models M3, M4, and M5 may be simplified by selecting simpler CI models for the Hquid and vapour phase fugacities. Note that the nonlinearity of EB and MB equations becomes clear by inserting the corresponding CI equations into them. Note also that models Ml and M2 are linear because the corresponding constitutive variables are independent of composition. Models consisting of CI and C2 are commonly used for equilibrium saturation point calculations (does not need MB and EB) that enable the generation of phase diagrams. In a similar fashion, kinetic models represented by CI and their corresponding C2 equations generate reaction yield diagrams (attainable regions). In the next section, describes how the generated models can be very quickly imported into a computer-aided system and after analysis solved with the appropriate solver. 3. ICAS-MoT: COMPUTER-AIDED MODELLING TOOL ICAS-MoT takes a model represented by a set of equations as input and generates either a solution to the model equations or exports the model as a COM-object for use in a simulationengine or external software. The model equations can be imported from a model generation tool (for example, ModDev), a text-file containing the equations, a model written in XML, or, a model directly added by the user through an editor option in MoT. The work process in MoT is divided into two main activities - Model Definition and Model Solution (see Figure 5a-5b). In addition to the above, MoT also allows Model Transfer in terms of COM-objects for link to the ICAS simulation-engine and/or use in external software. 3.1 Model Definition As shown in Figure 5a, model definition includes model creation (that is, writing a derived set of model equations according to the syntax rules of MoT), model translation (translates and expands the created text-based model), import (reads models introduced through external txt-files and/or XML-files), modify (allows the modification of the text-based model). The important step here is translation that dissects the text-based equations using a Reverse Polish Notation (RPN) algorithm and classifies equations and variables using a multi-layered classification system for equations and variables. First, MoT identifies the equations and variables according to a set of simple syntax rules, for example.
217 Equations must contain a "=" The derivative term must be placed on the left hand side of the ODEs or PDEs The derivative term must start with a "d" or "p" to indicate an ODE or a PDE, respectively If more than one operator is placed between two variables, only the first one will be considered Variable names must not have any operator (+, -,*,/, ) Only one variable or a derivative operator are allowed on the Left Hand Sides (LHSs) of any equation An equation with zero "0" on the LHS is counted as an implicit AE Q " G-3 Model Definition U
View Original Model
pB'^^iiBfflBii^
Ul View Translated Model
B • C3
H
Solver Options
U
Solution Options
Jl
Variable Bounds
U
Import Model
i
IJl
Create Model
;- • 1
Stepwise Model Solution ;
H
Modify Model
^- H
Solve Model
^ odelA^ariable Analysis
& Q
Misc
U
Classify Variables
Ill
Variable Chart Trace
H
Relate crr/dt to Y
^
Show Solution
I •H
Equation Ordering
•^
Set Variable Value
i
U
B " CU Advanced Options
B" Q
B-C]
U
Design Variables
U
Constraint Variables
U
Obi. Value and Paring
Il|
Define Relationships
U
Equation Trace Back
Testbed Environment U
Define Compounds
Ill
Define Stream Compositioi
Data Sets Igj
Meassurements
J
Design Sets
I % Definition [ " ^ Solution
Figure 5 a: Problem definition options in MoT
J Definition
"hS
Figure 5b: Problem solution options in MoT
218 Based on the above syntax rules, MoT identifies the equations and classifies (according to the first-translation layer, as, • Algebraic Equations (AEs) - Implicit (having more than one unknown variable per equation) - Explicit (having only one unknown variable per equation) • Ordinary Differential Equations (ODEs) • Partial Differential Equations (PDEs) The variables are classified (in this first-translation layer) in terms of those appearing on the LHSs and Right Hand Sides (RHSs) of the equations. All variables listed as appearing on the RHSs of equations are classified (by default) as parameters. The translation algorithm, illustrated in Figure 6a, scans the original text based equations and checks for mathematical consistency (the number of equations and variables before translation must be the same after translation). The syntax rules listed above are used to interpret the text based model equations. If the model passes this validation test, each equation and variable is expanded through the RPN algorithm. This means that a large number of expanded equations and variables need not be specified as the initial model, making the import of external models more flexible and easy. Once the equations have been expanded, the translation step is completed.
Original Equations
Pre-screen for validity Relationships
Variable Expansion
Equation Expansion
T
Expanded Equation Set Defined
Z
LHS Classification
RHS Classification
Model Ready To Solve
Figure 6a: Main steps of the model translation algorithm
219 3.2 Model Analysis The analysis step invokes the second-analysis layer for the classification of variables. In this layer, all variables identified previously as those appearing on the RHSs of equations and classified as parameters, are now re-classified as, , • . • • • •
Parameter (variables with known values) ExpHcit (variables that are functions only of parameters and/or dependent-prime variables) Implicit-Unknown (variables related to algebraic equations where there are more than one unknown variable per equation) Dependent (variables appearing with the differential operators on the LHSs of ODEs and/or PDEs) Dependent-prime (the derivative operator related to the dependent variable)
Note that according to this classification, the LHS variables can only be Unknown, Dependent and/or Dependent-Prime. Once the classification of the variables according to the rules of the analysis layer has been made, MoT offers a number of analysis options, •
Generation/analysis of incidence matrix (with equations as row index and variables as column index) • Checkfor singularity of matrix (identifies equations with no unknown variables) : • Equation trace-back (examines the model equations) • Equation-by-equation solution mode (solving residuals of each translated equation - this is an equation debug option to check if either the values passed to the equation are correct or if the derived function is correct) • Decomposition, partitioning and ordering of the model equations (identifies the sub-sets of equations that need to be solved simultaneously) Furthermore a degree of freedom analysis is automatically performed to ensure that the problem is not ill posed (number of equations do not match the number of unknown variables) before going to the solution step. 3.3 Model Solution The solution step in MoT involves the following. • •
Admin: Administration of the solution procedure (drives the numerical solution procedure) Solver-link: Connection to the solver specified by Admin Resid: Computes the function values for the translated equations
220
The Admin is divided into a global administrator and one or more local administrator(s), as needed by the solution strategy. Each local administrator instantiates one or more solver from the solver library. The possible local administrators are: • • •
Algebraic administrator Integration administrator Optimisation administrator
The global administrator combines all the local administrators needed to solve a given problem and handles the overall solution sequence. The local administrators are only connected to the model via the global administrator. From the model classification information, MoT detects and lists the needed local administrators (see Figure 6b where the structure of the local administrators is highlighted). If more than one local administrator is needed, the variable classification rules for the third-solution layer are invoked. Here, variables are classified as, • •
Design/Manipulated (parameters whose values may be changed in the outer-loop) Constraints (unknown, dependent and/or dependent-prime whose values must be within some specified bounds)
If the problem solution also involves parameter estimation based on supplied data, the data variables need to be classified as Real, Integer and Binary. Once the variables have been identified, the appropriate solver linked to MoT is invoked through the global administrator. Note that in problems related to optimisation (parameter estimation, process optimisation, etc.), multiple solvers may be used if the process model equations are solved separately from the objective function and constraints. 3.4 Model Transfer After the model equations have been successfully solved, the user has the option to generate a COM-object of the model for transfer to the ICAS model library (for use through the ICAS simulation engine), to generate a COM-object for use in external software (see figure 6c). For repeated use of the model, a model transfer is recommended. Note, however, that if the model equations are changed, the COM-object would need to be generated again. The same COM-object can however be used for different sets of parameters, for example, different sets of compound properties, reaction kinetics and equipment sizing data. In Fig. 6c, DynSim represents the dynamic simulation engine of ICAS.
221 Local Optimizer administrator Check lists and expand if needed
Set initial values
u
SQP
Update design variable list Time Y
Local dynamic administn Check Design variable list and update if nessesary
Function Evaluation Update Y in Model. Create local algebraic administrato
V
Until T=Tend
&0lvJlAb5 via local algebraic administrato
Take a step T = T +dT
Store constraint values in list
Update dYdT dYdT
|_
Determine list-based objective value
Figure 6b: Structure of local administrators instantiated for a dynamic optimisation problem ICAS
DynSim
MS- Excel
\
MoT Model
1
^ y
Figure 6c: COM-Object transfer
222
4. MODELLING THROUGH ICAS-MoT: APPLICATION EXAMPLES In this section, each of the main modelHng steps within ICAS-MoT (to be called MoT) is highlighted through several illustrative examples. For the first example (model for a simple 2phase vapour-liquid flash separator), all the main steps with respect to the use of MoT are highlighted. For the other examples, the model equations are presented, analysed and their solutions through MoT are discussed. For each model, all the model equations in text format (ready for import to MoT) together with their solution values are given in Appendix (A1-A6). 4.1 Multiple Model Generation and Solution with MoT The process description for a two-phase flash separation operation is illustrated through Figure 7, where the shell (boundary) and its connections (streams) are shown together with the types of equations. STREAM CONNECTION OBJECT Name: 3 Models for quantities: Energy (enthalpy): //3=(iFUNC_E(2:/3„,7l,,P3) Models for the "from"-coiinection: (equihbrmm) Energ}^ connection: T.^=Tj^^^^ Momentum connection: r\=Pjiash SHELL OBJECT ^2imQ: flash Assmiied phase condition: Calculate (VL) Equilibrium model: 0=J^ft^K^^j^ ''f./fty (SKSX^.^, P^j^^^, fjn, fvy ^Kj^^^i), no accumulation, mclude mass & energy balance SHELL CONNECTION OBJECT Name: heater Connection models: Energy connection: 0^heater ^flash Figure 7: Process description in terms of shell and connections From the above model description, a generic model for the process, which is valid for any number of components present in a mixture whose vapour-liquid equilibrium behaviour can be assumed to be ideal, has been developed. Figure 8a shows an example for such a model. M
223
principle, modelling tools such as ModDev can also generate this model. Note, however, that the generated model equations would need to be rearranged to match the equations shown in Fig. 8 a. Model Import In this model the variables Tout, Pout, Tref, Tfeed F, and z are specified and the model calculates the heat duty (q), the exit stream flowrates (V & L) and their corresponding equilibrium compositions (x & y). #inherit relations/assumptions Tl = Tout PI = Pout Tv = Tl Pv = PI #phase distribution P s a t [ i ] = (10^(Aant[i] - B a n t [ i ] / ( T o u t - 2 7 3 . 1 5 + C a n t [ i ] ) K[i] = P s a t [ i ] / P o u t
))/750
hVap[i] = (Avap[i]*(l - Tout/Tc [i] ) ) ^ (Bvap [i] + Cvap[i]*(Tout/Tc[i])+Dvap[i]*(Tout/Tc[i])^2))/1000 hV[i]=(((((E[i]*0.2*Tout+D[i]*0.25)*Tout+C[i]/3.0)*Tout+B[i]*0.5)*Tout+A[i])*Tout)/1000 hVr[i]=((({(E[i]*0.2*Tref+D[i]*0.25)*Tref+C[i]/3.0)*Tref+B[i]*0.5)*Tref+A[i])*Tref)/100 0 hL[i] = ( ( ( ( ( E [ i ] * 0 . 2 * T o u t + D [ i ] * 0 . 2 5 ) * T o u t + C [ i ] / 3 . 0 ) * Tout+B[i]*0.5)*Tout+A[i])*Tout)/1000 hLr[i] = (((((E[i]*0.2*Tref+D[i]*0.25)*Tref+C[i]/3.0)*Tref+B[i]*0.5)*Tref+A[i])*Tref)/1000 hVFeed[i]=(((((E[i]*0.2*Tfeed+D[i]*0.25)*Tfeed+C[i]/3.0)*Tfeed+B[i]*0.5)*Tfeed+A[i])*Tf eed)/lOOO #conservation L = F -V x [ i ] = z [ i ] * F / ( K[i]*V+L ) y[i] = K[i]*x[i] SY = sum__i (abs (y [i] ) ) SX = s u m _ i ( a b s ( x [ i ] ) ) #0 = z [ i ] * F - y [ i ] *V+L*x[i] 0 = SX - SY Hv = sum_i( (hV[i]-hVr[i]+hVap[i] ) *y[i] ) HI = sum_i( ( h L [ i ] - h L r [ i ] ) * x [ i ] ) Hf = sum_i( ( h V F e e d [ i ] - h V r [ i ] ) * z [ i ] ) 0 = (Hf*F - H1*L -Hv*V + q)/lOOO
Figure 8a: Model equations for simple 2-phase flash (VLB)
224
Model Translation The above model can be imported directly as a txt-file into MoT. During translation, MoT asks for the compounds present in the mixture to be identified. If the model equations are to be solved using data for the model parameters transferred directly from the component database in ICAS, then the compounds must be selected at this stage. MoT then retrieves the necessary constitutive model parameters from the ICAS component-model database. Otherwise, values for the all the model parameters must be suppHed before the model equations can be solved (using the "set variables" option in MoT). The translated model is shown in Figure 8b. Note that the translated model does not show the actual expanded equations used by MoT to interpret the equations.
P
M^^^i^L^i
Fie Edit View Window Help
•
ED
Ji£i
Model Definition m
^ i
View Oiiginal Model
g j View Tianslated Model H] Import Model U H
B a
U
EQ
[
Relate dy/dt to Y Equation Oideiing
H
Set Variable Value
Advanced Options DesignVariables
H
Constraint Variables
gj
Ob| Value and Paiing
Jj
Define Relationships
U
Equation Trace Back
Testbed Environment U
EQ 51 „ ^
Modify Model Classify Vaiiables
U
D
ECJ
Cieate Model
M ode Wanable Analysis J]
.JSJxJ
i TV = Tl E Pv = PI E Psat[i] = {10^(Aanl]:i] - Bant[i]/(Tout-273.15+Cant[i]) ))/7eo :+ IC[i] = Psat[i]/Pout £ hVap[i] = (Avap[i]*(l-(Tout/Tc[i]))'^(Bvap[i]+Cvap[i]*
—J
Define Compounds
J ] Define Stream Compos Data Sets »J
1
%fj^aamtt \Huiim
\
>r
^
{Model Information Retrived Model Testbed ver. 0.7 Init Complete
\j\Alk\Hh^'*^^A^*'*^*>'^^^h^'^^^^^'''^t^'^^
A'to'ial'teChatt.Twce / , :
ILiU
^
Figure 8b: The ordered translated model equations. The next step is to analyse the translated model. First the variables are classified (actually, reclassified from the default classification in the translation step) and the incidence matrix will be generated. This model has 32 equations and 65 RHS variables.
225
Model Analysis The classification of variables is done through the "Variable Classification" option in MoT. This allows the classification of all non-explicit variables into 4 categories: Parameter, Unknown, Known and Dependent. Any changes in the variable classification will immediately be reflected in the Incidence Matrix. MoT also hsts the detected expHcit variables and shows their last known value. For new models, all exphcit variables are set to 0 by default. This model has 32 equations (2 imphcit AEs and 30 AEs) and 65 variables (26 parameters, 7 known, 2 unknown (imphcit) and 30 explicit (unknown)). Ordering the model equations actually shows that the equation system can be arranged as a system of exphcit AEs. That is, there are no equations involved with more than one unknown variable. Therefore, this model can be solved through the debug-mode. The debug option (or equation by equation solution mode) is shown in Figure 9. By simply stepping through each equation, the calculated value (shown on the right side of the equation in Fig. 9) can be noted and in this way, any mis-match of equations and variables can be identified. Note that the user does not need to write any programming code to reach this stage. The next step is the automatic model solution-mode. For this model, however, this is not necessary since stepping through each equation has aheady provided the solution. Therefore, this steady state model is modified to its corresponding dynamic version, which is described and solved in the sections below. l1
r r
4
r r
7
r r
10 11 12 13 14 15 16 17 18 19
m
21 Z2 23
i24 i^ 126 127
m 129 30 31 32 33 M
^
r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
rtol=((n_0)+(n_1)) 32 hVr_0=(((((EDippr104_0«0 2"Tref+DDippr104_0"0 25)'Tref+CDippr1 G4_0y3 O'Tref+BDippr104_0'0 5)'Tref+ADippr104 6167.39 hVrJ =((((CEDippr104J »0.2*Tref+DDippr104_1 »0 25)'Tref+CDippr104J /3.0)»Tref+BDippr104_1 *0.5)*Tref+ADippr104 18575.4 'hLr_0=(((((EDippr1G4_0"0 2'Tref+DDippr104_G'0 25)'Tref+CDippr104.0/3 0)*Tref+BDippr104_0'0 5)'Tref+ADippr104_ 6167.39 "hLrJ =(((((EDippr104J «0 2«Tref+DDippr104_1 "0 25)*Tref+CDippr104_1 /3.0)*Tref+BDippr104J *0.5)*Tref+ADippr104_ 18575 4 zTank_0=n_0Wol 0 40625 'zTankJ=njyhtot 0 59375 "v=ValveV'(Pout-Pmin) 0 07 DenV=Pouf/(G .08314»Touf) 0 G6G1395 hL_1=(((((EDippr104_1*0 2*Tout+DDippr104_1*0 25)*Tout+CDippr104J/3 0)'Tout+BDippr104J >0 5)'Toiit+ADippr104_ 7940 'hL_0=(((((EDippr1 G4_0«0 2«Tout+DDippr104_0«0 25)*Tout+CDippr104_0/3 GJ'Tout+BDipprl 04_0"0 5)"Tout+ADippr1 G4_ 2757.9 'hV_1 =(((((EDippr104_1 '0.2»Tout+DDippr104_1 »0.25)''Tout+CDippr104J13 0)'Tout+BDippr104_1 «G.5rTout+ADippr104 7940 hV_0=(((((EDippr104_G'0 2«Tout+DDippr104_G«G 25)«Tout+CDippr104_0/3 0)"Tout+8Dippr1 G4_0''0 5)«Tout+ADippr104 2757 9 "hVapJ =(ADippr103_1 *(1 -(Tout£)B_TcJ ))'XBDippr1 G3_1 +CDippr103J '(Tout£)B_TcJ )+DDippr103_1 «(Tout£>B_Tc_1 43868 8 hVap_0=(ADippr103_0*(1 -(ToLityDB_Tc_0))'XBDippr1 G3_0+CDippr1 b3_0'(ToutroB_Tc_G)+DDippr103_G'(Tout£»B_Tc_G 48951 3 'PsatJ =(10'^DB.Antoine A J -DB_ArrtoineB_1 J(Tout-27315+DB_AntoineC_1 )))/760 K_1=Psat_1/PoU 'x_1 =zTank_1 y(1 +phi«(K_1 -1)) 1 18744 I yJ=x_1*K_1 5 5533e-005 '> Psat_0=(10'XDB_AnloineA_0-DB_AntoineB_0/(Tout-27315+DB_AntoineC_0)))/760 2 53064e-006 ^ 2 53064e-006 '.. K_0=Psat_0/Pout KResidual_2=((2Tank_0«(1 -K_0)/(,t +phi'(K_0-1 )))+(zTank_1 "(1 - K J X I +phi«(KJ -1)))) x_0=zTank_0fl:i +phi*(K_0-1)) HI=(((hL_0-hLr_0)«x_G)+(ChL_1 -hLrJ )«xj)) 'y_0=x_0'K_0 Hv=(((hV_0-hVr_0+hVap_0)«y_0)+((hV_1 -hVrJ +hVap_1 ry_1)) Residual_3=(HI*ntot'(1 -phi)+Hv*ntot*phi-Htank)/1000 Ttank=Tout dLJ =ADippr101J BDipprI 01J -Xl +(1 -TtankyCDipprI 01J )'CDippr101J) 'dL_0=ADippr101 _0yBDippr101.0-XI +(1 -TtankJCDipprI 01 _0)15Dippr101 _0) "DenL=1 /t:(x_0/dL_0)+(x_1 /dLJ)) level=ntot*(1 -phi)/(Area'DenL) ResidualJ =ntot"(1 -phO/DenL+ntot"phi©enV-Vol PhPout Pv=PI
Figure 9: Debug solution mode - equation by equation
226 Model Solution The steady state model developed above is converted to a dynamic model by adding mass and energy accumulation terms to the mass and energy balance equations for the shell (see Figure 8a). These equations are listed below in Figure 10 (note that additional equations related to the equipment size variables such as level of liquid in the tank that are also needed, are not shown in Figure 10). These are given in the complete hsting of the model equations in the Appendix (A3). d n d t [ i ] = F*z [ i ] - L * x [ i ] - V * y [ i ] d H t a n k d t = Hf*F - Hl*L - Hv*V + q
Figure 10: Dynamic model 2-phase vapour-liquid flash separator Note that in the equations listed in Figure 10, n[i] and Htank are the dependent variables and dn[i] and dHtank are their corresponding dependent-prime variables, respectively, t is the independent variable. This model can now be solved by setting either the initial values for the dependent variables at the initial value of t and integrating to a final value of t or vice versa (the numerical solver for integration allows forward as well as backward integration). The set of DAEs are solved in the ODE-mode since all the AEs in this model are explicit. The BDF-integration method available in ICAS is used to solve this initial value forward integration problem. Figure 11 shows a screen-shot from MoT as the solution of the model equations progress. ^ ^ I
Tout I Pojt I n.O I P o
UMim
Htank \ Sa«.Po.nlsNow|r^5^
j__ji^_
1 .;
]_ J _
Vaiab't Update Oplrans
1 1
I Do net updalp variatihs
Ths mode- vdioblet wSI bs updated with the values oblaireo m tNs sdUion Fot cfcinanc modelj IbsreansthallheiTitidrtaietis
A
X
y
Solution Statistic T otal N' imber cf Equations solved
|
Model Equaticn.
|
Speed of SoiutKn[KEPS)
{
Solution DuIa^)0^
I
-"•-'
1
! 1 1 1 ,
i
i
1 20
22
24
26
d L J = ADipprI 0 1 J /BDippM 0 1 J "(I +(1 -Ttank/CDipprl 01 Jj'ODipprI 01 _1) dL_0-ADipprl 01 _0/BDippr101.0-'CI +(1 -TtankKDipprl 01 .Oj-dDipprl 01 _0) DerL=1/((x_0/dL_0)+(xll/dLJ » I PwNntnt'd-phiWArea'DenL) KesidualJ •ntot'd -phi)/DenL+ntofphl©enV-Vol PI=P0iit
Figure 11: Model solution completed and solution statistics
I
227
Model Modification The above two-phase dynamic flash model is now converted to a two-phase dynamic reactiveflash model where reactions may occur in the liquid phase by adding a reaction term to the mass and energy balance equations together with the corresponding reaction rate model (CI-model) and their model parameters. For purposes of illustration, only the mass balance version of the model solution is illustrated for the following reactive system (see Figure 12), #reactionl a => b + c EACTl=El*1000/8.31441/T RRATE=Al*exp(-EACTl)/( -STC_0 ) RRATE=RRATE*{ conc_0^ord_0 ) #reaction2 a + b => d EACT2=E2*1000/8.31441/T LL_RRATE=A2*exp(-EACT2)/{ -LL_STC_0 ) LL_RRATE=LL_RRATE*(LL_conc_O^LL_ord_0)*(LL_conc_l^LL_ord_l)
Figure 12: Reaction rate equations for the dynamic reactor model For the above reactive system, the concentration of B (n_l) as a function of time and a plot of the concentration of B as a function of A (n_0) are shown in Figures 13a and 13b, respectively. Note that Figure 13b is actually the attainable region diagram (Horn 1964, Glasser et al. 1987) for B as a function of A and can be obtained by simply integrating the following differential equation. ---Kate-LL,
dt dn^ Kate-LL, dt nF,0 [ | , 0
rJ..2 j HJi
^
; /-
t"^:" 1
/ ''
/
!: 2
4
-Ctot^'.
i
^- ~\ '
"K:
O-t-n
)
••^v;-i
"I/"'"
T
i I
1
i
"^
2
I LLHICaj-LUtldLlI LtHfdJa|:iIHId„3J '8mTE|'tLLRFaTEj'
1
:
6
8
i
• i '
1
. . ; - • •
•
"
^
.
I 1 — i — i — 1 10 12
i 1 i
''
"^••^--4
18
20
22
:
24
Figure 13a: Concentration of B as a function of time
Figure 13b: Attainable region diagram
(6) (7)
228 If experimental data for the actual reactor operation is available, then this model can be used to regress the reaction parameters (for example, Al, El, A2, E2, and the reaction orders ordi) in the reaction equations listed in Figure 12. This model has 63 Equations (59 independent out of which 8 are ODEs and 51 are explicit AEs) and 91 variables (17 Parameters, 15 known, 8 Dependent and 51 Explicit). 4.2 On-line Parameter Estimation This example highlights the determination of optimal kinetic parameters for the bio degradation of glucose (from wastewater) to methane. The complex reaction scheme is shown in Figure 14 (where consumption rates ri follow simple Monod kinetics). The kinetic model used in this example has been taken from Skiadas et al. (2000). Previous kinetic studies have showed that the anaerobic decomposition of glucose (G/) to methane and carbon dioxide passes through the production of lactic (Lac), acetic (Ac) and propionic acid (Pr). In the present mechanism, concentrations of other volatile fatty acids such as butyrates are neglected, as they are small in comparison to those of acetate and propionate.
•[Gl] X k^,+[Gl] [Lac] V [Lac]
X
•[IP] X
K,+[iP] m„
.4[Pr]
"max 5
Ks+lAc]
X
Figure 14: Kinetic mechanism for anaerobic bioconversion of glucose to biogas. According to this reaction (kinetic) mechanism, the behaviour of the batch digester is described by a set of nonlinear first-order ordinary differential equations (mass balances) taking the form:
229
(8)
dt r.=
M max,i
i
y
(9)
Were, mmaxj is the maximum specific consumption rate of the component i (in mg of Q per gm of biomass per hour), A:si is the saturation constant of the component / (in mg/1), Q is the concentration of component / (in mg/1 for i =1,2,4,5 and in mg carbon/1 for / = 3), Xis the total biomass concentration (in g/1), n is the consumption rate of component i for products formation and cell synthesis and / is the component index for Glucose (1), Lactate (2), IP (an intermediate product, 3), Propionate (4) and Acetate (5), respectively. On-line estimation of parameters using measurements of concentration, Q^exp, and employing an observer (Alvarez and Lopez, 1999) has been performed. This required the derivation of additional ordinary differential equations to be solved simultaneously with the process model equations (Eq. 8). For the k^^ parameter Pk, the additional ODE is given by, ,
~ Jk\
^ i ' ^ji'
"i' ^s,i'
^ ' ^estimator'
^i,exp
)
(10)
In general, the estimator objective is to minimize the error between the calculated concentration Q and experimental measurement Q^exp as a function of the time, such that at sufficiently long time, the ODEs would IOQ dPk/dt « 0. To verify this on-line parameter estimation approach, experimental data for acetate concentration (taken originally in discrete form) was regressed into as a time dependent function. Then Eqs.(8)-(10) and the regressed function for measured acetate concentration, representing a new model was imported to MoT. The variable mmaxs was identified as the parameter to estimate. The translated model is shown in Figure 15.
JCMM J6 £i inMll,,
Ac.CKp = 6.00077 + 85.29691*7 - .«1401*T-^2+aZI707'7'-3-a0021S*7'4 rl=mmiim*GI*X/(kjl-i-ra)
:*X/(k»2+lac)
•X/(k«3+IP) •X/(k54+Pr) 3? rS = mmBi<S»Ac »X/(ki5 + Ac ) i^aUKl=lt5S+AC •f:auK3=iinmm5«1{
£7lir
Design Sets
=10
-H
Figure 15: Translated model for parameter estimation problem
230
The model has 29 equations (6 ODEs and 23 explicit AEs) and 47 variables (18 Parameters, 6 Dependent and 23 Explicit). This model is solved by setting the initial values for the dependent variables at the initial value oft and integrating to a final value of tend- The initial conditions for this model are defined by the dependent variables as shown in the figure 16. Dependent 1427.423 0 10 1e-008 102 18
Gl |Lac |mmax3 IP Pr |AC
Figure 16: Initial conditions for the estimator model The generated solution is shown in Figure 17 (concentration of acetate is plotted as a function of time), obtained by using the BDF-integration method in ICAS. As shown in Figure 17, after nearly 32 hours, a steady state value for the parameter is obtained (nimaxs = 4.82 ). Hi1j^^•^M^^^^^l Ac_exp| T
1^61
isi I Lac
| mmax3| IP
|_Pr
\[Z
•
250
/
150 100 50
0
---
\
/
200
i
——
7
300
Save Pointe Now I F Sf've Points ! when done
\
/ /
t 0
s
\\ —
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
Time (lirs]
Figure 17: Transient response of Acetate concentration in the reactor.
231 4.3 Emulsion Copolymerisation Reactor Here, the copolymerisation of MMA-Styrene is considered. The dynamic model (Dimitratos et al, 1989) considers mass balances around a volume and includes thermodynamic equilibrium in a semi-continuous reactor: ^ ^ dt
= g,{[M;'UM^],V„F„F„[l],a„a,}, ^
=
i = l,2
(11) (12)
g,[[M^],[M^\V„F„F„[l]}
dK = g4Mn,[MaF.F,,[l]} dt iAG/RT}f{[Mf],[M^la„a,}=iAG/RT)^{[Mfl[M^],a„a,},
^
(13) i = l,2
(14)
where, Mf is the monomer concentrations in the polymer phase P and monomer i, / is the initiator concentration, VR is the reactor volume, AG is the partial molar free energies, at are the monomer partition coefficients, and Ft are non-linear functionalities (described in Appendix A4 where all the model equations are provided) Equations (11)-(14) form a set of stiff DAEs. The model as listed in Appendix has 114 equations (74 independent, out of which, 4 are ODEs, 2 are impHcit AEs and 68 are exphcit AEs). There are 103 variables (28 Parameters, 2 Unknown, 1 Known, 4 Dependent and 68 Explicit). The initial values of the dependent and unknown (implicit) variables are hsted in Figure 18. Oepemlent Y1 Y2 Y4 Y3 dYl «^2 dY3 dY4 YS Y6
Dependent Prime
|
1e-015 1e-015 0.55 0.00272
UiritRown
i
0
b
;
i
i0.1
i
0 0
Ki"
;
Figure 18: Initial condition for the copolymerisation model The DAE set has been solved with the BDF-integration method in ICAS. Figures 19a-19c show some of the simulation results. The distribution of the monomers between the different phases change throughout the course of the polymerization and after the end of the monomer addition period, a drop in the level of monomers in the polymer particles is observed, followed by a drop in the aqueous-phase monomer concentrations. The time and concentration are in seconds and mole/htre, respectively.
232 1111111'lillitill IfII1 ilM ' MIiMW^^^^^^^^^^^^^^^^^^^ JYI
Y5
1 y2
1 V4
1 Y3
i' "'•' ^M^k^-. -..,
j
Unknown
- - _ ,— ,—: -'•—
20 •
r--. . ^
-i—^H-
••--•—
n .;
1
/ / / i
1
—
i
1 i
-
i i
4000
8000
12000
16000
1
—, _..
,
! 0
Close
p Save Points SavePoin>» Now | when done
Unkno«nA« 25 •
2Si
-
1 MIR 1 M2R j Mia \ M2a 1
20000
24000
28000
32000
t—1 36000
Unknown Aa!
(a) jn
j_Y5
jjjT
j_Y4
j_Y3
|fM1R~ M2R | Mia j M2a |
Close Save Pwntt Now
/
/ /
y 0
...U-__
/
1
•
--
' '- H~i
"
/ >
4000
8000
-
i
12000
- !
16000
*- -1 - ^
20000
24000
28000
L^
1
• 1
1
;
- -"-1
\
i
_-.- i -
1
! i
:
T
)
1
1 1
32000
36000
(b) |; ;'C3ose"^-:
NA
-•
\ \ ; ~;"; \ .,
0.018 0.016 j
°' //
-
.:
/: / i
1
•^^
1
!
i
^,-.,^
• • • j " "
i 1
!
1 i 4000
1 6000
i i
!
• ^ - - . .
-,.-1
1
°™ 1
0
1
v^ i
12000
i
! 1 '
16000 20000
24000
28000 32000
36000
•
(c) Figure 19: Styrene concentration vs. time, (a) in the particles (b) in the reactor and (c) in the water
233
4.4 Beer Fermentation Process The modelling of fermentation processes is a basic part of any research in fermentation process control. Since all the optimisation work to be done is based on the reliability of the model equations, they are important for the right design. Their model equations are generally highly non-linear. In fermentation, an accurate mathematical model is indispensable for the control, optimisation and the simulation of a process. Models used for on-line control and those used for simulation will not generally be the same (even if they pertain to the same process) because they are used for different purposes; no model can be said to be the best. The model is not expected to be a reconstruction of the process, rather it is intended to serve as a set of operators on the identified set of inputs, producing similar output as expected from the process. The model is described by the following set of ODEs (Andres-Toro, et. al. 1996): dX,Jt) dt dXJt) dt
^i.-XJt)-k„-XJt) dX,^Jt) dt
(15)
-=-^^u.,•x^,(t) + ^i^^-X^Jt)
=K•xJt)-^i^•x^(t) dS(t) dt dE(t) dt
Ms-XJt)
• " " ^ - . . - ^ dt ' '" dt dD(t) = k,^ 'S(t)^XJt)-k^ dt
(16) (17) (18)
(20) 'D(t)'E(t)
(21)
where, Xacu X/ag, Xdead, are active, suspended and dead biomass concentration, S, E, A andD are, substrate, ethanol, ethyl acetate and diacetyl concentrations, respectively. The MoT version of the model as supphed by Woinaroschy (2002) is given in Appendix A6 and its translated version in MoT is shown in Figure 20a. The model has 17 equations (7 ODEs and 10 explicit AEs) and 21 variables (2 Parameters, 2 Known, 7 Dependent and 10 Exphcit). It is solved with the BDFintegration method in ICAS with the initial values of the dependent variables Hsted in Figure 20b.
234
D
F<e E *
View Window Help
•li^lai 'I'-ir*! #|?|
d ^ ^ l Ml ^1 :ilJl,.,! .d>!i
l~
[^
Model Definition
*.
U
View Onginal Model
Q
View Translated Model
Bl ''"P°'' Model D Cieate Model H bj D
Modify Model
Model/Variable Analysis Q
Classify Vaiiables
H
Relatedr/dttoY
HI Equation Oidenng
1
H
Set Variable Value
F" ~~\ Advanced Options j
H) Design Variables H
1
1 f-'
^
Obj. Value and Paring Define Relationships
ll
EQiJs'io" Trace Back
|
120
ye
0 0
y7
I
Testbed Enviionment III '-'B'''^^'^o''
1
111 Define Stream Compos \ E
Dependent y5 y1 y2 y4 y3
Constraint Variables
J) D
1
j
+J temp_K = T+273.15 +. f = l-y5/(0.5*si) +' mKO = eMp(i08.3I-31934.09/teinp_K) .+, meas = eKp(89.92-26589/temp K) 15 msO =eKp(-41.92+11654.64/temp_i:) tl mlag =exp(30.72-9501.54Aeinp_K) ,+1 km=eKp(130.16-38313/temp_K} a mDO = eKp(33.82-10033.28/temp_IC) +1 maO = eHp(3.27-1267.24/temp K) a ks=eHp(-I19.53+34203.95/temp_IC) i l dKl =-yl*mlag +1 dK2 = y2*y4*mi«0/(0.5*si+yS)-y2*km+yl*mlag +1 dK3 = y2*km-(0.5*si*mD0/(0.5*si+y5))*y3 .+1 dK4 = -(y4*ms0/(ks+y4))*y2 3 dx5 = (y4*ma0/(ks+y4))*f*y2 '+i dH6 =(y4*ms0/(ks+y4))*y2*meas+y6*0 31 dx7 = kdc*y4*y2-kdm*y7*y5
13 Data Sets Jj
%IM»imim
J
Meassurements
j*$Solil»oft J
Figure 20a: Translated model of beer fermentation model
Figure 20b: Initial conditions for the beer fermentation model
A sample of the dynamic simulation results is shown in Figure 20c, which shows the transient behaviour of ethanol concentration in the fermenter.
A^biaic Solver
|
[temp_K=T+27315 I=1-y5fl:0 5-s.1 dx7=l(dc'y1'Y2-kdm'v7'v5 mxO=expC108 31
1
S'""
I
msO=expC-41 9 mlag=exp(30 7: km=eyp(13016 mD0=exp(33 8:
^&c^:^^
Save Poinis Now
bU -
name y5 yi y2 V4 /3 y6 y7
J^l
Unknown A«s
BofoiesoliitiMi
120 0
1 Aner solution 49 88089895612 1 467617563955e-G05 1 465232644086 0 0002014335094599 01765568784112 4 313846154062 0 01502563436254
•
....
55 •
^^
1 1
in
in nr
./
--
-in 15 •
/
10 -
S;-' 0
20
i
/ 30
I
;
i
:
i
'
1
• " • • • • •
10
\ !, i
' ^ " ' i^
40
•
50
60
70
80
90
100
Figure 20c: Beer fermentation model solution completed.
235
4.6 Using MoT by Excel Macro File Microsoft Excel is an extremely powerful tool, which is used by millions of people everyday. Functions tailored to a specific task can be programmed into Excel to extend its capabilities with customised analysis tools. A simple method of customising Excel is to create a Macro. We have developed an Excel Macro file that through COM* technology is able to use MoT. The Excel Macro is customised such that it is able execute a sequence of modelling related activities: reading of input data, execution of the MoT model, and writing of output (results) data, as shown in Figure 21. The building process that follows involves code generation, compiling, linking, and evaluation of the MoT model. L « ] g e gdt ^lew Insert: FQ.r(nat loots j^ata m^dow
Anal
^10 N7
1 2~ 3"" 4" '5" 6
j j
-r B
/
B
^
S
i.
D
^^ 12'
U 151
m
t?:
^
g
1$
% ,
toS /.g I •;? i S
_'<»•• A ^ J
»
c
_ ,
E
F
G
H
j
1
•
J
Activate MoT-COtt module Loac MoT Model 1
12 8 9 ' ID'i
tlelp
Number of V irlables: 6 x1 x2 x3 ResidualJ Residuar2 Residuafs
10000 2.0000 3 0000 0.0000 O.OOOD 0 0000
Pump 1
j BIO
Load 1
1 BIO
Variables and values xl : 0.999999963520098 xZ . 2.00000000942215 )C3 • 3 00000002624378 ResiduaLl : 2.83699566949736E-07 Residual 2 ; 2.03364759698843E-07 Residual_3 : 7.33545482134446E-08
1 MIO
Retee se Model
RunAnocki
Figure 21: Excel macro interface In this way, the process of evaluation and use of a MoT export model is completely automatic. The main advantage is that users can prepare data for a MoT export model, get output results from the MoT export model, and store results from the same model using different parameters, directly through an Excel worksheet environment. This makes the model use easier for those not famihar with the modelling tool-box environment. For example, the model given by equations (22)-(24) was written as MoT project and then was exported to EXCEL as shown in Figure 22.
* COM is an acronym for Component Object Model, and it is the widely accepted standard for integration of external functionality into Microsoft Office applications, such as Excel
236 Xj + ^^^'•'^+ ( ' X 2 + X 3 / = 2 7
(22)
(H-V X,
+ x] = 10
X3 + sin(x^-2)+ xl= 7
(23) (24)
5. CONCLUSIONS A computer-aided modelling tool that can assist the model developer and engineer by carrying out the time consuming and expensive steps in the modelling work process has been presented together with detailed apphcation examples. Models can be imported into MoT as they are written in their text form and can be exported to external software without the user having to write any program codes. MoT has been used for steady state and dynamic simulations of processes, for process optimisation studies, for model parameter estimation and develop and test very quickly new models reported in journals and books. MoT is also being used to very quickly generate process models that are currently not available in process simulation packages. Finally, the model export feature in MoT makes it easy to develop customised simulators for any process. Current work is involved in adding more logical equations (operators), to allow combination of different MoT generated model objects and to develop an automatic discretisation technique for models with PDAE systems.
6. REFERENCES Andres-Toro, B.; Giron-Sierra, J.M.; Lopez-Orozco, J.A. and Femandez-Conde, C , Optimisation Of A Batch Fermentation Process By Genetic Algorithms., 183-188. 1996. Alvarez, J., Lopez, T., Robust Dynamic State Estimation of Nonlinear Plants. AIChE J., 45, 107123. 1999. Bieszczad, J.: A Framework for the Language and Logic of Computer-Aided Phenomena-Based Process Modelling. PhD Thesis, Massachusetts Institute of Technology, 2000. Bogusch, R., B. Lohmann, W. Marquardt: Computer-Aided Process Modelling with ModKit, Computers and Chemical Engineering 21, 1105-1115, 1997. Dimitratos, J., Georgakis, C, El-Aasser, S., Klein, A. Dynamic Model and State Estimation for an Emulsion Copolymerization Reactor. Comp.Chem.Eng., 13, 21-33. 1989 Eggersmann, M.; J. Hackenberg, W. Marquardt, I. T. Cameron: Applications of Modelling: A Case Study from Process Design. In B. Braunschweig, R. Gani (Eds.): Software Architectures and Tools for Computer Aided Process Engineering, Elsevier CACE Series, 11, 335-372, 2002.
237
Gani, R., G. Hytoft, C. Jaksland, A. K. Jensen: An Integrated Computer-Aided System for Integrated Design of Chemical Processes, Computers and Chemical Engineering, 21, 11351146,1997. Glasser, B., Hildebrandt, D., and Glasser D., Optimal Mixing for Exothermic Reversible Reactions. I and EC Research, 31,6,1541,1992. Hangos, K; I. T. Cameron: Process Modelling and Process Analysis. Academic Press, 2001. Horn, F. Attainable Regions in Chemical Reaction Technique. In The Third European Symposium on Chemical Reaction Engg. London: Pergamon, 1964. Marquardt, W., A. Gerstlauer, E. D. Gilles: Modelling and Representation of Complex Objects: A Chemical Engineering Perspective. Proc. 6* Int. Conf. on Industrial and Engineering Apphcations of Artificial InteUigence and Expert Systems, Edinburgh, Scotland, 219-228, July, 1993. Marquardt, W., L. von Wedel, B. Bayer: Perspectives on Lifecycle Process Modelling. In M. F. Malone, J. A Trainham, B. Camahan (Eds.): Foundation of Computer Aided Process Design, AIChE Symposium Series 323, 96, 192-214, 2000. Okada, H., T. Shirao: Life Cycle Needs. In B. Braunschweig, R. Gani (Eds.): Software Architectures and Tools for Computer Aided Process Engineering, Elsevier CACE Series, 11, 65-86, 2002. Russel, B. M. R., R. Gani: MoT - A Modelling Test-Bed. In: ICAS Manual (CAPEC Report), Technical University of Denmark, 2000. Skiadas, I.V., Gavala, H.N., Lyberatos, G., Modeling of the Periodic Anaerobic Baffled Reactor Based on the Retaining Factor Concept. Water Res., 34, 3725-3736, 2000. Stephanopoulos, G.; G. Henning, H. Leone: MODEL.LA - A Modelling Framework for Process Engineering - 11. Multifaceted Modelling of Processing Systems. Computers and Chemical Engineering 14, 813-846, 1990. von Wedel, L; Marquardt, W., Gani, R: Modelling Frameworks. In B. Braunschweig, R. Gani (Eds.): Software Architectures and Tools for Computer Aided Process Engineering, Elsevier CACE Series, 11, 89-126, 2002. Woinaroschy, A., Department of Chemical Engineering, University "Politehnica" of Bucharest 1-5 Polizu Str. 78126-Bucharest, Romania, (Personal Communication), 2002.
238 APPENDIX Al. Model Equations for Steady State Flash #inherit relations/assumptions Tl = Tout PI = Pout Tv = Tl Pv = PI #phase d i s t r i b u t i o n P s a t [ i ] = (10^(Aant[i] K[i] = P s a t [ i ] / P o u t
- Bant [i] / (Tout-273 .15+Cant [i] ) ) )/760
hVap[i] = {Avap[i]*(l( T o u t / T c [ i ] ) ) ^ ( B v a p [ i ] + C v a p [ i ] * ( T o u t / T c [ i ] )+Dvap [ i ] * ( T o u t / T c [ i ] )^2) )/lOOO h V [ i ] = ( ( ( ( ( E [ i ] * 0 . 2 * T o u t + D [ i ] *0 .25) *Tout+C [ i ] / 3 . 0) *Tout+B [ i ] *0 . 5) * T o u t + A [ i ] ) *T out)/lOOO hVr[i] = (({{(E [i]*0.2*Tref+D[i]*0.25)*Tref+C[i]/3.0)*Tref+B[i]*0.5)*Tref+A [ i ] ) * Tref)/1000 hL[i] = ( ( { ( (E[i] *0.2*Tout+D[i]*0.25)*Tout+C[i]/3.0)*Tout+B[i]*0.5)*Tout+A[i] ) * T o u t ) / l 000 hLr[i] = (((((E[i]*0.2*Tref+D[i]*0.25)*Tref+C[i]/3.0)*Tref+B[i]*0.5)*Tref+A[i])*Tref)/l 000 hVFeed[i] = ( ( ( ( ( E [i]*0.2*Tfeed+D[i]*0.25)*Tfeed+C[i]/3.0)*Tfeed+B[i]*0.5)*Tfeed +A[i])*Tfeed)/lOOO #conservation L = F - V x [ i ] = z [ i ] * F / ( K[i]*V+L ) y[i] = K[i]*x[i] SY = s u m _ i ( a b s ( y [ i ] ) ) SX = s u m _ i ( a b s ( x [ i ] ) ) #0 = z [ i ] * F - y [ i ] * V + L * x [ i ] 0 = SX - SY Hv = s u m _ i ( HI = s u m _ i ( Hf = s u m _ i (
(hV[i]-hVr[i]+hVap[i] ) *y[i] ) (hL[i]-hLr[i] )*x[i] ) (hVFeed [ i ] - h V r [ i ] ) * z [ i ] )
0 = (Hf*F - H1*L -Hv*V + q)/lOOO
239 Result before and after the soluii«(ri| 1 Tl Tout PI Pout Tv Pv Psat_0 Aant_0 Bant_0 Cant.O PsatJ AantJ Bant 1 Cant J i K_0 P
Hame
Before solution 0 355 0 1" o' 0 o' 8.07131 1730.63 233.426
o' 8.08097 1582.27 239.726 0 0
xj
i#yfe'£i;i:
1 After solution .355 355 '1 1 ' 355 1 0.5026232194386 8.07131' 1730.63 233.426 .1.904563405575 .8.08097 '1582.27 •239.726" 0 0.5026232194386'
7Z —1
.d
Show all items Close
A2. Model Equations for Dynamic Flash Tl = Tout PI = Pout Ttank=Tout Tv = Tl Pv = PI ntot = sum_i(n[i] ) zTank[i] = n [ i ] / n t o t #find k values P s a t [ i ] = {10^(Aant[i] - B a n t [ i ] / ( T o u t - 2 7 3 . 1 5 + C a n t [ i ] ) K[i] = Psat [ i ] / P o u t x [ i ] = zTank[i]/(1+phi* (K[i]-1) ) y[i] = x[i]*K[ii
))/760
#get the e n t h a l p i e s hVap[i] = ( A v a p [ i ] * ( l ( T o u t / T c [ i ] ) ) ^ ( B v a p [ i ] + C v a p [ i ] * ( T o u t / T c [ i ] ) + D v a p [ i ] * ( T o u t / T c [ i ] )^2) ) / 1 0 0 0 h V [ i ] = ( ( ( ( ( E [ i ] * 0 . 2 * T o u t + D [ i ] * 0 . 2 5 ) *ToUt+C [ i ] / 3 . 0) *Tout+B [ i ] *0 . 5) *Tout+A [ i ] ) *T out)/lOOO hVr [ i ] = ( ( ( ( ( E [ i ] *0 . 2*Tref+D [ i ] * 0 . 2 5 ) *Tref+C [ i ] / 3 . 0) *Tref+B [ i ] * 0 . 5 ) * T r e f + A [ i ] ) * Tref)/1000 hL[i] = ( { ( ( (E [ i ] * 0 . 2 * T o u t + D [ i ] * 0 . 2 5 ) * T o u t + C [ i ] / 3 . 0 ) * T o u t + B [ i ] * 0 . 5 ) * T o u t + A [ i ] ) * T o u t ) / I 000
240 hLr[i] = { { { { {E[i] * 0 . 2 * T r e f + D [ i ] *0 .25) *Tref+C [ i ] / 3 . 0) *Tref+B [ i ] * 0 . 5 ) * T r e f + A [ i ] *Tref)/l : 000 #get the densities d L [ i ] = A 1 0 5 [ i ] / B 1 0 5 [ i ] ^ ( 1 + ( 1 - T t a n k / C 1 0 5 [ i ] : ^D105 [ i ] ) DenL = 1 / sum_i ( x [ i ] / d L [ i ] ) DenV = P o u t / ( 0 . 0 8 3 1 4 * T o u t ) #rachford r i c e 0 = n t o t * { l - p h i ) / D e n L + n t o t * p h i / D e n V - Vol 0 = sum_i{ z T a n k [ i ] * ( 1 - K [ i ] ) / (1 + p h i * { K [ i ] - 1 ) ) Level = ntot*(1-phi)/(Area*DenL) L = ValveL*Level V = ValveV*(Pout-Pmin) Hv = s u m _ i ( ( h V [ i ] - h V r [ i ] + h V a p [ i ] ) * y [ i ] ) HI = s u m _ i ( ( h L [ i ] - h L r [ i ] ) * x [ i ] ) 0 = ( H l * n t o t * ( 1 - p h i ) + H v * n t o t * p h i - Htank)/lOOO #update holdups d n d t [ i ] = F*z [ i ] - L * x [ i ] - V * y [ i ] d H t a n k d t = Hf*F - Hl*L - Hv*V + q
B4^'lliHtiltli1-HHHil¥? SS^^E,cl:i r Hdme
Tl
Before solution 0
Tout
After solution
200
:457.457i91503i
6
Pout
1
118.26431308498
Ttank
0
1457 ;457l"915031
Tv Pv n_0 nj
0
1457.4571915031
0
118.26431308498
13 19
114.24165253207
ntot
0
123.54380922324
i
0
iO
zTank 0
0
i 0.6049000990893
zTank 1
0
Psat 0
0
Aant 0
8.07131
18.07131
Ban(t_0
1730.63
1730.63
Show all items
1
1 Jk,
: 457.4571915031
PI
!/
=18.26431308498
19.302156691162
i 0.3950999009107 lll!l'58d638l'461
Close
1
A3. Model Equations for Two Phase Reactive Flash ftl=sum_ i ( f l [ i ] : ft2=sum i ( f 2 [ i ] ;
1
xj
d I
241 v a l i d [ i ] = p o s t ( n f [ i ] ) *nf [ i ] LLvalid[i] = post(LLHld[i])*LLHld[i] nf_tot=sum_i(
valid[i]
)
level=nf_tot/density/area LLHld_tot=sum_i(LLvalid[i]) LL_level=LLHld_tot/LL_density/area V = level*area cone[i]=valid[i]/V LL_V = L L _ 1 e v e 1 * a r e a LL_conc[i]=LLvalid[i]/LL_V #reactionl a => b + c EACT1=E1*100 0/8.31441/T RRATE=Al*exp(-EACTl)/( -STC_0 ) #reaction2 a + b => d EACT2=E2*1000/8.31441/T LL_RRATE=A2*exp(-EACT2)/{ -LL_STC_0 ) RRATE=RRATE*( conc_0^ord_0 ) RRATE=RRATE*post(nf_0) RV = RRATE*V RVl [ i ] = RV * STC[i] LL_RRATE=LL_RRATE* (LL_conc_O^LL_ord_0) * ( L L _ c o n c _ l ^ L L _ o r d _ l ) LL_RRATE=LL_RRATE*post(LLHld_0)*post(LLHld_l) LL_RV = LL_RRATE*LL_V LL_RVl[i] = LL_RV * LL_STC[i] ft3=valve*level X[i]=valid[i]/nf_tot f3 [ i ] = f t 3 * x [ i ] LL_x[i]=LLvalid[i]/LLHld_tot d i f f [ i ] = f l [ i ] + f 2 [ i ] - f t 3 * f 3 [i]+RV1 [ i ] - t o L L [ i ] * v a l i d [ i ] LL d i f f [ i ] = t o L L [ i ] * v a l i d [ i ] + L L R V l [ i ]
242 - -
-
LJ
_i„-,,„^,Ti™r_„,„i
i Result before and after t h e ' s b l U t i « | i ; i i i | | | i l i ^ ^ ^ Name LLHId_2 LLvalid_3 LLHId_3 nfjot level density area LLHIdJot LLJevel LL_density V conc_0 concj conc_2 cone 3 LL_V
Befoie solution 0 0 0 0 0 55.8 0.25 0 0 65 0 0 0 0 0 0
2Sl\
u
After solution 0 1.509901988568 1.509901988568 8.518082047665 0.6106152005495 55.8 0.25 8.462134524128 0.5207467399464 65 0.1526538001374 2.229045527241 e-007l 0.1832495799612 55.61675019713 0" 0.1301866849866
\
!/ Show all items
1
Close
j
A4. Model Equations for On Line Parameter Estimation #0n Line Parameter Estimation
:J^ ************************* * # Batch Anaerobic Biogas Reactor #Experimental measured concentration fixed Ac_exp = 6.00077 + 85.29891*T - 7.41401*T^2 + 0.21787*T^3 - 0.00215*T^4 #The rl = r2 = r3 = r4 = r5 =
consumption rates follow simple Monod kinetics: mmaxl*Gl *X/(ksl + Gl ) mmax2*Lac*X/(ks2 + Lac) mmax3*IP *X/(ks3 + IP ) mmax4*Pr *X/(ks4 + Pr ) mmax5*Ac *X/(ks5 + Ac )
auxl aux2 aux3 aux4 aux5
= = = = =
ks5 + Ac auxl^2 mmax5*X BE3*X Ac_exp -
Ac
#reactor residence time Thr =10.0
243 So Omega
=60.0 = So/Thr
#damping factor Zeta =0.71 #linear gain kll = 2.0*Zeta*Omega kl2 = Omega^2 # The mass balances batch reactor Fl = - r l F2 = C E l * r l - r 2 F3 = D E l * r l - r 3 F4 = AL2*r2 + AL3*r3 F5 = BE2*r2 + BE3*r3 F6 = ( (aux3*Ac/aux2 dGl
dLac dip dPr dAc
= = = = =
for the anaerobic biodegradation of glucose to biogas in a
- r4 + Y4AcPr*r4 - r5 - aux3/auxl)*kll + kl2)*(ks3 + IP)/{aux4*IP)*aux5
Fl F2 F3 F4 F5 + k l l * a u x 5
#Parameter to be estimated. dmmax3 = F6 id after thjBsdliitJWi
I
Name
Ac_exp T r1 mmax1 Gl
rks1 r2 mmax2 Lac ks2
r
mmax3 IP ks3
u
P
i^M^';.
Before solution 1 After solution [ j ; ^ 25.93943929692 | 0 ' 33.750000879 0 |0 r 0 197.94 h 97.94 \ ' \Q \ ^ 1427.423 14 114 46 146 0 11.4511952961356-0431 2.88 12.88 0 13.959114151162e-044i 11 ill 0 11.819258533568 10 10.1980394597962 1e-008 159.16015441718 31 131 0 6.5436 ^1
Show all items Close
244
A5. Model Equations for Emulsion Copolymersation Reactor #DYNAMIC MODEL OF EMULSION COPOLYMERISATION REACTOR #test system Styrene (S) /Methyl Methacrylate (MMA) #some parameters for styrene kpll = 1.259e07*exp(Epll/R/Te) ktll = 1.700e09*exp{Etll/R/Te) rol mlp mlw
= dl/MWl = MWl/DOP/MWl = MW1/MWW*10
#some parameters for methyl methacrylate kp22 = 4.768e07*exp{Ep22/Te) kt22 = 6.5808el0*exp(Et22/Te) ro2 m2p m2w
= = =
d2/MW2 MW2/DOP/MW2 MW2/MWW*10
#Parameters for styrene/methyl methacrylate ml2 = MW1/MW2 m21 = 1.0/ml2 #Parameters for ammonium and potassium persulfate kd = 2.288el6*exp(Ed/R/Te) #Other parameters dp kpl2 kp21
= (dpi + dp2)/2.0 = kpll/rl = kp2 2/r2
x21
= xl2
#Total volume of polymer phase auxl = Yl/Y5/rol aux2 = Y2/Y6/ro2 aux3 = 1 . 0 - auxl - aux2 Vp = Y4 - Vwa/aux3 #initiator decomposition rate RI = 2.0*f*kd*Y3 #Average radical concentrations auxl = kp21*Yl aux2 = kp21*Yl + kpl2*Y2
245 PI = auxl/aux2 ktl2 = sqrt(ktll*kt22) kt_ = ktll*Pl**2.0 + PI*(1.0-Pl)*ktl2 - kt22*(1.0-Pl)^2.0 AvRaCon
= sqrt(RI*Vp/(2.0*Y4*kt_))
#Styrene(monomer) consumption rate auxl = AvRaCon*kpll*kp22 aux2 = rl*Yl*Yl + Y1*Y2 aux3 = kp22*rl*Yl + kpll*r2*Y2 Rpl = auxl*aux2/aux3 RXNl = Rpl*Y4 #MMA auxl aux2 aux3
(monomer) consumption rate = AvRaCon*kpll*kp22 = r2*Y2*Y2 + Y1*Y2 = kp22*rl*Yl + kpll*r2*Y2
Rp2 = auxl*aux2/aux3 RXN2 = Rp2*Y4 #Functions fl = Vp/Y4*(1.0 - 1.0/Y5) + 1.0/Y5 f2 = Vp/Y4*(1.0 - 1.0/Y6) + 1.0/Y6 #Partials derivatives auxl = 1.0 - fl aux2 = 1.0 - f2 aux3 = rol*(Y5 - 1.0)*Vwa aux4 = ro2*(Y6 - 1.0)*Vwa pflYl pf2Yl pflY2 pf2Y2 pflY4 pf2Y4
= = = = = =
-auxl*auxl*Y4/aux3 -auxl*aux2*Y4/aux3 -auxl*aux2*Y4/aux4 -aux2*aux2*Y4/aux4 auxl/Y4 aux2/Y4
#Functions 2 Al = fl*Y4 + Yl*Y4*pflYl A2 = f2*Y4 + Y2*Y4*pf2Y2 Bl = Yl*Y4*pflY2 B2 = Y2*Y4*pf2Yl Kl = FMl - RXNl - (fl - Y4*pflY4)*Yl*g4 K2 = FM2 - RXN2 - (f2 - Y4*pf2Y4)*Y2*g4
246 #Main functions auxl aux2 aux3 aux4
= = = =
g4 g3 g2 gl
auxl + aux2 - aux3 - aux4 -2.0*f*kd*Y3 - y3/Y4*g4 (A1*K2 - B2*K1)/(A1*A2 - B1*B2) (A2*K1 - B1*K2)/{A1*A2 - B1*B2)
= = = =
F]y[l*MWl/dl FM2*iyiW2/d2 RXN1*MW1*(1.0/dl RXN2*MW2*(1.0/d2
-
l.O/dp) l.O/dp)
#thermodynamic equilibrium the partial molar free energies filp = Yl/rol fi2p = Y2/ro2 fila = Yl/Y5/rol fi2a = Y2/Y6/ro2 fipp = 1.0 - filp - fi2p fiwa = 1.0 - fila - fi2a auxl = In(filp) aux2 aux3 aux4 aux5 aux6 aux7
= = = = = =
(1.0 - ml2)*fi2p (1.0 - mlp)*fipp xl2*fi2p*fi2p xlp*fipp*fipp fi2p*fipp*(xl2 + xlp - x2p*ml2) 2.0*Gamma/(rol*rp*Rg*Te)*fipp
DGRTIP = auxl + aux2 + aux3 + aux4 + aux5 + aux6 + aux7 auxl aux2 aux3 aux4 aux5 aux6
= = = = = =
In(fila) (1.0 - ml2)*fi2a (1.0 - mlw)*fiwa xl2*fi2a*fi2a xlw*fiwa*fiwa fiwa*fi2a*(xl2 + xlw - x2w*ml2)
DGRTIA = auxl + aux2 + aux3 + aux4 + aux5 + aux6 auxl aux2 aux3 aux4 aux5 aux6 aux7
= = = = = = =
ln(fi2p) (1.0 - m21)*filp (1.0 - m2p) *f ipp x21*filp*filp x2p*fipp*fipp filp*fipp*(x21 + x2p - xlp*m21) 2.0*Gamma/(ro2*rp*Rg*Te)*fipp
247 DGRT2P = auxl + aux2 + aux3 + aux4 + aux5 + aux6 + aux7 auxl aux2 aux3 aux4 aux5 aux6
= = = = = =
ln(fi2a) (1.0 - m21)*fila (1.0 - m2w)*fiwa x21*fila*fila x2w*fiwa*fiwa fila*fiwa*(x21 + x2w - xlw*m21)
DGRT2A = auxl + aux2 + aux3 + aux4 + aux5 +' aux6 #Differential algebraic system dYl = gl dY2 = g2 dY3 = g3 dY4 = g4 0 = DGRTIP - DGRTIA 0 = DGRT2P - DGRT2A #Concentrations auxl = 1.0/Y5 aux2 = 1.0/Y6 aux3 = 1.0 - auxl aux4 = 1 . 0 - aux2 MIR = (Vp/Y4*aux3 + auxl)*Y1 M2R = (Vp/Y4*aux4 + aux2)*Y2 Mia = Y1/Y5 M2a = Y2/Y6
248 Result before and after the solutSoii Name kp11 |Ep11 R |Te kt11 B11 |ro1 d1 !MWI m1p DOP m1w MWV kp22 Ep22 kt22
Before solution 0 -29 0.00831'447' 333 0 -9"' 0 903 104.15 " 0 150 ' 0 18.013 0 ' -3762 0
R Show all items Close
A6. Model Equations for Beer Fermentation Process temp_K = T+2 73.15 f = l-y5/(0.5*si) mxO = exp(108.31-31934.09/temp_K) meas = exp(89.92-26589/temp_K) msO = exp(-41.92+11654.64/temp_K) mlag = exp(3 0.72-9501.54/temp_K) km = exp(13 0.16-3 8313/temp_K) mDO = exp(33.82-10033.28/temp_K) maO = exp(3.27-1267.24/temp_K) ks = exp(-119.63+34203.95/temp_K) dxl dx2 dx3 dx4 dx5 dx6 dx7
= = = = = = =
-yl*mlag y2*y4*mx0/(0.5*si+y5)-y2*km+yl*mlag y2*km-(0.5*si*mD0/(0.5*si+y5))*y3 -(y4*msO/(ks+y4))*y2 (y4*maO/(ks+y4))*f*y2 (y4*ms0/(ks+y4))*y2*meas+y6*0 kdc*y4*y2-kdm*y7*y5
| After soiiition 1355.7590399254 -29 :0.00831447 ;333 =65876529.55799 ;-9 ' 18.670187229957 :903 104.15 •0.006666666666667 '" ;i50 .57.81935268972 '18.013 •591.5376205997 .-3762 ": 32423422 ;38872
m
J3 .«»j
d
249
Result before and after the solution Before solution
Name temp_K T f y5 si mxO meas msO mlag jkm mDO maO ks dx1 y1 dx2
0 12 0 1 120 0 0 0 0 0 0 0 0 0 1
d'
P^ Show ail ilems Close
Wff-
m
After solution | j j j j 285.15 : 1 12 j-^' 6.1686543504256 TJ 49.88073897446 V;:\ 120 W'A 6;C)252l'C)46l'54265 ] > | 0.03594877829459 j ; , 6.3506250448458 |;= 6.07418455620481 j-J 6.01498264565745 i-;^C 6.2551324912354 j ; : . ; 6.3096917466795 l'r:~ 1.378130247261 t< ? -1.090156137364e-66n " ' 1.4695191038496-005; -0.02195166390506 ^!
This Page Intentionally Left Blank
251
Other Papers Presented at the Workshop
Statistical Tools for Dynamic Model Building - An Overview Steven Asprey (CPSE, Imperial College) Non-linear Projection to Latent Structures (PLS) Modelling Elaine Martin and Julian Morris (University of Newcastle) An Incremental Approach to Model Identification Wolfgang Marquardt (R WTH Aachen) Assessment of Software for Kinetic Reaction Model Development, Model Discrimination, Parameter Estimation, and DOE Rob Berger (TUDelft) Real Time Optimisation and NMPC of Large Scale DAEs in Chemical Engineering Hans George Bock (IWR, Universitdt Heidelberg) Empirical and Mechanistic Modelling of a Precipitation Process Nina Thornhill (University College London) Nonlinear Optimum Experimental Design in DAEs and Application in Reaction Kinetics Johannes Schloeder (IWR, Universitdt Heidelberg) The Direct Method in the Design of Experiments for Solutions of Differential Equations Arising in Chemical Kinetics Anthony Atkinson (London School of Economics) Multiple Shooting Methods for Parameter Estimation in Differential Equations Ekaterina Kostina (IWR, Universitdt Heidelberg) Inverse Problems in Reaction-Diffusion Systems Andre Bardow and Wolfgang Marquardt (RWTH Aachen)
This Page Intentionally Left Blank
253
Author Index Asprey, S. P. Atkinson, A. C. Bernardo, F. P. Duever, T. A. Flores-Tlacuahuac, A. Gani, R. Guerrero-Santos, R. Haario, H. J0rgensen, S. B. Kristensen, N. R. Madsen, H. Oxby, P. W. Pistikopoulos, E. N. Reilly, P. M. Rollins, D. K. Saldivar-Guerra, E. Sales-Cruz, M. Saraiva, P. M. Turunen, I. Verheijen, P.
105 141 175 63 21 209 21 1 41 41 41 63 175 63 159 21 209 175 1 85
This Page Intentionally Left Blank
255
Subject Index
Algebraic equations (AEs)
107, 213
Balance equations Bayesian approach Beer fermentation model
212 94 233
COM-object Complete Feasibility - see general framework Computational criteria Computer-aided modelling tool Condition ratio Connection and conditional equations CONOPT Constitutive equations Continuous Time Hammerstein Approach Continuous Time Stochastic Modelling (CTSM) Control vector Parameterisation (CVP) Convexity condition Copolymerisation - modelling Copolymerisation - simulation studies Criteria - Akaike' s Final Prediction Error (FPE) Criteria - Akaike's Information Criterion (AIC) Criteria - error variance Criteria - Shortest Data Descriptor (SDD)
220
DASOLV Data Issues - Irregular sampling
190 87 211 77 212 182 212 161
Data Issues - Missing observations Data Issues - Occasional outliers Decision process Decision-making criteria Design of experiments - global criteria Design of experiments - local criteria Determinant criterion Differential/algebraic equations (DAEs) Dynamic model Dynamic model building Dynamic optimisation Dynamic optimisation framework Dynamic simulation
48 48 91 176 8 11, 12 70 107,213 23 109 221 119 27
41 105 187 64 74 88 88 88 88 114 48
Effect of replication simulation study Emulsion copolymerisation reactor model Engineering criteria Equation set object (ESO) Equivalence Theorem Estimation - kinetic parameters Expected value approach Expected value of perfect information (EPVI) Experiment design - ammonia Experiment design - Aoptimality Experiment design - Doptimality Experiment design - dynamic
80 231 87 114 144 4 117 178 100 14,115 14, 115,145 131
256 Experiment design - Eoptimality Experiment design fermentation of Baker's yeast Experiment design - Robust Explicit feasible region evaluation - see general framework Exponential decay
124 114
Flexibility Flexibility analysis
87 185
GAMS General Framework - Process design under uncertainty General Least Squares (GLS) Generalised Benders Decomposition (GBD) GPROMS
182
115
191 146
190 71 189 114
Hammerstein Models Hybridoma cell culture - model
160 117
ICAS Identifiability - local Independent models Inference approach - model selection
211 111 91
Kinetic mechanism Markov Chain Monte Carlo method (MCMC-method) Mathematical model - reactor & heat exchanger system Mathematical Models Maximum likelihood estimation MIN0S5 ModDev Model (dynamic) Model adequacy Model Analysis Model building process Model Definition Model formulation
88 23 9 181 212 44 182 211 24 89 219 107 216 43
Model Generation Model identification Model identification cycle Model selection Model selection experimentation Model Solution Model Transfer Model Translation Model validation Model validation Model.La Modelling Modelling of a fed-batch bioreactor - Linear kinetics Modelling of a fed-batch bioreactor - case study Modelling of a fed-batch bioreactor - Monod kinetics Modelling of a fed-batch bioreactor - Monod kinetics with substrate inhibition ModKit MoT Multivariate Weighted Least Squares (MWLS)
214 41 42 2 95 219 220 218 3 53 211 1,209 54 53 56 57 211 212 63,72
NAG SQP algorithm NARMAX (Nonlinear AutoRegressive Moving Average models with exogenous inputs) Nested models
114
On-line parameter estimation Operating policy in face of uncertainty Optimal design formulations Optimal design solutions Optimisation approach Optimisation problem - MINLP Optimum design - multivariate response Optimum design - One response Ordinary Differential Equations
228
160 90
185 188 201 92 93 144 143 15, 107,
257
(ODEs) Parameter estimability dynamic Parameter estimate distributions Parameter estimation - methods Parameter fitting Parameter sensitivity Parameter uncertainty simulation study Parametric discrimination Parametric identifiability global Parametric identifiability solving SIP problems Parametric sensitivity Partial differential algebraic equations (PDAE) Phenomena models Physico-chemical criteria Point estimates of parameters Polymerisation reactor Power transformation Probability density function (PDF) Process design under uncertainty Process life Process models Reaction lumping scheme Reactions - Two consecutive first order Residual analysis Robust formulation with complete feasibility - see general framework Robust formulation with explicit feasible region evaluation - see general framework Robust process criteria Semi-infinite programming (SIP) Sensitivity coefficients
213 111 75 44 27 145 79 10 109 112 29 107 211 87 68 21 145 179 180 209 211 98 149 51 194 195 1^2 105, 109 30
Shell description Simulation SIP - solution algorithm Six-sigma quality - design formulation Soft uncertain parameters Statistical criteria Statistical Design of Experiments (SDOE) Statistical tests Stochastic differential equations Stochastic flexibility Stream connection Theoretical Hammaerstein Process Time series & multivariable data Two-phase reactor model Two-phase separation model Uncertain parameters Uncertainty Uncertainty - formulation of design problems Uncertainty classification and modelling Uncertainty model
222 210 114 197 189 87 159 50 43 186 222 163 99 227 222 198 176 182 183 199
Value of perfect information (VPI) Volterra Models
177 160
Worst case approach
118
This Page Intentionally Left Blank