DATA HANDLING IN SCIENCE AND TECHNOLOGY — VOLUME 26
Practical Data Analysis in Chemistry
DATA HANDLING IN SCIENCE AND TECHNOLOGY Advisory Editors: S. Rutan and B. Walczak
Other volumes in this series: Volume 1 Volume 2 Volume 3 Volume 4 Volume 5 Volume 6
Volume 7 Volume 8 Volume 9 Volume 10 Volume 11 Volume 12 Volume 13 Volume 14 Volume 15 Volume 16 Volume 17 Volume 18 Volume 19 Volume 20A
Volume 20B Volume 21 Volume 22 Volume 23 Volume 24
Volume 25
Microprocessor Programming and Applications for Scientists and Engineers, by R.R. Smardzewski Chemometrics: A Textbook, by D.L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte and L. Kaufman Experimental Design: A Chemometric Approach, by S.N. Deming and S.L. Morgan Advanced Scientific Computing in BASIC with Applications in Chemistry, Biology and Pharmacology, by P. Valkó and S. Vajda PCs for Chemists, edited by J. Zupan Scientific Computing and Automation (Europe) 1990, Preceedings of the Scientific Computing and Automation (Europe) Conference, 12–15 June, 1990, Maastricht, The Netherlands, edited by E.J. Karjalainen Receptor Modeling for Air Quality Management, edited by P.K. Hopke Design and Optimization in Organic Synthesis, by R. Carlson Multivariate Pattern Recognition in Chemometrics, illustrated by case studies, edited by R.G. Brereton Sampling of Heterogeneous and Dynamic Material Systems: Theories of Heterogeneity, Sampling and Homogenizing, by P.M. Gy Experimental Design: A Chemometric Approach (Second, Revised and Expanded Edition) by S.N. Deming and S.L. Morgan Methods for Experimental Design: Principles and Applications for Physicists and Chemists, by J.L Goupy Intelligent Software for Chemical Analysis, edited by L.M.C. Buydens and P.J. Schoenmakers The Data Analysis Handbook, by I.E. Frank and R. Todeschini Adaption of Simulated Annealing to Chemical Optimization Problems, edited by J. Kalivas Multivariate Analysis of Data in Sensory Science, edited by T. Næs and E. Risvik Data Analysis for Hyphenated Techniques, by E.J. Karjalainen and U.P. Karjalainen Signal Treatment and Signal Analysis in NMR, edited by D.N. Rutledge Robustness of Analytical Chemical Methods and Pharmaceutical Technological Products, edited by M.W.B. Hendriks, J.H. de Boer, and A.K. Smilde Handbook of Chemometrics and Qualimetrics: Part A, by D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. de Jong, P.J. Lewi, and J. Smeyers-Verbeke Handbook of Chemometrics and Qualimetrics: Part B, by B.G.M. Vandeginste, D.L. Massart, L.M.C. Buydens, S. de Jong, P.J. Lewi, and J. Smeyers-Verbeke Data Analysis and Signal Processing in Chromatography, by A. Felinger Wavelets in Chemistry, edited by B. Walczak Nature-inspired Methods in Chemometrics: Genetic Algorithms and Artificial Neural Networks, edited by R. Leardi Handbook of Chemometrics and Qualimetrics, by D.L. Massart, B.M.G. Vandeginste, L.M.C. Buydens, S. de Jong, P.J. Lewi, and J. Smeyers-Verbeke Statistical Design — Chemometrics, by R.E. Bruns, I.S. Scarminio and B. de Barros Neto
DATA HANDLING IN SCIENCE AND TECHNOLOGY — VOLUME 26 Advisory Editors: S. Rutan and B. Walczak
Practical Data Analysis in Chemistry MARCEL MAEDER School of Environmental and Life Sciences The University of Newcastle Callaghan, NSW 2308, Australia
YORCK-MICHAEL NEUHOLD School of Environmental and Life Sciences The University of Newcastle Callaghan, Australia
Amsterdam – Boston – Heidelberg – London – New York – Oxford
Paris – San Diego – San Francisco – Singapore – Sydney – Tokyo
Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands Linacre House, Jordan Hill, Oxford OX2 8DP, UK
First edition 2007 Copyright © 2007 Elsevier B.V. All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email:
[email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-444-53054-7 ISSN: 0922-3487
For information on all Elsevier publications visit our website at books.elsevier.com
Printed and bound in The Netherlands 07 08 09 10 11 10 9 8 7 6 5 4 3 2 1
Contents PREFACE
IX
SYMBOLS
XIII
1 INTRODUCTION
1
2 MATRIX ALGEBRA
7
2.1 Matrices, Vectors, Scalars 2.1.1 Elementary Matrix Operations Transposition Addition and Subtraction Multiplication 2.1.2 Special Matrices Square Matrix Symmetric Matrix Diagonal Matrix Identity Matrix Inverse Matrix Orthogonal and Orthonormal Matrices
8
10
10
12
16
21
21
22
22
23
24
25
2.2 Solving Systems of Linear Equations
26
3 PHYSICAL/CHEMICAL MODELS
29
3.1 Beer-Lambert's Law
33
3.2 Chromatography / Gaussian Curves
36
3.3 Titrations, Equilibria, the Law of Mass Action 3.3.1 A Simple Case: Fe3+ + SCN 3.3.2 The General Case, Definitions A Chemical Example, Cu2+, Ethylenediamine, Protons 3.3.3 Solving Complex Equilibria The Newton-Raphson Algorithm Example: General 3-Component Titration Example: pH Titration of Acetic Acid Equilibria in Excel Complex Equilibria Including Activity Coefficients Special Case: Explicit Calculation for Polyprotic Acids 3.3.4 Solving Non-Linear Equations One Equation, One Parameter
40
40
43
45
48
48
56
58
60
62
64
69
69
Contents
vi Systems of Non-Linear Equations
3.4 Kinetics, Mechanisms, Rate Laws 3.4.1 The Rate Law 3.4.2 Rate Laws with Explicit Solutions 3.4.3 Complex Mechanisms that Require Numerical Integration The Euler Method Fourth Order Runge-Kutta Method in Excel 3.4.4 Interesting Kinetic Examples Autocatalysis 0th Order Reaction The Steady-State Approximation Lotka-Volterra / Predator-Prey Systems The Belousov-Zhabotinsky (BZ) Reaction Chaos, the Lorenz Attractor
4 MODEL-BASED ANALYSES
71
76
77
77
80
80
82
86
87
89
91
92
95
97
101
4.1 Background to Least-Squares Methods 4.1.1 The Residuals and the Sum of Squares Linear Example: Straight Line Non-Linear Example: Exponential Decay
102
103
103
105
4.2 Linear Regression 4.2.1 Straight Line Fit - Classical Derivation 4.2.2 Matrix Notation 4.2.3 Generalised Matrix Notation 4.2.4 The Normal Equations The Pseudo-Inverse Linear Dependence, Rank of a Matrix Numerical Difficulties 4.2.5 Errors in the Fitted Parameters 4.2.6 Excel Linest 4.2.7 Applications of Linear Least-Squares Fitting Linearisation of Non-Linear Problems Polynomials, the Savitzky-Golay Digital Filter Smoothing of Noisy Data Calculation of the Derivative of a Curve Polynomial Interpolation 4.2.8 Linear Regression with Multivariate Data Applications Computation of Component Spectra, Known Concentrations Computation of Component Concentrations, Known Spectra The Pseudo-Inverse in Excel
109
109
113
114
115
117
119
120
121
125
127
127
130
131
135
138
139
143
144
145
146
4.3 Non-Linear Regression 4.3.1 The Newton-Gauss-Levenberg/Marquardt Algorithm A First, Minimal Algorithm
148
148
149
Contents Termination Criterion, Numerical Derivatives The Levenberg/Marquardt Extension Standard Errors of the Parameters Multivariate Data, Separation of the Linear and Non-Linear Parameters Constraint: Positive Component Spectra Structures, Fixing Parameters Known Spectra, Uncoloured Species Reduced Eigenvector Space Global Analysis 4.3.2 Non-White Noise, F2-Fitting Linear F2-Fitting Non-Linear F2-Fitting 4.3.3 Finding the Correct Model 4.4 General Optimisation 4.4.1 The Newton-Gauss Algorithm 4.4.2 The Simplex Algorithm 4.4.3 Optimisation in Excel, the Solver F2-Fitting in Excel
5 MODEL-FREE ANALYSES
vii 153
155
161
162
168
169
175
180
183
189
190
195
197
198
198
204
207
211
213
5.1 Factor Analysis, FA 5.1.1 The Singular Value Decomposition, SVD 5.1.2 The Rank of a Matrix Magnitude of the Singular Values The Structure of the Eigenvectors The Structure of the Residuals The Standard Deviation of the Residuals 5.1.3 Geometrical Interpretations Two Components Reduction in the Number of Dimensions Lawton-Sylvestre Three and More Components Mean Centring, Closure HELP Plots Noise Reduction
213
214
217
219
221
222
223
224
224
228
231
235
239
241
243
5.2 Target Factor Analyses, TFA 5.2.1 Projection Matrices 5.2.2 Iterative Target Transform Factor Analysis, ITTFA 5.2.3 Target Transform Search/Fit Parameter Fitting via Target Testing
246
250
251
253
257
5.3 Evolving Factor Analyses, EFA 5.3.1 Evolving Factor Analysis, Classical EFA 5.3.2 Fixed-Size Window EFA, FSW-EFA 5.3.3 Secondary Analyses Based on Window Information
259
260
268
271
Contents
viii
Iterative Refinement of the Concentration Profiles Explicit Computation of the Concentration Profiles
271
276
5.4 Alternating Least-Squares, ALS 5.4.1 Initial Guesses for Concentrations or Spectra 5.4.2 Alternating Least-Squares and Constraints 5.4.3 Rotational Ambiguity
280
281
282
288
5.5 Resolving Factor Analysis, RFA
290
5.6 Principle Component Regression and Partial Least Squares, PCR and PLS 5.6.1 Principal Component Regression, PCR Mean-Centring, Normalisation PCR Calibration PCR Prediction Cross Validation 5.6.2 Partial Least Squares, PLS PLS calibration PLS Prediction / Cross Validation 5.6.3 Comparing PCR and PLS
295
296
297
298
300
303
306
308
309
310
FURTHER READING
313
LIST OF MATLAB FILES
317
LIST OF EXCEL SHEETS
321
INDEX
322
Preface The word 'practical' in the title describes a characteristic feature of this book. However, it could easily be misunderstood. It is not a book that is meant to be taken into the laboratory to remain on the lab bench next to instruments and test tubes. The book is practical insofar as every bit of theory applicable to data analysis is exemplified in a short program in Matlab or in an Excel spreadsheet. The philosophy of the book is that the reader can study the programs, play with them and observe what happens. They are short and concise and thus invite and encourage meddling and improving by the reader. Suitable data are generated for each example in short routines. This ensures the reader has a clear understanding of the structure of the data and thus will have a better chance of comprehending the analysis. In fact, the programs, rather than complex equations, are often used to elucidate the principles of the analysis. The programs are written modular. The reader can replace the artificial data, generated by a function, with real data from the lab. There is extensive use of graphical output. While the plots are minimal they efficiently illustrate the results of the analyses. In order to keep the programs concise, no effort was made to build comfortable user-interfaces. In Chapter 2, we give a brief introduction to matrix algebra and its implementation in Matlab and Excel. The next three chapters form the core of the book. We distinguish two types of data analysis: model-based and model-free analyses. For both, appropriate data have to be generated for subsequent analysis. In Chapter 3, we supply the theory required for the modelling of chemical processes. Many of the example data sets used for both kinds of analyses are taken from kinetics and equilibrium processes. This reflects the background of both authors. In fact, this part of the book serves as a solid introduction to the simulation of equilibrium processes such as titrations and the simulation of complex kinetic processes. The example routines are easily adapted to the processes investigated by the reader. They are very general and there is essentially no limit to the complexity of the processes that can be simulated. Chapter 4 is an introduction to linear and non-linear least-squares fitting. The theory is developed and exemplified in several stages, each demonstrated with typical applications. The chapter culminates with the development of a very general Newton-Gauss-Levenberg/Marquardt algorithm. Chapter 5 comprises a collection of several methods for model-free data analyses. It starts with classical Factor Analysis, employing many
x
Preface
geometrical visualisations, covers popular methods, such as EFA and ALS and concludes with a brief introduction into the calibration based methods PCR and PLS. A fair amount of effort has been put into writing short and concise but still readable code. A few highlights: lolipop.m, a very short function of 7 lines of code that performs polynomial interpolation of any degree and complexity. It can be used for interpolation and, of course with caution, for extrapolation. NewtonRaphson.m, a 30 line code function that equilibrium problems of any degree of complexity.
solves
chemical
nglm3.m, a function with 40 lines of code for the general non-linear leastsquares fitting based on the Newton-Gauss algorithm. It incorporates a very efficient handling of parameters. A package of compact PCR and PLS programs that includes cross validation. While many aspects of data analysis are introduced, starting from very basic facts, the book is not primarily written for the beginner. Its main audience is expected to come from post-graduate students, research and industrial chemists with sufficient interest in data analysis to warrant the development of their own software rather than relying on other people's packages that all too often are rather black boxes. Statistics plays a crucial role in any data analysis, and accordingly, the statistical aspects are mentioned and appropriate equations/code are supplied. E.g. examples are given for the least-squares analysis of data with white noise as well as F2-analyses for data with non-uniformly distributed noise. However, the statistical background for the appropriate choice of the two methods and more importantly, the effects of wrong assumptions about the noise structure are not included. Many of our students, colleagues and friends deserve to be acknowledged. Most important are the students. They have repeatedly forced us to think and re-think the concepts of data analysis. The principle is straightforward: it is not possible to explain anything properly without having understood it in depth. Most important were those students who have been involved in chemometrics projects over the years: Andrew Whitson, Arnaldo Cumbana, Caroline Mason, Eric Wilkes, Graeme Puxty, Jeff Harmer, Kirsten Molloy, Maryam Vosough, Monica Rossignoli, Nichola McCann, Pascal Bugnon, Peter Lye, Porn Jandanklang, Raylene Dyson, Rod Williams, and Sarah Norman. Important are also all the colleagues who helped educate us and have been part in some or many aspects of data analysis, they include: Alan Williams, André Merbach, Andreas Zuberbühler, Anna de Juan, Arne Zilian, Bernhard Jung, Bill Tolman, Charlie Meyer, Christoph Borchers, Dom Swinkels, Ed Constable, Elmars Krausz, Geoff Lawrance, Hans Brintzinger, Harald Gampp, Helmut Mäcke, Ira Brinn, Jean Clerc, Jim Ferguson, Ken Karlin, Konrad Hungerbühler, Liselotte Siegfried, Manuel Martínez, Martin
Preface
xi
Schumacher, Paul Gemperline, Peter King, Peter Comba, Robert Binstead, Romá Tauler, Sigrid Mönkeberg, Silvio Fallab, Susan Kaderli, Thomas Kaden, Tom Callcott. Special thanks to our colleagues at the Department of Chemistry, the University of Newcastle, for taking over MM's teaching and more importantly, his share of administration while on sabbatical leave, sweating over this book. Thanks also to Jenny Helman and Gudrun Ludescher for the incredible effort of proof reading a text without understanding the content. They deserve a warm thank-you from us and also from every reader.
Marcel Maeder Yorck-Michael (Bobby) Neuhold Newcastle, Australia September 2006
"Alles für die Wissenschaft", Gerda Maeder
This page intentionally left blank
Symbols y, Y
vector (mu1, nsu1) or matrix (mun, nsunl) of data (e.g. single or multi wavelength absorbance data)
C
matrix (munc, nsunc) of component concentrations
a, A
vector (npu1, ncu1) or matrix (ncunl) of linear parameters (e.g. molar absorptivities of pure species spectra)
r, R
vector (mu1, nsu1) or matrix (mun or nsunl) of residuals
F
design matrix for linear regression (munp or nsunp)
J
Jacobian matrix of derivatives with respect to parameters
U
column matrix of all eigenvectors of YYt
S
diagonal matrix of all corresponding singular values in decreasing order of their magnitude
V
row matrix of all eigenvectors of YtY
¯ U
column matrix of significant eigenvectors (nsune)
¯ S
diagonal matrix of significant singular values (neune)
¯ V
row matrix of significant eigenvectors (neunl)
¯ Y
factor analytically reproduced data matrix (nsunl)
Yred
data matrix of absorbances in reduced eigenspace (nsune)
Ared
matrix of molar absorptivities in reduced eigenspace (ncune)
Yglob
vertically concatenated data matrices Y1…Ynm
Cglob
vertically concatenated concentration matrices C1…Cnm
p
vector of parameters (npu1)
T
transformation matrix (ncunc), score matrix in PLS (nsune)
P
loading matrix in PLS (neunl)
W
matrix of loading weights in PLS (neunl)
vprog
prognostic vector in PCR or PLS (nlu1)
m, ns
# of spectra (e.g. # of rows in Y and C)
n, nl
# of lambdas (e.g. # of columns in Y or A)
nc
# of components or species (e.g. # of columns in C or rows in A)
xiv
Symbols
ne
¯ and V ¯ t or diagonal # of factors (e.g. # of columns in U ¯ elements in S, # of factors used for PCR or PLS prediction)
nm
# of measurements (e.g. # of submatrices in Yglob and Cglob)
nd
polynomial degree
df
degree of freedom
np
# of parameters
nu
# of unknown spectra
mp
Marquardt parameter
A, B, C, … names of chemical species [A]
concentration of species A
[A]tot
total concentration of component A
[A]0
initial concentration of species A
K
equilibrium constant
Exyz
formation constant of species XxYyZz
k
rate constant
Oj
j-th wavelength or j-th eigenvalue
ssq
sum of squared residuals
F2
sum of squared weighted residuals
Vr
standard deviation of the residuals
Vy
standard deviation of the noise in y or Y
Vp i
standard deviation of the parameter pi
1 Introduction As the title Practical Data Analysis in Chemistry indicates, there are different facets to the book: The book is about data analysis, the data are taken from chemistry, and the emphasis is on practical considerations. Data Analysis in Chemistry is an ambitious title; of course we cannot cover all aspects of data analysis in chemistry. A substantial fraction of the examples investigated in the different chapters is based on data from absorption spectroscopy. Absorption spectroscopy is a very powerful and very readily available technique; there are very few laboratories that do not have a spectrophotometer. Further, Beer-Lambert's law establishes a very neat and simple relationship between signal and concentration of species in solution. While most of the examples discussed in this book deal with spectra measured in the visible wavelength region, this is not important. BeerLambert's law is valid at any wavelength and covers UV, as well as NIR and IR spectroscopy. Identical laws govern CD spectroscopy and thus the methods can be adapted immediately. The only difference is that CD signals can be negative and thus, those methods that rely on positive molar absorptivities, need to be modified. Also, light emission spectroscopy often obeys laws that are very similar to Beer-Lambert's law. Other examples of data types used in the book include potentiometric data (pH) and data from monovariate chromatography detectors, such as flame ionisation or refractive index. The crucial feature is that there is a clearly defined relationship between signal and concentration. All the numerical methods are developed to a level that allows the analysis of complete absorption spectra, e.g. complete absorption spectra are measured as a function of time in a kinetic investigation. More traditional singlewavelength measurements are just a special case of spectra measured at one wavelength only. This multivariate ability is also a significant aspect of the book. There are few commercial and publicly available programs that include the analysis of multivariate data. The collection of examples is extensive and includes relatively simple data analysis tasks such as polynomial fits; they are used to develop the principles of data analysis. Some chemical processes will be discussed extensively; they include kinetics, equilibrium investigations and chromatography. Kinetics and equilibrium investigations are often reasonably complex processes, delivering complicated data sets and thus require fairly complex modelling and fitting algorithms. These processes serve as examples for the advanced analysis methods.
2
Chapter 1
There are many types of data in chemistry that are not specifically covered in this book. For example, we do not discuss NMR data. NMR spectra of solutions that do not include fast equilibria (fast on the NMR time scale) can be treated essentially in the same way as absorption spectra. If fast equilibria are involved, e.g. protonation equilibria, other methods need to be applied. We do not discuss the highly specialised data analysis problems arising from single crystal X-ray diffraction measurements. Further, we do not investigate any kind of molecular modelling or molecular dynamics methods. While these methods use a lot of computing time and power, they are more concerned with data generation than with data analysis. Also, we do not cover several typical chemometrics types of analyses, such as cluster analysis, experimental design, pattern recognition, classification, neural networks, wavelet transforms, qualimetrics etc. This explains our decision not to include the word 'chemometrics' in the title. What is practical about this book? The book is not meant to be taken into the lab. It is practical in a different way: all methods and equations that are developed, are translated immediately into a short computer program that performs the particular analysis under investigation. We decided not to supply a collection of data files from real experiments that could be read by the analysis programs for further processing. Instead, we provide short files that generate 'measurements'. The main advantage of this practice is that the reader will be able to analyse and understand the structure of the data. The results of the analyses can be compared with the input, e.g. resulting rate constants in a kinetic fit can be compared with the rate constants used to generate the data. The practice also invites the reader to 'play' with the data, investigating the influence of noise level, noise structure. The reader can observe the effects of changing the parameters used to generate the data, such as rate or equilibrium constants, absorption spectra for the reacting species, general conditions such as initial concentrations, etc. The data generation or modelling functions are a powerful educational tool. The extensive collection of example programs is a unique feature of this book. They are meant to be an invitation for the reader to be used, to be incorporated into the readers' packages and also to be fiddled with and improved. We have put considerable effort into writing good code, but no doubt, there is room for improvement. Matlab is a matrix oriented language that is just about perfect for most data analysis tasks. Those readers who already know Matlab will agree with that statement. Those who have not used Matlab so far, will be amazed by the ease with which rather sophisticated programs can be developed. This strength of Matlab is a weak point in Excel. While Excel does include matrix operations, they are clumsy and probably for this reason, not well known and used. An additional shortcoming of Excel is the lack of functions for Factor Analysis or the Singular Value Decomposition. Nevertheless, Excel is very powerful and allows the analysis of fairly complex data. The book is structured in four main chapters.
Introduction
3
Chapter 2, Matrix Algebra, gives a very brief introduction into matrix algebra. Most tasks in numerical data analysis are advantageously formulated in elegant and efficient matrix notation. Matlab is a matrix based language and thus ideally suited for the development of programs dealing with numerical analyses. This point cannot be over-stressed. The few short programs presented in this chapter may also serve as a very rudimentary introduction into Matlab. Readers not familiar with Matlab but otherwise proficient in an alternative language will be surprised at the almost complete lack of for … end loops. We also introduce matrix operations in Excel, assuming that the other, more common aspects of Excel are known to the reader. While there is a reasonable collection of matrix operations available in Excel, their usage is rather cumbersome. We believe that many readers will appreciate the short introduction into this aspect of Excel. Of course parts of, or the whole chapter can be skipped by those readers who are already proficient in the basics of matrix algebra and the implementation in Matlab and Excel. Chapter 3, Physical/Chemical Models, starts with a review of Beer-Lambert's law and very importantly demonstrates its compatibility with matrix notation. After a short discussion on chromatographic concentration profiles (these are used heavily in Chapter 5, Model-Free Analyses), we start the development of a toolbox for the computational analysis of equilibrium problems. All these computations are based on the law of mass action. While a few simple equilibrium systems can be solved by analytical expressions, all other systems require modelling by iterative procedures. We explore the Newton-Raphson method and develop it into an incredibly powerful algorithm that resolves equilibrium systems of any complexity. We subsequently incorporate the algorithm into programs that model potentiometric pH-titrations and spectrophotometric titrations. At this stage, the collection of routines can serve as an educational tool; later, in the subsequent Chapter 4, Model-Based Analyses, it will be incorporated into a general non-linear least-squares fitting program for the analysis of equilibrium processes and the determination of equilibrium constants. Those equilibrium processes that can be resolved explicitly are straightforwardly modelled in Excel. While it is possible to solve equilibrium problems of essentially any complexity in Excel, it is virtually impossible to develop a reasonable spreadsheet for the modelling of a complex titration. Iterative methods are generally difficult to implement in Excel. The Newton-Raphson algorithm is further developed into a fairly generally applicable tool for the solving of sets of non-linear equations. The equivalent to the law of mass action in equilibria are the sets of differential equations in kinetics. They are defined by the chemical reaction scheme. Again, there are explicit solutions for very simple models but most other models lead to sets of differential equations that need to be integrated numerically. Matlab supplies an extensive collection of functions for
4
Chapter 1
numerical integration dealing with just about any conceivable case, in particular the so-called 'stiff problems'. It would go well beyond the limits and scope of this book to develop such algorithms. We do, however, explain the principles of numerical integration and also develop an Excel spreadsheet with a 4th order Runge-Kutta algorithm. We demonstrate the use of Matlab's numerical integration routines (ODE solvers) and apply them to a representative collection of interesting mechanisms of increasing complexity, such as an autocatalytic reaction, predator-prey kinetics, oscillating reactions and chaotic systems. This section demonstrates the educational usefulness of data modelling. The collection of kinetic modelling programs will be adapted in the subsequent chapter for the non-linear least-squares analysis of kinetic data and the determination of rate constants. Chapter 4, Model-Based Analyses, is essentially an introduction into leastsquares fitting. It is crucial to clearly distinguish between linear and non linear least-squares fitting: linear problems have explicit solutions while non-linear problems need to be solved iteratively. Linear regression forms the base for just about 'everything' and thus requires particular consideration. For non-linear regression there are several iterative methods available. The simplex algorithm is a popular method. Its concept is simple but convergence and execution times are slow. In this chapter we present the Newton-Gauss algorithm enhanced by the Levenberg/Marquardt method. The method is developed using a representative collection of worked examples that illustrate the different aspects of data fitting. We end up with a collection of programs that can fit any reaction mechanism in kinetics and any system in equilibrium studies. The most advanced features include multivariate data, inclusion of known spectra, efficient handling of variable and non-variable parameters and global analysis of series of multivariate measurements. Chapter 5, Model-Free Analyses. Model-based data fitting analyses rely crucially on the choice of the correct model; model-free analyses allow insight into the data without prior chemical knowledge about the process. Model-free analysis is based on restrictions imposed on the results of the analysis. The restrictions that are demanded by the physics of the measurement rather then by the scientist. Typical restrictions of this kind are that concentrations and molar absorptivities have to be positive. Only multivariate (e.g. multi-wavelength) data are amenable to model-free analyses. While this is a restriction, it is not a serious one. The goal of the analysis is to decompose the matrix of data into a product of two physically meaningful matrices, usually into a matrix containing the concentration profiles of the components taking part in the chemical process, and a matrix that contains their absorption spectra (Beer-Lambert's law). If there are no model-based equations that quantitatively describe the data, model-free analyses are the only method of analysis. Otherwise, the results of model
Introduction
5
free analyses can guide the researcher in the choice of the correct model for a subsequent model-based analysis. An important group of methods relies on the inherent order of the data, typically time in kinetics or chromatography. These methods are often based on Evolving Factor Analysis and its derivatives. Another well known family of model-free methods is based on the Alternating Least-Squares algorithm that solely relies on restrictions such as positive spectra and concentrations. There is a rich collection of publications describing novel methods for ModelFree Analyses. The selection presented here does not cover the complete range; it attempts to select the more useful and interesting methods. Such a selection is always influenced by personal preferences and thus can be biased. Excel does not provide functions for the factor analysis of matrices. Further, Excel does not support iterative processes. Consequently, there are no Excel examples in Chapter 5, Model-Free Analyses. There are vast numbers of free add-ins available on the internet, e.g. for the Singular Value Decomposition. Alternatively, it is possible to write Visual Basic programs for the task and link them to Excel. We strongly believe that such algorithms are much better written in Matlab and decided not to include such options in our Excel collection. This chapter ends with a short description of the important methods, Principal Component Regression (PCR) and Partial Least-Squares (PLS). Attention is drawn to the similarity of the two methods. Both methods aim at predicting properties of samples based on spectroscopic information. The required information is extracted from a calibration set of samples with known spectrum and property. In any book, there are relevant issues that are not covered. The most obvious in this book is probably a lack of in-depth statistical analysis of the results of model-based and model-free analyses. Data fitting does produce standard deviations for the fitted parameters, but translation into confidence limits is much more difficult for reasonably complex models. Also, the effects of the separation of linear and non-linear parameters are, to our knowledge, not well investigated. Very little is known about errors and confidence limits in the area of model-free analysis. We tried to keep the programs short; they perform only the essential numerical operations, followed by minimal data output. Usually there are one or two graphs of output and occasionally a few lines of numerical output. The graphs are designed to be simple but instructive. They do not make use of the richness of the Matlab graphics routines. We did not include any graphical user-interfaces (GUIs) in order to avoid difficulties with different versions of Matlab. Matlab versions starting from 5.3 should be compatible with all programs in this book.
6
Chapter 1
Iterative refining processes invariably depend on the quality of initial guesses. If these are too far from the optimum, the process can diverge and collapse, sometimes seriously. The code provided for all iterative processes is minimal and works for most reasonably well behaved problems; however, the routines are not fool-proof. In the case of divergence and collapse we recommend the user investigates the appropriateness of the initial guesses supplied to the function. A few words regarding programming style are appropriate. Any computer program is a compromise between readability, length of code and speed of execution. Matlab, in particular, offers a substantial range of powerful commands that allows the composition of extremely compact code. The reader will find a few instances where commands perform very complex operations at the expense of being almost incomprehensible. In many instances explanations will be given in the form of comments or in adjoining text but there may be several occasions when the novice will struggle to understand the line of code. The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music. Donald Knuth - The Art of Computer Programming.
2 Matrix Algebra In this chapter, we present the basic matrix mathematics that is required for understanding the methods introduced later in the book. In line with the philosophy that all concepts are immediately implemented in Matlab and/or Excel, this will be done here as well. This way, Chapter 2 not only revises the basic mathematics, it also serves as a very short introduction to the Matlab and Excel languages. It is not meant to be a manual on Matlab or Excel; the reader will need to refer to more specialised texts and proper manuals. Several more advanced features of both languages are not covered at this introductory stage but will be explained as they emerge in later chapters. Generally, most chemists possess some knowledge in one or more classical programming languages such as Basic or Fortran. While this is certainly helpful, it also produces some 'bad habits'. Matlab is a matrix-based language; it incorporates an extensive function library for matrices, vectors and data arrays in general and so is particularly well designed for the numerical analysis of multivariate data. Even though classical loop-based programming is possible for matrix operations, in Matlab there is usually a much shorter and faster way of performing the same task. Matlab programs are very readable since matrix equations are almost written as in 'real life'. Very important properties of Matlab are the direct availability of a vast number of functions that directly allow high level graphical output and the fact that Matlab works on interpreter basis, i.e. compilation is done in the background and all variables and data can be accessed at the prompt. This makes Matlab one of the most used development tools in engineering and the current standard in chemometrics. Most programs provided in this book have been developed and tested in the standard version of Matlab 6.1 and do not require any additional toolboxes (e.g. the optimisation toolbox). Our philosophy is to keep the algorithms as simple as feasible. Additionally, we avoid Matlab's capabilities in programming a graphical user interface (GUI). That way, backwards compatibility of our programs to Matlab 5.3 as well as upwards to Matlab 7.x is very likely, yet not always guaranteed. As an integral component of Microsoft Office, the spreadsheet program Excel is installed on many personal computers. Thus, a widespread basic expertise can be assumed. Although initially designed for business calculations and graphics, Excel is also extremely useful for scientific purposes. Its matrix capabilities, as well as the optimisation add-in 'solver', are not widely known but can often be applied in order to quickly resolve quite complex multivariate problems. We have used Excel 2002 but any other version will do equally well. As mentioned before, this chapter has two goals, (a) to refresh some basic matrix mathematics and (b) to familiarise the reader with the essentials of both Matlab and Excel, particularly with respect to multivariate data
8
Chapter 2
analysis of chemical problems. In order to minimise abstractness we provide many examples. It is helpful to distinguish matrices, vectors, scalars and indices by typographic conventions. Matrices are denoted in boldface capital characters (A), vectors in boldface lowercase (a) and scalars in lowercase italic characters (s). For indices, lower case characters are used (i). The symbol 't' indicates matrix and vector transposition (At, at). All chemical applications discussed later in this book will deal exclusively with real numbers. Thus, we introduce matrix algebra for real numbers only and do not include matrices formed by complex numbers.
2.1 Matrices, Vectors, Scalars A matrix is a rectangular array of numbers. e.g.
A
ªa1,1 «# « «ai ,1 « «# «am,1 ¬
" a1, j % " ai , j " am , j
" a1,n º » # » " a j ,n » » % # » " am ,n »¼
(2.1)
The size of a matrix is defined by its number of rows, m, and number of columns, n. We refer to the dimensions by mun. In Matlab, the appropriate notation is [m,n]=size(A). For any matrix A, ai,j is the element in row i (i=1…m) and column j (j=1…n). Vectors and scalars can be seen as special cases of matrices where one or both dimensions have collapsed to 1. Thus, a row vector represents a 1un matrix, a column vector an mu1 matrix and a scalar a 1u1 matrix. Matlab is based on the philosophy that Everything is a Matrix. In order to visualise matrices, column and row vectors, it is convenient to use rectangles, vertical and horizontal lines, as outlined in Figure 2-1.
Figure 2-1. A Matrix, a column vector and a row vector Sometimes it is helpful to specifically distinguish between row and column vectors. In such instances, we borrow Matlab's colon (:) notation. A vector x
Matrix Algebra
9
is represented by x1,: if it is a row vector (the row dimension is 1) and by x:,1 if it is a column vector (the column dimension is 1). Furthermore, every row of a matrix A can be seen as a row vector or sub matrix of A with the dimensions 1un, while every column of A represents a column vector or sub matrix of the dimensions mu1. Thus, the second row of matrix A can be referred to as the row vector a2,:, the third column of A as the column vector a:,3, etc…. With this notation it is generally possible to denote any sub matrix of A. For example, A2:4,3:6 is a matrix of dimensions 3u4 comprised of the elements of A that are within the rectangle defined by rows 2 to 4 and columns 3 to 6. Let's see how this is done in Matlab: A=[1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6; ... 7; ... 8; ... 9]
a_r2c3=A(2,3) a_r2=A(2,:) a_c3=A(:,3) A_sub=A(2:4,3:6)
% % % %
1st row of A 2nd 3rd 4th
% % % %
extract extract extract extract
element of 2nd row of 3rd column 2nd to 4th
A in row 2 column 3 A of A row and 3rd to 6th column
A = 1 2 3 4 a_r2c3 = 4 a_r2 = 2 a_c3 = 3 4 5 6 A_sub = 4 5 6
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
3
4
5
6
7
5 6 7
6 7 8
7 8 9
A few remarks are in order: x all matrix entries are enclosed by two square brackets, the elements within each row are separated by blanks (or equivalently by commas), the end of a row is indicated by a semicolon x three dots at the end of a line tell Matlab there is continuing input in the next line x the percent (%) character introduces a comment
10
Chapter 2
x for references to individual elements, rows, columns or sub-matrices, the corresponding row and column indices (or colon operators) are separated by a comma and put between parentheses x there is an equivalent command for the last line that creates the identical sub-matrix (A2:4,3:6) but refers to all rows and columns individually: A_sub=A([2 3 4],[3 4 5 6]). The sub-matrix is created according to the two vectors comprising the row and column indices. This can be handy if rows and/or columns that are not in a sequence are to be combined .
2.1.1 Elementary Matrix Operations
Transposition The transposed matrix At is defined as the interchange of rows and columns of A. This can also be seen as the reflection of all elements of A at its main diagonal (along ai,j, i=j ), according to transposition ai , j oa j ,i
(2.2)
Thus, the row dimension of A becomes the column dimension of At, the column dimension of A will be the row dimension of At. j i
i
A
j
At
Figure 2-2. A matrix A and its transposed At
A few lines in Matlab illustrate this on an example. A=[1 2 3; ... 4 5 6]
% definition of matrix A
At=A'
% matrix transposition
dim_A=size(A) dim_At=size(At)
% retrieving the dimensions of A % retrieving the dimensions of At
A = 1 4
2 5
3 6
Matrix Algebra
11
At = 1 2 3 dim_A = 2 dim_At = 3
4 5 6 3 2
Note that Matlab uses the quote (') as the transposition operator. In an Excel spreadsheet, the TRANSPOSE function can be applied to an array of data. For this, we need to become familiar with the two most important rules to perform matrix operations in Excel: x The result has to be pre-selected, i.e. the dimensions of the result have to be known. x
The SHIFT+CTRL keys must be held while pressing the ENTER key to confirm the operation.
Figure 2-3 shows an example spreadsheet. Cells A3:C4 contain the elements of the 2u3 matrix A. In order to perform the matrix transposition cells E3:F5, which will contain the result, At, have to be pre-selected. Next =TRANSPOSE(A3:C4) is typed on the Excel command line followed by the SHIFT+CTRL+ENTER key combination.
Figure 2-3. Matrix transposition in Excel Note the curly braces that have appeared in the command line and indicate a matrix operation applied to a block of cells. The curly braces must not be typed in explicitly. As for most Excel functions, there is always an alternative way via the main menu's Insert-Function feature and an upcoming graphical user interface that leads you through the process of selecting the correct function and the source cells corresponding to the matrix A. This takes a bit longer but has the advantage that you do not have to recall the exact syntax. However, the target cells, i.e. the elements corresponding to the transposed matrix At still
12
Chapter 2
have to be pre-selected. When the TRANSPOSE function is selected from the Insert-Function menu, a window similar to the one shown in Figure 2-4 appears.
Figure 2-4. Matrix transposition via Excels interactive graphical user interface With the correct source cells in the input line and pushing the OK button while holding the SHIFT+CTRL keys, the equivalent result as in Figure 2-3 is obtained. Importantly, once an array of cells has been declared a matrix by applying the SHIFT+CTRL+ENTER or the SHIFT+CTRL+OK combination, it is no longer possible to alter or delete its individual cells. Always, the whole array has to be pre-selected and thus only the complete array can be modified. As you have probably already noticed, Excel uses alphabetical letters for the column index and numbers for the row index in order to specify a cell on the spreadsheet. The column index appears before the row index, e.g. cell A3 refers to column A row 3. This is contrary to the general convention as introduced earlier and, if desired, you can change Excel's default settings (Tools-Options-General) to accommodate a row-column notation. The notation A3 will then be altered into R3C1 (row 3, column 1). Addition and Subtraction Addition or subtraction of matrices is done element-wise and thus is straightforward. Obviously, the dimensions of the matrices to be added have to match.
ArB
ªa1,1 r b1,1 " a1, j r b1, j «# % # « «ai ,1 r bi ,1 ai , j r bi , j « # # « «am ,1 r bm ,1 " am , j r bm , j ¬
" a1,n r b1,n º » # » ai ,n r bi ,n » » % # » " am ,n r bm ,n »¼
(2.3)
Matrix Algebra
13
Matrix addition and subtraction are commutative
ArB
rB A
(2.4)
and associative.
(A r B) r C
A r (B r C )
(2.5)
In Matlab the standard mathematical operators for addition (+) and subtraction (-) can be used directly with matrices. As with transposition, Matlab automatically calls the appropriate functions to perform the operations. A=[1 2 3; ... 4 5 6]
% definition of A
B=[0.1 0.2 0.3; ... 0.4 0.5 0.6]
% definition of B
Y=A+B
% matrix addition
A = 1 4
2 5
3 6
B = 0.1000 0.4000
0.2000 0.5000
0.3000 0.6000
1.1000 4.4000
2.2000 5.5000
3.3000 6.6000
Y =
Note that Matlab directly adds/subtracts a single scalar element-wise to/from matrices. Suppose you want to subtract element b1,2 (a scalar) from all elements of matrix A. A valid Matlab command would be A=[1 2 3; 4 5 6]; B=[0.1 0.2 0.3; 0.4 0.5 0.6]; Y=A-B(1,2)
% subtracting element b1,2 from all elements of A
Y = 0.8000 3.8000
1.8000 4.8000
2.8000
5.8000
In Excel, mathematical operations of one or more cells can be dragged to other cells. Since a cell represents one element of an array or matrix, the effect will be an element-wise matrix calculation. Thus, addition and subtraction of matrices are straightforward. An example:
14
Chapter 2
Figure 2-5. Matrix addition in Excel Cells A3:C4 and E3:G4 comprise the elements of array A and B respectively. First, in cell I3, the addition is done for one pair of elements, A3+E3 (a1,1+b1,1). Then, the calculation is repeated by dragging cell I3 to the remaining cells of the rectangle I3:K4. It is worthwhile mentioning that matrix addition can alternatively be performed by pre-selection of all cells of the prospective Y, assigning the array addition according to A3:C4+B3:C4 and applying the SHIFT+CTRL+ENTER key combination. Usually, this has no particular advantage.
Figure 2-6. Matrix addition by array command Note the difference in the command line between Figure 2-5 and Figure 2-6. The capability of dragging results from one cell to others is a very useful property of Excel and becomes even more powerful in combination with the dollar operator ($) correctly applied within the cell reference. Referring to the previous Matlab example, if the scalar element b1,2 (cell F3) is to be subtracted from matrix A (A3:C4) in Excel, putting the dollar operator ($) in front of the column and row reference of the source cell containing the scalar b1,2 ($F$3), prevents "dragging-over" of the source cell F3 in both column and row direction.
Figure 2-7. Subtracting element b1,2 from all elements of A
Matrix Algebra
15
Similarly, it is possible to add/subtract one column, say b:,j (or one row bi,:) of B to/from all columns (or rows) of A. Then, the $ symbol is put before the column (row) index only. In the example below, the third column of B (b:,3), containing the elements of cells G3:G4, is subtracted from all columns of A and a matrix Y of the same dimensions as A is formed.
Figure 2-8. Subtracting the third column of B from all columns of A In the same way the '$' symbol can be used for other element-wise operations in Excel. The power and importance of the $ operator cannot be overrated and we apply it in several additional examples later. In Matlab the plus (+) and minus (-) operators cannot be directly applied to equations that involve vectors or matrices of different dimensions. In order to perform the same operation as in the former Excel example, column vector b:,3 must be replicated three times to match the dimensions of A. For this, the Matlab command repmat can be used. A=[1 2 3; 4 5 6]; B=[0.1 0.2 0.3; 0.4 0.5 0.6]; Y=A-repmat(B(:,3),1,3)
% replicating the third column of B 3u and % subtracting the result from A
Y = 0.7000 3.4000
1.7000 4.4000
2.7000
5.4000
Matlab employs B(:,3) as the notation for the third column of B, b:,3. By using repmat(B(:,3),1,3) a matrix is created consisting of an 1-by-3 (horizontal) tiling of copies of B(:,3). Naturally, this function can also be used to create a vertical tiling of copies of row vectors, e.g. if row vector b2,: is to be added/subtracted to/from all rows of A. An appropriate function call would then be repmat(B(2,:),2,1). We refer to the Matlab manuals for further details on this function. The repmat command in the above application can be replaced by a more conventional loop: A=[1 2 3; 4 5 6];
16
Chapter 2
B=[0.1 0.2 0.3; 0.4 0.5 0.6]; for i=1:3 Y(:,i)=A(:,i)-B(:,3); end
It is a matter of opinion whether the loop or the repmat option is preferable. One is shorter and the other is easier to comprehend; one is Matlab specific and the other is more general. In this book, we tend to use the shorter, Matlab-style version, at least as long as the readability is not severely compromised. Multiplication
A matrix product Y=CA is defined in the following way nc
yi , j
¦c
ci ,: u a:, j
i ,k
ak , j
(2.6)
k 1
It is only defined if the number of columns of matrix C matches the number of rows of matrix A. The column dimension nc of C (i.e. the length of ci,:) has to be the same as the row dimension of A (i.e. the length of a:,j). j i
j i
A C
=
Y
Figure 2-9. Matrix multiplication Let ci,: be the i-th row vector of C and a:,j be the j-th column vector of A, then each element yi,j of Y is calculated as the scalar product ci,:ua:,j. In other words, the element in the i-th row and j-th column of Y is the sum over the element-wise products of the i-th row of matrix C and the j-th column of matrix A. Thus, if the dimensions of C are munc and the dimensions of A are ncun, Y has the dimensions mun. Figure 2-10 illustrates the multiplication Y=CA on a simple example. The factor matrices C (4u2) and A (2u3) can be arranged in such a way that the rows of C align with the rows of Y and the columns of A align with the columns of Y.
17
Matrix Algebra
C ª1 «3 « «5 « ¬7
2º 4»» 6» » 8¼
ª9 «19 « «29 « ¬39
12 15 º Y 26 33 »» 40 51» » 54 69 ¼
ª1 « 4 ¬
2 5
3 ºA 6 »¼
Figure 2-10. Matrix multiplication Y = C A That way, the dimensions of Y (4u3) become immediately obvious. One particular element, e.g. y2,3, calculates as the scalar product of the second row of C with the third column of A according to c2,:ua:,3=3u3+4u6=33. In Matlab the asterisk operator ( ) is used for the matrix product. If the corresponding dimensions match all individual scalar products, ci,:ua:,j, are evaluated to form Y. C=[1 3 5 7
2; ... 4; ... 6; ... 8]
A=[1 2 3; ... 4 5 6] Y=C*A
% the matrix product
C = 1 3 5 7
2 4 6 8
1 4
2 5
3 6
9 19 29 39
12 26 40 54
15 33 51 69
A =
Y =
Matlab automatically determines the correct dimensions of Y. In Excel, the cells comprising the prospective result Y have to be pre-selected as we have already seen for matrix transposition. For this, we need to predict the dimensions of Y from the row dimension of C and column dimension of A. Also, there is no direct operator for matrix multiplication in Excel. The function MMULT in conjunction with the SHIFT+CTRL+ENTER key
18
Chapter 2
combination needs to be applied in order to perform the operation. MMULT is called with two arguments, the cell range B3:C6, containing the elements of C, and the cell range E8:G9, containing the elements of A. The correct syntax can be taken from the Excel command line in Figure 2-11.
Figure 2-11. Matrix multiplication in Excel Vectors have earlier been introduced as matrices with one dimension being reduced to one, i.e. they are comprised of one column or one row only. When the rule for matrix multiplication given by equation (2.6) is formally applied to vectors and the appropriate dimensions match, there are two immediate consequences: (1) The product of a row vector and a column vector of same length results in a scalar (the scalar product). e.g.
ª4º 1 2 3 > @ ««5 »» «¬6 »¼
1u 4 2 u 5 3 u 6
33
(2) The product of a column vector with m rows and a row vector with n columns results in a matrix with m rows and n columns. This is the so-called outer product.
e.g.
ª1 º «2» « » >5 6 7 @ «3 » « » ¬4¼
6 7º ª5 «10 12 14 » « » «15 18 21» « » ¬20 24 28 ¼
There are a few useful rules for dealing with matrix products. The list is not complete and for more details, we refer to a textbook on linear algebra.
19
Matrix Algebra
If a matrix is to be multiplied by a scalar x, the multiplication is performed on every element of the matrix. Obviously, this operation is commutative. xA
ª x a1,1 ! x a1,n º « » % # » « # « x am,1 " x am ,n » ¬ ¼
Ax
(2.7)
The multiplication of matrices, however, is generally not commutative; i.e. the order of the factors must not be changed. CA z AC
(2.8)
The multiplication of a matrix with more than one scalar (e.g. x, y) is associative and commutative (x A ) y
x (A y )
xy A
(2.9)
and the product with a sum of scalars is distributive. A (x y )
Ax Ay
(2.10)
When three or more matrices are to be multiplied, the operation is also associative
(C A ) D
C (A D)
(2.11)
and with the sum of two matrices involved in the product, it is also distributive.
(A B) D
AD BD
(2.12)
The transpose of the product of matrices is the product of the transposed individual matrices in reversed order. (C A D)t
Dt A t C t
(2.13)
As stated earlier, Matlab's philosophy is to read everything as a matrix. Consequently, the basic operators for multiplication, right division, left division, power (*, /, \, ^) automatically perform corresponding matrix operations (^ will be introduced shortly in the context of square matrices, / and \ will be discussed later, in the context of linear regression and the calculation of a pseudo inverse, see The Pseudo-Inverse, p.117). Element-wise operations can, however, be enforced. If this is desired, the dot (.) needs to be placed before each operator (.*, ./, .\, .^).
20
Chapter 2
+ A
.*
B
=
C
./ … Figure 2-12. Element-wise matrix operations
Note that the operators for addition (+) and subtraction (-) need not be preceded by a dot. These two operations are always done element-wise. Some examples: A=[1 2 3; 4 5 6]; B=[0.1 0.2 0.3; 0.4 0.5 0.6]; X=A.*B Y1=A./B Y2=A.\B Z=A.^B
% % % %
element-wise element-wise element-wise element-wise
multiplication right division left division raising to the power
X = 0.1000 1.6000 Y1 = 10 10 10 10 Y2 = 0.1000 0.1000 Z = 1.0000 1.7411
0.4000 2.5000
0.9000
3.6000
10 10 0.1000 0.1000
0.1000 0.1000
1.1487 2.2361
1.3904 2.9302
Element-wise right division (./) leads to the inverse result of element-wise left division (.\) and the operator '.^' raises all elements to the corresponding power. For element-wise operations, the dimensions of the matrices always have to match. In contrast to Matlab, where the defaults are the matrix operators, in Excel the default is the element-wise operation. In fact, all basic operations (e.g. +, -, *, /, ^) and functions (e.g. EXP, LN, LOG) work element-wise in Excel. All matrix functions such as TRANSPOSE, MMULT, and MINVERSE require a
Matrix Algebra
21
pre-selection of a target cell block and the SHIFT+CTRL+ENTER key combination to perform the calculation.
2.1.2 Special Matrices There are several matrices that have special properties and thus require closer attention. The following list is not complete but it is sufficient for many applications. Square Matrix
Matrices that have the same number of rows and columns are called square matrices. From what we have learnt so far there are two immediate consequences with respect to matrix multiplication. (1) The product of any matrix A with its transpose At or vice versa results in square matrices that are symmetric (see below); but recall that AAt z AtA. (2) Any square matrix can be multiplied with itself repeatedly and the resulting matrix is also square. This is identical to raising the power of the matrix. A few examples in Matlab: A=[1 2 3; 4 5 6]; Y1=A*A' Y2=A'*A Z1=Y2*Y2 Z1=Y2^2 Y1 = 14 32 Y2 = 17 22 27 Z1 =
% % % %
multiplication if A with its transpose or vice versa multiplication of Y2 with itself or, equivalently, raising the power of Y2
32 77 22 29 36
27 36 45
1502 1984 2466
1984 2621 3258
2466 3258 4050
1502 1984 2466
1984 2621 3258
2466 3258 4050
Z1 =
22
Chapter 2
Symmetric Matrix
Symmetric matrices are square matrices that are identical to their transpose. They are invariant to an inflection at their main diagonal, i.e. invariant to the interchange of row and column index. In the former Matlab example both the 2u2 matrix Y1 and the 3u3 matrix Y2 are symmetric. Diagonal Matrix
A square matrix comprised of zeros except for the main diagonal elements, is called a diagonal matrix. Naturally, a diagonal matrix is symmetric. The Matlab command diag(x) forms a diagonal matrix D with the entries of vector x as its diagonal elements. Reversely, the command diag(D) extracts the diagonal elements of D into a vector. x=[1 2 3]; D=diag(x) x=diag(D)
% forming a diagonal matrix % extracting diagonal element into a vector
D = 1 0 0
0 2 0
0 0 3
x = 1 2 3
Diagonal matrices are handy when individual rows or columns of a matrix are to be multiplied by different scalar factors s1…sn. One typical example is the normalisation of B so that the square root of the sum of all squared elements in, for example, each row of B becomes one, i.e. unity length of each row vector. B=[0.1 0.2 0.3; 0.4 0.5 0.6]; s=1./sqrt(sum(B.^2,2)) B_n=diag(s)*B
% vector of row-wise normalisation coeff. % row-wise normalisation
Note that the Matlab command sum(B.^2,2) performs a row-wise addition of all squared elements of B. For a column-wise summation sum(B.^2,1) or just sum(B.^2) could be used. s = 2.6726 1.1396 B_n = 0.2673 0.4558
0.5345 0.5698
0.8018 0.6838
Matrix Algebra
23
Another common task is to normalise in such a way that the maximum value in each e.g. column of B, becomes one. B=[0.1 0.2 0.3; 0.4 0.5 0.6]; s=1./max(B) % vector of column-wise normalisation coeff. B_n=B*diag(s) % column-wise normalisation
Note that the command max(B,1) or simply max(B) finds the column-wise maxima, max(B,2) the row-wise maxima of B. s = 2.5000 B_n = 0.2500 1.0000
2.0000
1.6667
0.4000 1.0000
0.5000 1.0000
Note that the order of the multiplication with the square matrix diag(s) is different in the two examples. In the first example the rows are normalised, in the second, the columns. Identity Matrix
A diagonal matrix with only ones as diagonal elements is called identity matrix, commonly abbreviated as I. It is the neutral element with respect to matrix multiplication; i.e. left or right multiplication of a matrix A with an identity matrix I of appropriate dimensions results in A itself. In Matlab the command eye(n) can be used to build an identity matrix of dimensions nun. A=[1 2 3; 4 5 6]; I1=eye(2) Y1=I1*A I2=eye(3) Y2=A*I2
% % % %
I1 = 1 0
0 1
1 4
2 5
3 6
1 0 0
0 1 0
0 0 1
1 4
2 5
3 6
Y1 =
I2 =
Y2 =
2u2 identity matrix left multiplication 3u3 identity matrix right multiplication
24
Chapter 2
Inverse Matrix
The inverse X-1 of a matrix X is defined in such a way that X X 1
X 1 X
I
(2.14)
The left and right product of a square matrix with its inverse results in an identity matrix. In Matlab the command inv(X) or equivalently X^(-1) is used for matrix inversion. Only square matrices can be inverted. X=[1 2; 3 4]; X_inv=inv(X) I=X*X_inv X_inv = -2.0000 1.5000 I = 1.0000 0.0000
% matrix inversion
1.0000 -0.5000 0 1.0000
Singular matrices cannot be inverted. They have linearly dependent rows or columns. In Matlab or any other computer language, singularity can simply be an issue due to the numerical precision. Consider the following example: X=[1 2; 1+1e-16 2]; rank_X=rank(X) % rank of X X_inv=inv(X) % matrix inversion rank_X = 1 Warning: Matrix is singular to working precision. X_inv = Inf Inf Inf Inf
Within the Matlab's numerical precision X is singular, i.e. the two rows (and columns) are identical, and this represents the simplest form of linear dependence. In this context, it is convenient to introduce the rank of a matrix as the number of linearly independent rows (and columns). If the rank of a square matrix is less than its dimensions then the matrix is call rank-deficient and singular. In the latter example, rank(X)=1, and less than the dimensions of X. Thus, matrix inversion is impossible due to singularity, while, in the former example, matrix X must have had full rank. Matlab provides the function rank in order to test for the rank of a matrix. For more information on this topic see Chapter 2.2, Solving Systems of Linear Equations, the Matlab manuals or any textbook on linear algebra. In Excel, matrix inversion can be performed similarly to matrix transposition (see earlier). Figure 2-13 gives an example. Cells D3:E4, defining the target matrix, have to be pre-selected and now the MINVERSE function is applied to the source cells A3:B4. Finally, the SHIFT+CTRL+ENTER key combination is used to confirm the matrix operation.
Matrix Algebra
25
Figure 2-13. Matrix inversion in Excel
Orthogonal and Orthonormal Matrices
Matrices with exclusively orthogonal column (or row) vectors are called orthogonal matrices. For any two columns x:,i and x:,j of a matrix X to be orthogonal the necessary condition states that their scalar product is zero. t x :,i x :, j
0
for all i z j
(2.15)
In our three dimensional world we can perceive three vectors to be orthogonal. However, in a higher dimensional space the set of equations defined by (2.15) must suffice. If columns (or rows) of X are normalised to the square root of the sum of their squared elements (i.e. to unity length), the matrix is called orthonormal. Recall that earlier this kind of normalisation was solved most elegantly by right (left) multiplication with a diagonal matrix comprising the appropriate normalisation coefficients. See the section introducing diagonal matrices for more details. Alternatively, Matlab's built-in function norm can be used to determine normalisation coefficients and perform the same task. An example for column-wise normalisation of a matrix X with orthogonal columns is given below. It is worthwhile to compare X with equation (2.15); the subspace command can be used to determine the angle between the vectors (in rad) and reconfirm orthogonality. X=[1 4; ... 2 3; ... 5 -2]
% orthogonal column matrix
angle=subspace(X(:,1),X(:,2)); angle=rad2deg(angle)
% angle (in rad, pi/2=90°) % angle (in grad)
Xn=[]; for i=1:2 Xn(:,i)=X(:,i)./norm(X(:,i)); % col.-wise normalisation end Xn
26
Chapter 2
X = 1 4 2 3 5 -2 angle = 1.5708e+000 angle = 90 Xn = 1.8257e-001 7.4278e-001 3.6515e-001 5.5709e-001 9.1287e-001 -3.7139e-001
Note that this kind of normalisation, via the norm function, can only be performed column- (or row-) wise via a loop as seen in the Matlab box above. Calling norm with one matrix argument determines a different kind of normalisation coefficients. We refer to the Matlab help and function references for more detail. Orthonormal matrices have very special properties. If a matrix X is comprised of orthonormal rows then X Xt
I
(2.16)
If matrix X is comprised of orthonormal columns then Xt X
I
(2.17)
with the appropriate dimensions of the identity matrices. If matrix X is square and has orthonormal rows, its columns are also orthonormal. The inverse is then equal to its transpose X -1
Xt
(2.18)
and consequently X Xt
Xt X
I
(2.19)
2.2 Solving Systems of Linear Equations Matrix multiplication and inversion provide very useful means of representing and solving systems of linear equations. Consider the following matrix equation: y
cA
(2.20)
27
Matrix Algebra
where y
>y1 y2 y3 @ , c
y
>c1 c 2 c 3 @ , A
=
ªa1,1 a1,2 a1,3 º « » «a2,1 a 2,2 a 2,3 » . «a 3,1 a 3,2 a 3,3 » ¼ ¬
c
A
Figure 2-14. System of linear equations in their matrix form According to the rule for matrix multiplication introduced earlier, each element of y is calculated as the scalar product between c and the corresponding column of A. These linear operations are represented exactly by the following system of inhomogeneous linear equations: y1
c1a1,1 c 2a 2,1 c 3a 3,1
y2
c1a1,2 c 2a 2,2 c 3a 3,2
y3
c1a1,3 c 2a 2,3 c 3a3,3
(2.21)
Let's assume the elements c1, c2 and c3 of vector c are the unknowns. Thus, the system is comprised of three equations with three unknowns. Such systems of n equations with n unknowns have exactly one solution if none of the individual equations can be expressed by linear combinations of the remaining ones, i.e. if they are linearly independent. Then, the coefficient matrix A is of full rank and non-singular and its inverse, A-1, exists such that right multiplication of equation (2.20) with A-1 allows the determination of the unknowns. c
c
=
y A 1
(2.22)
y
A-1
Figure 2-15. Solving Systems of linear equations A typical example arises from Beer-Lambert's law. In spectrophotometry, it describes the linear relationship between the concentration of a chemical species and the measured absorbance at a particular wavelength. The corresponding coefficients are called molar absorptivities. They are specific for each species and wavelength. We refer to Chapter 3.1, Beer-Lambert's Law, for a more detailed introduction of Beer-Lambert's law. Consider a mixture of three species of unknown concentrations c1, c2 and c3 for which the absorbances y1, y2 and y3 have been measured at three
28
Chapter 2
different wavelengths. Suppose that the molar absorptivity coefficients for the individual three species have been determined independently at all three wavelengths beforehand and are known. These absorptivities are collected in matrix A such that each individual row contains the values for one specific species at the three wavelengths. This case is exactly covered by equations (2.20) and (2.21). If the wavelengths have been chosen reasonably, A will be invertible and the individual concentrations c1, c2 and c3 can be determined from equation (2.22). A small Matlab routine could be as follows: y=[0.8 0.6 0.9]; A=[90 30 80; ... 20 70 50; ... 10 50 40]; c=y*inv(A)
% absorbances % molar absorptivities % concentrations
c = 0.0079
0.0033
0.0026
It is important to stress that for this to work, the independently known matrix A of absorptivity coefficients needs to be square, i.e. it has previously been determined at as many wavelengths as there are chemical species. Often complete spectra are available with information at many more wavelengths. It would, of course, not be reasonable to simply ignore this additional information. However, if the number of wavelengths exceeds the number of chemical species, the corresponding system of equations will be over determined, i.e. there are more equations than unknowns. Consequently, A will no longer be a square matrix and equation (2.22) does not apply since the inverse is only defined for square matrices. In Chapter 4.2, we introduce a technique called linear regression that copes exactly with these cases in order to find the best possible solution.
3 Physical/Chemical Models Any textbook on Physical Chemistry is full of mathematical equations that quantitatively describe chemical and physical processes. Often these equations are explicit, often they are not. Some are simple equations and some are complex. Explicit equations are relatively straightforward to deal with Ɇ all that is required is to translate the equation into computer code, be it Matlab or Excel or any other language. As an example, let us consider the ideal gas law pV
nRT
(3.1)
where p is the pressure, V the volume, n the number of moles, R the gas constant and T the temperature in Kelvin. The equation can be rearranged to allow the calculation of the pressure as a function of volume and temperature of a gas sample of a given number of moles. The results can be plotted in a mesh plot of pressure vs. volume and temperature. We create a range of temperatures and volumes. The command logspace( 1.2,0,50) creates a vector of 50 logarithmically spaced values between 10 1.2 and 100=1; the command meshgrid produces the matrices V and T that contain all the pressures and temperatures required for the grid of values needed for the plot. MatlabFile 3-1. Gas_Laws.m %Gas_Laws n=1; R=8.206e-2;
% number of moles % [L atm mol-1 K-1]
volume=logspace(-1.2,0,50)'; temp=200:10:300; [V,T]=meshgrid(volume,temp); % Ideal gas pressure=n*R*T./V; mesh(log10(volume),temp,pressure); view(20,30) xlabel('log(volume)');ylabel('temp');zlabel('pressure');
30
Chapter 3
Figure 3-1. The pressure of an ideal gas as a function of log(volume) and temperature. Ideal gases do not exist and for real gases, several approximate equations have been developed to describe the pressure as a function of volume and temperature. The first useful equation is due to van der Waals p
nRT §n · a ¨ ¸ V nb ©V ¹
2
(3.2)
The van der Waals coefficients a and b are determined experimentally for each gas. For CO2 they are a=3.610 atmL2mol-2 and b=0.0429 Lmol-1. The approximation is not perfect and negative volumes are computed at certain ranges of pressure and temperature. These values are replaced by zero. MatlabFile 3-2. Gas_Laws.m …continued %Gas_Laws ... continued %Van der Waals corrections for CO2 a=3.61; b=.0429;
% [L^2 atm mol-2] % [L mol-1]
pressure_vdW=n*R*T./(V-n*b)-a*n^2./(V.*V); pressure_vdW(pressure_vdW<0)=0; mesh(log10(volume),temp,pressure_vdW); view(20,30) xlabel('log(volume)');ylabel('temp');zlabel('pressure');
Physical/Chemical Models
31
Figure 3-2. The pressure of a real gas as a function of temperature and volume; approximation based on the van der Waals equation. However, many important chemical problems cannot be described by explicit equations. They require iterative solutions. As an example, consider an apparently simple problem. The solubility product of calcium sulphate, gypsum, is defined as K SP
[Ca 2 ][SO 42 ]
2.4 u105 M 2
(3.3)
What is the solubility of gypsum in g/l? The calculation is trivial if the activity coefficients are ignored. This, of course, is not correct but as the solubility is not very high, maybe they can be ignored. The activity coefficients for ions are a function of the ionic strength of the solution. The ionic strength is defined by the concentration of the dissolved salt, which in turn affects the activity coefficients, which influences the solubility … and so on. Things are deeply coupled and there is no easy, explicit way of resolving the problem. Of course there is a correct answer; it is the one established in the beaker of equilibrated saturated solution. Later, in Complex Equilibria Including Activity Coefficients (p. 62), we demonstrate how to tackle this problem in Excel. In this chapter, we concentrate on the fundamental physical-chemical law of mass action. It forms the basis of many chemical investigations including kinetics and equilibrium studies. Importantly, the solutions are usually not explicit and thus require iterative approaches. The equations that quantitatively represent the law of mass action are very simple and most chemists will remember them. As an example, and reminder, consider a reaction
32
Chapter 3 k oZ X Y m k
(3.4)
If the equation is seen as a kinetic equation, a set of differential equations can be developed
d[ X ] dt
d[Y ] dt
d[ Z ] dt
k [ X ][Y ] k [Z ]
(3.5)
The above reaction (3.4) can also be regarded as an equilibrium, which is reached after sufficient time. Then the law of mass action states that [Z ] [ X ][Y ]
k k
K
(3.6)
This equation defines quantitatively the relationship between the equilibrium concentrations of the components. There are many analytical chemistry textbooks that deal with the chemical equilibrium in fairly extensive ways and demonstrate how to resolve the above system explicitly. However, more complex equilibrium systems do not have explicit solutions. They need to be resolved iteratively. In kinetics, there are only a few reaction mechanisms that result in systems of differential equations with explicit solutions; they tend to be listed in physical chemistry textbooks. All other rate laws require numerical integration. Initially, we develop Matlab code and Excel spreadsheets for relatively simple systems that have explicit analytical solutions. The main thrust of this chapter is the development of a toolbox of methods for modelling equilibrium and kinetic systems of any complexity. The computations are all iterative processes where, starting from initial guesses, the algorithms converge toward the correct solutions. Computations of this nature are beyond the limits of straightforward Excel calculations. Matlab, on the other hand, is ideally suited for these tasks, as most of them can be formulated as matrix operations. Many readers will be surprised at the simplicity and compactness of well-written Matlab functions that resolve equilibrium systems of any complexity. The algorithms developed in this chapter can model any situation, e.g. they can serve to demonstrate the effects of initial concentrations and rate constants in kinetics and of total concentration and equilibrium constants in equilibrium situations. Very importantly, these algorithms further form the core of non-linear least-squares fitting programs for the determination of rate or equilibrium constants, introduced and developed in Chapter 3, ModelBased Analyses. We start the chapter with a few simpler applications: Beer-Lambert's law and Gaussian curves. Light absorption measurements of solutions are most commonly used for the investigation of many chemical processes. A good understanding of Beer-Lambert's law and in particular the application of the very elegant matrix notation, is useful for the methods developed later in the
Physical/Chemical Models
33
chapter. Several processes result in Gaussian curves, e.g. concentration profiles in chromatography and random noise usually follow a Gaussian distribution. We also model absorption spectra as linear combinations of Gaussians. The examples given allow the reader to apply the knowledge acquired in the second chapter of this book.
3.1 Beer-Lambert's Law Spectrophotometry is probably the most commonly used quantitative technique in chemistry. A substantial amount of data, analysed by the methods presented in this book, are based on spectrophotometric measurements. For this reason, we introduce Beer-Lambert's law, concentrating on the compatibility of its inherent structure with matrix notation. Beer-Lambert’s law states that the total absorption (also called absorbance or extinction), yO, of a solution at one particular wavelength, nj, is the sum over all contributions of dissolved absorbing components, A, B, …, Z with concentrations [A], [B], ..., [Z] and molar absorptivities HA,O, HB,O, …, HZ,O, multiplied by the path length l. ynj
([A ] H A,nj [B ] H B ,nj ... [Z ] H Z ,nj ) u l
(3.7)
For the individual components, the coefficients HA,O, HB,O, …, HZ,O can be regarded as absorbances normalised to unity concentration (1 M) and unit path length (1 cm). They have units of M-1cm-1, absorbances y have no units. For simplicity, we have set the path length l=1cm. Equation (3.7) can be written in a matrix mode; the concentrations can be written as a row vector c and the molar absorptivities as a column vector a of the same length nc (nc is the number of coloured, i.e. absorbing species in the system). The absorbance y is then the scalar product of these two vectors.
. = y
nc
c
u
nc
a
+ . r
(3.8)
The scalar r represents the difference between the real measurement y, which is never perfect, and its ideal representation as the product cua. If ns absorbance measurements, y1...yns, are taken for ns different mixtures of the nc components, then ns equations of the kind (3.8) can be written. Again, it is possible and more convenient to use the vector/matrix notation: the vector y is the product of a matrix C that contains, as rows, the concentrations of the components in each solution, multiplied by the same column vector a with the molar absorptivities.
34
Chapter 3 nc ns
C
=
u nc a
y
ns
+
r
(3.9)
The i-th row of matrix C contains the individual component concentrations of the i-th mixture, for which absorbance data are available in the i-th row (element) of y. Again, due to imperfections in any real measurement, the product Cua does not exactly result in y. The difference is now a vector r of residuals. Note that Cua and r have the same dimensions as y. If complete spectra at nl wavelengths are measured for the above ns solutions, there are nsunl equations of the kind (3.8). Again, matrix notation is much more convenient and easier to understand. The vectors y, a and r are augmented by the wavelength dimension and become matrices Y, A and R. nl ns ns
Y
nc
=
C
u
nl
A
nl nc nc
+ ns
R
(3.10) It is most important to recognise that the structure of a system of linear equations, as in (3.10), is compatible with Beer-Lambert's law and hence allows the application of the very elegant and powerful matrix notation. As an additional advantage, the matrix based Matlab language can be used to its full potential to manipulate such multivariate data. Let us repeat: Y is a matrix that consists of all the individual measurements. The absorption spectra, measured at nl wavelengths, form nl-dimensional vectors that are arranged as rows of Y. Thus, if ns spectra are measured for ns mixtures, Y contains ns rows of nl elements; it is an nsunl matrix. As the structures of Beer-Lambert’s law and the mathematical law for matrix multiplication are essentially identical, this matrix Y can be written as a product of two matrices C and A; where C contains, as columns, the concentration profiles of the absorbing species. If there are nc absorbing species, C has nc columns, each row contains nc elements, the concentrations of the species of the ns mixtures. Similarly, the matrix A contains, in nc rows, the molar absorptivities of the absorbing species, measured at nl wavelengths; these are the HX,nj values of equation (3.7) comprising the pure component spectra. For the time being let us assume that we know all the individual concentrations of four mixtures of three chemical components forming matrix C. Let us also suppose that we know the molar absorptivities of all three components at six wavelengths, matrix A. From those two matrices one can construct a multivariate measurement, matrix Y. In this or a similar way, most "experimental" data matrices used in later chapters will be simulated. A simple Matlab example: MatlabFile 3-3. Beer_Lambert.m % Beer_Lambert
35
Physical/Chemical Models
C=[1e-2 4e-2 3e-2 2e-2
2e-2 1e-2 4e-2 3e-2
3e-2; ... 2e-2; ... 1e-2; ... 4e-2]
% 4 mixtures, 3 components
A=[10 8 6 4 2 1; ... 12 16 20 15 11 9; ... 5 10 15 20 25 30]
% 3 components, 6 wavelengths
Y=C*A subplot(2,1,1);plot([400:20:500],A); ylabel('absorptivity'); legend('A(1,:)','A(2,:)','A(3,:)'); subplot(2,1,2);plot([400:20:500],Y); xlabel('wavelength'); ylabel('absorbance'); legend('Y(1,:)','Y(2,:)','Y(3,:)','Y(4,:)'); C = 0.0100 0.0400 0.0300 0.0200
0.0200 0.0100 0.0400 0.0300
0.0300 0.0200 0.0100 0.0400
A = 10 12 5
8 16 10
6 20 15
4 15 20
2 11 25
1 9 30
Y = 0.4900 0.6200 0.8300 0.7600
0.7000 0.6800 0.9800 1.0400
0.9100 0.7400 1.1300 1.3200
0.9400 0.7100 0.9200 1.3300
0.9900 0.6900 0.7500 1.3700
1.0900 0.7300 0.6900 1.4900
absorptivity
30 A(1,:) A(2,:) A(3,:)
20 10 0 400
420
440
460
480
500
absorbance
1.5 Y(1,:) Y(2,:) Y(3,:) Y(4,:)
1 0.5 0 400
420
440 460 wavelength
480
500
Figure 3-3. Spectra A in the upper panel and the data Y in the lower panel.
36
Chapter 3
In many applications, such as chromatography, equilibrium titrations or kinetics, where series of absorption spectra are recorded, the individual rows in Y, C and R correspond to a solution at a particular elution time, added volume or reaction time. Due to the evolutionary character of these experiments, the rows are ordered and this particular property will be exploited by important model-free analysis methods described in Chapter 5, Model-Free Analyses. Generally, the multivariate data analysis attempts to find the best matrices C and A for a given measured Y. We discuss a wide range of methods for this task, in depth, in the two Chapters 4 and 5, Model-Based Analyses and Model-Free Analyses.
3.2 Chromatography / Gaussian Curves During the elution of a compound in chromatography, many simultaneous processes influence its movement along the stationary phase. Under ideal conditions, the resulting concentration profiles are Gaussian curves. In reality, there are many distortions from the ideal, and the observed profiles are hardly ever accurately described by Gaussian curves. There are numerous influences of very different natures and as a result, there is no general function that describes observed concentration profiles in any situation. Nevertheless, Gaussian and distorted Gaussians can often be used to closely represent elution profiles. Gaussian profiles are also utilized to approximate peak shapes observed in different types of spectroscopy. Again, we need to stress that the actual molecular processes behind a spectroscopically observed transition are very complex and do not strictly follow Gaussian curves. However, here too, Gaussian curves can serve as useful approximations. Many of the model-free analysis methods, which are discussed in Chapter 5, were originally developed for the deconvolution of overlapping chromatographic peaks. The reason is, as just explained, that the concentration profiles in chromatography cannot be described by any generally applicable analytical function; thus, model-based data fitting is out of the question and only the model free methods can computationally resolve overlapping peak clusters. Of course, it would be possible to painstakingly try to improve the chromatographic conditions such as temperature, pressure, solvent, column length and material, etc. This however, is not the task of the chemometrician. It is a guiding principle of this book to generate all the data that subsequently are analysed by the methods developed for the task. Thus, we need functions for the generation of spectra and chromatographic concentration profiles. Generally, we use Gaussians or linear combinations of Gaussians for the purpose. This is a matter of convenience rather than a necessity.
37
Physical/Chemical Models
A Gaussian curve is characterised by peak position, peak height and peak width. Commonly, the half width is used, i.e. the width at half peak height. We accommodate this convention. In statistics, the Gaussian distribution is usually normalised to unit integral, however, this is not useful in the present context. 2
f
§ (x x max ) · ln(2) ¨ ¸ © width 2 ¹ e
(3.11)
The following function gauss.m creates a Gaussian curve with a given width and centre and a peak maximum of one, which is more convenient for our purposes. MatlabFile 3-4. gauss.m function f = gauss(x, x_max, width); f = exp(-log(2)*((x-x_max)/(width/2)).^2);
Multiplication with a scalar height then defines the amplitude of the peak. This is shown in the example Gauss_Curve.m below: MatlabFile 3-5. Gauss_Curve.m % Gauss_Curve t=1:100; t_max=60; width=20; height=3; y=height*gauss(t, t_max, width); plot(t,y); xlabel('time');
3
2.5
2
1.5
1
0.5
0 0
20
40
60 time
Figure 3-4. A Gaussian curve.
80
100
38
Chapter 3
We use the function gauss.m in several examples, not only for the generation of chromatographic concentration profiles, but also for the generation of absorption spectra since these can often be approximated by a combination of Gaussian profiles. For example: MatlabFile 3-6. Gauss_Curve2.m % Gauss_Curve2 lam=500:5:800; a=10000*gauss(lam,200,300)+1000*gauss(lam,650,80); plot(lam,a); xlabel('wavelength');
1200
1000
800
600
400
200
0 500
550
600
650 700 wavelength
750
800
Figure 3-5. A spectrum, generated as the sum of two Gaussian functions. There have been several attempts to modify the Gaussian curve in order to better model distorted chromatographic concentration profiles. This might be useful if model-based fitting is applied to asymmetrical chromatograms. The following skewed Gaussian function is often suitable.
f
1 2tail (x x max ) · § 1 ln(2) ¨ ln ¸ tail width © ¹ e
2
(3.12)
There is an increased complexity compared to the simple Gauss function of equation (2.8). Also, an additional parameter tail appears relating to the distortion (tailing) of the Gaussian. Note that with this function width is only an approximation for the half width. The true half width tends to become slightly larger with increasing peak distortion. The corresponding Matlab function is gauss_sk.m:
39
Physical/Chemical Models MatlabFile 3-7. gauss_sk.m function f = gauss_sk(x, x_max, width, tail) f=zeros(size(x)); ft0=(1+2*tail*(x-x_max)/width) > 0; f(ft0)=exp(-log(2)*((log(1+2*tail*(x(ft0)x_max)/width)/tail).^2)); MatlabFile 3-8. Gauss_Skewed.m % Gauss_Skewed t= 1:100; t_max=60; width=20; height=3; tail=0.3; y=height*gauss_sk(t, t_max, width, tail); plot(t,y); xlabel('time');
3
2.5
2
1.5
1
0.5
0 0
20
40
60
80
100
time
Figure 3-6. A skewed Gaussian. A few remarks for the Matlab function gauss_sk.m: 1. If the parameter tail is positive, the peak will be steeper on its left side; if tail is negative, the right side is steeper. 2. Typically useful values for tail are -1 d tail d 1 3. If abs(tail) is small (not zero) the result will be very close to a normal Gaussian as in function gauss. 4. If t he inner logarithm is not defined for a particular xi the corresponding function value is set to zero.
40
Chapter 3
3.3 Titrations, Equilibria, the Law of Mass Action Titrations are very powerful techniques that contain two very different kinds of information and thus serve two different purposes: (a) titrations are used for quantitative analytical applications, e.g. the determination of the concentration of an acid by an acid-base titration or the determination of a metal ion by a complexometric titration; (b) titrations serve also as a method for the determination of equilibrium constants, e.g. the determination of the strength of the interaction between a metal ion and a ligand. Naturally, both objectives can be combined and the analysis of one titration can deliver both types of information. Under ideal conditions, the determination of the endpoint of a titration is simple. It can be accomplished by using an appropriate indicator or by straightforward analysis of a pH titration curve, e.g. through the detection of the inflection point of the pH vs. addition curve. Often the requirement of ideal conditions is not met, and so application of the above methods will result in approximations only. Proper numerical analysis of titration curves is possible and will result in significantly improved outcomes. Titrations consist of the observation of one or several measures as a function of the addition of an appropriate reagent. Reagents are typically acids or bases or ligands in metal determinations. Measurements are typically pH and/or absorption spectra. We concentrate on the data analysis of these two types. It should be straightforward for the reader to adapt the algorithms to other observations. Currently, most titrations are done under computer control, either by commercial auto-titrators or by assemblies of burettes, sensors and vessels in the research laboratories. This is not crucial and the analysis of such a titration is essentially identical with the analysis of a manual titration. Whatever the aim of a particular titration, the computation of the position of a chemical equilibrium for a set of initial conditions (e.g. total concentrations) and equilibrium constants, is the crucial part. The complexity ranges from simple 1:1 interactions to the analysis of solution equilibria between several components (usually Lewis acids and bases) to form any number of species (complexes). A titration is nothing but a preparation of a series of solutions with different total concentrations. This chapter covers all the requirements for the modelling of titrations of any complexity. Model-based analysis of titration curves is discussed in the next chapter. The equilibrium computations introduced here are the innermost functions required by the fitting algorithms.
3.3.1 A Simple Case: Fe3+ + SCNWe start with a simple equilibrium where only two components react with each other to form one new species. A typical example is the interaction between Fe3+ and SCN- to form the dark red-brown complex Fe(SCN)2+. The
41
Physical/Chemical Models
reaction can be used as a test for either Fe3+ or SCN-. However, this is not the issue here: we want to calculate all concentrations in a solution of known volume with known amounts of salts of the above ions, or specifically, we want to compute [Fe3+], [SCN-] and [Fe(SCN)2+] in a solution with known total concentrations [Fe3+]tot and [SCN-]tot. The equilibrium interaction between two components Fe3+ and SCN- to form the new species Fe(SCN)2+ is represented by the chemical equation K
o Fe (SCN )2 Fe 3 SCN - m
(3.13)
The relationship between the concentrations [Fe3+], [SCN-], and [Fe(SCN)2+] is defined by the law of mass action
K
[Fe (SCN )2 ] [Fe 3 ][SCN - ]
(3.14)
Of course, for the computations of [Fe3+], [SCN-], and [Fe(SCN)2+] we need to know the equilibrium constant K. For such a simple equilibrium system, there are explicit equations that are relatively easy to derive. There are two equations for the total concentrations: [Fe 3 ]tot
[Fe 3 ] [Fe (SCN )2 ]
[SCN - ]tot
[SCN - ] [Fe (SCN )2 ]
(3.15)
Presently, we are dealing with 3 unknowns [Fe3+], [SCN-], and [Fe(SCN)2+] and 3 equations, (3.14) and (3.15). Rearranging (3.15) in terms of the concentrations [Fe3+] and [SCN-] and incorporation of the expressions into the law of mass action (3.14) yields: [Fe (SCN )2 ]
K
([Fe 3 ]tot - [Fe (SCN )2 ]) ([SCN - ]tot - [Fe (SCN )2 ])
(3.16)
Rearrangement yields: K [Fe(SCN - )2 ]2
(K ([Fe 3 ]tot [SCN - ]tot ) 1)[Fe(SCN )2 ] K [Fe 3 ]tot [SCN - ]tot (3.17)
0
This is a quadratic equation in [Fe(SCN)2+]. It has two mathematical solutions but only one is physically possible: [Fe(SCN )2 ]
(K ([Fe 3 ]tot [SCN - ]tot ) 1) -
(K ([Fe 3 ]tot [SCN - ]tot ) 1)2 - 4K 2[Fe 3 ]tot [SCN - ]tot 2K
(3.18)
42
Chapter 3
The free equilibrium concentrations [Fe3+] and [SCN-] are computed subsequently as: [Fe 3 ] [Fe 3 ]tot - [Fe (SCN )2 ]
(3.19)
[SCN - ] [SCN - ]tot - [Fe (SCN )2 ]
It is possible to follow different paths of rearrangements and substitutions to arrive at a solution of the system of equations, but all must result in the same correct concentrations. The spreadsheet below calculates the concentration profiles for Fe3+, SCN-, and Fe(SCN)2+ for a titration of 10ml of 0.1M solution of Fe3+ with 9u10-2M SCN- . The equilibrium constant is K=200. ExcelSheet 3-1. Chapter2.xls-FeSCN =$B$2*$B$3/B7 =(($B$1*(C7+D7)+1)= (($B$1*(C7+D7)+1)SQRT(($B$1*(C7+D7)+1)^2SQRT(($B$1*(C7+D7)+1)^2 4*$B$1^2*C7*D7))/(2*$B$1)
=C7-E7
Figure 3-7. Partial view of an Excel spreadsheet for the calculation of the concentration profiles of a simple titration.
[Fe3+], [SCN-], [Fe(SCN)2+]
0.12 [Fe3+]
0.1
[SCN-] [Fe(SCN)2+]
0.08 0.06 0.04 0.02 0 0
0.005
0.01
0.015
0.02
0.025
0.03
vol added
Figure 3-8. Concentration profiles for the titration of Fe3+ with SCN-.
Physical/Chemical Models
43
An important aspect is that there is a continuous dilution during the titration. The total concentrations change continuously. These calculations are done in columns C and D by the following equations: [Fe 3 ]tot
V0 [Fe 3 ]0 V0 Vadded
[SCN - ]tot
Vadded [SCN ]added V0 Vadded
(3.20)
Note the poorly defined endpoint at the theoretical value of 0.0111 L. The equilibrium constant for this interaction is too low for a useful titration between the two compounds.
3.3.2 The General Case, Definitions It is straightforward to generalise equations (3.13) and (3.14) to allow for the formation of species of any complexity from any number of components. Discussion here is limited to species formed by only three components X, Y, and Z. The generalisation to 4, 5 or any number of components is selfevident. E
xyz o X xY y Z z xX yY zZ m
(3.21)
The definition of equilibrium constants for such general cases is most conveniently done via the so-called formation constants ǃxyz, which are defined in the following way: [ X xY y Z z ]
E xyz
(3.22)
[ X ]x [Y ]y [ Z ]z
and [ X xYy Z z ]
E xyz [ X ]x [Y ]y [Z ]z
(3.23)
Using the Fe3+/SCN- example, the above equations translate as:
E11
[Fe (SCN )2 ] [Fe 3 ][SCN - ] (3.24)
and 2
[Fe (SCN ) ]
3
-
E11[Fe ][SCN ]
At this stage, we need to introduce the nomenclature used in the equilibrium literature. While there is no official agreement, the following expressions are well established and more importantly, they allow a reasonably consistent and systematic description of equilibria of any complexity.
44
Chapter 3
Components are the most basic units (molecules or ions or atoms) that interact with each other. In the above example X, Y, and Z are components. All the resulting products of the interactions (molecules or ions or complexes) are called species. In the example, one of potentially many species is XxYyZz. To be consistent and to allow elegant and efficient notation and computer coding, the components themselves are also species. Their equilibrium constant is one. The equilibrium constants ǃxyz as defined in (3.22) are called formation constants. The composition of a particular species is defined by a set of three stoichiometric constants written as the indices x, y, and z. If a species is composed of only two components, the appropriate index is zero. Now, we need to recall a bit of basic physical chemistry. The law of mass action, as defined in equation (3.22) is not entirely correct. Instead of concentrations we ought to use the activities of all species. Formally, this is not a problem as there is a simple relationship between activity and concentration: {X } J X [ X ]
(3.25)
where the curly brackets represent the activity and the square brackets the concentration; JX is the activity coefficient for the species X. Unfortunately, activity coefficients are very difficult to determine and thus equation (3.25) seems to be of little use. The traditional way out is to undertake the investigation in solutions of constant ionic strength. Activity coefficients are mainly influenced by the ionic strength of the solution and, keeping it constant by having an excess of some non-interacting salt (e.g. NaClO4), does not remove activity coefficients but keeps them constant. Published equilibrium constants are apparent values where all activity coefficients are taken into the constant. Thus, such equilibrium constants are only valid at the one ionic strength at which they were determined. A thermodynamically correct method is to investigate the equilibria under a range of different ionic strengths and then extrapolate to zero ionic strength where activity coefficients are one by definition. It is also possible to estimate activity coefficients, usually as a function of ionic strength, size and charge of the ions. We give one example of such calculations in Equilibria in Excel (p.60). A few additional remarks are needed. Using the formation constants as introduced above, leads to well defined and consistent descriptions. Sometimes it is easier to apprehend an equilibrium if the species are not formed from the components but as a combination of other species and/or components; e.g. K XY
2 XY o XY Y m XY 2
(3.26) XY K XY 2
[XY 2 ] [XY ][Y ]
45
Physical/Chemical Models
might be easier to grasp than E12 o XY 2 X 2Y m
(3.27)
E 12
[XY 2 ] [ X ][Y ]2
XY , can Both notations are correct and the equilibrium constants, such as K XY 2
always be expressed as functions of the formation constants; e.g. XY K XY
2
[ XY2 ] [ XY ][Y ]
[ XY2 ] 2
[ X ][Y ]
u
[ X ][Y ] [ XY ]
E12 E11
(3.28)
An additional advantage of using formation constants is that they are uniquely defined. Any alternative equilibrium constants have to be defined somehow and there is no generally accepted formulation available. Consider XY K XY above; the notation needs to be defined, probably best by referring to its 2 XY definition in equation (3.26). K XY on its own, without explanation, is an 2
ambiguous symbol. Usually there is no ambiguity about the choice of components to form the species; often they are metal ions, ligands, solvent molecules, protons etc. A Chemical Example, Cu2+, Ethylenediamine, Protons
We illustrate the nomenclature introduced above in an example taken from coordination chemistry. In fact, equilibrium species of interesting complexity are commonly encountered in coordination chemistry and to a large extent coordination chemists have developed the principles of equilibrium studies. Consider the interaction of a metal ion M (e.g. Cu2+) with a bidentate ligand L (e.g. ethylenediamine, en) in aqueous solution. For work in aqueous solution the pH also plays an important role and thus, the proton concentration [H] (=[H+]), as well as several differently protonated species, need to be taken into account. Using the nomenclature commonly employed in coordination chemistry, there are three components, M, L, and H. In aqueous solution they interact to form the following species, HL, H2L, ML, ML2, ML3, MLH, MLH-1 and OH. (In fact, more species are formed, e.g. ML2H-1, but the above selection will suffice now.) The water molecules are usually not defined as additional components. The concentration of water is constant and its value is taken into the equilibrium constants. Note that in the following discussion we omit all charges. Each of these species is formed by the appropriate number of components and its concentration is defined by its formation constant. This is best represented in a table:
46
Chapter 3
Table 1. Notation for equilibrium modelling.
Species
Notation
Formation constant
m
l
h
M (Cu2+)
1
0
0
ǃ100=1
L (en)
0
1
0
ǃ010=1
H (H+)
0
0
1
ǃ001=1
LH (enH+)
0
1
1
E011
LH2 (enH22+)
0
1
2
E012
ML (Cu(en)2+)
1
1
0
E110
ML2 (Cu(en)22+)
1
2
0
E120
ML3 (Cu(en)32+)
1
3
0
E130
MLH (Cu(enH)3+)
1
1
1
MLH-1 (Cu(en)(OH)+)
1
1 -1
H-1 (OH-)
0
0 -1
E111 E11-1 E00-1
[LH ] [L ][H ] [LH 2 ] [L ][H ]2 [ML ] [M ][L ] [ML2 ] [M ][L ]2 [ML3 ] [M ][L ]3 [MLH ] [M ][L ][H ] [MLH -1]
[M ][L ][H ]-1 [OH ][H ]
KW
A short chemical explanation is due for the species MLH and MLH-1. MLH represents a complex formed between Cu2+, en and one proton. The notation does not indicate the structure of this species. However, there is little doubt that it would be a protonated ethylenediamine, which is coordinated by only one site, acting as a monodentate ligand.
o MLH 1 . MLH-1 represents the equilibrium M L H m Different, chemically more intuitive, ways of defining this o ML(OH ) H or ML H m o ML(OH ) . In species are ML m chemical terms it can represent the deprotonation of a
100 ml 1 g CuCl2 0.6 g en 10ml 1M HCl
47
Physical/Chemical Models
coordinated water molecule or any other site of the complex: H2O
H2O
N
OH2
N
OH2
N
OH2
N
OH
H2O
+ H+
H 2O
Figure 3-9. The structure of the deprotonated complex formed between one Cu2+ and one ethylenediamine. Let us review the present task of resolving solution equilibria. Imagine a 100ml volumetric flask containing 1g of CuCl2, 0.6g of ethylenediamine, 10ml of 1M HCl solution, topped up to the mark with water. What we know are the total concentrations of M, L and H, or [Cu2+]tot, [en]tot, [H+]tot. What we need to calculate are the equilibrium concentrations of all species: HL, H2L, ML, ML2, ML3, MLH, MLH-1, H and OH. The total concentrations of all components are equal to the sum of all the relevant species, multiplied by the appropriate stoichiometric factors. With the exception of the –[OH] term in the third equation, the equations below are self-explanatory. It is most convenient to allow the [H+]tot to become negative if [OH-]>[H+], i.e. if pH>7. Otherwise [H+]tot would need to include all water protons. In aqueous solution, the addition of x moles of OH- is equivalent to removing x moles of H+. [M ]tot
[M ] [ML ] [ML2 ] [ML3 ] [MLH ] [MLH -1 ]
[L ]tot
[L ] [LH ] [LH 2 ] [ML ] 2[ML2 ] 3[ML3 ] [MLH ] [MLH -1] (3.29)
[H ]tot
[H ] [ LH ] 2[LH 2 ] [ MLH ] -[ MLH -1] -[ OH ]
As mentioned before, it is convenient to include the components in the list of species and define their equilibrium constant as in Table 1, i.e. with a Evalue of 1. This allows a very concise notation, easy to code in MATLAB (as we will see later): nspec
[M ]tot
¦
mi Ei [M ]mi [L ]li [H ]hi
¦
li Ei [M ]mi [L ]li [H ]hi
¦
hi Ei [M ]mi [L ]li [H ]hi
i 1 nspec
[L ]tot
i 1 nspec
[H ]tot
(3.30)
i 1
The total concentration for a component M, L or H is the sum of all
i=1…nspec species concentrations, [M mi Lli Hhi ]
Ei [M ]mi [L ]li [H ]hi , multiplied
by the corresponding component's stoichiometric factor mi, li or hi of the i-th
48
Chapter 3
species. In accordance to the i-th row in Table 1, Ei holds the formation constant, Emilihi, of the i-th species M mi Lli Hhi . Henceforward, we will leave out the indexation of the sums with respect to the individual species. Then, equation (3.30) can be written rather sloppily: [M ]tot [L ]tot [H ]tot
¦ mEmlh [M ]m [L ]l [H ]h ¦ l Emlh [M ]m [L ]l [H ]h ¦ h Emlh [M ]m [L ]l [H ]h
(3.31)
Although mathematically not quite correct, this enhances readability. Just always keep in mind that it represents the sum over all species using their corresponding stoichiometric factors.
3.3.3 Solving Complex Equilibria After all that introduction, it is time to start the development of the algorithms required to compute the correct solution. The task is to calculate all equilibrium species concentrations in a solution, knowing the total concentrations of the components and the complete set of equilibrium constants. Assume we are dealing with a system of ncomp components forming nspec species. (Remember, the components themselves are species as well). There are nspec unknowns, i.e. all species concentrations, therefore nspec equations are required! There are nspec-ncomp equations of the kind (3.23), one for each species formed from the components, plus their ncomp equations of the kind (3.30), one for each component. The important point is, we have enough equations. The daunting task is to resolve this system of nspec equations with nspec unknowns. In particular, we want to develop a general solution which deals with any number of components and species formed. It is reassuring to know that there is always a solution to the mathematical problem, it is the one established in the real solution in the beaker. For all but the simplest cases, there is no explicit formula and so the calculations need to be performed numerically, i.e. in an iterative process starting from initial guesses.
The Newton-Raphson Algorithm In general, non-linear problems cannot be resolved explicitly, i.e. there is no equation that allows the computation of the result in a direct way. Usually such systems can be resolved numerically in an iterative process. In most instances, this is done via a truncated Taylor series expansion. This downgrades the problem to a linear one that can be resolved with a 'stroke of the brush' or the Matlab / and \ commands; see The Pseudo-Inverse (p.117).
Physical/Chemical Models
49
The result of each step of the iterative process is an approximation only, which, hopefully, is better than the previous one. Naturally, the process is iteratively continued until there some appropriate termination criterion is met. As a reminder, we write the Taylor series expansion for a function f(x) of one single variable x.
f (x Gx )
f (x )
1 1 1 (n ) f '(x ) u Gx f ''(x ) u Gx 2 ... f (x ) u Gx n ... 1! 2! n!
(3.32)
This equation looks essentially the same if we deal with a vector x of variables and a function f that has several components:
f (x Gx )
f (x )
1 1 1 (n ) f '( x ) u Gx f ''( x ) u Gx 2 ... f ( x ) u Gx n ... 1! 2! n!
(3.33)
We truncate the series after the first derivative and what remains is:
f (x Gx ) # f (x ) f '(x ) u Gx
(3.34)
The function at x Gx is approximated by the sum of the values at x plus the derivative f '(x) times the difference Gx . Later, in Figure 3-22, we give a graphical representation of the principles of the truncated Taylor series expansion for a one-parameter function. How do we apply equation (3.34) to resolve the equilibrium problem? Firstly, we need a good notation and reorganisation of the system of equations (3.23) and (3.31). There are several paths that can be taken in order to arrive at a more manageable set of equations. The one chosen here is 'natural' in that it follows the structure of the problem and allows the writing of compact and generally applicable computer code. While the core of the code, developed later, is general, we derive it using our system of 3 components X, Y and Z. First, initial guesses for the free component concentrations [X], [Y] and [Z] are made. Naturally, the better the initial guesses the faster and more robustly the correct solution is computed. As it turns out, in equilibrium calculations as described here, the quality of the initial guesses is often not crucial; convergence is fast even for guesses that are several orders of magnitude away from the correct values. Based on these estimated concentrations and the known formation constants, all species concentrations are computed by applying equations of the kind (3.23). Next the total concentrations for the components are calculated, applying equation (3.30). These calculated total concentrations are compared with the known total concentrations. If they are identical we have found the solution, if they are
50
Chapter 3
different, the component concentrations have to be improved in another iterative step. We have to translate our present problem into the language used in (3.34). The equivalent of the function f is the set of differences between the known true total concentrations for the components and the calculated ones. Instead of f we collect these differences in the row vector d.
d
d(c )
x y zº ª[ X ] ¦ xE xyz [ X ] [Y ] [Z ] « tot » x y z» « [Y ] yE [X ] [ ] [ ] Y Z « tot ¦ xyz » « x y z» [ ] [ ] [ ] [ ] Z zE X Y Z «¬ tot ¦ xyz »¼
t
(3.35)
The equivalent of x in equation (3.34) are the component concentrations; they are collected in the vector c. c
>[ X ]
[Y ] [Z ]@
(3.36)
Equation (3.34) now reads as:
d(c + Gc ) = d(c ) +
wd(c) u Gc wc
(3.37)
It is useful to represent this equation graphically. All vectors are 3-element row vectors while the matrix of derivatives is a 3u3 matrix. =
+ Gc
d(c)
d(c+Gc)
wd(c ) wc
Figure 3-10. Graphical representation of equation (3.37). The task is to determine the shift vector Gc for which the new vector of differences d(c+Gc) is minimal or within the Taylor approximation for which this difference is zero. In Chapter 2.2, Solving Systems of Linear Equations, we have given the solution:
ª wd(c) º Gc= d(c) u « ¬ wc »¼ What is left is the matrix
J.
wd wc
1
(3.38)
wd(c) of derivatives. It is called the Jacobian, wc
51
Physical/Chemical Models
ª wd1 « « wc1 « wd1 « « wc 2 « wd « 1 «¬ wc 3
wd wc
J
wd2 wc1 wd2 wc 2 wd2 wc 3
wd3 º » wc1 » wd3 » » wc 2 » wd3 » » wc 3 »¼
(3.39)
The determination of the first element of J as an example, is detailed below: wdX w[ X ]
w [ X ]tot ¦ xExyz [ X ]x [Y ]y [Z ]z
w[ X ]
w ¦ xExyz [ X ]x [Y ]y [Z ]z w[ X ]
¦ x Exyz [ X ]x 1[Y ]y [Z ]z 2
x
2
¦
y
(3.40)
z
x Exyz [ X ] [Y ] [Z ] [X ] 2
¦
x [ X xYy Z z ] [X ]
where dx denotes the difference between true and calculated total concentration of component X. It is relatively easy to continue along these lines and determine the total Jacobian as:
J
wd(c ) wc
ª x 2 [ X xYy Z z ] «¦ [X ] « « xy[ X xYy Z z ] «¦ « [Y ] « « xz [ X xYy Z z ] «¦ [Z ] «¬
xz [ X xYy Z z ] º » [X ] [X ] » » y 2[ X xYy Z z ] yz [ X xYy Z z ] » ¦ [Y ] ¦ [Y ] » » yz [ X xYy Z z ] z 2 [ X xYy Z z ] » ¦ [Z ] ¦ [Z ] » »¼
¦
xy[ X xYy Z z ]
¦
(3.41)
The Jacobian is not symmetric. However, we can rewrite this equation in a way that takes advantage of a 'hidden' symmetry. This allows faster computation as only the upper triangular part of the symmetric matrix J needs to be computed:
J
2 0 º ª« ¦ x [ X xYy Z z ] ª1 [X ] 0 «« 0 1 [Y ] 0 »» u « ¦ xy[ X xYy Z z ] « «¬ 0 0 1 [Z ]»¼ « xz [ X Y Z ] x y z ¬¦
diag(c)1 u J
¦ xy[X xYy Z z ] ¦ xz[X xYy Z z ]º» ¦ y 2[X xYy Z z ] ¦ yz[X xYy Z z ]»» ¦ yz[X xYy Z z ] ¦ z 2[X xYy Z z ]»¼
(3.42)
52
Chapter 3
The Jacobian can be written as the product of a diagonal matrix, containing the inverse component concentrations times the matrix J , where
J 1
1
J diag(c) . The shifts are then calculated as: d(c) J 1
Gc
1
d(c ) J diag(c)
(3.43)
We are now in a position to write a function NewtonRaphson.m that calculates the species concentrations for a solution for which the total component concentrations, the chemical model, and its formation constants are known. The flow diagram in Figure 3-11 illustrates the procedure. Guess initial concentrations for the free component concentrations [X], [Y], [Z]
Calculate the concentrations of all species e.g. XxYyZz=Exyz [X]x[Y]y[Z]z
Calculate the total concentration of the components e.g. [ X ]tot _ calc
¦ x Exyz [X ]x [Y ]y [Z ]z
Compare actual total conc. with computed ones e.g.
dx
[ X ]tot [ X ]tot _ calc
Difference
#0
Exit
>0 Calculate Jacobian
Calculate shift vector and add to component concentrations
Figure 3-11. A flow diagram for the Newton-Raphson algorithm.
53
Physical/Chemical Models MatlabFile 3-9. NewtonRaphson.m function c_spec=NewtonRaphson(Model, beta, c_tot, c,i) ncomp=length(c_tot); nspec=length(beta); c_tot(c_tot==0)=1e-15;
% number of components % number of species % numerical difficulties if c_tot=0
it=0; while it<=99 it=it+1; c_spec =beta.*prod(repmat(c',1,nspec).^Model,1); %species conc c_tot_calc=sum(Model.*repmat(c_spec,ncomp,1),2)'; %comp ctot calc d =c_tot-c_tot_calc; % diff actual and calc total conc if all(abs(d) <1e-15) return end
% return if all diff small
for j=1:ncomp % Jacobian (J*) for k=j:ncomp J_s(j,k)=sum(Model(j,:).*Model(k,:).*c_spec); J_s(k,j)=J_s(j,k); % J_s is symmetric end end delta_c=(d/J_s)*diag(c); c=c+delta_c;
% equation (2.43)
while any(c <= 0) delta_c=0.5*delta_c; c=c-delta_c; if all(abs(delta_c)<1e-15) break end end
% take shift back if conc neg.
end if it>99; fprintf(1,'no conv. at C_spec(%i,:)\n',i); end
The function NewtonRaphson.m is written in a very compact manner, taking full advantage of Matlab's vectorised commands. Sometimes this makes the lines difficult to read Ɇ a few additional remarks are appropriate. First, we have a closer look at the arguments that are passed into the Newton-Raphson function. We do that for the example of a 3 component system with the species XY, XY2, XYZ and YZ formed by the components X, Y and Z. Input argument Model is a matrix defining the stoichiometry of the species formed: X
Model
ª1 «0 « ¬« 0
Y
Z
XY
XY2
0 0 1 0
1 1
1 2
1 1
0º 1 »»
X Y
0
0
0
1
1»¼
Z
1
XYZ
YZ
(3.44)
54
Chapter 3
Each column of the matrix Model contains the 3 indices, xyz, representing the stoichiometry for the species shown in the corresponding column header. There are 3 rows for the 3 components. The species and component names in the column and row headers are not part of the matrix Model, they are added for clarity only. The Input argument beta collects the formation constants for the species in a row vector in the same order as in the model. Note that the first three values have to be 1, representing formation constants for the components themselves:
beta
[1 1 1 E110
E120
E111 E011 ]
(3.45)
Input argument c_tot is a row vector containing the 'true' known total concentrations of the components. Input argument c is a row vector of initial guesses for the free component concentrations. The input argument i is only used in the error handling at the very end; its meaning will be explained later. Output argument c_spec is a row vector containing all free species concentrations in the same order as defined in the model. Now a few commands need to be explained. The command line c_spec = beta.*prod(repmat(c',1,nspec).^Model,1);
%species conc
calculating all species concentrations can be taken apart and assembled step-wise: repmat(c',1,nspec).^ Model: ª[ X ] [ X ] [ X ] [ X ] [ X ] [ X ] [ X ]º ª1 0 0 1 1 1 0 º « [Y ] [Y ] [Y ] [Y ] [Y ] [Y ] [Y ] » . «0 1 0 1 2 1 1 » « » « » «¬ [ Z ] [Z ] [Z ] [Z ] [Z ] [Z ] [Z ] »¼ «¬0 0 1 0 0 1 1»¼ 1 [X ] [X ] [X ] 1 º ª[ X ] 1 « » 2 « 1 [Y ] 1 [Y ] [Y ] [Y ] [Y ]» «1 1 [Z ] 1 1 [ Z ] [Z ]»¼ ¬
(3.46)
builds a matrix, each column containing the concentrations of the components that form one of the species; these concentrations are taken to the power of the corresponding stoichiometric coefficients. Then,
prod(repmat(c',1,nspec).^ Model,1): ª[ X ] [Y ] [Z ] [ X ][Y ] [ X ][Y ]2 [ X ][Y ][Z ] [Y ][Z ]º ¬ ¼
(3.47)
calculates the products along the columns. And,
beta.* prod(repmat(c',1,nspec).^ Model,1): ª[X ] [Y ] [Z ] E110 [ X ][Y ] E120 [X ][Y ]2 E111[ X ][Y ][Z ] E011[Y ][Z ]º ¬ ¼ multiplies them with the corresponding E-values.
(3.48)
55
Physical/Chemical Models
The command produces the vector c_spec with all species concentrations. Next, the line c_tot_calc =sum(Model.*repmat(c_spec,ncomp,1),2)'; %comp ctot calc
recalculating the total component concentrations from the free species concentrations also needs explanation. Again, we take apart and reassemble some steps for the calculation of the row vector c_tot_calc: Model.*repmat(c_spec,ncomp,1): ª1 0 0 1 1 1 0 º ª[ X ] [Y ] [Z ] [ XY ] [ XY2 ] [ XYZ ] [YZ ]º «0 1 0 1 2 1 1 » . «[ X ] [Y ] [Z ] [ XY ] [ XY ] [ XYZ ] [YZ ]» 2 » « » « «¬0 0 1 0 0 1 1 »¼ ¬«[ X ] [Y ] [Z ] [ XY ] [ XY2 ] [ XYZ ] [YZ ]¼»
(3.49)
0 [ XY ] [ XY2 ] [ XYZ ] 0 º ª[ X ] 0 « 0 [Y ] 0 [ XY ] 2[ XY ] [ XYZ ] [YZ ]» 2 « » «¬ 0 0 [Z ] 0 0 [ XYZ ] [YZ ]»¼ builds a matrix, each row containing the free concentrations of the species that include a particular component multiplied by the corresponding stoichiometric coefficient of this component. Then,
sum(Model.* repmat(c_spec,ncomp,1),2)': [X]+[ XY ] [XY2 ] [XYZ ] ª º «[Y]+[ XY ] 2[ XY ] [XYZ ] [YZ ]» 2 « » «¬ »¼ [Z]+[ XYZ ] [YZ ]
t
(3.50)
calculates the sums along the rows to get the total calculated component concentrations. The transposition is needed to define c_tot_calc as a row vector. Finally, the calculation of the elements of the Jacobian J (actually J*) according to equation (3.42): J_s(j,k)=sum(Model(j,:).*Model(k,:).*c_spec); (J*)
¦ xy[X xYy Z z ] is
As an example the element J * (1,2)
computed as shown
below: Model(1,:).* Model(2,:).* c_spec: t
t
t
ª1 º ª0 º ª [X ] º «0 » «1 » « [Y ] » « » « » « » «0 » «0 » « [Z ] » « » « » « » «1 » . «1 » . « [XY ] »
«1 » «2» « [XY2 ] »
« » « » « »
«1 » «1 » «[XYZ ]» «0 » «1 » « [YZ ] » ¬ ¼ ¬ ¼ ¬ ¼
(3.51)
56
Chapter 3
J_s(1,2)= sum(Model(j,:).* Model(k,:).* c_spec): [XY ] 2[ XY2 ] [XYZ ]
(3.52)
All the other lines of code should be easy to understand. The termination criterion states that all absolute differences between calculated and actual total concentrations need to be smaller than 10-15M. This provides sufficient numerical accuracy for many chemical problems at typical total concentrations around 10-3M. It is quite obvious that the computations become less accurate if the total concentrations get closer to the break criterion. We use an absolute termination criterion to allow for zero total component concentration. The only technique implemented to deal with potential divergence takes back half the shift in those instances where at least one of the resulting component concentrations is negative. This is done as many times as required. It is important to stress that this function is minimal, convergence is not guaranteed and the user has to make sure that the initial guesses for the free component concentrations are reasonable. In this context reasonable means good enough to result in convergence. It is worth noting that this very compact function NewtonRaphson.m allows the computational resolution of any equilibrium situation of any complexity. There are no hard-coded limitations regarding the number of components or the number or species formed. There are, however, 'natural' limits to both these numbers due to the limited numerical accuracy of the computations (eg matrix inversion). Next we illustrate how to use the function by generating a complete titration, or rather, by computing the concentration profiles of all species as a function of the titration. Example: General 3-Component Titration The following program Eq1.m generates a titration. The components X, Y, Z and the model are the same as used previously. It is a titration of 10ml of a solution which is 10-3 M in X and 2u10-3M in Y with 15ml of a solution of 10 3M Z in steps of 1ml. This results in a set of 16 solutions of different composition. The first few lines should be self-explanatory. In the first loop a matrix Ctot of total concentrations of the components X, Y and Z is generated column-wise. These represent the 'true' total concentrations. Using Matlab's vectorised element-wise division (./) the total concentrations of the j-th component are computed for all the i=1…nvol solutions. Equation (3.53) shows this in a more classical way: In the i-th solution, the total concentration of the j-th component is the product of initial volume and its initial concentration, plus the added volume multiplied by its added concentration, divided by the total volume of the i-th solution; hence taking into account the dilution.
57
Physical/Chemical Models
c toti , j
c 0 j v 0 cadded j vaddedi
(3.53)
v 0 vaddedi
In the second loop, the matrix of species concentrations C is computed rowwise by the Newton-Raphson function. Each solution is analysed individually. To expedite the computations, the initial guesses for the component concentrations are the result of the previous solution (apart from the first one). If the Newton-Raphson function returns an error these initial guesses need to be improved. MatlabFile 3-10. Eq1.m % Eq1 spec_names = {'X' 'Y' 'Z' 'XY' 'XY2' 'XYZ' 'YZ'}; Model = [ 1 0 0 1 1 1 0; ... 0 1 0 1 2 1 1; ... 0 0 1 0 0 1 1]; log_beta = [ 0 0 0 3 7 10 4]; beta =10.^log_beta;
% X % Y % Z
c_0 = [1e-3 2e-3 0]; c_added = [0 0 1e-3]; ncomp = length(c_0);
% tot conc in initial solution X,Y,Z % tot conc titration solution X,Y,Z % number of components
v_0 v_added v_tot nvol
% % % %
= = = =
0.01; [0:0.001:.015]'; v_0+v_added; length(v_added);
initial volume (10mL) added solution (0-15mL) total volume number of solutions
for j=1:ncomp C_tot(:,j)=(v_0*c_0(j)+v_added*c_added(j))./v_tot; end c_comp_guess = [1e-10 1e-10 1e-10]; % initial guess for [X],[Y],[Z] for i=1:nvol C(i,:)=NewtonRaphson(Model,beta,C_tot(i,:),c_comp_guess,i); c_comp_guess=C(i,1:ncomp); % new guess = previous comp conc end set(gcf,'DefaultAxesLineStyleOrder', ... {'-',':','-.','--','-o','-s','-v','-^'}); plot(v_added,C); legend(spec_names); xlabel('vol. added (L)');ylabel('species conc. (M)')
58
Chapter 3
8
x 10
-4
X Y Z XY XY2 XYZ YZ
7
species conc. (M)
6 5 4 3 2 1 0 0
0.005 0.01 vol. added (L)
0.015
Figure 3-12. Concentration profiles for a complex titration.
Example: pH Titration of Acetic Acid The function NewtonRaphson.m is very general and can easily be adapted for any titration. The potentiometric titration of acetic acid with NaOH solution serves as an additional example of its usage. In aqueous solutions, whenever protonation equilibria are involved, the autoprotolysis of water needs to be incorporated into the model. Thus, for an acetic acid titration the model comprises two equilibria o AH A H m
(3.54)
o H 2O H OH m The autoprotolysis of water can be included into the algorithm in a very compact way. As already indicated in Table 1, hydroxo species are given the notation H-1. According to equation (3.22) we can write:
E 001
[ H 1 ] [ H ]1
[ H 1 ][ H ]
(3.55)
Thus [H-1]=[OH-] and E00-1=Kw, the ionic product of water. Refer also to the last column in the Model matrix.
Physical/Chemical Models
59
It is a titration of 10ml of 0.1M solution of acetic acid with 0.1M NaOH. Note the initial concentrations in the lines c_0 c_added
= [0.1 0.1]; = [0 -0.1];
% conc in initial solution A,H % conc titration solution A,H
The acetic acid solution is composed of 0.1 M acetate (A) and 0.1 M H+ (H). The sodium hydroxide solution is defined as -0.1 M H. A negative total concentration is of course only a formal notation. The Newton-Raphson routine interprets any negative [H+]tot correctly and computes the free [H+] and [OH-]. Replacing a few lines in Eq1.m results in Eq2.m: MatlabFile 3-11. Eq2.m % Eq2 spec_names = {'A' 'H' 'AH' 'OH'}; Model = [ 1 0 1 0; ... 0 1 1 -1]; ... log_beta = [ 0 0 5 -14]; beta =10.^log_beta; c_0 c_added ncomp
= [0.1 0.1]; = [0 -0.1]; = length(c_0);
v_0 v_added v_tot nvol
=0.01; =[0:0.0001:0.015]'; =v_0+v_added; =length(v_added);
% A % H
% tot conc in initial solution A,H % tot conc titration solution A,H % number of components % % % %
initial volume (10mL) added solution (0-15mL) total volume number of solutions
for j=1:ncomp c_tot(:,j)=(v_0*c_0(j)+v_added*c_added(j))./v_tot; end c_comp_guess = [1e-10 1e-10]; % initial guess for [A] and [H] for i=1:nvol C(i,:)=NewtonRaphson(Model,beta,c_tot(i,:),c_comp_guess,i); c_comp_guess=C(i,1:ncomp); end plot(v_added,-log10(C(:,2))); xlabel('vol. added (L)');ylabel('pH')
60
Chapter 3
13 12 11 10
pH
9 8 7 6 5 4 3 0
0.005 0.01 vol. added (L)
0.015
Figure 3-13. Potentiometric pH titration of acetic acid with strong base. There are very few other changes from the first titration program Eq1.m. The most obvious difference is in the plot where now the pH =-log[H+] is plotted against the added volume of base, instead of all concentrations. Equilibria in Excel It is possible to resolve complex equilibria in Excel. One feasible way of setting up a spreadsheet is represented in the example of Figure 3-14. It computes the equilibrium concentrations in a titration of a solution of a metal M with a ligand L. Two complexes are formed: ML and ML2. It is a 2 component 4-species system. The matrices defining the Model and the E-values are contained in the upper part of the spreadsheet. The given total concentrations of M and L are collected in the columns A and B, row 8 downwards. Initially guessed values for the component concentrations [M] and [L] for each solution are in the respective rows of columns D and E. The next two entries, [ML] and [ML2], are calculated from the component concentrations and the respective formation constant, according to equation (3.23). Next, the calculated total concentrations are computed, making sure the stoichiometric coefficients are incorporated correctly, equation (3.30). The task is to juggle the initially estimated component concentrations [M] and [L] in columns D and E until the calculated total concentrations collected in the columns I and J match the known concentrations in the columns A and B. The reader is encouraged
61
Physical/Chemical Models
to try to do it 'manually'. Since there are two parameters to change and they interact strongly, this task is very difficult. ExcelSheet 3-2. Chapter2.xls-eqML2
=F$5*D8*E8 =D8+F8+G8 =ABS(A8-I8)+ABS(B8-J8) =ABS(A8-I8)+ABS(B8-J8)
Figure 3-14. Excel spreadsheet for the modelling of a metal/ligand titration. The actual task of finding the correct free concentrations [M] and [L] is undertaken by the Solver. The Solver is a very powerful tool in Excel. It can be employed to maximise and minimise functions of many variables and to find solutions to functions of many variables. The Solver can be found in the Tools menu. If it is not there, it has to be installed as an Add-In , also found in the Tools menu. In the Matlab programs, every single absolute difference between known and computed total component concentrations defines the termination criterion. In Excel, the Solver termination criterion is the sum of all the absolute differences. It is advantageous to add constraints, as seen in Figure 3-1. The computed total concentrations have to be equal to the 'true' ones and the component concentrations have to be positive. While these constraints are not always required, in our experience they allow more robust calculations, e.g. starting from substantially wrong initial guesses.
62
Chapter 3
Figure 3-15. The Solver window for the analysis of one equilibrium. The Solver window shown in Figure 3-15 resolves the fifth solution in line 12. The Target Cell is set to L12, which is the sum of the absolute differences of the total concentrations for this particular solution of the titration. Make sure the Min button is selected. The default in solver is Max, which is certainly not what we want. The By Changing Cells box contains the references to the component concentrations [M] and [L] that are changed by the Solver until the objective function is minimised. Also, it is usually a good idea to select Set Automatic Scaling within the Options menu. All that needs to be done now is to click on Solver. A few comments and observations are appropriate. The quality of the initial guesses for the free concentrations of the components is more critical than in the Matlab Newton-Raphson routine introduced previously. The main disadvantage of the Solver, however, is the fact that it can only be applied to one instance. It cannot be 'dragged' around on the spreadsheet like most other functions of Excel. It means, for our present example, that for each solution the Solver needs to be set up individually, defining the Set Target Cell, the By Changing Cells and also the Subject to the Constraints list. A very cumbersome task for long titrations. Of course, it would be possible to set up a macro or implement Visual Basic functions; this is, however, beyond the aims of this book. Complex Equilibria Including Activity Coefficients In order to demonstrate the power of the Solver in Excel, let us return to the problem mentioned in the introduction to this chapter (p.31): What is the solubility of calcium sulphate? but this time taking into account activity coefficients. As it turns out, they are far from zero, even in a saturated solution of only slightly soluble gypsum.
63
Physical/Chemical Models
We need two equations. (a) The ionic strength μ of a solution is defined as half of the sum of all products of the concentrations cj multiplied by the square of their charges zj:
P
1 ¦ c j z 2j 2 j
(3.56)
And (b) the extended Debye-Hückel equation for the approximation of the activity coefficient Jj of the j-th ion. It needs the charge zi and the ionic radius Dj:
log J j
0.51z 2j P
1
(3.57)
Dj P 305
Any alternative approximation for the activity coefficients could be applied, the principle is the same. ExcelSheet 3-3. Chapter2.xls-CaSO4
=0.5*(D7*B7^2+D8*B8^2) =0.5*(D7*B7^2+D8*B8^2)
=10^((-0.51*B7^2*SQRT($B$10)) /(1+C7*SQRT($B$10)/305))
=B4/B1
=E7*D7*E8*D8
Figure 3-16. Spreadsheet to determine the solubility of CaSO4, including activity coefficients. The spreadsheet above 'functions' in the following way. The cell B4 contains a guess for the solubility. This allows the computation of the concentration for both ions in the cells D7 and D8. These in turn define the ionic strength in B10, computed by applying equation (3.56). Next,
64
Chapter 3
the activity coefficients for both ions are calculated in cells E7 and E8, based on equation (3.57). Note how small they are for the relatively low concentration! The calculated solubility product in cell B12 is the product of the activities of the two ions, as indicated by the curly brackets { }. The solver is given the task of finding that particular solubility in the cell B4 for which the correct solubility product (2.4u10-5M2) results in cell B12. Clicking the Solve button gives solubility=1.71g/l. Unfortunately this result is still not quite correct, the reason is that there is an ion pair CaSO4 formed that needs to be taken into account as well. Special Case: Explicit Calculation for Polyprotic Acids The Newton-Raphson algorithm we have developed can deal with any equilibrium situation. There is no limit to the number of components or species. The disadvantage is that the computations are iterative and this is clearly unsuitable for Excel applications. While it is possible to resolve complicated equilibria, it is inconvenient for complete titrations, as only one cell at a time can be evaluated. However, there are important special cases that can be solved explicitly. We deal with one here. Polyprotic acids are fairly important and their potentiometric pH titrations are common. For 2-component systems of this kind, it is possible to 'turn around' the computations and come up with explicit, non-iterative solutions. So far we have computed the species concentrations knowing the total component concentrations, which is an iterative process. This is the normal arrangement in titrations where volumes and total concentrations are known and the rest is computed, e.g. the [H+] and thus the pH. 'Turning around' things in this context means that one calculates the titration volume required to reach a given (measured) pH. One knows the 'y-value' and computes the corresponding 'x-value'. In this way there are explicit equations that can directly be implemented in Excel. We develop the equations for the special case of a weak two-protic acid titrated with a strong base, and then generalise. The following equilibria occur and the law of mass action applies. K
1 ZZZZ X A H YZZZ Z AH
K
2 ZZZZ X AH H YZZZ Z AH 2
K1 K2
[AH ] [ A ][H ] [AH 2 ] [AH ][H ]
(3.58)
The acid exists in three different forms, [ A ],[ AH ] and [AH 2 ] the sum of all of them, [Atot], is known.
65
Physical/Chemical Models
[ A ]tot
[ A ] [ AH ] [ AH 2 ]
D0 [ A ]tot D1[ A ]tot D 2[ A ]tot
(3.59)
with
D0
[A] [ A ]tot
D1
[ AH ] [ A ]tot
D2
[ AH 2 ] [ A ]tot
The D values (degree of dissociation) express the amount of a particular protonated/unprotonated form of the acid as a fraction of the total. Substitution into the equilibrium equations (3.58) results in K1
D1 D 0 [H ]
K2
D2 D1[H ]
(3.60)
Next, equations are derived for the fractions D1 and D2 in terms of D0 D1
D0 K1[H ]
D2
D1K 2 [H ]
(3.61)
D0 K1K 2 [H ]2
The sum of all fractions must be one: D0 D1 D2
D0 D 0 K1[H ] D 0 K1K 2 [H ]2
1
(3.62)
and this allows the computation of D0 D0
1 1 K1[H ] K1K 2 [H ]2
(3.63)
and subsequently
D1
K1[H ]
1 K1[H ] K1K 2[H ]2
D2
K1K 2[H ]2 1 K1[H ] K1K 2[H ]2
(3.64)
These equations are straightforwardly generalised for n-protic acids, defining K0=1. Di
K 0 K1K 2 ...K i [H ]i K 0 [H ] K1[H ] K1K 2[H ]2 ... K1K 2 ...K n [H ]n 0
1
(3.65)
66
Chapter 3
We are now in a position that allows the easy, non-iterative computation of the species distribution curves of any acid. The program EDTA.m performs the computations for edta, a 6-protic acid: MatlabFile 3-12. EDTA.m % Edta pH=[-1:.1:12]'; H=10.^(-pH); logK=[10.2 6.2 2.7 2 1.5 0]; K=10.^logK; n=length(logK);
% protonation constants EDTA
denom=zeros(size(H)); for i=0:n num(:,i+1)=H.^i*prod(K(1:i)); denom=denom+num(:,i+1); end alpha=diag(1./denom)*num;
% numerator % denominator
% number of protons
% alpha(:,1) contains alpha_0 etc
plot(pH, alpha) axis([-1 12 0 1]); xlabel('pH'); ylabel('alpha');
1 0.9 0.8 0.7
alpha
0.6 0.5 0.4 0.3 0.2 0.1 0
0
2
4
6
8
10
12
pH
Figure 3-17. Species distribution of edta. The extent of protonation increases from right to left One small remark is appropriate: in (3.59) the alphas conveniently start with
D0 for the fully deprotonated form of the acid. Vectors in Matlab cannot have a zero index and thus, in the program EDTA.m, Di is stored in the element alpha(i+1).
Physical/Chemical Models
67
In analytical chemistry, it is convenient and customary to define the fraction ) which is the fraction of the number of moles base, nB, relative to the number of moles of acid, nA. nB nA
)
c Bv B c Av A
(3.66)
In equation (3.66) cA is the initial concentration of the acid, cB is the concentration of the base solution used for the titration, vA is the initial volume of the acid and vB is the volume of added base. This fraction can be expressed as a function of pH and other known variables. (We do not develop the equation here and refer to standard texts in analytical chemistry.) The general formula for an n-protic acid is: Dn 1 2Dn 2 ... nD 0 ) 1
[H ] [OH ] cA
[H ] [OH ] cB
(3.67)
Knowing cA, cB and vA we can compute vB, the volume of added base. vB
)c Av A cB
(3.68)
We apply the principle to compute the titration curve of 25 ml of 5M phosphoric acid with 0.1M sodium hydroxide solution. The pH-values in column A are given, the amount of the base solution to reach these pHvalues are calculated in column J. ExcelSheet 3-4. Chapter2.xls-H3PO4
=1+$B$5*B10+$B$5*$C$5*B10^ = 1+$B$5*B10+$B$5*$C$5*B10^ 2+$B$5*$C$5*$D$5*B10^3 =I10*$B$2*$B$3/$B$1 =I10*$B$2*$B$3/$B$1 =$B$5*$C$5*$D $5*B10^3/D10
=(G10+2*F10+3*E10-(B10 =(G10+2*F10+3*E10-(B10C10)/$B$2)/(1+(B10-C10)/$B$1)
Figure 3-18. Calculations for titration curve of phosphoric acid.
68
Chapter 3
D in the column D is the denominator of equation (3.65). Note the negative volumes calculated for the first two pH values. The pH values are lower than would be expected for pure phosphoric acid of this concentration. The negative volume of added base can be interpreted as the volume of strong acid of the same concentration. The plot includes only the positive part. 14
12
10
pH
8 6 4 2 0 0
10
20 Vb (mL)
30
40
Figure 3-19. Titration curve of phosphoric acid. The fact that the three equilibrium constants are well separated in phosphoric acid is not relevant. The spreadsheet can deal automatically with any 3-protic acid. Figure 3-20 is the result of replacing the logarithms of the protonation constants in the cells B4:D4 with the values for citric acid (6.4, 4.8, 3.1): 14 12
pH
10 8 6 4 2 0 0
10
20 30 Vb (mL)
Figure 3-20. Titration curve of citric acid.
40
50
69
Physical/Chemical Models
3.3.4 Solving Non-Linear Equations The relative ease of solving the system of non-linear equations for rather complex equilibrium problems, as indicated by the shortness of the function NewtonRaphson.m and by the inconsequentiality of poor initial guesses, is misleading. As we will see shortly, this statement is particularly pertinent to cases of general systems of m equations with m parameters. Solving systems of equations is a common task and we give a short introduction. To start with, we investigate the simple case of one equation with one parameter. One Equation, One Parameter For the 1-dimensional case it is possible to represent the basic ideas graphically. This allows natural understanding and good insight into the potential shortfalls of the Newton-Raphson algorithm. We have chosen a truly irrational function y
cos( x ) log(x) S
(3.69)
In MATLAB this translates into: x=1:.1:50; y=cos(sqrt(x))./(log(x)+pi); plot(x,y) xlabel('x');ylabel('y');
0.2 0.15 0.1
y
0.05 0 -0.05 -0.1 -0.15 -0.2 0
10
20
30
40
50
x
Figure 3-21. Graph of equation (3.69), y
cos( x )/(log(x) S) .
70
Chapter 3
The task is to resolve (3.69) for y=0, i.e. to find the x-values that result in y=0, or in other words to find the intercepts of the curve with the x-axis. These x-intercepts are commonly call roots. There is a solution around x=20, another one near x=5 and possibly even more outside the window displayed in Figure 3-21.
f(x0)
f’(x0) 0
x0
x1
f(x1)
20
22
24
26
28
30
32
34
Figure 3-22. Schematic representation of the Newton-Raphson algorithm.
The Newton-Raphson technique is represented graphically in Figure 3-22. As a guess, we start with a value of x0 =30. We calculate its function value, f(x0) and the derivative f'(x0). The x-intercept of the tangent with slope f'(x0) through the point [x0,f(x0)] is a better estimate, x1, for the solution. We continue by recycling the idea. Calculate f(x1) and lay the tangent with slope f'(x1) through the point [x1,f(x1)]. Its intersection, x2, is already very close to the correct value (in the Figure it is not possible to draw this line and distinguish it from the graph of the function). Figure 3-23 illustrates one potential disaster that strikes if the initial guess x0 is in an unfortunate position, however, note, it is not very far from the solution. f'(x1) is zero or very small and thus the x-intercept is at infinity or very far away. Most likely the iterative process will collapse.
71
Physical/Chemical Models
f’(x1)
x0
0
f’(x0)
f(x0)
0
5
10
15
20
25
30
35
40
45
50
Figure 3-23. The Newton-Raphson algorithm with an unfortunate initial guess.
It all depends on the initial guess, and it is far from easy to know where to start if the approximate location of the root is not known a priori. Instead of adapting the NewtonRaphson.m function we just use the Matlab function fzero which is a general routine for that kind of one-dimensional problem. x = fzero(inline('cos(sqrt(x))./(log(x)+pi);'),30) x = 22.2066
Systems of Non-Linear Equations
The resolution of systems of m equations with m unknowns is dramatically more complicated! Rather than giving a theoretical account of that statement, we have chosen an example for illustration: a system of 2 equations with 2 unknowns: sin(x ) + cos(y ) - 0.5 = 0 x u e-y - 2 = 0
(3.70)
We can plot both equations as surfaces, the function value z as a function of x and y. The first function, z=sin(x)+cos(y)-0.5, is an 'egg carton' like surface with many solutions z=0, see Figure 3-24. The second surface, z=xe-y -2, is much less structured, Figure 3-25.
72
Chapter 3
MatlabFile 3-13. Egg_Carton.m % Egg_Carton x=-4:.2:10; y=-4:.2:10; [X,Y]=meshgrid(x,y); Z1 = sin(X) + cos(Y) - 0.5; Z2 = X .* exp(-Y) -2; figure(1) h=surfl(x,y,Z1); colormap gray set(h,'linestyle','none') xlabel('x');ylabel('y');zlabel('z');
Figure 3-24. The function z = sin(x)+cos(y)-0.5. MatlabFile 3-14. Egg_Carton.m …continued %Egg_Carton ...continued figure(2) h=surf(x,y,Z2,'linestyle','none'); colormap gray xlabel('x');ylabel('y');zlabel('z')
Figure 3-26 displays the solutions z=0 in the (x,y)-plane, the black circle like lines are the solutions for the first equation while the dotted line is the solution for the second. Obviously, solutions for both are the intersections of the two sets of lines. Within the window shown in the figure there are four solutions, there is an infinite number of additional solutions outside this window. Depending on the starting estimates, any of the solutions will be found.
73
Physical/Chemical Models
Figure 3-25. The function z=xe-y-2. MatlabFile 3-15. Egg_Carton.m …continued %Egg_Carton ...continued figure(3) contour(x,y,Z1,[0 0],'k') hold on contour(x,y,Z2,[0 0],'k:') hold off xlabel('x');ylabel('y')
10 8 6
y
4 2 0 -2 -4 -4
-2
0
2
4
6
8
10
x
Figure 3-26. The solutions for sin(x)+cos(y)-0.5=0 (full lines) and xe-y-2=0 (dotted line).
74
Chapter 3
As we have demonstrated, systems of non-linear equations with several unknowns are difficult to resolve. The task of developing a general program that can cope with all eventualities is huge. We are only offering a very minimal program that specifically analyses the system of equations (3.70). Instead of the two variables x and y we use a vector x with two elements; similarly, we use a vector z instead of z1 and z2. The elements of the required Jacobian J can be given explicitly (see Two_Equations.m). The shift vector delta_x is calculated as in equation (3.38). MatlabFile 3-16. Two_Equations.m % Two_Equations x=[3; 0]; it=0; while it<100 it=it+1; z=[sin(x(1)) + cos(x(2)) - 0.5; ... x(1) * exp(-x(2)) - 2]; if sum(abs(z)) <1e-10 break end
% break if z1, z2
J=[cos(x(1)) -sin(x(2)); exp(-x(2)) -x(1)*exp(-x(2))];
% Jacobian dz/dx
delta_x= -J\z; x=x + delta_x;
% shift vector
small
end x x = 3.4969 0.5587
Depending on the initial guess in the first line (x=[3; 0];), any of the infinite number of other solutions will result; or in some unfortunate instances divergence and a mild catastrophe might occur. It is very easy to set up a spreadsheet for this system of equations. Most of Figure 3-27 is self-explanatory. But note that instead of setting as a target abs(z1)+abs(z2)=0, the target is the sum of the squares of the two z values. The Solver works better this way. An alternative is to have the above normal termination criterion and the additional constraint $D$3:$E$3=0.
Physical/Chemical Models
75
ExcelSheet 3-5. Chapter2.xls-eqsys
=SIN(A3)+COS(B3)-0.5 =SIN(A3)+COS(B3)-0.5 =A3*EXP(-B3)-2 =A3*EXP(-B3)-2
=D3*D3+E3*E3
Figure 3-27. A spreadsheet, using the Solver to solve the system of non-linear equations.
It is worthwhile going back to the original task of solving equilibrium problems. After seeing Figure 3-26 it might come as a surprise that the Newton-Raphson algorithm, if applied to solution equilibria, converges as reliably as it does. One important reason is that we know there is only one solution for positive component concentrations. So, if there is convergence, it will be towards the correct solution. Imagine there were more than one possible equilibrium positions after mixing a few components. Say one is green, the other red. How could the solution determine whether it should turn green or red? The existence of one unique solution alone does not guarantee convergence. Experience has shown that guesses for the component concentrations that are too low seem to be less prone to divergence than estimates that are too large. It might be a good idea just to accept that as an empirical fact, rather than trying to explain it. Matlab does not include a routine of the kind of fzero for more than one variable. Only the function fsolve, which is part of the Optimisation Toolbox, can deal with systems of equations with several variables. Here we demonstrate the application of fsolve to the system of equations (3.70). MatlabFile 3-17. nonlineq.m function f=nonlineq(x) f=[sin(x(1))+cos(x(2))-0.5; x(1)*exp(-x(2))-2]; MatlabFile 3-18. Main_nonlineq.m %Main_nonlineq
76
Chapter 3
x0=[3; 0]; x=fsolve('nonlineq',x0) x = 3.4969 0.5587
It is possible to adapt fsolve to analyse equilibrium problems for which we developed NewtonRaphson.m. Attempts to do so are frustrated by the slow computation times, compared with NewtonRaphson.m. The reason lies in the fact that fsolve is a very general program that can deal with 'anything', while NewtonRaphson.m is dedicated to one specific task. Usually small, dedicated programs execute faster but are not easily adapted for other tasks.
3.4 Kinetics, Mechanisms, Rate Laws The investigation of the kinetics of a chemical reaction serves two purposes. A first goal is the determination of the mechanism of a reaction. Is it a first order reaction, AoB, or a second order reaction, 2AoB ? Is there an intermediate AoIoB ? and so on. The other goal of a kinetic investigation is the determination of the rate constant(s) of a reaction. The first aspect is much more difficult, it needs good experimentation as well as chemical knowledge and intuition. The second aspect is comparatively much simpler and we deal with it in the next chapter. A complete analysis, comprising both aspects, allows accurate predictions of the time behaviour of a reaction for any initial conditions. This can be an invaluable tool: e.g. the optimisation of an industrial chemical process. The core of all the above tasks is the modelling of the chemical reaction for a given mechanism and the corresponding set of rate constants, i.e. the computation of the concentration profiles of all reacting species as a function of time. In this sub-chapter, we demonstrate how to perform the computations for essentially any mechanism of any complexity. The limitations are of a numerical, not a fundamental, nature. Similar to the modelling of equilibrium titrations, we compute a matrix C that contains, as columns, the concentration profiles of the interacting species. There is one significant difference: in equilibrium modelling each solution is treated independently, i.e. each row of C is computed individually, independent from all the other rows (the only information that is carried from one solution to the next are the initial guesses, accelerating the computations but not influencing anything else). In kinetics the concentrations at any time completely depend on the concentrations before, the initial concentrations determine everything. The concentrations are computed sequentially and the complete matrix C is computed as one unit.
Physical/Chemical Models
77
3.4.1 The Rate Law The equivalent to the law of mass action, as encountered in the previous chapter (e.g. in equation (3.22)), are systems of differential equations, defined by the chemical model or the reaction mechanism and the corresponding rate constants. We start with a general chemical reaction, just to practise the notation Ɇ it is not a realistic example: k o cC aA bB m k
(3.71)
For this reaction, we can write the following system of ordinary differential equations (ODEs): [ A ] a
[B ] b
[C ] c
k [ A ]a [B ]b k [C ]c
Note the notation [ A ] for the derivative of [A] with respect to time [ A ]
(3.72) d[ A ] . dt
Multistep mechanisms are dealt with in the same way, as an example: k1 A oB k2 2B oC
[ A ] [B ]
k1[ A ]
[C ]
k2[B ]2
(3.73)
k1[ A ] 2k2 [B ]2
These systems of differential equations need to be integrated. The resulting concentration profiles depend on the rate constants and initial concentrations of the reacting species at time zero, [A]0, [B]0, [C]0. These initial concentrations are the boundary conditions that completely define the concentrations as a function of the reaction time. As in Chapter 3.3, Titrations, Equilibria, the Law of Mass Action, we start with the discussion of simple mechanisms for which the systems of differential equations can be solved explicitly. Later we explain how numerical integration routines can be employed to calculate concentration profiles for any mechanism.
3.4.2 Rate Laws with Explicit Solutions There is a limited number of reaction mechanisms with sets of ODEs that can be integrated analytically, i.e. for which there are explicit formulas to calculate the concentrations of the reacting species as a function of time.
78
Chapter 3
This set includes all reaction mechanisms that contain only first order reactions, as well as very few mechanisms with second order reactions. Any textbook on chemical kinetics or physical chemistry supplies a list. A few examples for such mechanisms are given below: a)
k A o B
b)
k 2 A o B
c)
k A + B o C
d)
k2 k1 A o B o C
(3.74)
For the above reactions the ODE's are: [A ] = -[B ] = -k [A ] [A ] = -2[B ] = -2k [A ]2
a) b)
[A ] = [B ] = -[C ] = -k [A ] [B ] [A ] = -k1 [A ], [B ] = k1 [A ] - k2 [B ], [C ] = k [B ]
c) d)
(3.75)
2
Integration of the ODEs results in the concentration profiles for all reacting species as a function of the reaction time and the initial concentrations. The explicit solutions for the ODEs above (3.75) are given below (3.76). We list the equations for one concentration only. The remaining concentrations can be calculated from the closure principle, which is nothing else but the law of conservation of mass (e.g. in the first example [B]=[A]0-[A], where [A]0 is the concentration of A at time zero). Only in example d) two concentrations need to be given to allow the determination of the third by subtraction. a)
[A ] = [A ]0e-kt
b)
[A ] =
[A ]0 1+2[A ]0kt
c)
[A ] =
[A ]0 ([B ]0 -[A ]0 ) [B ]0e([B ]0 -[A ]0 )kt -[A ]0
d)
[A ] = [A ]0e-kt , [B ] = [A ]0
k1 (e-k1t -e-k2t ) k2 -k1
([A ]0 z [B ]0 )
([B ]0
(3.76)
0, k1 z k2 )
k Modelling and visualisation of a reaction A o B require only a few lines of Matlab code. A plot of the concentration profiles is given in Figure 3-28. This task can be performed equally well in Excel.
79
Physical/Chemical Models MatlabFile 3-19. AtoB.m % AtoB t=[0:50]'; % time vector (column vector) A_0=1e-3; % initial concentration of A k=.05; % rate constant C(:,1)=A_0*exp(-k*t); % [A] C(:,2)=A_0-C(:,1); % [B] (Closure) plot(t,C); % plotting C vs t xlabel('time');ylabel('conc.');
1
x 10
-3
0.9 0.8 0.7
conc.
0.6 0.5 0.4 0.3 0.2 0.1 0 0
10
20
30
40
50
time
k Figure 3-28. Concentration profiles for a reaction A o B.
Solutions for the integration of ODE's, such as the ones given in equations (3.76), are not always readily available. For non-specialists it is difficult to determine if there is an explicit solution at all. The symbolic toolbox (which is not contained in the standard Matlab) provides very convenient means to integrate systems of differential equations and also to test whether there is k an explicit solution. As an example the reaction 2A o B: % 2A -> B, explicit solution d=dsolve('Da=-2*k*a^2','Db=k*a^2','a(0)=a_0','b(0)=0'); pretty(simplify(d.a)) a_0 ------------2 k t a_0 + 1
or in a readable form: [A]
[ A ]0 2 k t [ A ]0 1
(3.77)
80
Chapter 3
Note that Matlab’s symbolic toolbox demands lower case characters for species names. Below, the attempt to use the symbolic toolbox for the integration of a slightly more complex mechanism: k1 A oB k
(3.78)
2 oC 2B
% A -> B, explicit solution % 2B -> C d=dsolve('Da=-k1*a','Db=k1*a-2*k2*b^2','Dc=k2*b^2', ... 'a(0)=a_0','b(0)=0','c(0)=0'); pretty(simplify(d.c))
As it turns out, the explicit solution is very complex including several Bessel functions. We leave it to the reader to explore the output. As a matter of fact, most mechanisms do not have explicit solutions and require numerical integration. The few mechanisms discussed so far are exceptions rather than the rule. Fortunately, numerical integration is always possible and next we demonstrate how this can be achieved.
3.4.3 Complex Mechanisms that Require Numerical Integration Numerical integration of sets of differential equations is a well developed field of numerical analysis, e.g. most engineering problems involve differential equations. Here, we only give a very brief introduction to numerical integration. We start with the Euler method, proceed to the much more useful Runge-Kutta algorithm and finally demonstrate the use of the routines that are part of the Matlab package. The Euler Method The simplest method for the numerical integration of a system of differential equations is named after Euler. Not surprisingly, it can be seen as an adaptation of the truncated Taylor series expansion, equation (3.81), which is the standard tool for non-linear problems. We have already encountered it in Solving Complex Equilibria (p.48), and we employ it again for non-linear least-squares fitting in Chapter 4.3, Non-Linear Regression. Please note: the Euler method should not be used. It is very slow even if only a modest accuracy is required. But because of its simplicity it is ideally suited to demonstrate the general principles of the numerical integration of ordinary differential equations. As usual, we start with a simple example; consider the reversible reaction: k+ ZZZZ X 2 A YZZZ Z B k
(3.79)
81
Physical/Chemical Models
While there is an analytical solution for this mechanism, the formula for the calculation of the concentration profiles for A and B is fairly complex, involving the tan and atan functions (according to Matlab’s symbolic toolbox). We use it to demonstrate the basic ideas of numerical integration. The Euler method can be represented graphically, see Figure 3-29.
[A]0
[ A ]0
slope
[A]1
slope
[ A ]1
[A]2
[ A ]2
slope
[A]3
t0
t1
t2
t3
Figure 3-29. Euler method for numerical integration.
At any time t, knowing the concentrations [ A ]t and [B ]t , the derivatives of the concentrations of A and B, [ A ]t and [B ]t , can be calculated by applying the rate law.
[ A ]t = -2k+[A ]t2 + 2k- [B ]t [B ]t = -
1 [ A ]t = k+[A ]t2 - k- [B ]t 2
(3.80)
Figure 3-29 only deals with the concentration [A], the principle for the treatment of concentration [B] is identical. Starting at time t0 the initial concentrations are [A]0 and [B]0. The derivatives [ A ]0 and [B ]0 are calculated according to equation (3.80). This allows the computation of new concentrations, [A]1 and [B]1 after a short time interval 't = t1–t0. [A ]1 = [A ]0 + Ʀt [A ]0 [B ] = [B ] + Ʀt [B ] 1
0
(3.81)
0
These new concentrations at time t1 in turn allow the determination of new derivatives and thus another set of concentrations [A]2 and [B]2, after the second time interval t2–t1. As shown in Figure 3-29, this procedure is simply repeated until the desired final reaction time is reached. The main disadvantage of the Euler method is that the calculated approximation for the concentrations are systematically wrong for each step.
82
Chapter 3
In the example, the tangent [A ] always overestimates the change of the concentration for any time interval. The longer it is, the larger the deviation. Thus to maintain a good accuracy, step sizes have to be very small; but then computation times are very long and additionally other numerical problems start to interfere. Fortunately, there are many better methods available. Amongst them algorithms of the Runge-Kutta type are frequently used in chemical kinetics. In contrast to our preferred standard mode in this book, we do not develop a Matlab function for the task of numerical integration of the differential equations pertinent to chemical kinetics. While it would be fairly easy to develop basic functions that work reliably and efficiently with most mechanisms, it was decided not to include such functions since Matlab, in its basic edition, supplies a good suite of fully fledged ODE solvers. ODE solvers play a very important role in many applications outside chemistry and thus high level routines are readily available. An important aspect for fast computation is the automatic adjustment of the step-size, depending on the required accuracy. Also, it is important to differentiate between stiff and non-stiff problems. Proper discussion of the difference between the two is clearly outside the scope of this book, however, we indicate the stiffness of problems in a series of examples discussed later. So, instead of developing our own ODE solver in Matlab, we will learn how to use the routines supplied by Matlab. This will be done in a quite extensive series of examples. The situation is different for those readers who do not have access to Matlab and rely completely on Excel. In the following, we explain how a fourth order Runge-Kutta method can be incorporated into a spreadsheet and used to solve non-stiff ODE's. Fourth Order Runge-Kutta Method in Excel The fourth order Runge-Kutta method is the workhorse for the numerical integration of ODEs. Elaborate routines with automatic step-size control are available in Matlab. Here, we develop an Excel spreadsheet for the numerical integration of the k ZZZZ X reaction mechanism 2A YZZZ Z B based on the 4th order Runge-Kutta k
method, see Figure 3-30. The 4th order Runge-Kutta method requires four evaluations of concentrations and derivatives per step. This appears to be a serious disadvantage, but as it turns out, significantly larger step sizes can be taken for an acceptable accuracy and overall the computation times are very much shorter.
Physical/Chemical Models
83
ExcelSheet 3-6. Chapter2.xls-RungeKutta
=B5+E5/6*(F5+2*J5+2*N5+R5)
=A6-A5 =A6-A5 =2*$B$1*B5^2+2*$B$2*C5 =-2*$B$1*B5^2+2*$B$2*C5
=B5+E5/2*J5 =2*$B$1*L5^2+2*$B$2*M5 =-
=B5+E5/2*F5
=B5+E5*N5
=-2*$B$1*H5^2+2*$B$2*I5 =-2*$B$1*H5^2+2*$B$2*I5
=-2*$B$1*P5^2+2*$B$2*Q5
Figure 3-30. Excel spreadsheet for the numerical integration of the k ZZZZ X rate law for the reaction 2A YZZZ Z B using 4th order Runge-Kutta k
equations. The 4th order Runge-Kutta method is reasonably complex. Without developing the equations, we demonstrate their application by applying them stepwise in an Excel spreadsheet that is written specifically for the reaction k ZZZZ X 2A YZZZ Z B. k
We start at time zero where the initial concentrations are [A]0=1 and [B]0=0 (cells B5 and C5). The time interval 't=1 is calculated in cell E5. 1. Calculate the derivatives of the concentrations at time point 0:
[ A ]0 = -2k+ [A ]02 + 2k- [B ]0 [B ]0 = k+[A ]20 - k- [B ]0
(cell F5) (cell G5)
In the Excel language, for [ A ]0 , this translates into =-2*$B$1*B5^2+2*$B$2*C5, as indicated in Figure 3-30. Note, in the figure only the cell formulas for the computations of component A are given. 2. Calculate approximate concentrations at the intermediate time point 't/2, based on the concentrations and derivatives at time 0.
't [A ]0 2 't [B ]1 = [B ]0 + [B ]0 2 [A ]1 = [A ]0 +
(cell H5) (cell I 5)
Again, the Excel formula for component A is given in Figure 3-30.
84
Chapter 3
3. Calculate the derivatives at the intermediate time point:
[ A ]1 = -2k+ [A ]12 + 2k- [B ]1 [B ]1 = k+ [A- ]12 k- [B ]1
(cell J5) (cell K5)
4. Calculate another set of concentrations at the intermediate time point, now based on the concentrations at time 0 and the derivatives at the intermediate time point:
't [A ]1 2 't [B ]2 = [B ]0 + [B ]1 2 [A ]2 = [A ]0 +
(cell L5) (cell M5)
5. Compute a new set of derivatives at the intermediate time point, based on the concentrations just calculated:
[ A ]2 = -2k+ [A ]22 + 2k- [B ]2 [B ]2 = k+ [A ]22- k- [B ]2
(cell N5) (cell O5)
6. Next, the concentrations at the new time point 1, after the complete time interval, are computed, based on the concentrations at time point 0 and these new derivatives at the intermediate time point:
[A ]3 = [A ]0 + 't [A ]2 [B ] = [B ] + 't[B ] 3
0
2
(cell P5) (cell Q5)
7. Computation of the derivatives at time point 1:
[ A ]3 = -2k+ [A ]32 + 2k- [B ]3 [B ]3 = k+[A ]32- k- [B ]3
(cell R5) (cell S5)
8. Finally the new concentrations after the full time interval are computed as: 't [A ]new = [A ]0 + [A ]0 +2[A ]1 2[A ]2 [A ]3 (cell B6) 6 't [B ]new = [B ]0 + [B ]0 2[B ]1 2[B ]2 [B ]3 (cell C6) 6 These concentrations are put as the next elements into the columns B and C.
85
Physical/Chemical Models
These final equations can be compared with the equivalent in the Euler approach, equation (3.81). In the Runge-Kutta method a weighted average of 4 different approximations for the derivatives is used instead of the derivative at the beginning of the time interval. Figure 3-31 displays the resulting concentration profiles for species A and B. As it is a reversible reaction, an equilibrium is established after a certain time.
1
concentration
0.8
0.6
0.4
0.2
0 0
2
4
6
8
10
time
k ZZZZ X Figure 3-31. Concentration profiles for a reaction 2A YZZZ Z B ( k
[A], } [B]) as modelled in Excel using a 4th order Runge-Kutta for numerical integration. While the above spreadsheet looks moderately complex, it nevertheless allows the accurate numerical integration of a real chemical reaction in a very short time. We might get the impression that "this is it". Far from it, there are several important aspects that we can only point out very briefly here. As mentioned before, the spreadsheet has been set up specifically for the numerical integration of the differential equations for the reaction k ZZZZ X 2A YZZZ Z B . It is not convenient to adapt this spreadsheet for the k
computation of a different reaction scheme. Most equations need to be rewritten. Such manual changes are error prone, and ensuing wrong results can easily go unnoticed. It is of course possible to set up a more elaborate spreadsheet that more readily allows adaptation for different reaction mechanisms. (The web-site http://www.cse.csiro.au/poptools/index.htm offers a large downloadable selection of tools for Excel, amongst them a package for the numerical integration of ODE's)
86
Chapter 3
For fast computation the determination of the best step-size (interval) is crucial; steps that are too small result in correct concentrations at the expense of long computation times. On the other hand, steps that are too long save computation time but result in poor approximations. The best intervals lead to the fastest computation of concentration profiles within some pre-defined error limits. This of course requires knowledge about the required accuracy. The ideal step-size is not constant during the reaction and so needs to be adjusted continuously. If more complex mechanisms and thus systems of differential equations are to be integrated, adaptive step size control is absolutely essential. The Runge-Kutta algorithm cannot handle so-called stiff problems. Computation times are astronomical and thus the algorithm is useless, for that class of ordinary differential equations, specialised 'stiff solvers' have been developed. In our context, a system of ODEs sometimes becomes stiff if it comprises very fast and also very slow steps and/or very high and very low concentrations. As a typical example we model an oscillating reaction in The Belousov-Zhabotinsky (BZ) Reaction (p.95). It is well outside the scope of this chapter to expand on the intricacies of modern numerical integration routines. Matlab provides an excellent selection of routines for any situation. For further reading we refer to the relevant literature and the Matlab manuals. Thus, rather than trying to explain how they work, we demonstrate how they are used.
3.4.4 Interesting Kinetic Examples Fast and accurate ODE solvers are very complex algorithms. In particular the design of adaptive step size and the analysis of stiff problems require sophisticated algorithms. The development of such algorithms is beyond the scope of this book. Matlab supplies a good collection of routines that cater for all the needs of the kineticist dealing with any reasonably complex mechanism. In contrast to the other parts of this book, we do not develop a generally applicable algorithm. Instead, we demonstrate in a representative collection of interesting and exemplary mechanisms, how to use the Matlab solvers. In this chapter, we concentrate on the simulation of chemical kinetics, i.e. based on a given chemical mechanism and the relevant rate constants, the concentration profiles (the matrix C) of all reaction species is computed. The next chapter incorporates these functions into a general fitting routine that can be used to fit the optimal rate constants for a given mechanism to a particular measurement. We start with simple chemical examples, later we examine a few interesting and surprising non-chemical examples.
Physical/Chemical Models
87
Autocatalysis Processes are called autocatalytic if the products of a reaction accelerate their own formation. Autocatalytic reactions get faster as the reaction proceeds, sometimes dramatically, sometimes slowly and steadily. Exponential growth is a very basic non-chemical example. Of course the acceleration cannot be permanent; the reaction will slow down and eventually come to an end once the starting materials have been used up. Only economists believe in sustainable growth. An extreme example of an autocatalytic reaction is an explosion. In this case, it is not directly a chemical product that accelerates the reaction, it is the heat generated by the reaction. The more heat produced, the faster the reaction; the faster the reaction, the more heat, etc. There are many mechanisms that display autocatalytic behaviour. A minimal and very basic autocatalytic reaction scheme is presented below: k1 A o B
(3.82)
k
2 A + B o 2B
Starting with component A there is a relatively slow first order reaction to form the product B. The second reaction is of the order two, it opens another path for the formation of component B. As it is a second order reaction, the higher the concentration of B, the faster is the decomposition of A to form more B. The system of differential equations for this reaction scheme is given below (3.83): [A ] = -k1[A ] - k2 [A ][B ] [B ] = k [A ] + k [A ][B ] 1
(3.83)
2
The organisation of the Matlab ODE solvers requires some explanation. For this example, the core is a function, ode_autocat.m, that returns the derivatives of the concentrations at any particular time or better, for any set of concentration of the reacting species. Essentially it is the Matlab code for equation (3.83). MatlabFile 3-20. ode_autocat.m function c_dot=ode_autocat(t,c,flag,k) % A --> B % A + B --> 2 B c_dot(1,1)=-k(1)*c(1)-k(2)*c(1)*c(2); c_dot(2,1)= k(1)*c(1)+k(2)*c(1)*c(2);
% A_dot % B_tot
c_dot is a column vector of the two derivatives [A ] and [B ]. The vectors k and c contain the rate constants and the actual concentrations; t is the time at which the derivatives are computed; it is not used within this particular function. The flag is not used here either.
88
Chapter 3
This function is called numerous times from the Matlab ODE solver. In the example it is the ode45 which is the standard Runge-Kutta algorithm. ode45 requires as parameters the file name of the inner function, ode_autocat.m, the vector of initial concentrations, c0, the rate constants, k, and the total amount of time for which the reaction should be modelled (20 time units in the example). The solver returns the vector t at which the concentrations were calculated and the concentrations themselves, the matrix C. Note that due to the adaptive step size control, the concentrations are computed at times t which are not predefined. MatlabFile 3-21. autocat.m % autocat % A --> B % A + B --> 2 B c0=[1;0]; % initial conc of A and B k=[1e-6;2]; % rate constants k1 and k2 [t,C]=ode45('ode_autocat',20,c0,[],k); % call ode-solver plot(t,C) % plotting C vs t xlabel('time');ylabel('conc.');
1.2 1
conc.
0.8 0.6 0.4 0.2 0 -0.2 0
5
10 time
15
20
Figure 2-32. Concentration profiles for the autocatalytic reaction k2 k1 A o B ; A + B o 2B.
Figure 2-32 shows the calculated corresponding concentration profiles using the rate constants k1=10-6s-1 and k2=2M-1s-1 for initial concentrations [A]0=1M and [B]0=0M. After an induction period of some 5 time units the reaction accelerates dramatically. At around 10 time units, when the
Physical/Chemical Models
89
component A is almost used up, the reaction decelerates and quickly comes to the end. We are using the solvers here in their very basic version. Many additional parameters can be controlled, such as maximal step size or required accuracy. We refer to the original documentation for more information about these topics. In the above program, autocat.m, the 20 represents the total time. The ODE solver calculates the optimal step size automatically and returns the time vector t with the concentrations C. The ODE solver can also be forced to return concentrations at specific times by passing the complete vector of times instead of only the total time. 0th Order Reaction In strict terms, 0th order reactions do not really exist. They are always macroscopically observed reactions where the rate of the reaction is independent of the concentrations of the reactants for a certain time period. Formally, the ODE for a basic 0th order reaction is defined below: [A ] = -k [A ]0 = -k
(3.84)
A simple mechanism that mimics a 0th order reaction is the catalytic transformation of A to C. A reacts with the catalyst Cat to form an intermediate activated complex B. B in turn reacts further to form the product C releasing the catalyst that continues reacting with A. k1 A + Cat o B k
(3.85)
2 B o C + Cat
The total concentration of catalyst is much smaller than the concentrations of the reactants or products. Note, that in real systems, the reactions are reversible and usually there are more intermediates, but for the present purpose, this minimal reaction mechanism is sufficient. The system of ODEs: [ A ] ] [Cat [B ] [C ]
k1[A ][Cat ] k1[A ][Cat ] k2 [B ]
k1[A ][Cat ] k2[B ] k2[B ]
MatlabFile 3-22. ode_zero_order.m function c_dot=ode_zero_order(t,c,flag,k) % 0th order kinetics % A + Cat --> B % B --> C + Cat c_dot(1,1)=-k(1)*c(1)*c(2);
% A_dot
(3.86)
90
Chapter 3
c_dot(2,1)=-k(1)*c(1)*c(2)+k(2)*c(3); c_dot(3,1)= k(1)*c(1)*c(2)-k(2)*c(3); c_dot(4,1)= k(2)*c(3);
% Cat_dot % B_dot % C_dot
The production of C is governed by the amount of intermediate B which is constant over an extended period of time. As long as there is an excess of A with respect to the catalyst, essentially all of the catalyst exists as complex B and thus this concentration is constant. The crucial differential equation is the last one; it is a 0th order reaction as long as [B] is constant. The kinetic profiles displayed in Figure 3-32 have been integrated numerically with Matlab's stiff solver ode15s using the rate constants k1=1000 M-1s-1, k2=100 s-1 for the initial concentrations [A]0=1 M, [Cat]0=10 4 M and [B] =[C] =0 M. For this model the standard Runge-Kutta routine is 0 0 far too slow and thus useless. MatlabFile 3-23. zero_order.m % zero_order % A + Cat --> B % B --> C + Cat c0=[1;1e-4;0;0]; % initial conc of A, Cat, B and C k=[1000;100]; % rate constants k1 and k2 [t,C] = ode15s('ode_zero_order',200,c0,[],k); % call ode-solver figure(1); plot(t,C) % plotting C vs t xlabel('time');ylabel('conc.');
1 0.9 0.8 0.7
conc.
0.6 0.5 0.4 0.3 0.2 0.1 0 0
50
100 time
150
200
Figure 3-32. Concentration profiles for the reaction k2 k1 A + Cat o B , B o C + Cat . The reaction is th approximately 0 order for about 100 s.
91
Physical/Chemical Models
The Steady-State Approximation Traditionally, reaction mechanisms of the kind above have been analysed based on the steady-state approximation. The differential equations for this mechanism cannot be integrated analytically. Numerical integration was not readily available and thus approximations were the only options available to the researcher. The concentrations of the catalyst and of the intermediate, activated complex B are always only very low and even more so their ] and [B ] . In the steady-state approach these two derivatives derivatives [Cat are set to 0. [B ]
] [Cat
k1[A ][Cat ] k2[B ]
0
(3.87)
This equation allows the computation of the concentration [B] [B ]
k1[ A ][Cat ] k2
(3.88)
and the conservation of mass for the catalyst which either exists as Cat or as B [Cat ] [Cat ]0 [B ]
(3.89)
Introduction of (3.89) into (3.88) and a few rearrangements result in [B ]
k1[ A ][Cat ]0 k2 k1[ A ]
(3.90)
For most of the time, up to 100 sec, k2<
(3.91)
This relationship is best verified in Figure 3-33 below: MatlabFile 3-24. zero_order.m …continued % zero_order ...continued figure(2); plot(log10(t),log10(C)) xlabel('log(time)');ylabel('log(conc.)') legend('A','Cat','B','C');
% plotting logC vs logt
Between about 10-2 to 102 time units, the concentration [B]|10-4=[Cat]0, confirming the steady-state approximations. With the availability of numerical ODE solvers, exercises of the kind just presented are superfluous. While the results are the same for large parts of the data, numerical integration delivers a complete analysis that covers the whole reaction from time 0 to the end.
92
Chapter 3
0 A Cat B C
-1 -2
log(conc.)
-3 -4 -5 -6 -7 -8 -4
-3
-2
-1 0 log(time)
1
2
3
Figure 3-33. The logarithms of the concentrations of Figure 3-32 on a logarithmic time axis.
Lotka-Volterra / Predator-Prey Systems This example is not chemically relevant but exciting nonetheless. It models the dynamics of a population of predators and prey in a closed system. Consider an island with a population of sheep and wolves. Sheep are eating the (never ending) supply of grass and multiply. Wolves keep the number of sheep under control and multiply themselves. If the sheep population is large, there is a lot of food for the wolves and they multiply ferociously, decimate the sheep population and eventually reduce it to a small number. Wolves slow down their breeding and die a natural death (sheep do not die naturally, they are only eaten). The wolf population goes down and this gives the sheep a chance to breed and increase their number …the cycle starts again. We can model this behaviour with a set of three 'reactions' and their differential equations. (a) In the first ‘reaction’ the sheep are breeding. Note, that there is a constant supply of grass and this reaction could go on forever. As it is written, this ‘reaction’ violates the law of conservation of mass, it is only an empirical rate law. In a second ‘reaction’ (b), wolves eat sheep and breed themselves. The third ‘reaction’ (c) completes the system, wolves have to die a natural death.
Physical/Chemical Models
93
k1 sheep o 2 sheep k
2 wolf + sheep o 2 wolves
(3.92)
k3
wolf o dead wolf The following system of differential equations has to be solved:
] k1[sheep ] k2 [wolf ][sheep ] [sheep ] k2[wolf ][sheep ] k3 [wolf ] [wolf
(3.93)
MatlabFile 3-25. ode_lotka_volterra.m function c_dot=ode_lotka_volterra(t,c,flag,k) % % % %
lotka volterra sheep --> 2 sheep wolf + sheep --> 2 wolves wolf --> dead wolf
c_dot(1,1)=k(1)*c(1)-k(2)*c(1)*c(2); c_dot(2,1)=k(2)*c(1)*c(2)-k(3)*c(2);
% sheep_dot % wolf_dot
The kinetic population profiles displayed in Figure 3-34 have been obtained by numerical integration using Matlab's Runge-Kutta solver ode45 with the rate constants k1=2, k2=5, k3=6 for the initial populations [sheep]0=2, [wolf]0=2. For simplicity, we ignore the units. In ode_Lotka_Volterra.m the function that generates the differential equations is given. It is repeatedly called by the ODE-solver ode45. MatlabFile 3-26. Lotka_Volterra.m % Lotka_Volterra % sheep --> 2 sheep % wolf + sheep --> 2 wolves % wolf --> dead wolf c0=[2;2]; % initial 'conc' of sheep and wolves k=[2;5;6]; % rate constants k1, k2 and k3 [t,C] = ode45('ode_lotka_volterra',10,c0,[],k); % call ode-solver figure(1); plot(t,C) % plotting C vs t xlabel('time');ylabel('conc.') legend('sheep','wolves');
Surprisingly, the dynamic of such a population is completely cyclic. All properties of the cycle depend on the initial populations and the ‘rate constants’. This behaviour is best seen in a plot of the wolf vs. the sheep 'concentration'. For any set of initial 'concentrations' and ‘rate constants’, this cyclic behaviour is maintained.
94
Chapter 3
4 sheep wolves
3.5 3
conc.
2.5 2 1.5 1 0.5 0 0
2
4
6
8
10
time
Figure 3-34. Lotka-Volterra’s predator and prey ‘kinetics’. MatlabFile 3-27. Lotka_Volterra.m …continued %Lotka_Volterra ...continued figure(2); plot(C(:,1),C(:,2)) xlabel('[sheep]');ylabel('[wolf]')
2.5
2
[wolf]
1.5
1
0.5
0 0
1
2 [sheep]
3
4
Figure 3-35. The concentration of wolves plotted versus the concentration of sheep in the Lotka-Volterra predator-prey kinetics.
Physical/Chemical Models
95
Note the imperfect coincidence of the line. This effect is due to small numerical errors; increasing the accuracy of the solver reduces these differences. The Belousov-Zhabotinsky (BZ) Reaction Chemical mechanisms for real oscillating reactions are very complex and presently not understood in every detail. Nevertheless, there are approximate mechanisms which correctly model several crucial aspects of real oscillating reactions. In these simplified systems, often not all physical laws are strictly obeyed, e.g. the law of conservation of mass. The Belousov-Zhabotinsky (BZ) reaction involves the oxidation of an organic species such as malonic acid (MA) by an acidified aqueous bromate solution in the presence of a metal ion catalyst such as the Ce(III)/Ce(IV) couple. At excess [MA] the stoichiometry of the net reaction is catalyst 2BrO3 3CH 2 (COOH )2 2H o 2BrCH (COOH )2 3CO2 4H 2O (3.94)
A short induction period is typically followed by an oscillatory phase, visible by the alternating colour of the aqueous solution due to the different oxidation states of the metal catalyst. Addition of a coloured redox indicator, such as the Fe(II)/(III)(phen)3 couple, results in more dramatic colour changes. Typically, several hundred oscillations with a periodicity of approximately one minute, gradually die out within a couple of hours and the system slowly drifts towards its equilibrium state. In order to understand the BZ system Field, Körös and Noyes developed the so-called FKN mechanism. From this, Field and Noyes later derived the Oregonator model, an especially convenient kinetic model to match individual experimental observations and predict experimental conditions under which oscillations might arise. k1 BrO3 Br o HBrO2 HOBr k
2 BrO3 HBrO2 o 2HBrO2 2M ox
k3 HBrO2 Br o 2HOBr k4
2HBrO2 o BrO3 k5
MA M ox o 12 Br
(3.95)
HOBr
Mox represents the metal ion catalyst in its oxidised form (Ce(IV)). It is important to note that this model is based on an experimentally determined empirical rate law and does clearly not comprise stoichiometrically correct elementary processes. The five reactions in the model provide the means to kinetically describe the four essential stages of the BZ reaction: x
formation of HBrO2
96
Chapter 3
x
autocatalytic formation of HBrO2
x
consumption of HBrO2
x
oxidation of malonic acid (MA)
For the calculation of the kinetic profiles displayed in Figure 3-36, we used the rate constants k1=1.28M-1s-1, k2=33.6M-1s-1, k3=2.4u106M-1s-1, k4=3u103M-1s-1, k5=1M-1s-1 which result in approximate concentration profiles in acidic solution. The initial concentrations are [BrO3-]0=0.063M, [Ce(IV)]0=0.002M (=[Mox]0) and [MA]0=0.275M. The code is fairly complex and thus its development can be error prone. MatlabFile 3-28. ode_BZ.m function c_dot = ode_BZ(t,c,flag,k) % % % % % % %
BZ BrO3 + Br BrO3 + HBrO2 HBrO2 + Br 2 HBrO2 MA + Mox
--> --> --> --> -->
HBrO2 + HOBr 2 HBrO2 + 2 Mox 2 HOBr BrO3 + HOBr 0.5 Br
c_dot(1,1)=-k(1)*c(1)*c(2)-k(2)*c(1)*c(3)+k(4)*c(3).^2; %BrO3_dot c_dot(2,1)=-k(1)*c(1)*c(2)-k(3)*c(3)*c(2)+0.5*k(5)*c(6)*c(5); %Br_dot c_dot(3,1) =k(1)*c(1)*c(2)+k(2)*c(1)*c(3)-k(3)*c(3)*c(2) -2*k(4)*c(3).^2; %HBrO2_dot c_dot(4,1)= k(1)*c(1)*c(2)+2*k(3)*c(3)*c(2)+k(4)*c(3).^2; %HOBr_dot c_dot(5,1)= 2*k(2)*c(1)*c(3)-k(5)*c(6)*c(5); %Mox_dot c_dot(6,1)=-k(5)*c(6)*c(5); %MA_dot MatlabFile 3-29. BZ.m % BZ % % % % %
BrO3 + Br BrO3 + HBrO2 HBrO2 + Br 2 HBrO2 MA + Mox
--> --> --> --> -->
HBrO2 + HOBr 2 HBrO2 + 2Mox 2 HOBr BrO3 + HOBr 0.5 Br
BrO3_0=0.063; Mox_0=0.002; MA_0=0.275; k=[1.28;33.6;2.4e6;3e3;1]; options=odeset('RelTol',1e-6,'AbsTol',1e-10); [t,C] = ode15s('ode_BZ',1000,[BrO3_0 0 0 0 Mox_0 MA_0],options,k); plot(t,log10(C(:,[1 4 6])));axis([0 1000 -8 0]); % BrO3,HOBr,MA hold;plot(t,log10(C(:,[2 3 5])),'linewidth',2);hold; % Br,HBrO2,Mox xlabel('time');ylabel('log(conc)') legend('BrO3','HOBr','MA','Br','HBrO2','Mox')
97
Physical/Chemical Models
0 BrO3 HOBr MA Br HBrO2 Mox
-1 -2
log(conc)
-3 -4 -5 -6 -7 -8 0
200
400
600
800
1000
time
Figure 3-36. The BZ reaction as represented by the Oregonator model. The species Br , HBrO2 and M ox display regular oscillations while the species BrO3 , HOBr and MA change their concentrations slowly and more steadily. One important note for this system: we had to increase the default accuracy of the integration (RelTol and AbsTol) and also use the stiff solver ode15s. We leave it to the reader to experience the Runge-Kutta solver ode45 or the default accuracy. Chaos, the Lorenz Attractor Our chemical experiences suggest that differential equations seem to be something stable, and by that we mean that, if there is a small change in one of the conditions, either initial concentrations or rate constants, we expect small changes in the outcomes as well. The classical example for a stable system is our solar system of planets orbiting the sun. Their trajectories are defined by their masses and initial location and velocity, all of which are the initial parameters of a relatively simple system of differential equations. As we all know, the system is very stable and we can predict the trajectories with an incredible precision, e.g. the eclipses and even the returns of comets. For a long time, humanity believed that the whole universe behaves in a similarly predictable way, of course much more complex but still essentially predictable. Descartes was the first to formally propose such a point of view.
98
Chapter 3
In the 1960's the meteorologist Edward Lorenz worked on systems of differential equations describing weather patters, and found something utterly different. The smallest modification in the initial conditions can have a dramatic effect, resulting in a completely different outcome after a certain time. Such behaviour is called chaotic. The sets of differential equations initially were rather complex but later he developed a simpler set which shows the same effect. A B
k1(B - A)
C
AB - k3C
k2 A - B - AC
(3.96)
MatlabFile 3-30. ode_Lorenz.m function c_dot=ode_Lorenz(t,c,flag,k) % A_dot = k1(B-A) % B_dot = k2A-B-AC % C_dot = AB-k3C c_dot(1,1)= k(1)*(c(2)-c(1)); c_dot(2,1)= k(2)*c(1)-c(2)-c(1)*c(3); c_dot(3,1)= c(1)*c(2)-k(3)*c(3);
% A_dot % B_tot % C_tot
Naturally, A, B and C as well as the constants ki have a completely different meaning than the ones we are used to from chemical kinetics; they are not species with a certain concentration. Chaotic behaviour is restricted to certain ranges of initial values and parameters. It is up to the reader to play with these options. The short program Lorenz.m calculates the 'concentrations' for A, B and C for the initial conditions. c0=[1;1;20]. Figure 3-37 displays the trajectories in a fashion that is not common in chemical kinetics. It is a plot of the time evolution of the values of A vs. B vs. C (see also Figure 3-35). Most readers will recognise the characteristic butterfly shape of the trajectory. The important aspect is that, in contrast to Figure 3-35, the trajectory is different each time. This time, it is not the effect of numerical errors but an essential aspect of the outcome. Even if the starting values for A, B and C are away from the ‘butterfly’, the trajectory moves quickly into it; it is attracted by it and thus the name, Lorenz attractor. MatlabFile 3-31. Lorenz.m % Lorenz c0=[1;1;20]; % initial conc of A, B, C k=[10;30;3]; % parameters k1, k2, k3 [t,C]=ode45('ode_Lorenz',30,c0,[],k); % ode-solver figure(1); plot3(C(:,1),C(:,2),C(:,3)); grid; xlabel('A');ylabel('B');zlabel('C');
99
Physical/Chemical Models
50 40
C
30 20 10 0 40 20 0 -20 B
-40
-20
10
0
-10
20
A
Figure 3-37. The trajectory for the Lorenz attractor. And now the chaotic aspect. Lets start with very similar initial conditions of c0=[1;1;20.00001], and store the result in the matrix C1. A plot of A only for the two calculations as a function of time is most revealing. MatlabFile 3-32. Lorenz.m …continued % Lorenz ...continued
c0=[1,1,20.00001];
[t1,C1]=ode45('ode_lorenz',30,c0,[],k); figure(2); plot(t,C(:,1),t1,C1(:,1));
xlabel('time');ylabel('A');
axis([10 20 -20 20])
% ode-solver
20 15 10
A
5 0 -5 -10 -15 -20 10
12
14
16
18
20
time
Figure 3-38. Two trajectories with very slightly different initial conditions. They are indistinguishable for a relatively long time and then suddenly move apart.
100
Chapter 3
For the first 14 time units the two traces are virtually indistinguishable and then, rather suddenly, they move apart. Each trajectory still stays within the original 'butterfly' but follows a completely different path.
4 Model-Based Analyses Very rarely are measurements themselves of much use or of great interest. The statement "the absorption of the solution increased from 0.6 to 0.9 in ten minutes", is of much less use than the statement, "the reaction has a half-life of 900 sec". The goal of model-based analysis methods presented in this chapter is to facilitate the above 'translation' from original data to useful chemical information. The result of a model-based analysis is a set of values for the parameters that quantitatively describe the measurement, ideally within the limits of experimental noise. The most important prerequisite is the model, the physical-chemical, or other, description of the process under investigation. An example helps clarify the statement. The measurement is a series of absorption spectra of a reaction solution; the spectra are recorded as a function of time. The model is a second order reaction A+B→C. The parameter of interest is the rate constant of the reaction. The purpose of this chapter is to develop a collection of methods that allow the determination of the 'best' set of parameters for a particular given model and one or a collection of measurements. In other words we fit the parameter(s) to the measurement(s). It cannot be over-stressed that the task of finding the 'best' model for the measurement is a much more difficult undertaking. A crucial difference between finding the optimal parameters for a given model and finding the optimal model, lies in the fact that the parameters of a model form a continuous space, while models are discrete entities. Model-based parameter fitting relies on the continuous relationship between the quality of the fit and the parameters. There are no equivalent continuous transitions from one trial model to the next and thus, all the powerful fitting algorithms are useless. A lot of chemical intuition, experience, knowledge etc. is involved in the process of establishing the correct model. It is not the goal of this chapter to offer much help on this subject. The usual procedure is to chose a selection of reasonable models and fit them all, and subsequently make a decision on the 'best' or 'correct' one by analysing the individual results of these analyses. Some data fitting algorithms provide statistical information that allows an estimation of the quality of the fit and thus about the suitability of the model. The tools we created in Chapter 3, Physical/Chemical Models, form the core of the fitting algorithms of this chapter. The model defines a mathematical function, either explicitly (e.g. first order kinetics) or implicitly (e.g. complex equilibria), which in turn is quantitatively described by one or several parameters. In many instances the function is based on such a physical model, e.g. the law of mass action. In other instances an empirical function is chosen because it is convenient (e.g. polynomials of any degree) or because it is a reasonable approximation (e.g. Gaussian functions and their linear combinations are used to represent spectral peaks).
102
Chapter 4
A crucial point, not mentioned so far, is the question about the meaning of the expression 'best' parameters. Intuitively it seems to be clear; they are the parameters for which the calculated data match the measured data as closely as possible. Almost invariably the sum of the squares of the differences between the measured data and the calculated model function is minimised and is the measure for the quality of the fit.
4.1 Background to Least-Squares Methods There are several reasons why the sum of squares, i.e. the sum of squared differences between the measured and modelled data, is used to define the quality of a fit and thus is minimised as a function of the parameters. It is instructive to consider alternatives to the sum of squares. (a) Minimal sum of differences - is not an option, as positive and negative differences cancel each other out. Huge deviations in both directions can result in zero sums. (b) This suggests minimising the sum of the absolute values of differences. (c) Another possibility is to take the sum of the differences to the power of 4, 6, …. The higher the power the more weight is applied to the relatively large differences. (d) The ultimate is to minimize the maximal difference, which is identical to taking a very high power of the differences prior to summation. It is beyond the scope of this book to discuss the statistical theories behind the very common least-squares fitting. We refer the reader to the list of reference books in Further Reading for more details. Here we give a glimpse into the reasoning. Given a set of measurements, it is obvious that parameters producing function values that are very different from the data, are less likely correct than parameters producing function values very similar to the measurement. The statistical argument goes the other way. Given a set of parameters, what is the probability that the particular measurement could have occurred? Assuming normally distributed errors in the measurements, one can start the mathematical formalism and come to the expected end: those parameters that result in the minimal sum of squares are the ones most likely to produce the actual measurement. Leastsquares fitting delivers a maximum likelihood estimation of the parameters. We need to stress again: this is the case only if the measurement errors are independent, normally distributed and of constant standard deviation. Often these rather stringent statistical requirements are not met. However, ignoring this fact and applying the method of least-squares fitting anyway, usually does not result in a disaster. All alternatives to the least-squares measure are computationally more demanding. Note that non-uniformly distributed noise is not a problem and can easily be incorporated into the fit. Weighting according to known standard deviations of the noise results in χ2 fitting, see Non-White Noise, χ2-Fitting (p.189).
103
Model-Based Analyses
4.1.1 The Residuals and the Sum of Squares The measured data are approximated by an appropriate function. For each measured data pair (xi, yi) there is a calculated value ycalci for that particular xi. The value ycalci is computed as a function of the parameters and xi. The difference between the measurement yi and its calculated value ycalci is defined as the residual ri. This is represented in Figure 4-1.
+ + yi ycalci
+
+
ri
+
+
+ xi
Figure 4-1. The residual ri is the difference between the measured yi and the calculated ycalci ri = yi − ycalci = yi − f (xi , parameters)
(4.1)
The residuals are a function of the parameters. Note that they are also a function of the model and the data, but we take these as given and ignore this for the time being. The sum of squares, ssq, is the sum of all the squares of the individual residuals and thus is also a function of the parameters: ssq = ∑ ri2 = ∑ (yi − ycalci )2 = f ( parametrs ) i
(4.2)
i
It is instructive to represent the situation for two typical examples. Linear Example: Straight Line For a straight line, the function for ycalc describing a vector of measurements is: ycalci = intercept + slope × xi
(4.3)
Thus ssq is a function of the parameters intercept and slope and therefore can be displayed in a 3-dimensional plot, see Figure 4-3.
104
Chapter 4
First some noisy data are generated, they are scattered around a straight line: MatlabFile 4-1. Data_mxb.m function [x,y]=Data_mxb x=(1:10)'; y=20+6*x; randn('seed',2); y=y+5*randn(size(y));
% initialise random number generator % adding normally distributed noise
MatlabFile 4-2. Main_mxb.m % Main_mxb [x,y]=Data_mxb; plot(x,y,'+',[1 10],[26 80]) axis([0 11 0 100]) xlabel('x');ylabel('y');
100 90 80 70
y
60 50 40 30 20 10 0 0
2
4
6
8
10
x
Figure 4-2. Noisy data scattered around the underlying straight line We can calculate ssq for a range of values for slope and intercept and plot the result in a mesh-plot, see Figure 4-3. MatlabFile 4-3. Main_mxb.m …continued % Main_mxb ...continued intercepts=-20:5:60; slopes=0:12; for i=1:length(intercepts); for j=1:length(slopes);
105
Model-Based Analyses SSQ(i,j)=sum((y-(intercepts(i)+slopes(j).*x)).^2); end end mesh(slopes,intercepts,SSQ+5e4);
colormap([0 0 0]);
hold on;
contour(slopes,intercepts,SSQ,50);
xlabel('slopes');
ylabel('intercepts');
zlabel('ssq+5e4');
hold off;
x 10
4
ssq+5e4
15
10
5
0 60 40
15 20
10 0
intercepts
5 -20
0
slopes
Figure 4-3. ssq vs. slope and intercept. The minimum of ssq is near the true values slope=6 and intercept=20 that were used to generate the data (see Data_mxb.m). ssq is continuously increasing for parameters moving away from their optimal values. Analysing that behaviour more closely, we can observe that the valley is parabolic in all directions. In other words, any vertical plane cutting through the surface results in a parabola. In particular, this is also the case for vertical planes parallel to the axes, i.e. ssq versus only one parameter is also a parabola. This is a property of so-called linear parameters. Non-Linear Example: Exponential Decay In order to further explore the properties of the landscape formed by the sum of squares as a function of parameters, we concentrate on a slightly more
106
Chapter 4
complex function. As an example, we use the exponential decay of the intensity of the radiation of a sample of a radioisotope. I = I 0 e −k t
(4.4)
The two parameters defining this function are the rate constant k and the initial intensity I0. First we create and plot a noisy measurement: MatlabFile 4-4. Data_Decay.m function [t,y]=Data_Decay t=[0:50]'; k=0.05; I_0=100; randn('seed',0); y=I_0*exp(-k*t); y=y+10*randn(size(y)); MatlabFile 4-5. Main_Decay_2d.m % Main_Decay_2d [t,y]=Data_Decay; plot(t,y,'x'); xlabel('time'); ylabel('intensity');
120 100
intensity
80 60 40 20 0 -20 0
10
20
30
40
50
time
Figure 4-4. Exponential decay of radiation intensity. And now we repeat what we have done earlier with the straight line fit, i.e. calculating and plotting the sum of squares, ssq, as a function of a range of parameters.
Model-Based Analyses
107
MatlabFile 4-6. Main_Decay_ssq.m % Main_Decay_ssq [t,y]=Data_Decay; I_0=0:10:200; k=0:.01:.2; for i=1:length(I_0); for j=1:length(k); SSQ(i,j)=sum((y-(I_0(i)*exp(-k(j).*t))).^2); end end mesh(k,I_0,SSQ+1e6); colormap([0 0 0]); hold on; contour(k,I_0,SSQ,100); xlabel('k'); ylabel('I_0'); zlabel('ssq+1e6'); hold off; view(50,20);
Figure 4-5. ssq vs. initial intensity I0 and k. Comparing Figure 4-5 with the corresponding plot from the straight line fit in Figure 4-3, an important difference is that the landscape is no longer parabolic. There is a flat region and a very steep increase at the back corner. Nevertheless, the contour lines clearly indicate that there is a minimum near the correct position.
108
Chapter 4
More careful examination of this shape reveals two important facts. (a) Plots of ssq as a function of k at fixed I0 are not parabolas, while plots of ssq vs. I0 at fixed k are parabolas. This indicates that I0 is a linear parameter and k is not. (b) Close to the minimum, the landscape becomes almost parabolic, see Figure 4-6. We will see later in Chapter 4.3, Non-Linear Regression, that the fitting of non-linear parameters involves linearisation. The almost parabolic landscape close to the minimum indicates that the linearisation is a good approximation.
Figure 4-6. Close-up of ssq vs. I0 and k near the minimum. The surface is approximately parabolical. The exact location of the minimum cannot be computed explicitly for non linear parameters. Starting from a set of initial guesses for the parameters, the location of the minimum has to be approached iteratively. Some methods are robust, i.e. they converge reliably, even if started far from the minimum. These methods, however, tend to be slow in localising the exact position of the minimum (e.g. in Chapter 4.4.2, The Simplex Algorithm). Several alternative algorithms require the computation of the first and sometimes the second derivatives, either of the sum of squares or of the residuals with respect to the parameters (e.g. The Newton-Gauss Algorithm, see Chapters 4.3.1 and 4.4.1). If the initial guesses for the parameters are poor, convergence is often not reliable; generally, these methods are not as robust as the simplex algorithm when started far from the minimum. However, if the initial guesses for the parameters are reasonable, the progress of the iterative process is very fast.
Model-Based Analyses
109
4.2 Linear Regression This chapter on linear regression is central to the whole book, in fact linear regression is central for almost all numerical computations. This might sound surprising, as linear regression is significantly simpler than non linear regression or many other algorithms. The justification for the statement lies in the fact that most non-linear problems are linearised and solved iteratively, but each iterative linearisation step is solved by a linear regression calculation. Generally, the linearisation is based on a Taylor series expansion, truncated after the second term. We have already encountered the Taylor expansion in the Chapter The Newton-Raphson Algorithm (p.48). We meet it again in Chapter 4.3.1, The Newton-GaussLevenberg/Marquardt Algorithm. We can conclude that linear regression calculations are very, very common. They are continuously performed deep inside the non-linear problem solving routines. As it turns out, linear regression is, with a few exceptions, the most complex computation undertaken in any program. For this reason we specifically discuss numerical problems that may occur in certain situations. Matlab recognises the importance of linear regression calculations and introduced a very elegant and useful notation: the / forward- and \ back slash operators, see p.117-118. Note that the term 'Linear Regression' is somewhat misleading. It is not restricted to the task of just fitting a straight line to some data. While this task is an example of linear regression, the expression covers much more. However, to start with, we return to the task of the straight line fit.
4.2.1 Straight Line Fit - Classical Derivation It makes sense to start with the well known task of finding the best straight line through a set of (x,y)-data pairs. We can refer back to Figure 4-3 which displays the sum of squares, ssq, as a function of the two parameters defining a straight line, the slope and the intercept. The task is to find the position of the minimum, the values for slope and intercept that result in the least sum of squares. Earlier, we promised an explicit solution for the determination of linear parameters. We first change the original notation introduced in equation (4.3): ycalci = intercept + slope × xi = a1 + a 2 xi
(4.5)
It is more efficient to use a1 and a2 as parameters rather than intercept and slope. More importantly, in 4.2.3, Generalised Matrix Notation we will be able to extend the vector containing the a-values to any higher dimension.
110
Chapter 4
The sum of squares, ssq, can be written as m
m
m
i =1
i =1
i =1
ssq = ∑ ri2 = ∑ (yi − ycalci )2 = ∑ (yi − (a1 + a2 xi ))2
(4.6)
where m denotes the number of elements in the data vector y. At the minimum, the derivatives of ssq with respect to a1 and to a2 are both zero. ∂ssq ∂ssq =0 = ∂a0 ∂a1
(4.7)
It is a matter of substituting (4.6) into (4.7) and a bit of straight algebra to arrive at: m
∂ssq = ∂a1
∂ ∑ (yi − (a1 + a 2xi ))2 i =1
∂a1 m
m
m
i =1
i =1
i =1
= ∑ −2(yi − a1 − a2 xi ) = −2∑ yi + 2ma1 + 2a 2 ∑ xi = 0
(4.8)
and m
∂ssq = ∂a 2
∂ ∑ (yi − (a1 + a 2xi ))2
i =1
∂a 2 m
m
m
m
i =1
i =1
i =1
i =1
= ∑ −2xi (yi − a1 − a 2 xi ) = −2∑ xi yi + 2a1 ∑ xi + 2a 2 ∑ xi2 = 0
This represents a system of 2 equations with 2 unknowns, a1 and a2. After division
by
2
and
introducing
a
short
notation
for
the
sums
m
(e.g. ∑ xi yi = Σxy ), we can write: i =1
a1m + a2Σx = Σy a1Σx + a2Σx 2 = Σxy
(4.9)
and this in turn is written as a matrix equation (see Chapter 2.2) ⎡m ⎢ ⎣ Σx with the solution
Σx ⎤ ⎡a1 ⎤ ⎡ Σy ⎤ ⎥⎢ ⎥ = ⎢ ⎥ Σx 2 ⎦ ⎣a2 ⎦ ⎣ Σxy ⎦
(4.10)
111
Model-Based Analyses
⎡a1 ⎤ ⎡ m ⎢a ⎥ = ⎢ ⎣ 2 ⎦ ⎣ Σx
Σx ⎤ ⎥ Σx 2 ⎦
−1
⎡ Σy ⎤ ⎢Σxy ⎥ ⎦ ⎣
(4.11)
The inverse of the 2-by-2 matrix can be calculated as ⎡m ⎢ ⎣ Σx
Σx ⎤ ⎥ Σx 2 ⎦
−1
=
⎡ Σx 2 1 2 ⎢ m Σx − (Σx ) ⎣ −Σx 2
−Σx ⎤ ⎥ m ⎦
(4.12)
Inserting (4.12) into the matrix product of (4.11) results in a set of two explicit equations for the two parameters:
Σx 2Σy − Σx Σxy m Σx 2 − (Σx )2 −Σx Σy + m Σxy a2 = m Σx 2 − (Σx )2 a1 =
(4.13)
Rather than writing a short program in Matlab for this result, we demonstrate how to perform the task of a straight line fit in Excel. Excel actually provides several ways of performing the job of fitting the best line through a set of data pairs. The most convenient is probably the Add Trendline … tool which delivers the result in a few clicks. Right-click on one of the points of the data series and a context menu appears as shown in Figure 4-7. ExcelSheet 4-1. Chapter3.xls-trendline
Figure 4-7. Using Trendline for a straight line fit. Select Add Trendline … to get the graphical input selection menu for the trendline as shown in Figure 4-8.
112
Chapter 4
Figure 4-8. The Add Trendline menu. Keep the default Linear Regression under the type tab and on the Options tab select Display equation on chart. This results in 14
y = 1.2667x + 1.6667
12 10 8 6 4 2 0 0
5
10
Figure 4-9. Fitted trendline with equation. One difficulty with the Trendline is that the equation only appears graphically. The values for slope and intercept have to be copied manually into the spreadsheet if they are to be used in later calculations. In Chapter 4.2.6, Excel Linest, we discuss the LINEST function of Excel which is much more versatile while still covering the best line fit. LINEST delivers the results into the spreadsheet where they can be used for further
calculations. Additionally, LINEST supplies a statistical analysis.
Model-Based Analyses
113
4.2.2 Matrix Notation A useful first step towards the fitting of more complex linear functions, is to translate the equations into a matrix oriented notation. Equation (4.5) is actually a system of m equations, where m is the number of (x,y)-data pairs. ycalc1 = a1 + a 2 x1 ycalc 2 = a1 + a 2 x 2 ycalci = a1 + a 2 xi
(4.14)
ycalcm = a1 + a 2 xm
This system of m equations can be written as one matrix equation.
⎡ ycalc1 ⎤ ⎡1 x ⎤ 1 ⎢ ⎥ ⎢ ⎥ ⎢ ycalc 2 ⎥ ⎢1 x 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎡a1 ⎤ ⎢ ⎥=⎢ ⎥ ⎢ ycalci ⎥ ⎢1 xi ⎥ ⎢⎣a2 ⎥⎦ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢y ⎥ ⎢1 xm ⎦⎥ ⎣ calcm ⎦ ⎣
(4.15)
y calc = F(x ) a
(4.16)
or
ycalc is a column vector containing the m individual elements ycalci, F(x), or shorter just F, is an m by 2 matrix; the first column is formed by ones and the second column is composed of the elements xi. The vector a contains the parameters a1 and a2.
Similarly, the vector of residuals r, as introduced in equation (4.1), can be defined in a matrix equation: y = y calc + r
r = y − y calc = y − F a
(4.17)
And the sum of squares can be written as m
ssq = ∑ ri2 = r t r
(4.18)
i =1
The task is to find that vector a for which ssq is minimal. Now we are in a better position to generalise to more complex linear functions. The prototype of linear least-squares fitting is the fitting of a
114
Chapter 4
polynomial of a higher degree. Remember, a straight line is a polynomial of degree one and hence is only a special case.
4.2.3 Generalised Matrix Notation Equations (4.15) or (4.16) represent the fit of a straight line to a set of data pairs. These equations can be written in an expanded form:
y calc = F a = f:,1 a1 + f:,2 a2
(4.19)
Recall the colon (:) notation as introduced in Chapter 2.1, Matrices, Vectors, Scalars. The first column of F, f:,1, contains m ones, while the second column, f:,2, contains the values x1,…,xm. The j-th column of F is multiplied by its corresponding j-th element of the parameter vector a and the products are summed. Equation (4.19) and its predecessors describe the special case for a polynomial of degree one. It is straightforward to generalise by adding any number of terms or columns in F and elements in a.
y calc = F a = f:,1 a1 + f:,2 a2 + ... + f:, j a j + ... + f:,np anp =
np
∑ f:, j a j
(4.20)
j =1
The prototype application is the fitting of the np linear parameters, a1,…,anp defining a higher order polynomial of degree np-1. The generalisation of equation (4.5) reads as:
ycalci = a1 + a2xi + a3 xi2 + ... + a j xij −1 + ... + anp xinp −1 =
np
∑ a j xij −1
(4.21)
j =1
In matrix notation, the equivalent of equation (4.15) can be written in the following way:
y calc
⎡1 x1 ⎢ ⎢1 x 2 ⎢ =⎢ ⎢1 xi ⎢ ⎢ ⎢ ⎣1 xm
x12
x1j −1
x 22
x 2j −1
xi2
xij −1
xm2
xmj −1
⎡a ⎤ x1np −1 ⎤ ⎢ 1 ⎥ ⎥ a2 ⎥ x 2np −1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ a3 ⎥ ⎥⎢ ⎥ xinp −1 ⎥ ⎢ a ⎥ j ⎥⎢ ⎥ ⎥⎢ ⎥ np −1 ⎥ xm ⎦ ⎢a ⎥ ⎣ np ⎦
(4.22)
where the j-th column f:,j contains the (j-1)-th power of the elements of the vector x. It is most important to realise that the columns of F can comprise any function of x, not just the different powers. Examples include sin(x), exp(
115
Model-Based Analyses
3x), 1./x, tan(ln(x+1)), … there is no end to the possibilities. The matrix F is often called design matrix . We repeat, the task of linear regression is to determine those values of the vector a for which the product vector ycalc=Fa is as close as possible to the actual measurements y. Closeness of course is defined by the sum of the squared differences between y and ycalc. There are several ways to derive the equations for the computation of the optimal vector a. One option would be to generalise the procedure we used for the straight line fit in equations (4.5) - (4.13), which would be rather cumbersome. In the following we use a different approach.
4.2.4 The Normal Equations We already stated that this chapter on linear regression is central to the whole book and to numerical methods in general. Thus, it is worthwhile putting extra effort into trying to fully understand the process, its limits, its dangers, etc. It is possible to represent the principles geometrically. However, due to the restriction of the human mind to comprehend three dimensions only, these geometrical representations necessarily are restricted as well. We can only deal with two columns in the design matrix F. y f:,2
r
ycalc=Fa f:,1
Figure 4-10. The best residual vector r is orthogonal to the plane spanned by the vectors f:,1 and f:,2 .
The two vectors f:,1 and f:,2 span a plane that is represented by the grey rectangle. Note that the two vectors are not orthogonal; they do not form a normal system of axes. Any point on this plane can be written as a linear combination of the two base vectors f:,1 and f:,2 or as Fa. Thus, any point on that plane is defined by a pair of numbers and this pair forms the vector a. The pair could be called coordinates but this might be misleading, as coordinates usually are based on an orthogonal system of axes. In this graph, the linear regression problem can be understood as finding the point on the plane that is closest to the measurement vector y. The vector of measured data y usually does not lie in the plane spanned by the columns of F. It would lie there if there were no measurement errors or no noise. Be reminded: the column vectors in F, y and r are m-dimensional vectors, thus there is a high dimensional space in which y can be outside the plane F.
116
Chapter 4
Unfortunately, it is not possible for us to 'see' the grey plane imbedded in an m-dimensional space, 3 dimensions have to suffice. Some people have a good 3 dimensional imagination. They immediately 'see' that the closest point on the plane is just vertically underneath the tip of the vector y. Using more appropriate expressions: the minimal residual vector r, which is the shortest difference between y and Fa, is orthogonal, or normal, to the plane defined by F. Now the expression 'Normal Equations' starts to make sense. The residual vector r is normal to the grey plane and thus normal to both vectors f:,1 and f:,2. As outlined earlier, in Chapter Orthogonal and Orthonormal Matrices (p.25), for orthogonal (normal) vectors the scalar product is zero. Thus, the scalar product between each column of F and vector r is zero. The system of equations corresponding to this statement is: t f:,1 r=0 t f:,2 r=0
(4.23)
This set of equations can be further simplified and written as one matrix equation:
Ft r = 0
(4.24)
where 0 is now a column vector with two 0's. All that is needed now are a few matrix algebraic manipulations to arrive at the equation for the calculation of the best a. F t r = F t (y − F a) = 0 thus t
(4.25)
t
F y = F Fa to result in a = (F t F )-1 F t y
(4.26)
This last equation is very crucial and we will spend considerable time investigating it further. Equations (4.11) and (4.26) have to be identical. Simple verification based on the rules for matrix multiplication:
117
Model-Based Analyses
⎡1 1 F tF = ⎢ ⎣ x1 x 2
1 ⎤ xm ⎥⎦
⎡1 1 Fty = ⎢ ⎣ x1 x 2
1 ⎤ xm ⎦⎥
⎡1 x1 ⎤ ⎢1 x ⎥ 2⎥ ⎢ ⎢ ⎥ ⎢ ⎥ x 1 m⎦ ⎣ y ⎡ 1⎤ ⎢ ⎥ ⎢ y2 ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ym ⎦
⎡m = ⎢ ⎣ Σx
⎡ Σy ⎤
⎢Σxy ⎥
⎣ ⎦
Σx ⎤ ⎥ Σx 2 ⎦
(4.27)
thus ⎡a ⎤ ⎡ m a = ⎢ 1⎥ = ⎢ ⎣a2 ⎦ ⎣Σx
Σx ⎤ ⎥ Σx 2 ⎦
−1
⎡ Σy ⎤ ⎢Σxy ⎥ ⎣ ⎦
The important point is that equation (4.26) is very general and does not require any calculations of derivatives of ssq with respect to the parameters, etc.
The Pseudo-Inverse Equation (4.17) +r can be written in a casual way y ≈ Fa
(4.28)
where the ≈ represents the least-squares solution. Don't forget that it is not a proper equation as y=Fa. As the solution to equation (4.28) one might be tempted to write " a = F -1 y "
(4.29)
which, of course, is mathematically incorrect as F is not square and thus cannot be inverted. The matrix (FtF)-1Ft in equation (4.26) replaces "F-1" in equation (4.29). It is known as the Pseudo-Inverse of F, for which the notation F+ is often used. Thus we can write a = F+y F + = (F t F )-1 F t
(4.30)
Matlab is, of course, aware of the fundamental importance of the pseudoinverse and created its own notation for it. In Matlab we could write a=inv(F'*F)*F'*y but it is numerically much more efficient to use the appropriate Matlab back-slash \ command as in a=F\y. It is to be read from the right to the left as 'y divided by F', implying, of course, the multiplication of the left pseudo-inverse of F with y as given in equation (4.30).
118
Chapter 4
The equation y≈Fa written more casually as y=Fa can be represented by the following general scheme
= =
y =
F
a
Figure 4-11. Schematic representation of the matrix equation y=Fa. y and a are column vectors while F is a matrix of the appropriate dimensions. It is possible to transpose the equation represented in Figure 4-11 and one could write yt=atFt. Renaming yt, at and Ft to y1, a1 and F1 and using other dimensions as in Figure 4-11, the next figure represents a generalised 'transposed' situation.
=
y1
=
a1
F1
Figure 4-12. Schematic representation of the matrix equation y1=a1F1. The least-squares solution for a1 in the above equation is a1 = y1 F1t (F1 F1t )−1 or a1 =
(4.31) y1 F1+
In Matlab we could write a=y*F'*(inv(F*F')); but again it is numerically much more efficient to use the appropriate Matlab forward slash / operator as in a=y/F, which again states 'y divided by F' but now reading from left to right and meaning the multiplication of y with the right pseudo-inverse of F, as stated in equation (4.31). Matrix multiplication is not commutative, (see Chapter 2.1.1, Elementary Matrix Operations) and thus the order of the factors is important and in a similar way the order is important with 'division'. There are the left and right pseudo-inverses and the / and \ slashes in Matlab represent them in a very elegant way. With beginners of Matlab, it is a very common mistake to use the wrong slash. In the best
119
Model-Based Analyses
case, an error occurs as the dimensions do not match. In the worst case the error goes unnoticed. It is very helpful to write down schematic matrix equations in order to verify the dimensions and correctness of the corresponding calculations. In Figure 4-13 and Figure 4-14 this was done to visualise equations (4.30) and (4.31) using the matrix/vector dimensions shown in Figure 4-11 and Figure 4-12.
=
=
a = (FtF)-1
Ft
y =
F+
y
Figure 4-13. Schematic representation of the matrix equations involving multiplication of y by the left pseudo-inverse F+=(FtF)-1Ft.
=
a1 =
=
y1
F1t
(F1F1t)-1 =
y1
F 1+
Figure 4-14. Schematic representation of the matrix equations involving multiplication of y1 by the right pseudo-inverse F1+=F1t(F1F1t)-1.
Linear Dependence, Rank of a Matrix For the computation of the pseudo-inverse, it is crucial that the vectors f:,j are not parallel, or more correctly, that they are linearly independent. Otherwise, the matrix FtF is singular and cannot be inverted. Matlab issues a warning. We can gain a certain level of understanding by adapting Figure 4-10:
120
Chapter 4 y
r
f:,2
ycalc=Fa
f:,3 f:,1
Figure 4-15. Linear dependence, three vectors f:,1, f:,2, f:,3 lie in one plane Assume there are three columns in F that all lie in one plane, as indicated in Figure 4-15. In such a case, there is no unique set of coordinates defining the point on the plane that is nearest the measurement y. The coordinates can be given by any two of the three vectors. They cannot be calculated uniquely or, in other words, the three parameters of the vector a are not defined. In this case, when trying to perform a=F\y, Matlab would usually respond with an error message such as Warning: Matrix is singular to working precision. The number of linearly independent columns (or rows) in a matrix is called the rank of that matrix. The rank can be seen as the dimension of the space that is spanned by the columns (rows). In the example of Figure 4-15, there are three vectors but they only span a 2-dimensional plane and thus the rank is only 2. The rank of a matrix is a very important property and we will study rank analysis and its interpretation in chemical terms in great detail in Chapter 5, Model-Free Analyses.
Numerical Difficulties Figure 4-10 can be adapted further to represent another important aspect. In Figure 4-16 we see that the two vectors f:,1 and f:,2 are almost parallel, not exactly parallel, as then the rank would be one only. y f:,1
f:,2
r
ycalc=Fa
Figure 4-16. Near linear dependence if the base vectors are almost parallel
Model-Based Analyses
121
Because the two base vectors are almost parallel, the plane they lie in is not well defined. Figure 4-16 attempts to represent the problem: the plane can be turned about the two vectors like the pages of a book about the spine. Consequently the projection of y and the residuals r are poorly defined as well. The figure also indicates that the problem is less serious if y is close to the vectors f:,1 and f:,2, than if it is almost orthogonal. As linear regression is a very fundamental operation, several methods have been developed in order to improve the numerical stability of the calculation. It is beyond the objective of this book to discuss these issues in any detail. We do feel, however, that the reader has to be aware of the potential problems and should be able to avoid them as much as possible. The Matlab computations invoked by the back-slash \ and forward-slash / operators do not perform the calculation as given in equations (4.26) and (4.31). Here a short extract from the Matlab HELP: If F is not square and is full, then Householder reflections are used to compute an orthogonal-triangular factorization. F*P = Q*R where P is a permutation, Q is orthogonal and R is upper triangular (see qr). The least squares solution X for the equation B=AX is computed with X = P*(R\(Q'*B)
Without attempting to fully understand this, the essence is important: a=F\y
or a1=y1/F1
are numerically much better than a=inv(F'*F)*F'*y
or a1=y1*F1'*inv(F1*F1').
4.2.5 Errors in the Fitted Parameters As there is a difference between the measurements and the values of the calculated function, we can safely assume that the fitted parameters are not perfect. They are our best estimates for the true parameters and an obvious question is, how reliable are these fitted parameters? Are they tightly or they are loosely defined? As long as the assumption of random white noise applies, there are formulas that allow the computation of the standard deviation of the fitted parameters. While these answers should always be taken with a grain of salt, they do give an indication of how well defined the parameters are.
122
Chapter 4
We are not going to derive the formulas that allow the calculation of the standard deviations of the parameters. The reader is invited to refer to more specialised texts on statistics. We just give the formulas and also give ways of calculating the required information. In equation (4.32) the standard deviation of the parameter aj is given. σa j = σr d j , j
(4.32)
where σr is the standard deviation of the residuals σr =
ssq m − np
(4.33)
Here m is the number of points in y and np the number of fitted parameters. The difference m-np is the number of degrees of freedom, df. The elements dj,j in equation (4.32) are the diagonal elements of the inverse of the so-called curvature matrix, Curv, that contains the second derivatives of the sum of squares with respect to the parameters. The definition of the element Curvj,k is
Curv j ,k =
1 ∂ 2ssq 2 ∂a j ∂ak
(4.34)
This looks horrendous, but as will be shown in a moment it is not. In fact it is 'trivial' to compute. We start with the first derivative ∂ssq ∂ = ∂a j ∂a j
m
∑ ri2 i =1
2
np ⎛ ⎞ ⎜ yi − ∑ f i , j a j ⎟ ⎜ ⎟ j =1 ⎝ ⎠ np m ⎛ ⎞ = −2∑ ⎜ yi − ∑ f i , j a j ⎟ f i , j ⎜ ⎟ i =1 ⎝ j =1 ⎠ m
∂ =∑ i =1 ∂a j
= −2 r t f:, j and then the second derivative:
(4.35)
123
Model-Based Analyses
∂ 2ssq ∂ ∂ssq
= ∂ak ∂a j ∂ak ∂a j ∂ ∂ak
= −2
m
⎛
np
⎞
j =1
⎠
∑ ⎜⎜ yi − ∑ f i , ja j ⎟⎟ f i , j i =1 ⎝
⎛ ⎞ ⎛ np ⎞ ⎜ yi f i , j − ⎜ ∑ f i , j a j ⎟ f i , j ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ j =1 ⎠ ⎝ ⎠ m ⎞ ∂ ⎛ np = 2∑ ⎜ ∑ fi , ja j ⎟ f i , j ⎜ ⎟ i =1 ∂ak ⎝ j =1 ⎠ m
∂ ∂a i =1 k
= −2∑
(4.36)
m
= 2∑ f i ,k f i , j i =1
= 2 f:,tk f:, j The complete set of first and second derivatives can be written elegantly in matrix notation: ∂ssq = −2 r t F ∂a and
(4.37)
2
∂ ssq = 2F t F ∂a∂a The elements dj,j, as required in equation (4.32), are the diagonal elements of the inverse of FtF. D = Curv −1 = (F t F)−1
(4.38)
It is possible to represent the situation graphically.
ssq
ssq
ak
aj
Figure 4-17. The parameter ak is well defined, while the parameter aj is much more loosely defined.
124
Chapter 4
If ssq increases sharply with small movement of the parameter away from the minimum, then the parameter is well defined. Otherwise the parameter is only loosely defined. Referring to Figure 4-17 the parameter ak is better defined than the parameter aj. The example below shows a short Matlab program that fits the function y=tan(x) with a polynomial of degree 3 defined by 4 linear parameters, i.e. the elements of a. The statistical analysis is problematic for this example as the residuals are obviously not normally distributed. Nonetheless, the high errors in the parameters and the large standard deviation of the residuals indicate a bad fit. MatlabFile 4-7. tan_poly.m % tan_poly % fitting y=tan(x0) with a polynomial of degree 3 x=(0.1:0.1:1.5)'; y=tan(x); nd=3; np=nd+1;
% degree of polynomial % number of parameters
F(:,1)=ones(size(x)); for j=1:nd F(:,j+1)=F(:,j).*x; end
% design matrix
a=F\y; r=y-F*a; df=length(x)-np; ssq=r'*r; sigma_r=sqrt(ssq/df); D=inv(F'*F); sigma_a=sigma_r*sqrt(diag(D));
% linear parameters % residuals % degrees of freedom % sum of squares % sigma_r % inverted curvature matrix % sigma_parameters
fprintf(1,'sigma_r: %g\n',sigma_r); % print sigma_r for i=1:np % print sigma_a fprintf(1,'a(%i): %g +- %g\n',i,a(i),sigma_a(i)); end plot(x,y,'.',x,F*a); xlabel('x'); ylabel('y'); sigma_r: 1.21374 a(1): -2.46752 +- 1.6462 a(2): 20.9388 +- 8.62303 a(3): -37.9646 +- 12.3167 a(4): 20.2432 +- 5.0712
125
Model-Based Analyses
16 14 12 10
y
8 6 4 2 0 -2 0
0.5
1
1.5
x
Figure 4-18. Fitting the function y=tan(x) with a polynomial of degree 3.
4.2.6 Excel Linest As indicated in Chapter 4.2.1, Straight Line Fit, the Excel function LINEST is a more general function than the TRENDLINE. In addition to allowing the fitting of any linear function, it also delivers a statistical analysis. In order to demonstrate its use, we fit the same polynomial to the same tan-function as we have done in the preceding section using Matlab and tan_poly.m. The columns A and B of the spreadsheet shown in Figure 4-19 are made up by the (x,y)-data-pairs; the array (F4:I18) is the equivalent of the matrix F. LINEST is an array function. The parameters have to be entered as demonstrated in Figure 4-20. The parameter Const has to be set to FALSE at this stage. We explain its meaning in a moment. Stats needs to be set to TRUE for the statistical analysis. As LINEST is an array function, the output area (F21:I25) needs to be selected (the area is shaded) and the command is executed by pressing Ctrl-Shift-Enter (all three keys together). The statistical analysis results are listed as explained by the text underneath the output. Note that, for unknown reasons, Excel reverses the order of the parameters, i.e. the parameter belonging to the first column is listed last. It is encouraging to see that the results of Matlab (see p.124) and Excel are identical. Excel returns a few additional statistical numbers, r2, F and ssreg in cells F23:F25. They are not vital for our present purpose and we refer to the Excel Help for details.
126
Chapter 4
ExcelSheet 4-2. Chapter3.xls-linest
=LINEST(B4:B18,F4:I18,FALSE,TRUE)
Figure 4-19. Excel spreadsheet demonstrating the use of the LINEST function.
Figure 4-20. The window for the LINEST parameter input.
Model-Based Analyses
127
Excel has an alternative way of doing the same. The column of 1's can be omitted, selecting only the three others and fitting with the LINEST function but having the Const option set to TRUE. Excel internally adds the columns of ones and delivers exactly the same results. That one special parameter is the y-intercept. Similar options exist in the trendline function.
4.2.7 Applications of Linear Least-Squares Fitting In the following we present several linear least-squares analyses. Linearisation of Non-Linear Problems In many instances non-linear functions can be linearised and in this way a non-linear, iterative fitting procedure can be reduced to an explicit linear fit. A typical example is the exponential decay of the intensity of the emission of a radioactive sample. We use the data already used for Figure 4-4, produced by the function Data_Decay.m. The equation describing the data is:
y = I 0 × e −kt
(4.39)
This equation can be rewritten in a logarithmic form ln(y ) = ln(I 0 ) - kt
(4.40)
Plotting ln(y) versus time t results in a straight line with slope -k and intercept ln(I0). These two parameters can be computed non-iteratively in a linear regression. The program could look like this: MatlabFile 4-8. Main_Decay.m % Main_Decay [t,y]=Data_Decay; yb=y(y>0); tb=t(y>0);
% we have to get rid of negative values
F=ones(length(tb),2); F(:,2)=tb; a=F\log(yb); I_0=exp(a(1)) k=-a(2) plot(tb,log(yb),'o',tb,F*a); xlabel('time'); ylabel('ln(y)'); I_0 = 100.4029 k =
128
Chapter 4 0.0502
5 4.5 4
ln(y)
3.5 3 2.5 2 1.5 1 0
10
20
30
40
50
time
Figure 4-21. The logarithmic transform of the exponential data set used in Figure 4-4. The fitted exponential curve appears as a straight line. Several observations have to be made: (a) negative y-values have to be deleted as their logarithms are not defined. In real emission experiments this is usually not a problem. Due to the non-uniform error structure of emission data, emission intensity readings are most likely not negative. We will further investigate this aspect in Non-White Noise, χ2-Fitting (p.189). In other applications, eg. spectrophotometric measurements, negative readings are to be expected if the absorbance reading is close enough to zero. (b) As is obvious from the figure, the distribution of the noise is not uniform and thus the later part of the measurement carries more weight than the earlier part in defining the parameters. (c) Thus, the fitted values are different from the best values resulting from a non-linear least-squares fit of the original (non linearised) data. In this example the difference is minimal and not relevant. (d) In the example, the reading of emission intensity is expected to reach zero after enough time. If the infinity reading is not zero, the formula cannot be applied directly. If the measurement is described by an equation of the form
y = I 0 × e −kt + const .
(4.41)
the logarithm of y minus the constant (which is the value at time infinity) has to be plotted versus time
Model-Based Analyses
129
ln(y - const ) = ln(I 0 ) - kt
(4.42)
This task can be problematic as the correct value for const is not necessarily accurately known. Subtracting a 'wrong' value obviously results in a flawed analysis, and this is not always easily detected. The calculations below demonstrate the problem. There is a constant offset added to the y-data, which is not subtracted according to equation (4.42). The plot in Figure 4-22 does not show any 'obvious' miss-fit, however, the calculated parameters are significantly wrong. MatlabFile 4-9. Data_Decay_Offset.m function [t,y]=Data_Decay_Offset t=(0:50)'; k=0.05; I_0=100; randn('seed',0); y=50+I_0*exp(-k*t); y=y+10*randn(size(y)); MatlabFile 4-10. Main_Decay_Offset.m % Main_Decay_Offset [t,y]=Data_Decay_Offset; yb=y(y>0); tb=t(y>0); F=ones(length(tb),2); F(:,2)=tb; a=F\log(yb); I_0=exp(a(1)) k=-a(2) plot(tb,log(yb),'o',tb,F*a); xlabel('time'); ylabel('ln(y)'); I_0 = 137.7157 k = 0.0204
It is worthwhile noting that if a smaller amount of noise is added to the ydata, the subtraction of the wrong constant offset manifests in a visible curvature of their logarithmic plot. This in turn could be misinterpreted as non-exponential behaviour.
130
Chapter 4
5.2 5 4.8
ln (y)
4.6 4.4 4.2 4 3.8 3.6 3.4 0
10
20
30
40
50
time
Figure 4-22. Logarithmic plot with incorrect offset showing no obvious deviation from linearity. Iterative processes are always time consuming and, if possible, should be avoided. Thus, it seems attractive to apply a linearisation procedure in order to avoid a direct iterative non-linear least-squares fit of the exponential. The linearisation approach might be perfectly acceptable in some instances (see the first example) but as a general recipe it is not recommended (e.g. refer to the second example). Another drawback is the non-uniform error distribution invoked by the linearisation of the exponential data. This should in fact be counteracted by appropriate weighting of the residuals as will be introduced in Non-White Noise, χ2-Fitting (p.189). Further, linearisations are only possible in a few selected cases and therefore are not of great general value. In summary: there are not many convincing reasons to do it. Polynomials, the Savitzky-Golay Digital Filter Polynomials do not play an important role in real chemical applications. Very few chemical data behave like polynomials. However, as a general data treatment tool, they are invaluable. Polynomials are used for empirical approximations of complex relationships, smoothing, differentiation and interpolation of data. Most of these applications have been introduced into chemistry by Savitzky and Golay and are known as Savitzky-Golay filters. Polynomial fitting is a linear, fast and explicit calculation, which, of course, explains the popularity.
131
Model-Based Analyses
Smoothing of Noisy Data Measured data are always corrupted by a certain level of noise. For graphical purposes, it is sometimes desirable to display, for example, a spectrum as a smooth line instead of a band of noisy data points. It is important to stress from the very beginning that data smoothing should only be a graphical aid. Data smoothing prior to any parametric or non-parametric analysis hardly ever improves the results. Data smoothing distorts the original values in ways that are difficult to control and the distortion of the results of the fitting is virtually impossible to correct. Nothing is won and a lot can be lost. The basic idea of the Savitzky-Golay digital filter is fairly straightforward. The y-value of a particular (x,y)-data pair is replaced by the value of a polynomial of a certain degree, which has been fitted to a number of neighbouring data points. The computation is repeated for all data. In Figure 4-23 the procedure is illustrated graphically: There are 100 data points in the graph. One particular point is marked by ×. It is the one for which we want to compute its smoothed equivalent. 41 neighbouring data points are marked in black points. They include 20 points to the right and 20 to the left plus the point × itself. A parabola has been fitted though these 41 points and its graph in the range of the fit is represented in the figure. The point 'o' on the parabola is the smoothed representation for the original '×'. 0.114 0.113 0.112 0.111
y
0.11 0.109 0.108 0.107 0.106 0.105 0.104 0.01
0.015
0.02
0.025 x
0.03
0.035
0.04
Figure 4-23. Savitzky-Golay filtering. A polynomial is fitted to a range of data points and the original point (×) is replaced by the value on the polynomial (o).
132
Chapter 4
Two parameters define the Savitzky-Golay filter: the number of points, n, to the right and left of the centre, which are used for the fit and the degree, nd, of the polynomial to be fitted. It is crucial to choose those two parameters carefully; they have to be appropriate for the curves to be smoothed. Many data points and a low degree polynomial result in excellent smoothing but narrow features are broadened. The extreme in this direction is to fit a polynomial of degree 0 to many data points; this is also called moving window averaging. The opposite choice, few data points and a high order polynomial, result in poor smoothing with much less distortion of narrow features. The following short program creates a series of three noisy Gaussians of decreasing width. The density of data is constant along the x-axis and thus there is a decreasing number of points defining each Gaussian. The function SavGol.m performs a Savitzky-Golay smoothing. The parameters are the x- and y-vectors, the number (n) of neighbouring left or right data points that are used for one polynomial fit (i.e. if n=5, 2n+1=11 data points are fitted) and the degree (nd) of the polynomial to be fitted. Figure 4-24 displays the data and the results of the smoothing. The top panel contains the original true curve as well as the noisy data. The second panel displays the result of the Savitzky-Golay filter using 11 data points (2×5+1) to fit a forth order polynomial. All features of the curve are reasonably well preserved, but the smoothing is much less efficient than in the lower part of the Figure where a parabola (polynomial of degree 2) was fitted to 21 (2×10+1) data points. However, in this very smooth curve, the features of the narrow Gaussians are completely lost. The higher the polynomial and the smaller the number of data points fitted, the better it follows narrow features, but the smoothing effect is diminished. The user has to find the appropriate compromise. MatlabFile 4-11. Main_SavGol.m % Main_SavGol x=(1:150)'; y=gauss(x,50,25)+gauss(x,100,10)+gauss(x,125,2); randn('state',0) yn=y+0.1*randn(size(y)); y1=SavGol(x,yn,5,4); y2=SavGol(x,yn,10,2); subplot(3,1,1);plot(x,y,x,yn,'.k'); axis([0 150 -0.1 1.1]); ylabel('y'); subplot(3,1,2);plot(x,y,x,y1,'.k'); axis([0 150 -0.1 1.1]); ylabel('y'); subplot(3,1,3);plot(x,y,x,y2,'.k'); axis([0 150 -0.1 1.1]); xlabel('x'); ylabel('y');
133
Model-Based Analyses
y
1 0.5 0 0
50
100
150
50
100
150
100
150
y
1 0.5 0 0
y
1 0.5 0 0
50 x
Figure 4-24. In all panels the true data are represented by the line marker. The top panel displays the noisy (•) data; the middle panel shows the result of a 4th degree polynomial fitted through 11 noisy data points (•); and the bottom panel, the result of a 2nd degree smoothing through 21 noisy data points (•). It is worthwhile discussing the function SavGol.m in some detail. There are some interesting aspects that can illustrate a few issues of numerically reliable programming. It is tempting to write a routine such as SavGol_bad.m, to perform the Savitzky-Golay filtering, but we will show its numerical weakness. F is built up by the appropriate range of x-values and used to calculate the polynomial coefficients as a=F\y(i-n:i+n), see e.g. equation (4.31). MatlabFile 4-12. SavGol_bad.m function y1=SavGol_bad(x,y,n,nd) % Savitzki-Golay % n: number of points to the right or left % nd: degree of polynomial y1=zeros(size(x)); for i=1+n:length(x)-n x1=x(i-n:i+n); F(:,1)=ones(size(x1)); for j=1:nd F(:,j+1)=F(:,j).*x1; end a=F\y(i-n:i+n); y1(i)=F(1+n,:)*a; end
% the first and last n points are lost
134
Chapter 4
Using y1=SavGol_bad(x,yn,5,4);
rather than y1=SavGol(x,yn,5,4);
within Main_SavGol.m, the result is a large number of messages of the type Warning: Rank deficient, rank = 4
tol =
5.2917e-006.
These messages indicate that the rank of the matrix F is not 5, as expected for the case of a fourth order polynomial, but only 4. Why is this the case? At the beginning of the loop F is 1 1 1 ⎤ ⎡1 1 ⎢ ⎥ 2 3 2 24 ⎥ ⎢1 2 2 F=⎢ ⎥ ⎢ ⎥ ⎢1 11 112 113 114 ⎥ ⎣ ⎦
(4.43)
This is perfectly ok. However, towards the end of the loop, the matrix F looks like ⎡1 140 1402 1403 1404 ⎤ ⎢ ⎥ ⎢1 141 1412 1413 1414 ⎥ F=⎢ ⎥ ⎢ ⎥ ⎢ 2 3 4⎥ ⎣1 150 150 150 150 ⎦
(4.44)
In a strictly mathematical sense this matrix is not singular but numerically it is rank deficient and has effectively a rank of only 4. Calculation of its pseudo-inverse consequently is impossible, or at least numerically unsafe. What can we do about that? It is important to realise that the fitting of a polynomial to a series of (x,y) data pairs is really independent of the actual values in the x-vector. In other words, in the above example, it does not matter whether the x-values are between 140 and 150 or between 1 and 11 or between -5 and +5. What matters is the relationship between the x-values and their y-values. As the data are equidistant, any equidistant vector with the right number of values can be used to generate F. In the improved SavGol function we chose the values from -n to +n. The second important observation is that consequently we do not have to recalculate F each time and more importantly, we do not have to recalculate its pseudo-inverse F+, which is computed outside the loop for all points. The result is a routine which is much faster and numerically much sounder: MatlabFile 4-13. SavGol.m function y1=SavGol(x,y,n,nd) % Savitzki-Golay
Model-Based Analyses
135
y1=zeros(size(x)); x1=(-n:n)'; F(:,1)=ones(size(x1)); for j=1:nd F(:,j+1)=F(:,j).*x1; end F_plus=inv(F'*F)*F'; for i=1+n:length(x)-n a=F_plus*y(i-n:i+n); y1(i)=F(1+n,:)*a; end
The routine SavGol.m is very basic. It does not include the beginning and the end of the curve, and it does not allow asymmetric selection of data points for the polynomial fitting. Both these features could easily be implemented. We leave it to the reader to improve the function accordingly. In its present form it cannot be used for non-equidistant x-values. F would have to be recalculated within the loop but the vector x1 used to generate F would still have to be centred around zero. Calculation of the Derivative of a Curve The computation of the derivative would be straightforward for perfect, noise-free data. The first derivative could be well approximated by dyi yi +1 − yi ≈ dxi xi +1 − xi
(4.45)
and to calculate higher derivatives the process is repeated. For noisy data, this approach results in a disaster. The noise component is amplified with each differentiation and soon there is nothing left but amplified noise. A possible remedy is to use the Savitzky-Golay filter. An initial idea might be tempting: first smooth the data as just demonstrated, and subsequently differentiate the treated data using equation (4.45). There is a better way: the fitted polynomial can be differentiated analytically; this explicit computation is computationally more efficient, both faster and numerically safer. The data set is the same as the one used for smoothing, with the same amount of noise added. The 'true' derivatives are computed using the Savitzky-Golay algorithm on the noise-free data. This is not quite correct but suffices here. The three panels of Figure 4-25 display the results of three different computations of the derivative. The first plot in the figure shows the derivative calculated as simple differences between noisy data. There is only noise left. The two next parts are the derivatives calculated as 2nd and 4th order polynomials through 11 points. Again, a compromise has to be sought between noise reduction and the loss of narrow features. As with smoothing, large numbers of data to define the polynomials and a low order polynomial result in smooth curves that might suffer from loss of narrow features.
136
Chapter 4
MatlabFile 4-14. Main_SavGol_Deriv.m % Main_SavGol_Deriv.m x=(1:150)'; y=gauss(x,50,25)+gauss(x,100,10)+gauss(x,125,2); randn('state',0); yn=y+0.1*randn(size(y)); yd=SavGol_deriv(x,y,2,2); for i=2:length(y); ys(i)=yn(i)-yn(i-1); end y1=SavGol_deriv(x,yn,5,2); y2=SavGol_deriv(x,yn,5,4);
% the 'true' derivative
subplot(3,1,1);plot(x,ys,'.',x,yd,'k'); axis([0 150 -0.6 0.6]); subplot(3,1,2);plot(x,y1,'.',x,yd,'k'); axis([0 150 -0.6 0.6]); subplot(3,1,3);plot(x,y2,'.',x,yd,'k'); axis([0 150 -0.6 0.6]); xlabel('x');
0.5 0 -0.5 0
50
100
150
50
100
150
100
150
0.5 0 -0.5 0 0.5 0 -0.5 0
50 x
Figure 4-25. The top panel displays the true derivatives and those computed as the quotient of differences; the middle and bottom panels show the result of a 2nd and 4th degree polynomial fitted through 11 data points.
137
Model-Based Analyses
The calculation of the derivative of a general polynomial is straightforward:
yi = a1 + a2 xi + a3 xi2 + a 4 xi3 + ... + and +1xi nd = ⎛ dy ⎞ nd -1 2 = ⎜ ⎟ = a 2 + 2a3 xi + 3a 4 xi + ... + nd and +1x i ⎝ dx ⎠i
nd +1
∑ a j xij -1 j =1
nd
∑
j =1
(4.46)
ja j +1xi j -1
Defying Matlab elegance, one could write equation (4.46) as a loop, but it is certainly faster to vectorise the equation. The vectorised Matlab code (note that the polynomial degree equals the number of parameters minus one, nd=np-1) dydx(i)=F(1+n,1:nd)*([1:nd]'.*a(2:nd+1));
is a bit less transparent. Here the 'explanation':
nd
∑ ja j +1xij −1 = ⎡⎣ xi0 j =1
x 1i
x i2
⎛ ⎡ 1 ⎤ ⎡ a 2 ⎤ ⎞ ⎜⎢ ⎥ ⎢ ⎥⎟ ⎜ ⎢ 2 ⎥ ⎢ a3 ⎥ ⎟ x ind −1 ⎤⎦ ∗ ⎜ ⎢ 3 ⎥ . ∗ ⎢ a 4 ⎥ ⎟ ⎜⎢ ⎥ ⎢ ⎥⎟ ⎜⎢ ⎥ ⎢ ⎥⎟ ⎜ ⎢nd ⎥ ⎢a ⎥⎟ ⎝ ⎣ ⎦ ⎣ nd +1 ⎦ ⎠
= [F (n + 1,1) F (n + 1, 2) F (n + 1, 3)
⎛ ⎡ 1 ⎤ ⎡ a(2) ⎤ ⎞ ⎜⎢ ⎥ ⎢ ⎥⎟ ⎜ ⎢ 2 ⎥ ⎢ a(3) ⎥ ⎟
F (n + 1, nd )] ∗ ⎜ ⎢ 3 ⎥ . ∗ ⎢ a (4) ⎥ ⎟
⎜⎢ ⎥ ⎢ ⎥⎟ ⎜⎢ ⎥ ⎢ ⎥⎟ ⎜ ⎢nd ⎥ ⎢ a (nd + 1)⎥ ⎟ ⎦ ⎣ ⎦ ⎣ ⎝ ⎠
MatlabFile 4-15. SavGol_deriv.m function dydx=SavGol_deriv(x,y,n,nd) % Savitzki-Golay % % polynomial interpolation, degree nd, through 2n+1 data points dydx=zeros(size(x)); x1=(-n:+n)'; F(:,1)=ones(size(x1)); for j=1:nd F(:,j+1)=F(:,j).*x1; end F_plus=inv(F'*F)*F'; for i=1+n:length(x)-n a=F_plus*y(i-n:i+n); dydx(i)=F(1+n,1:nd)*([1:nd]'.*a(2:nd+1)); end
(4.47)
138
Chapter 4
Polynomial Interpolation The Savitzky-Golay algorithm could readily be adapted for polynomial interpolation. The computations are virtually identical to smoothing. In smoothing, a polynomial is fitted to a range of (x,y)-data pairs arranged around the x-value that needs to be smoothed. For polynomial smoothing, the polynomial is evaluated for a set number of data points around the desired x-value and the computed y-value at that x is the interpolated value. Polynomial fitting is a very important tool and, as expected, Matlab provides a set of functions for the task. Instead of adapting the Savitzky-Golay routine used previously, we demonstrate the handling of the Matlab routines polyfit.m and polyval.m. The developed function is a very general polynomial interpolation routine that deals with almost anything imaginable. An obvious name would be polypol, we couldn't resist the temptation and rearranged two consonants to turn it into a lolipop. MatlabFile 4-16. Main_lolipop.m % Main_lolipop x=(1:.5:6)'; % x/y data pairs y=gauss(x,4,3); randn('seed',0); yn=y+0.02*randn(size(y)); x1=1:0.1:5; % x1-values for which y1 is interpolated y1=lolipop(x,yn,x1,3,6); % interpolation plot(x,yn,'+',x1,y1,'.k'); xlabel('x');ylabel('y');
1.4 1.2 1
y
0.8 0.6 0.4 0.2 0 1
2
3
4
5
x
Figure 4-26. The result of polynomial interpolation.
6
Model-Based Analyses
139
In Main_lolipop.m a Gaussian curve is calculated at 0.5 intervals. A small amount of noise is added and this demonstrates a certain level of smoothing, resulting from polynomial fitting. The function lolipop.m interpolates the value y1 at position x1, using a polynomial of degree nd through points neighbouring (x,y)-data points whose x are closest to x1. MatlabFile 4-17. lolipop.m function y1=lolipop(x,y,x1,nd,npoints) % % % %
General Polynomial Inter/Extrapolation, degree nd, using npoints x,y,x1,y1 vectors - x,y do not have to be the same length as x1,y1 nd: degree of polynomials npoints: number of total points to define each polynomial
for i=1:length(x1) N=sortrows([x y abs(x-x1(i))],3); % sort x,y by abs(x-x1(i)) x_npoints=N(1:npoints,1); y_npoints=N(1:npoints,2);
% npoints nearest nodes
a=polyfit(x_npoints-mean(x_npoints),y_npoints,nd); % polyn. par. y1(i)=polyval(a,x1(i)-mean(x_npoints)); % interpolate end
The Matlab functions polyfit.m and polyval.m need to be explained. a=polyfit(x,y,nd) fits a polynomial of degree nd to the x/y data pairs, a is the vector of coefficients defining the polynomial. Note that the elements of a are arranged in the opposite order than 'our' a as defined in (4.26). The command y1(i)=polyval(a,x1(i)) evaluates the polynomial defined by a at x1(i). Using lolipop.m the routine Main_lolipop.m fits a polynomial of degree nd=3 through 6 nodes, closest to x1(i). These 6 nodes are determined by the first 3 lines of the loop, taking advantage of the sortrow function. Note that lolipop.m also allows for extrapolation but choosing x1-values outside the range of x is not recommended.
4.2.8 Linear Regression with Multivariate Data In this chapter we expand the linear regression calculation into higher dimensions, i.e. instead of a vector y of measurements and a vector a of fitted linear parameters, we deal with matrices Y of data and A of parameters. We derive the new concept by using a chemical example based on absorption data. First, consider a consecutive reaction A→B→C, with rate constants k1 and k2, where the absorption at one particular wavelength was recorded as a function of time. Let's say our task is to determine the molar absorptivities of species A, B and C at this wavelength, knowing all individual concentrations at all reaction times. Previously we used the notation F for the matrix of the 'known' function. In many chemical applications involving spectroscopic absorption
140
Chapter 4
measurements, an equivalent matrix is made up of molar concentrations of several chemical species. In these circumstances, we call the matrix C, referring to Chapter 3.1, Beer-Lambert's Law. The above example of recording the kinetics of the reaction A→B→C at one wavelength is then best described by the matrix equation. y = Ca + r
(4.48)
The (ns×1) column vector y contains the absorption data at ns reaction times; the concentration profiles of three species A, B and C form the columns of an (ns×3) matrix C and their molar absorptivities form an (3×1) column vector a. Vector r contains the residuals between y and C×a and has the same dimensions as y. Having the measurements y and supposedly knowing C, it is, as earlier in equation (4.30), a linear least-squares calculation that computes the best a: a = C+ y
(4.49)
The next step is to imagine having measured whole absorption spectra as a function of time, e.g. by using a diode array spectrophotometer. The kinetic traces at nl different wavelengths are arranged as columns of a matrix Y and, similarly, the molar absorptivities as columns of a matrix A, thus (4.48) transforms into
Y = CA +R
(4.50)
It is most helpful to recall the 'rectangle' notation for this equation introduced in Chapter 3.1, Beer-Lambert's Law:
nl ns
Y
nc
=
C
×
nl
A
nl nc
+ ns
R
Figure 4-27. The structure of Beer-Lambert's law in matrix notation. It is important to realise that each column a:,j can independently be calculated from the appropriate column y:,j, irrespective of all the other wavelengths, using equation (4.49). The pseudo-inverse C+ is the same for all. The equivalent of (4.49) for all wavelengths can be written as A = C + Y (in Matlab A = C \ Y )
(4.51)
Equation (4.51) minimises the sum of squares, ssq, of the residuals in R, defined by the multivariate equivalent to equation (4.2):
Model-Based Analyses ns nl
ssq = ∑ ∑ ri2, j
141
(4.52)
i =1 j =1
The complete matrix A, containing the absorption spectra of the components A, B and C, is computed in one step! Presently, we are able to compute A knowing Y and C. Computing Y knowing C and A is trivial. What about calculating the concentration matrix C, knowing Y and A? We could transpose equation (4.50): Yt = A t Ct
(4.53)
Applying the equivalent of (4.51), we can write +
Ct = A t Y t
(4.54)
(in Matlab C = Y / A )
(4.55)
and transposing back C = Y A+
This is, of course, the matrix equivalent of equation (4.31). There is another, more casual, way of deriving equations (4.51) and (4.55). We start with
Y = CA
(4.56)
multiply the equation with At from the right Y A t = CA A t
(4.57)
and then with (AA t )-1 , again from the right Y A t (AA t )-1 = C AA t (AA t )-1
(4.58)
C = Y A t (AA t )-1 = Y A +
(4.59)
A + = A t (AA t )-1
(4.60)
to result in
noting that
The same sequence for the calculation of C, given Y and A:
142
Chapter 4
Y = CA t
C Y = Ct C A (C t C)-1 C t Y = (C t C)-1 C t C A t
-1
(4.61)
t
A = (C C) C Y = C+ Y
Of course, All these derivations and equations hold for any matrix product of the kind Y=CA, irrespective of what the physical meaning of the matrices is. In addition, vectors are just special matrices and the equations also hold for vectors. The derivation of equation (4.59) and its equivalent (4.61) is not mathematically proper, but more importantly, the results are correct in the least-squares sense. They are identical to the ones derived via the normal equations, e.g. equation (4.26). Figure 4-28 represents the shapes of the matrices. A Y
=
C
C+ =
A
Y
A+ C
=
Y
Figure 4-28. Schematic representations of the dimensions of the matrices in equations (4.50), (4.51) and (4.55). Referring back to Matlab, it is very important to use the correct slash operator \ or / for the left and right pseudo inverse. Applying the wrong one will invariably result in an error message or worse, in a potentially undetected error.
143
Model-Based Analyses
Applications First we construct a kinetic measurement which we then analyse in both of the above ways. The reaction is the set of two consecutive first order reactions with rate constants k1 and k2: k2 k1 A ⎯⎯⎯ → B ⎯⎯⎯ →C
(4.62)
Integration of the appropriate differential equations for the reaction scheme is straightforward, the resulting equations for the concentrations of A, B, and C as a function of time (see Chapter 3.4.2, Rate Laws with Explicit Solutions) are: [ A ] = [ A ]0 × e −k1t [B ] = [ A ]0
(
k1 e −k1t − e −k2t k2 − k1
)
(4.63)
[C ] = [ A ]0 − [ A ] − [B ]
The absorption spectra are modelled by Gaussians (see 3.2 Chromatography / Gaussian Curves). MatlabFile 4-18. Data_ABC.m function [t,lam,Y,C,A]=Data_ABC % absorbance data generation for A -> B -> C t =[0:25:4000]'; lam =400:10:600; k =[.003; .0015]; A_0=1e-3;
% % % %
reaction times wavelengths rate constants initial concentration of A
C(:,1)=A_0*exp(-k(1)*t); % concentrations of A C(:,2)=A_0*k(1)/(k(2)-k(1))*(exp(-k(1)*t)-exp(-k(2)*t)); % conc. of B C(:,3)=A_0-C(:,1)-C(:,2); % concentrations of C A(1,:)=1e3*gauss(lam,450,50); % molar spectrum of A A(2,:)=4e2*gauss(lam,500,50); % molar spectrum of B A(3,:)=5e2*gauss(lam,550,50); % molar spectrum of C Y=C*A; randn('seed',0); Y=Y+0.01*randn(size(Y));
% applying Beer's law to generate Y % fixed start for random number generator % standard deviation 0.01
The short routine Main_ABC_3D.m reads in the absorbance data modelled by Data_ABC.m and plots them in Figure 4-29 against wavelength and reaction time. MatlabFile 4-19. Main_ABC_3D % Main_ABC_3D [t,lam,Y]=Data_ABC; mesh(lam,t,Y); xlabel('wavelength')
144
Chapter 4
ylabel('time') zlabel('absorbance')
Figure 4-29. Mesh-plot of the absorption matrix for a consecutive reaction A→B→C, measured at several wavelengths. We have now a set of data we can analyse in the two ways.
Computation of Component Spectra, Known Concentrations Assuming we know the two rate constants k1 and k2 that allow the computation of C. Assuming further, we only have measured spectra between time = 200 and 1200 (fast reaction with significant dead time of the instrument). The task is to determine the three absorption spectra of the pure compounds A, B and C. All three are not accessible directly in the range of available spectra because of severe overlap. MatlabFile 4-20. Main_ABC_Lin1.m % Main_ABC_Lin1 [t,lam,Y,C,A]=Data_ABC; C_p=C(9:49,:); Y_p=Y(9:49,:); A_calc=C_p\Y_p; % component spectra via multivariate linear regression plot(lam,A,'k:',lam,A_calc,'k-'); xlabel('wavelength');ylabel('mol. absorptivity');
145
Model-Based Analyses
1200 1000
mol. absorptivity
800 600 400 200 0 -200 400
450
500 wavelength
550
600
Figure 4-30. Calculated and true absorption spectra of species A, B and C (from left to right).
Computation of Component Concentrations, Known Spectra The molar absorptivity spectra of species A, B and C have been determined Main_ABC_Lin2.m calculates the corresponding independently. concentration profiles of the 3 components using the complete data set from Data_ABC.m. They are shown in Figure 4-31. MatlabFile 4-21. Main_ABC_Lin2.m % Main_ABC_Lin2 [t,lam,Y,C,A]=Data_ABC; C_calc=Y/A; % conc. profiles via multivariate linear regression plot(t,C,'k:',t,C_calc,'k-'); xlabel('time'); ylabel('concentration');
146
Chapter 4
12
x 10
-4
10
concentration
8 6 4 2 0 -2 0
1000
2000 time
3000
4000
Figure 4-31. Calculated and true concentration profiles of species A, B and C. It is probably more realistic to assume that we know neither the rate constants nor the absorption spectra for the above example. All we have is the measurement Y and the task is to determine the best set of parameters which include the rate constants k1 and k2 and the molar absorptivities, the whole matrix A. This looks like a formidable task as there are many parameters to be fitted, the two rate constants as well as all elements of A. In Multivariate Data, Separation of the Linear and Non-Linear Parameters (p.162), we start tackling this problem.
The Pseudo-Inverse in Excel We have encountered Excel's LINEST as a tool for linear regression. Unfortunately, LINEST cannot be generalised from vectors to matrices. To deal with matrices, we do not have an option but to use equations (4.59) and (4.61). It is possible to do so, but not as convenient as in Matlab. In order to keep the spreadsheet reasonably small, the dimensions are much smaller than those in the Matlab examples. It is still a consecutive reaction scheme; the spectra were recorded at 11 times and at 6 wavelengths.
147
Model-Based Analyses ExcelSheet 4-3. Chapter3.xls-pseudoinverse
=TRANSPOSE(C5:E15)
=MMULT(G5:Q7,C5:E15)
=MINVERSE(G10:I12)
=MMULT(G5:Q7,A18:F28)
=MMULT(K10:M12,H15:M17)
=MMULT(MINVERSE(MMULT(TRANSPOSE(C5:E15) ,C5:E15)),MMULT(TRANSPOSE(C5:E15),A18:F28))
Figure 4-32. Excel spreadsheet applying the equation A=(Ct C)-1 Ct Y in two ways, stepwise and in one big formula. There are two paths to reach the result. The first path is a stepwise construction of a series of intermediate matrices, they are framed in Figure 4-32. (a) Ct (b) Ct C (c) (Ct C)-1 (d) Ct Y and finally (e) (Ct C)-1 Ct Y This approach uses a lot of space on the spreadsheet, in particular the transpose of a long column is a very wide row. However, it is reasonably easy to detect potential errors in the formulas. The other path does the whole calculation in one step, the result is the shaded matrix Aone_eq. This is a very neat approach on the spreadsheet, but a very difficult equation to be entered in the matrix mode: =MMULT(MINVERSE(MMULT(TRANSPOSE(C5:E15),C5:E15)),MMULT(TRANS POSE(C5:E15),A18:F28)) Remember to preselect the rectangle of correct dimensions and use CtrlShift-Enter for matrix equations in Excel!
148
Chapter 4
4.3 Non-Linear Regression Non-linear regression calculations are extensively used in most sciences. The goals are very similar to the ones discussed in the previous chapter on Linear Regression. Now, however, the function describing the measured data is non-linear and as a consequence, instead of an explicit equation for the computation of the best parameters, we have to develop iterative procedures. Starting from initial guesses for the parameters, these are iteratively improved or 'fitted', i.e. those parameters are determined that result in the optimal 'fit', or, in other words, that result in the minimal sum of squares of the residuals. There are a multitude of methods for this task. Those that are conceptually simple usually are computationally intensive and slow, while the fast algorithms have a more complex mathematical background. We start this chapter with the Newton-Gauss-Levenberg/Marquardt algorithm, not because it is the simplest but because it is the most powerful and fastest method. We can't think of many instances where it is advantageous to use an alternative algorithm. Because of its relative complexity and tremendous usefulness, we develop the Newton-Gauss-Levenberg/Marquardt algorithm in several small steps and thus examine it in more detail than many of the other algorithms introduced in this book.
4.3.1 The Newton-Gauss-Levenberg/Marquardt Algorithm Later, in Chapter 4.4, General Optimisation, we discuss non-linear leastsquares methods where the sum of squares is minimised directly. What is meant with that statement is, that ssq is calculated for different sets of parameters p and the changes of ssq as a function of the changes in p are used to direct the parameter vector towards the minimum. In this section we demonstrate that it is possible to use the complete vector or matrix of residuals to drive the iterative refinement towards the minimum. As expected in an iterative algorithm, we start from an initial guess for the parameters. This parameter vector is subsequently improved by the addition of an appropriate parameter shift vector δp, resulting in a better, but probably still not perfect, fit. From this new parameter vector the process is repeated until the optimum is reached. As with almost any other non-linear problem that has to be solved iteratively, linearisation via a Taylor expansion with truncation after very few elements, is the solution. In Chapter 4.1 Background to Least-Squares Methods, e.g. in Figure 4-3 and Figure 4-5, we have seen that for univariate data, the vector r of residuals and thus the sum of squares ssq, is a function of the measurement y and the parameters p of the model of choice.
Model-Based Analyses
r = f (y, p)
149
(4.64)
The basic principle of the algorithm is to add a shift vector δp to the parameter vector p. The shift vector is computed with the aim of producing a new parameter vector for which ssq is minimal, or at least smaller. The residuals r(p+δp) after the application of the shift vector, are approximated by a Taylor series expansion. With sufficient terms, any precision for the approximation can be achieved. r(p + δp) = r(p) +
1 ∂r(p) 1 ∂r 2 (p) 2 δp + δp + ... 1! ∂p 2! ∂p2
(4.65)
As done previously, in The Newton-Raphson Algorithm (p.48), we neglect all but the first two terms in the expansion. This leaves us with an approximation that is not very accurate but, since it is a linear equation, is easy to deal with. Algorithms that include additional higher terms in the Taylor expansion, often result in fewer iterations but require longer computation times due to the calculation of higher order derivatives. r(p + δp) ≅ r(p) +
∂r(p) δp = r(p) + Jδp ∂p
(4.66)
The derivative ∂r(p)/∂p is known as the Jacobian J. The task is to compute the ‘best’ parameter shift vector δp that minimises the new residuals r(p+δp) in the least-squares sense. This is a linear regression equation with the explicit solution. δp = − J + r(p)
(4.67)
Note that equation (4.66) ( r(p + δp) ≅ r(p) + Jδp ) has the same structure as equation (4.17) ( r = y − Fa ) where the calculation of a was a = F + y . The Taylor series expansion is always only an approximation and therefore the shift vector δp will not result in the minimum directly. However, the new parameter vector p+δp will usually be better than the preceding p. Thus, an iterative process should move towards the optimal parameters.
A First, Minimal Algorithm We are now in a position to devise a first, very crude program that should, starting from a set of initial guesses, move towards the best fit. Below, a flow diagram is given that represents the basic principle of the Newton-Gauss algorithm:
150
Chapter 4
guess parameters, p=pstart calculate residuals, r(p) r(p)
and the sum of squa squares, res, ssq ssq
calculate Jacobian J calculate shift vector δp, and p = p + δp
Figure 4-33. First version of the Newton-Gauss algorithm The crucial part of this algorithm is the computation of J, the derivatives of the residuals with respect to the parameters. It might be best to demonstrate this by an example. An exponential curve, including some noise, is generated by the function Data_exp.m. The curve is defined by three parameters, the rate, p1, the amplitude p2 and the value at infinity time p3. y = p3 + p2 e
− p1 t
(4.68)
MatlabFile 4-22. Data_exp.m function [t,y]=Data_exp p=[2e-2;-4;10]; t=(1:2:100)'; y=p(3)+p(2)*exp(-p(1)*t); randn('seed',0); y=y+5e-2*randn(size(y)); MatlabFile 4-23. Main_exp_2d.m % Main_exp_2d [t,y]=Data_exp; plot(t,y,'.'); xlabel('time'); ylabel('y');
The main routine Main_exp_2d.m reads in the data and plots them in Figure 4-34.
151
Model-Based Analyses
10 9.5 9
y
8.5 8 7.5 7 6.5 6 0
20
40
60
80
100
time
Figure 4-34. An exponential function The derivatives of the vector r of residuals with respect to the parameter vector p are given by the following equations. Note that the first column of the Jacobian matrix contains the derivative of the residuals with respect to the first parameter, the second with respect to the second, etc. In this example the derivatives can be computed explicitly., later we will introduce the computation of numerical derivatives. r = y − ycalc = y − ( p3 + p2 e − p1 t )
j:,1 =
∂r = p2 t e − p1 t ∂p1
j:,2 =
∂r = −e − p1 t ∂p2
j:,3 =
∂r = −1 ∂p3
(4.69)
With that, we are in a position to write a short program, Main_NG1.m, that iterates towards the optimal set of parameters and fits the curve in Figure 4-34, provided the initial guesses are reasonable. MatlabFile 4-24. Main_NG1.m % Main_NG1 [t,y]=Data_exp; p=[.01; -3 ; 15 ]; % initial guesses [rate const, amp, inf]
152
Chapter 4
for i=1:10 y_calc=p(3)+p(2)*exp(-p(1)*t); r=y-y_calc; ssq(i)=sum(r.*r); J(:,1)=p(2)*t.*exp(-p(1)*t); J(:,2)=-exp(-p(1)*t); J(:,3)=-1;
end
delta_p=-J\r; p=p+delta_p;
% calculate parameter shifts % add parameter shifts
p subplot(1,2,1); plot(t,y,'.',t,y_calc);xlabel('time');ylabel('y'); subplot(1,2,2); plot(log(ssq),'+');xlabel('iteration');ylabel('log(ssq)'); p = 0.0195 -3.9882 10.0252
10
8
9.5
6
9 4 log(ssq)
y
8.5 8 7.5
2
0
7 -2
6.5 6 0
50 time
100
-4 0
5 iteration
10
Figure 4-35. Fitted exponential and iterative decrease of ssq The left panel of Figure 4-35 shows the result of the fit. The right panel displays the sum of squares for each iteration, featuring a continuous decrease. The program at this stage is very crude and needs several stages of improvements. For the next version we implement two new measures: a
Model-Based Analyses
153
proper termination criterion and numerical calculation of the derivatives for the calculation of the Jacobian.
Termination Criterion, Numerical Derivatives We start with the termination criterion. The right panel of Figure 4-35 immediately tells us that the iterations 6 to 10 are wasted. The minimal ssq has been reached at the fifth iteration and there is no further improvement. There are different ways of testing whether there is continuing improvement of the fit or whether the progress is finished and, hopefully, the best minimal ssq has been reached. In the progress of the iterations, the shifts δp, as well as the sum of squares, usually decrease continuously. Thus both could be inspected for constancy. The most common and intuitively correct test is the constancy of the sum of squares, as indicated in Figure 4-35. If a generally applicable routine is envisaged, it is not possible to test the absolute difference between old and new square sum. Depending on the data, ssq can be very small or very large. Therefore, a convergence criterion analysing the relative change in ssq has to be applied. The iterations are stopped once the absolute change is less than a preset value μ, typically μ=10-4. abs(
ssqold − ssq )≤ μ ssqold
(4.70)
guess parameters, p=pstart
calculate residuals, r(p) and the sum of squares, ssq
ssq const ?
yes
end; display results
no calculate Jacobian J calculate shift vector δp, and p = p + δp
Figure 4-36. Improved Newton-Gauss algorithm, including a termination criterion.
154
Chapter 4
And now we introduce numerical derivatives. In the example above, we used explicit formulas for the derivatives of the residuals with respect to the parameters. Often it is not easy, or even impossible, to work out the correct equations. Numerical computation of the derivatives is always possible. Usually it is slower and also numerically less accurate. The general formula is: r(p + Δpi ) − r(p) ∂r ≅ ∂pi Δpi
(4.71)
This is a rather casual notation and we need to clarify what is meant. p+Δpi is a new parameter vector with only the i-th parameter pi shifted by the small amount Δpi. In Main_NG2.m, Δpi is calculated as 1×10-4 pi. The factor 1×10-4 is somewhat arbitrary and experimentation is usually the best way of ∂y ∂r ∂(y − y calc ) determining the optimal value. Note that = = − calc since ∂p ∂p ∂p ∂y = 0 due to the invariance of the measured data y; thus we only need to ∂p compute ycalc not r, which is slightly quicker. In equation (4.72) this is worked out for the example. j:,1 = −
( p + p2 e −1.0001 p1 t ) − ( p3 + p2 e − p1 t ) ∂y calc =− 3 ∂p1 10−4 p1
j:,2 = −
( p + 1.0001 p2 e − p1 t ) − ( p3 + p2 e − p1 t ) ∂y calc =− 3 ∂p2 10−4 p2
j:,3 = −
(1.0001 p3 + p2 e − p1 t ) − ( p3 + p2 e − p1 t ) ∂y calc =− ∂p3 10−4 p3
(4.72)
The Matlab program Main_NG2.m has implemented the additions for a termination criterion and numerical derivatives. Refer to the Matlab Help Desk for information on the while end loop and also the break command. MatlabFile 4-25. Main_NG2.m % Main_NG2 [t,y]=Data_exp; p=[.01; -5 ; 15 ]; ssq_old=1e50;
% initial guesses [rate const, inf, amp]
while 1 y_calc=p(3)+p(2)*exp(-p(1)*t); r=y-y_calc; ssq=sum(r.*r) if end
(ssq_old-ssq)/ssq_old<1e-4 break
Model-Based Analyses
155
ssq_old=ssq; J(:,1)=-((p(3)+p(2)*exp(-1.0001*p(1)*t))-y_calc)/(1e-4*p(1)); J(:,2)=-((p(3)+1.0001*p(2)*exp(-p(1)*t))-y_calc)/(1e-4*p(2)); J(:,3)=-((1.0001*p(3)+p(2)*exp(-p(1)*t))-y_calc)/(1e-4*p(3)); delta_p=-J\r; % calculate parameter shifts p=p+delta_p; % add parameter shifts end ssq = 636.6850 ssq = 39.6328 ssq = 0.1147 ssq = 0.1071 ssq = 0.1071
The reader is invited to investigate the robustness of this algorithm and check how bad the initial guesses can be before the whole process falls over. Remember, the computations are based on approximations and thus there is no guarantee that the process will converge. The further the starting guesses from the minimum, the more likely the catastrophe will occur. Suddenly the sum of squares increases dramatically and soon there are error messages and the process comes to a grinding halt. In such a case, one could start the iterations with a new set of initial guesses, but this is cumbersome and requires too much time. In order to make the program practically useful, we need another level of improvement.
The Levenberg/Marquardt Extension This new version of the Newton-Gauss algorithm incorporates two further improvements. The first is the implementation of the Marquardt method of dealing with divergence. The second is the reformulation of the structure of the algorithm, using a function to compute the residuals. This makes the Newton-Gauss routine much more adaptable for new models, as only a different function for the calculation of the residuals has to be written. The actual Newton-Gauss-now-Levenberg/Marquardt algorithm remains the same. The convergence of the Newton-Gauss algorithm close to the minimum is usually excellent (quadratic). Refer to Figure 4-5 and Figure 4-6 which show the highly irregular ssq-surface far from the minimum and the parabolic behaviour close to the minimum. If starting guesses are poorly chosen, the shift vector δp, as calculated by equation (4.67), can point in a wrong direction or it can be too long or both. The result is an increased ssq, divergence, and usually quick and dramatic crash of the program.
156
Chapter 4
Marquardt, based on ideas by Levenberg, suggested a very elegant and efficient method to manage the problems associated with divergence. The pseudo-inverse for the calculation of the shift vector in equation (4.67) has been computed traditionally as J+ = (JtJ)-1Jt. Adding a certain number, the Marquardt parameter mp, to the diagonal elements of the square matrix JtJ prior to its inversion, has two consequences: (a) it shortens the shift vector δp and (b) it turns its direction towards steepest descent. The larger the Marquardt parameter, the larger is the effect. In matrix formulation, we can write: δp = −(Jt J + mp I)-1 J t r(p)
(4.73)
where I is the identity matrix of the appropriate size. If we want to use the preferable Matlab 'back-slash' notation, δp = -J \ r(p), we achieve the same by appending a diagonal matrix, containing the Marquardt parameter, to the lower end of J and adjusting the length of the residual vector, by appending the same number of np zeros to the end of the vector r(p). This is visualised in Figure 4-37. It might be a useful exercise for the reader to verify the equivalence of this approach with the original equation (4.73).
J
ns
mp
r
0
np
0 0
mp np
Jmp
rmp
Figure 4-37. Appending the Marquardt parameter mp to the Jacobian J and residual vector r
Depending on the change of the sum of squares the Marquardt parameter is adjusted; the parameter is reduced upon convergence and increased otherwise. There are no general rules on how exactly this should be done in detail; it depends on the specific case. Usually convergence occurs with no Marquardt parameter at all; in the Matlab program Main_Chrom.m, it is thus initialised as zero.
157
Model-Based Analyses
If required, the initial value for the Marquardt parameter, mp, in case of divergence, has to be chosen sensibly as well; the original suggestion was to use the value of the largest diagonal element of JtJ. This is, however, not crucial and in the Matlab function nglm.m further below, we set this initial value, if required, to one. The complexity of the flow diagram shown below might be surprising. A few remarks are appropriate: it is possible that the Marquardt parameter reaches a high value and this results in a very small shift vector. Consequently, the change in ssq gets very small too and the algorithm decides prematurely that the minimum has been reached. To prevent this sequence, one last iteration is done without a Marquardt parameter (mp=0) if the termination criterion is satisfied but mp is not yet zero. guess parameters, p=pstart initial value for mp
calculate residuals, r(p) r(p)
and the sum of squares, ssq ssq
<
ssqold <≈> ssq
≈
mp=0
yes
end; display results
no > mp×5
mp/3
mp=0
calculate Jacobian J
calculate shift vector δp, and p = p + δp
Figure 4-38. The Newton-Gauss algorithm after implementation of the Marquardt strategy
In most instances the algorithm converges straightaway. In order to test the Marquardt extension, we need more difficult data to analyse. The function Data_chrom.m generates an overlapping set of two Gaussian peaks. Each
158
Chapter 4
peak is characterised by peak height, peak position and peak width. Altogether there are six parameters to be fitted. Data_Chrom.m generates the data, they are represented in Figure 4-39. Recall the function gauss.m that generates Gaussians, as introduced in Chapter 3.2, Chromatography / Gaussian Curves. MatlabFile 4-26. Data_Chrom.m function [t,y]=Data_chrom t=(1:2:100)'; y=2*gauss(t,40,25)+3*gauss(t,70,30); randn('seed',0); y=y+5e-2*randn(size(y)); % noise of standard deviation 5e-2
The function nglm.m accepts the parameters p, the independent variables x, the measurements y and additionally the string fname, which contains the name of the function file that determines the residuals based on the model. If a different model is to be fitted, all that needs to be done is to replace fname in the calling routine, and, of course, supplying the function with that new name. Importantly, nglm.m is not affected at all. nglm.m returns the fitted parameters p and the minimal sum of squares. The function is a Matlab translation of the flow chart in Figure 4-38. The Jacobian is computed numerically, which makes the routine very generally applicable. The few parameters defined at the very beginning are chosen somewhat arbitrarily. Depending on specific problems, different values might have to be used. This is particularly the case for the Marquardt parameter. In stubborn cases, the user should try a different initial value.
The main program Main_chrom.m loads the measurement, generates initial guesses for the parameters, calls nglm.m and reports the fitted parameters. fprintf is a C based Matlab command that allows compact output. The reader is invited to read up on this command in the Matlab manuals. MatlabFile 4-27. Main_Chrom.m % Main_chrom [t,y]=Data_chrom; p0=[2;40;20;3;50;10]; % guesses [height,center,width;...] [p,ssq]=nglm('Rcalc_chrom',p0,t,y); % call ngl/m fprintf(1,'\n'); for i=1:length(p) fprintf(1,'p(%i) = %g\n',i,p(i)); end it=0, it=1, it=2, it=3, it=4, it=5, it=6, it=7, it=8,
ssq=107.204, mp=0, conv_crit=1 ssq=601.46, mp=0, conv_crit=-4.61043 ssq=96.9914, mp=1, conv_crit=0.0952631 ssq=85.0839, mp=0.333333, conv_crit=0.122769 ssq=47.7829, mp=0.111111, conv_crit=0.438403 ssq=23.7545, mp=0.037037, conv_crit=0.502867 ssq=4.22909, mp=0.0123457, conv_crit=0.821967 ssq=10.7692, mp=0.00411523, conv_crit=-1.54646 ssq=5.00616, mp=0.0205761, conv_crit=-0.183744
Model-Based Analyses it=9, ssq=0.79927, mp=0.102881, conv_crit=0.811006 it=10, ssq=0.174235, mp=0.0342936, conv_crit=0.782008 it=11, ssq=0.104042, mp=0.0114312, conv_crit=0.402863 it=12, ssq=0.103976, mp=0.00381039, conv_crit=0.00063486 it=13, ssq=0.103976, mp=0.00127013, conv_crit=3.87491e-007 it=14, ssq=0.103976, mp=0, conv_crit=3.87501e-007 p(1) = 2.00292 p(2) = 40.4661 p(3) = 25.9485 p(4) = 2.97704 p(5) = 70.3259 p(6) = 29.57 MatlabFile 4-28. nglm.m function [p,ssq,Curv]=nglm(fname,p,t,y) ssq_old=1e50; mp=0; mu=1e-4; delta=1e-6; it=0;
% Marquardt parameter % convergence limit % step size for numerical diff
while it<50 r0=feval(fname,p,t,y); % call calc of residuals ssq=sum(r0.*r0); conv_crit=(ssq_old-ssq)/ssq_old; fprintf(1,'it=%i, ssq=%g, mp=%g, conv_crit=%g\n', ... it,ssq,mp,conv_crit); if abs(conv_crit) <= mu % ssq_old=ssq, minimum reached ! if mp==0 % break % if Marquardt par zero, stop else % otherwise mp=0; % set to 0 , another iteration r0_old=r0; end elseif conv_crit > mu % convergence ! mp=mp/3; ssq_old=ssq; r0_old=r0; for i=1:length(p) p(i)=(1+delta)*p(i); r=feval(fname,p,t,y); J(:,i)=(r-r0)/(delta*p(i)); p(i)=p(i)/(1+delta); end elseif conv_crit < -mu % divergence ! if mp==0 mp=1; % use Marquardt parameter else mp=mp*5; end p=p-delta_p; % and take shifts back end J_mp=[J;mp*eye(length(p))]; % augment Jacobian matrix r0_mp=[r0_old;zeros(size(p))]; % augment residual vector delta_p=-J_mp\r0_mp; % calculate parameter shifts
159
160
Chapter 4
p=p+delta_p; it=it+1; end
% add parameter shifts
Curv=J'*J;
% Curvature matrix
The right panel of Figure 4-39 shows measurements and fitted curve, the left displays graphically the development of ssq and mp. The * markers represent ssq for each iteration and the • markers the Marquardt parameter, both on a logarithmic scale. Increase in ssq results in an increase of mp. The final mp=0 cannot be displayed on the logarithmic plot.
Figure 4-39. Development of ssq (∗) and mp (•) in the left panel and the final fit in the right panel.
The name of the function that computes the residuals is passed to the Newton-Gauss-Levenberg/Marquardt algorithm, ensuring that the latter is general and can be used for any data fitting task. Below is the listing for Rcalc_Chrom.m which returns the residuals for the Double-Gaussian fit. We are going to develop several other examples of similar functions for a very wide range of possible models. MatlabFile 4-29. Rcalc_Chrom.m function r=Rcalc_chrom(p,t,y) y_calc=p(1)*gauss(t,p(2),p(3))+p(4)*gauss(t,p(5),p(6)); r=y-y_calc;
Model-Based Analyses
161
Standard Errors of the Parameters We have already given the equations for the computation of the standard errors in the parameters optimised by linear regression, equation (4.32). The equations are very similar for parameters that are passed through the Newton-Gauss algorithm. In fact, at the end of the iterative fitting, the relevant information has already been calculated. As before, the standard error σpi in the fitted parameters pi can be estimated from the expression σ pi = σr di ,i
(4.74)
where di,i is the i-th diagonal element of the inverted curvature matrix (JtJ)-1. Make sure there is no Marquardt parameter added to JtJ before inversion. σr represents the standard deviation of the residuals r; this is an estimate for the standard deviation of the measurement error in y. σr =
ssq df
(4.75)
The denominator denotes the number of degrees of freedom, df; it is defined as the number of experimental values m (elements of y), minus the total number of optimised parameters np. df = m − np
(4.76)
The implementation into nglm.m is straightforward. The curvature matrix needs to be passed back to the main program, function [p,ssq,Curv]=nglm(fname,p,t,y)
The routine below is essentially the same as Main_chrom.m, which we have used earlier (p.158). The additions are, that nglm.m returns the Curvature matrix Curv and the few lines at the end for the actual computation and output of the standard deviations. MatlabFile 4-30. Main_chrom2.m % Main_chrom2 [t,y]=Data_chrom; p0=[2;40;20;3;50;10];
% guesses [height,center,width;...]
[p,ssq,Curv]=nglm('Rcalc_chrom',p0,t,y);
% call ngl/m
sig_r=sqrt(ssq/(length(y)-length(p))); % sigma_r sig_p=sig_r*sqrt(diag(inv(Curv))); % sigma_par fprintf(1,'\n'); for i=1:length(p) fprintf(1,'p(%i) = %g +-%g\n',i,p(i),sig_p(i)); end fprintf(1,'sig_r: %g\n',sig_r);
162 p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = sig_r:
Chapter 4 2.00292 +-0.0295207 40.4661 +-0.337379 25.9485 +-0.576606 2.97704 +-0.0220969 70.3259 +-0.253657 29.57 +-0.472861 0.04861166
Statistical analysis is an important issue. For in-depth analysis, we refer to more specialised literature. Only a few short comments are given here. The calculated errors are correct for ideal conditions of normally and uniformly distributed noise. While this is probably hardly ever the case, the errors introduced by non-ideal data are often relatively small. Much more serious are 'non-statistical problems', such as fitting the wrong model, having solutions with slightly wrong concentrations, etc. the list is long. But even in these cases, the information about the standard deviations of the parameters is not useless, in particular, relative magnitudes clearly indicate which parameters are better defined than others. The availability of statistical information about the parameters is an important feature of the Newton-Gauss type algorithms. Alternative algorithms such as the simplex algorithms (see p.204), do not deliver standard errors for the parameters. A particularly ‘dangerous’ feature of the simplex algorithm is the possibility of inadvertently ‘fitting’ completely irrelevant parameters. The immediate result of the fit gives no indication about the relevance of the fitted parameters. This also applies to the Solver algorithm in Excel. Multivariate Data, Separation of the Linear and Non-Linear Parameters
The number of parameters used in the examples of non-linear fitting so far was reasonable; but reasonable, of course, is not a well defined quantity. A rough guess for an upper limit for the number of parameters might be something like a dozen, depending on the data. Consider now multivariate data, e.g. measurements at many wavelengths instead of only one, say kinetics followed by a diode-array spectrophotometer. Assume the instrument records the spectra at 1024 wavelengths. Compared with monovariate data (single wavelength), there is a dramatic increase in the number of parameters to be fitted. In addition to the rate constant, there are now 1024 molar absorptivities for each reacting component that need to be fitted. The algorithm devised so far cannot cope with that number of parameters. Rather than developing the principles theoretically, we base the following pages on the example of a consecutive reaction where absorption spectra were measured at 100 wavelengths during the course of the reaction.
163
Model-Based Analyses k2 k1 A ⎯⎯⎯ → B ⎯⎯⎯ →C
(4.77)
Recall equation (3.10), repeated here:
nl ns
Y
nc
=
C
nl
×
A
nl nc
+ ns
R
Figure 4-40. The structure of Beer-Lambert's law in matrix notation.
The central part of the Newton-Gauss algorithm is the computation of the residuals, which are now collected in the matrix R. R is a function of the measurement Y, the model, and the parameters. For the example, the parameters include the two rate constants k1 and k2, which we collect in the vector p and all molar absorptivities, all elements of the matrix A. For a given model we can write R = f (Y, p, A )
(4.78)
Easily hundreds of parameters! There is a widespread belief that such data cannot be fitted and there is a long list of attempts to overcome the problem of too many parameters. As we will see, it is easy. The secret is to realise that there are linear and non-linear parameters and that they can be separated, essentially reducing the number of parameters to be fitted iteratively, to the number of non-linear ones. The vector p defines the matrix C of concentrations and C, in turn, allows the computation of the best matrix A as a linear least-squares fit, A=C+Y, recall equation (4.61). Thus R can be computed as R = Y − CA = Y − C(C + Y ) = f (Y, p)
(4.79)
The crucial aspect of equation (4.79) is that the residuals are defined as a function of the rate constants only. We are back to a reasonable number of parameters, two for our example! Note, however, that the matrix A is always based on the matrix C. Thus, during the iterative refinement, where rate constants are still incorrect, C as well as A are incorrect too. Only at the very end will they be correct. Next, several deliberations with respect to the calculation of the Jacobian are appropriate. J=
∂R(p) ∂p
(4.80)
J is the derivative of a matrix with respect to a vector. What is the structure of such an object and more disturbingly, what is its pseudo-inverse?
164
Chapter 4
A straightforward way to organise J is as a 3-dimensional array: The derivative of R with respect to one particular parameter pi is a matrix of the same dimensions as R itself. The collection of all these ns×nl derivatives with respect to all the np non-linear parameters (e.g. rate constants) can be arranged in a 3-D array of dimensions ns×nl×np, all individual matrices ∂R/∂pi written slice-wise ‘behind’ each other. This is illustrated in Figure 4-41. nl
np
∂R ∂pi
ns
Figure 4-41. The Jacobian J as a 3-dimensional array, each slice is the derivative of R with respect to one particular parameter.
Organising J in a 3-D array is elegant, but it does not fit into the standard routines of Matlab for matrix manipulation. There is no command for the calculation of the pseudo-inverse J+ of such a 3-D array. There are several ways around this problem; one of them is discussed here. The matrices R(p) and R(p+δpi) as well as each matrix ∂R/∂pi are vectorised, i.e. unfolded into long column vectors. The np vectorised partial derivatives then form the columns of the matricised Jacobian J. The structure of the resulting analogue to equation (4.66) can be represented graphically in the following way: np
r(p +δp) =
+
J
× δp
+
nl×ns nl× ns
=
nl×ns nl× ns
nl×ns nl× ns
× np
r(p)
Figure 4-42. Structure of the equation r(p + δp) = r(p) + Jδp . Since J and R are now re-arranged into a matrix and a vector, we can write the solution without any difficulty.
Model-Based Analyses
δp = − J + × r(p)
165
(4.81)
Or, using the Matlab '\' notation for the pseudo-inverse:
δp = −J \ r(p)
(4.82)
Remember, p only comprises the non-linear parameters, i.e. in case of our consecutive reaction the rate constants k1 and k2. The linear parameters, the elements of the matrix A containing the molar absorptivities, have effectively been eliminated. For reaction mechanisms that have explicit solutions to the set of differential equations, it is always also possible to define the derivatives ∂C/∂p explicitly. In such cases the Jacobian can be calculated in explicit equations and time consuming numerical differentiations are not required. The equations are rather complex, although implementation in Matlab is straightforward. The calculation of numerical derivatives is always possible and for mechanisms that require numerical integration, it is the only option. We apply this Newton-Gauss-Levenberg/Marquardt algorithm to the consecutive reaction shown in Figure 4-29. This time we fit all parameters, rate constants and molar absorptivities. The group of Matlab programs comprises the data generation Data_ABC.m (see p.143), the main program Main_ABC.m, a new version of the Newton-Gauss-Marquardt algorithm nglm2.m and the function that calculates the residuals for the model and given parameters, Rcalc_ABC.m. The data generation and main programs are probably self-explanatory. There are very few changes to the NewtonGauss program. They only comprise the input and output arguments of the Newton-Gauss routine and the function call for the calculation of the residuals. The critical changes have been implemented in the function Rcalc_ABC.m, where the matrix A is computed from Y and C and the last line, where the matrix R is vectorised into a long vector r. Also note that for the calculation of the standard deviation, σr, of the residual matrix R, the degrees of freedom have to be adapted in accordance with the multivariate dimensions. Referring to equation (4.76), the number of degrees of freedom, df, is calculated by df = ns × nl − (np + nc × nl )
(4.83)
where ns, nl and nc denote the number of spectra (rows in Y and C), the number of wavelengths (columns in Y and A) and the number of components (columns in C and rows in A), as outlined in Figure 4-40. np is the number of non-linear parameters directly fitted by the Newton-GaussLevenberg/Marquardt algorithm, in this case the number of rate constants. Equation (4.83) calculates the degrees of freedom by taking the difference between the number of elements in Y or R and the total number of fitted parameters (np non-linear and nc×nl linear ones). MatlabFile 4-31. Main_ABC.m % Main_ABC
166
[t,lam,Y]=Data_ABC; A_0=1e-3; k0=[0.005;0.001];
Chapter 4
% get absorbance data % start parameter vector
[k,ssq,C,A,Curv]=nglm2('Rcalc_ABC',k0,A_0,t,Y);
% call ngl/m
sig_r=sqrt(ssq/(prod(size(Y))-length(k)-(prod(size(A)))));% sigma_r sig_k=sig_r*sqrt(diag(inv(Curv))); % sigma_par for i=1:length(k) fprintf(1,'k(%i): %g +- %g\n',i,k(i),sig_k(i)); end fprintf(1,'sig_r: %g\n',sig_r); subplot(2,1,1);plot(t,C);xlabel('time');ylabel('concentration'); subplot(2,1,2);plot(lam,A);xlabel('wavelength');ylabel('molar abs.'); it=0, ssq=0.67914, mp=0, conv_crit=1 it=1, ssq=0.348911, mp=0, conv_crit=0.486246 it=2, ssq=0.338371, mp=0, conv_crit=0.0302078 it=3, ssq=0.338344, mp=0, conv_crit=7.90499e-005 k(1): 0.00301802 +- 4.21232e-005 k(2): 0.00149765 +- 1.91055e-005 sig_r: 0.0101012 MatlabFile 4-32. nglm2.m function [p,ssq,C,A,Curv]=nglm2(fname,p,A_0,t,y) ssq_old=1e50; mp=0; mu=1e-4; delta=1e-6;
% Marquardt parameter % convergence limit % step size for numerical diff
it=0; while it<50 [r0,C,A]=feval(fname,p,A_0,t,y); % call calc of residuals ssq=sum(r0.*r0); conv_crit=(ssq_old-ssq)/ssq_old; fprintf(1,'it=%i, ssq=%g, mp=%g, conv_crit=%g\n', ... it,ssq,mp,conv_crit); if abs(conv_crit) <= mu % ssq_old=ssq, minimum reached ! if mp==0 break % if Marquardt par zero, stop else % otherwise mp=0; % set to 0 , another iteration r0_old=r0; end elseif conv_crit > mu% convergence ! mp=mp/3; ssq_old=ssq; r0_old=r0; for i=1:length(p) p(i)=(1+delta)*p(i); r=feval(fname,p,A_0,t,y); J(:,i)=(r-r0)/(delta*p(i)); p(i)=p(i)/(1+delta); end elseif conv_crit < -mu % divergence !
167
Model-Based Analyses if mp==0 mp=1; else mp=mp*5; end p=p-delta_p;
% use Marquardt parameter
% and take shifts back
end J_mp=[J;mp*eye(length(p))]; r0_mp=[r0_old;zeros(size(p))]; delta_p=-J_mp\r0_mp; p=p+delta_p; it=it+1; end Curv=J'*J;
% % % %
augment Jacobian matrix augment residual vector calculate parameter shifts add parameter shifts
% curvature matrix
MatlabFile 4-33. Rcalc_ABC.m function [r,C,A]=Rcalc_ABC(k,A_0,t,Y) C(:,1)=A_0*exp(-k(1)*t); % concentrations of A C(:,2)=A_0*k(1)/(k(2)-k(1))*(exp(-k(1)*t)-exp(-k(2)*t)); % conc. of B C(:,3)=A_0-C(:,1)-C(:,2); % concentrations of C A=C\Y; % elimination of linear parameters R=Y-C*A; % residuals r=R(:); % vectorising the residual matrix R
concentration
1
x 10
-3
0.5
0 0
1000
2000 time
3000
4000
450
500 wavelength
550
600
molar abs.
1000 500 0 -500 400
Figure 4-43. Reaction A → B → C . Results of fitting the rate constants and component spectra. Shown are the concentration profiles C and component spectra A.
168
Chapter 4
In Figure 4-43 the results of the data fitting process are illustrated. The top panel contains the matrix C and the bottom panel the matrix A. We leave it to the reader to compare the plots with the corresponding results from linear regression, as shown in Figure 4-30 and Figure 4-31. Constraint: Positive Component Spectra
Concentrations and molar absorptivities can only be positive, i.e. all elements of the matrices C and A must be positive numbers. Calculation of C, based on the chemical model and its mathematical function, will automatically result in positive values only. If this is not the case, there is something wrong with the model or its translation into Matlab code. The linear regression calculation A=C+Y, however, will not automatically result in exclusively positive values. This will often be the case during the iterative refinement while the non-linear parameters are still not right, but it is also possible to encounter negative entries in A at the end of the fitting procedure. There are methods available that compute linear regressions with the restriction that all elements of the result are positive. The Matlab nonneg.m is freely available on the internet function (http://www.models.kvl.dk/source/) provided by C. Andersson. Its use is demonstrated in the function Rcalc_ABC2 which can be employed instead of Rcalc_ABC. MatlabFile 4-34. Rcalc_ABC2.m function [r,C,A]=Rcalc_ABC2(k,A_0,t,Y) C(:,1)=A_0*exp(-k(1)*t);
% concentrations of species A
C(:,2)=A_0*k(1)/(k(2)-k(1))*(exp(-k(1)*t)-exp(-k(2)*t)); % conc. of B C(:,3)=A_0-C(:,1)-C(:,2); % concentrations of C A=nonneg(Y',C')'; % pos spectra (Andersson) R=Y-C*A; % residuals r=R(:); % vectorising the residual matrix R
Here is a typical application for this non-negative linear least-squares function. Chemical reaction schemes comprising only first order reactions result in concentration profiles that are linear combinations of exponentials. However, such reactions can be fitted with several schemes of that kind and there is no unique relationship between the fitted exponentials and k2 k1 → B ⎯⎯⎯ →C the values mechanistic rate constants. In our example A ⎯⎯⎯ for k1 and k2 can be swapped, resulting in identical fits. Depending on the initial guesses for the rate constants, the algorithm converges to either of the two possible solutions. This can easily be verified by the reader. Fortunately, results with interchanged rate constants often lead to meaningless (e.g. negative) or unreasonable molar absorptivity spectra A. Simple chemical reasoning and intuition usually help to resolve the ambiguity. Using Rcalc_ABC2, even when starting with the correct but swapped rate constants, the fitting ends up with the right result, as calculated spectra are negative otherwise. Note, however, that this is not always the case.
Model-Based Analyses
169
Depending on the spectra of the components, wrong results cannot be identified by impossible, negative spectra. (also see Chapter 4.4.3, Optimisation in Excel, the Solver, Figure 4-62 and Figure 4-63) Structures, Fixing Parameters
In this sub-chapter, we introduce some drastic changes to the structure of the packages of programs used so far. They include the main program, data generation, Newton-Gauss programs and the function to calculate the residuals. The changes do not affect any of the numerical aspects, they only involve the way information is passed between the functions. The aim is to make the package more adaptable for new fitting tasks. There are several advantages that can be gained: (a) adding and removing parameters that need to be fitted, (b) fixing some parameters to particular values while leaving the others free to be fitted. In the present organisation of the programs, the above tasks require the cumbersome rewriting of parameter lists and small but important changes in the different routines. Such processes are error prone and are better avoided. In new and yet unknown chemical systems with fitting problems with many parameters, it is sometimes difficult to come up with reasonable initial guesses for all parameters. If some of them are seriously wrong, even the Marquardt algorithm collapses and there is no result. In such situations, it is often advantageous to leave a few, reasonably well defined parameters fixed while the others are fitted. Sometimes a fair amount of fiddling is required to achieve the goal. At present this is difficult to accomplish; it would require the changes of several lines in several parts of the code. These simplifications in the parameter handling are best organised by using structures and cell arrays, as provided by Matlab to supplement the matrix as a basic data type. We introduce both, beginning with structures. Structures are Matlab arrays with named "data containers" called fields. The fields of a structure can contain any kind of data. For example, one field might contain a text string representing a name, another might contain a scalar representing a billing amount, a third might hold a matrix of medical test results, and so on. (also see Matlab Help on structures). Instead of having a multitude of variables, each one to be passed to and returned from functions, all can be collected in a structure that is passed around as one single unit. Using a structure to pass information into and out of functions makes life much easier. We put everything required in the different functions into one structure, and pass the information around without having to write long variable lists as the arguments of these functions. More importantly, we do not have to change these lists for different kinds of tasks but simply add/remove appropriate fields to/from the existing structure.
170
Chapter 4
Until now, we have dealt with kinetic models and rate constants as the non linear parameters to be fitted to spectrophotometric absorbance data. However, measurements can be of a different kind and particularly titrations (e.g. pH-titrations) are often used for quantitative chemical analyses. In such instances concentrations can also be parameters. In fact, any variable used to calculate the residuals is a potential parameter to be fitted. In the examples below, we use the structure s into which we put all the information needed in different parts of the programs. We demonstrate the improvements analysing a potentiometric pH-titration of a solution containing an unknown concentration of a diprotic acid AH2 and an additional unknown concentration of a strong acid. Unknown also are the two protonation constants. While it might be clear to the chemist that such an analysis is feasible, how to incorporate the task into the existing programs is probably much less clear. In particular, the option of fitting concentrations does not fit well into the old organisation. The function Data_EqAH2.m simulates the pH-titration of a weak diprotic acid, AH2, in acid excess, with a strong base. The computation of the equilibria is similar to the examples Eq1.m and Eq2.m given in the Chapters Example: General 3-Component Titration (p.56) and Example: pH Titration of Acetic Acid (p.58). From the present point of view, the important aspect is that all variables are collected in one structure s. The model is now stored in s.Model, the logβ values in s.log_beta, etc. Importantly, all the information contained in s is returned to the invoking programs. MatlabFile 4-35. Data_EqAH2.m function s = Data_EqAH2 s.spec_names = {'A' 'H' 'AH' 'AH2' 'OH'}; s.Model = [ 1 0 1 1 0; ... 0 1 1 2 -1];
% component A % component H
s.log_beta
= [ 0
% formation constants
s.c_0 s.c_added s.v_0 s.v_added s.v_tot s.nvol s.ncomp
=[1e-1 2.3e-1]; =[0 -2e-1 ]; =10; =[.01:0.1:15]'; =s.v_0+s.v_added; =length(s.v_added); =size(s.Model,1);
0
7
10
-14]; % % % % % % %
conc. init. solution,Atot,Htot conc titration solution, Atot,OH initial volume added volumes total volumes number of additions number of components, 2
s.C_tot=(s.v_0*repmat(s.c_0,s.nvol,1)+ ... % total conc. of comp. s.v_added*s.c_added)./repmat(s.v_tot,1,s.ncomp); s.c_comp_guess0 =[1e-8 1e-8]; c_comp_guess=s.c_comp_guess0;
% init. guess for Newton-Raphson % species concentrations
beta=10.^s.log_beta; for i=1:s.nvol C(i,:)=NewtonRaphson(s.Model,beta,s.C_tot(i,:),c_comp_guess,i); c_comp_guess=C(i,1:s.ncomp); end
Model-Based Analyses
s.pH=-log10(C(:,s.ncomp));
% pH values
randn('seed',0); s.pH=s.pH+0.1*randn(size(s.pH));
% added noise
171
The structure s, as returned from Data_EqAH2, contains all the variables below: s = spec_names: Model: log_beta: c_0: c_added: v_0: v_added: v_tot: nvol: ncomp: C_tot: c_comp_guess0: pH:
{'A' 'H' 'AH' 'AH2' 'OH'} [2x5 double] [0 0 7 10 -14] [0.1000 0.2300] [0 -0.2000] 10 [150x1 double] [150x1 double] 150 2 [150x2 double] [1.0000e-008 1.0000e-008] [150x1 double]
Almost every field in structure s is a matrix. The only exception is the cell array spec_names, containing the strings of the species names. Due to the different lengths of the names, they cannot easily be collected in a 'normal' string matrix. Cell arrays are indicated by the curly brackets { … }. In addition to the benefit of only having to transfer the one single structure s into and out of functions, the handling of the parameters can be streamlined. In the main program Main_EqAH2.m, a new field s.par_str is created; it contains a list of the names of all the variables within the structure that are to be fitted. All other variables are used as they are and not fitted. In the example they include the protonation constants 's.log_beta(3)' (=log(β11)), 's.log_beta(4)' (=log(β12)), and 's.log_beta(5)' (=log(β0 1)=pKw), as well as the initial concentrations 's.c_0(1)' of the anion of the diprotic acid [A] and 's.c_0(2)', which is the total amount of protons [H]. The list of strings is stored as the cell array s.par_str, allowing for different lengths of the strings. In this way, the information about which parameters need to be fitted is carried into the functions. This information is used in the Newton-Gauss-Levenberg/Marquardt routine nglm3.m, where the appropriate derivatives and shifts are computed. Using the function get_par.m the values of the parameters corresponding to the names in s.par_str are collected into the vector of variable parameters s.par. The partner function put_par.m is used within nglm3.m to do the opposite; i.e. to update after each iteration all individual parameter variables corresponding to the names given in s.par_str.
172
Chapter 4
MatlabFile 4-36. Main_EqAH2.m % Main_EqAH2 s=Data_EqAH2; s.fname ='Rcalc_EqAH2';
% get titration data into structure s % file to calc residuals % variables to be fitted s.par_str ={'s.log_beta(3)'; 's.log_beta(4)'; 's.log_beta(5)';... 's.c_0(1)'; 's.c_0(2)'}; s.log_beta(3:5)=[6; 10; -13]; % [logB1, logB2 -pKw] initial estimates s.c_0(1:2) =[0.1; 0.2]; % Atot, Htot, initial estimates s.par=get_par(s); % collects variable param. into s.par s=nglm3(s);
% call ngl/m
s.sig_r=sqrt(s.ssq/(prod(size(s.pH))-length(s.par))); s.sig_par=s.sig_r*sqrt(diag(inv(s.Curv)));
% sigma_r % sigma_par
for i=1:length(s.par) fprintf(1,'%s: %g+-%g\n',s.par_str{i}(3:end),s.par(i),s.sig_par(i)); end fprintf(1,'sig_r: %g\n',s.sig_r); plot(s.v_added,s.pH,'.',s.v_added,s.pH_calc); xlabel('ml');ylabel('pH'); log_beta(3): 7.01069+-0.0164246 log_beta(4): 10.0144+-0.0304164 log_beta(5): -13.9973+-0.0182982 c_0(1): 0.099851+-0.000327329 c_0(2): 0.229964+-9.01145e-005 sig_r: 0.104209
14 12 10
pH
8 6 4 2 0 0
5
10
15
ml
Figure 4-44. Fitted titration curve for the addition of base to a mixture of a diprotic and a strong acid.
Model-Based Analyses
173
MatlabFile 4-37. get_par.m function par=get_par(s) % collects variable parameters into s.par for i=1:length(s.par_str) par(i,1)=eval(s.par_str{i}); end MatlabFile 4-38. put_par.m function s=put_par(s) % updates all parameter variables in s (e.g. s.log_beta, etc) for i=1:length(s.par_str) eval([s.par_str{i} '= s.par(i);']); end
The Newton-Gauss-Levenberg/Marquardt function nglm3.m is essentially the same as before. The main changes in nglm3.m are given in the lines below; they concern the shifting of parameters for the numerical computation of the derivatives., for i=1:length(s.par) % slice wise num. diff. eval([s.par_str{i} '=' s.par_str{i} '*(1+delta);']); r=feval(s.fname,s); eval([s.par_str{i} '=' s.par_str{i} '/(1+delta);']); J(:,i)=(r-r0)/(delta*s.par(i)); end
and the parameter updating at the end of each iteration. s=put_par(s); % updates parameter variables in s (e.g. s.log_beta) MatlabFile 4-39. nglm3.m function s=nglm3(s) ssq_old=1e50; mp=0; mu=1e-4; % convergence limit delta=1e-6; % step size for numerical diff it=0; while it<50 [r0,s]=feval(s.fname,s); % calculation of residuals conv_crit=(ssq_old-s.ssq)/ssq_old; % convergence criterium fprintf(1,'it=%i, ssq=%g, mp=%g, conv_crit=%g\n', ... it,s.ssq,mp,conv_crit); if abs(conv_crit) <= mu % ssq_old=ssq, minimum reached if mp==0 break % if Marquardt par zero, stop else % otherwise mp=0; % set mp to 0, next iteration r0_old=r0; end elseif conv_crit > mu % convergence mp=mp/3; ssq_old=s.ssq; r0_old=r0; for i=1:length(s.par) % slice wise num. diff. eval([s.par_str{i} '=' s.par_str{i} '*(1+delta);']); r=feval(s.fname,s); eval([s.par_str{i} '=' s.par_str{i} '/(1+delta);']);
174
Chapter 4
J(:,i)=(r-r0)/(delta*s.par(i)); end elseif conv_crit < -mu % divergence if mp==0 mp=1; % use Marquardt parameter else mp=mp*5; end s.par=s.par-delta_par; % and take shifts back end J_mp=[J;mp*eye(length(s.par))]; % augment Jacobian matrix r0_mp=[r0_old;zeros(size(s.par))]; % augment residual vector delta_par=-J_mp\r0_mp; % calculate parameter shifts s.par=s.par+delta_par; % add parameter shifts s=put_par(s); % updates parameter variables in s (e.g. s.log_beta) it=it+1; end s.Curv=J'*J; % curvature matrix
The central function Rcalc_EqAH2.m computes the residuals and is again very similar to the ones we developed earlier. First, the total concentrations are recalculated; this needs to be part of the calculation of the residuals, as we want to be able to fit initial concentrations (s.c_0) as well. Subsequently these total concentrations are passed to the Newton-Raphson function NewtonRaphson.m in order to calculate all species concentrations, see The Newton-Raphson Algorithm (p.48). The differences between measured and calculated pH define the residuals. Note that any variable used in this function to calculate the residuals can 'theoretically' be a parameter to be fitted to the data. MatlabFile 4-40. Rcalc_EqAH2.m function [r,s]=Rcalc_EqAH2(s) s.v_tot=s.v_0+s.v_added; s.C_tot=(s.v_0*repmat(s.c_0,s.nvol,1)+s.v_added*s.c_added) ... ./repmat(s.v_tot,1,s.ncomp); beta=10.^s.log_beta; c_comp_guess=s.c_comp_guess0; % species concentrations for i=1:s.nvol s.C(i,:)=NewtonRaphson(s.Model,beta,s.C_tot(i,:), ... c_comp_guess,i); c_comp_guess=s.C(i,1:s.ncomp); end s.pH_calc=-log10(s.C(:,s.ncomp)); r=s.pH-s.pH_calc; s.ssq=sum(r.*r); if nargout==2 figure(2);plot(s.v_added,s.pH,'.',s.v_added,s.pH_calc); xlabel('ml');ylabel('pH');drawnow end
What is provided here is very powerful. Powerful tools can be used to great advantage but they can also be abused and result in nonsense. It is very important to make sure that the right parameters, and also the right
Model-Based Analyses
175
number, are fitted: e.g. one can use an acid-base titration to determine either the concentration of the acid or the base, but certainly not of both at the same time. It would be very easy to tell the program to try. Also care has to be taken with the initial guesses for the parameters. Parameters can be very sensitive and the Newton-Gauss-Levenberg/Marquardt algorithm easily gets mislead, producing parameter shifts that point toward non-sensical areas such as negative concentrations. Usually, this results in an increase of the sum of squares and is then controlled by the Marquardt extension. This is however not always the case and for negative concentrations the NewtonRaphson routine can collapse altogether. We remind the reader that the Newton-Raphson routine is not infallible. One could add many more checks and controls to enforce its stability. We have concentrated on demonstrating the basic principles and developed a program that does the job under normal conditions. We leave the addition of 'bells and whistles' to the reader. And maybe a last reminder, the above package of short routines is actually a fairly complete pH titration fitting package. Theoretically, it deals with any number of components, species, equilibria, concentrations and whatever the chemist wants to fit! All that lacks is a user friendly interface. Known Spectra, Uncoloured Species
The package of functions developed so far for the analysis of kinetic absorbance data or pH titrations can cope with almost anything in terms of chemical models. There are no software imposed restrictions to the complexity of the model that can be analysed. However, in practise, things are more complicated and the analysis is not as straightforward as anticipated. Many problems that do arise are related to linearly dependent columns in the concentration matrix C, rendering the computation of the pseudo-inverse C+ impossible. Besides, concentration profiles of species not absorbing in the investigated wavelength region need to receive special treatment. Finally, in some instances the absorption spectra of one or more species are known independently and we would like to take advantage of that additional knowledge. In this part, we offer the solution to that family of potential challenges. We return to spectrophotometric absorbance data and Beer's law Y=CA. The matrix C of concentration profiles, as computed by the relevant function, contains the profiles of all species that are part of the model. Remember, in kinetics this relevant function is an ODE solver, e.g. Runge-Kutta, while for equilibrium investigations, such as titrations, it is the Newton-Raphson routine that computes C. The following considerations hold for both kinetic and titrimetric absorbance data Y. It makes no sense whatsoever to try to calculate the spectra of species that are known not to absorb, by applying the equation A=C+Y. Often C will be rank deficient and cannot be pseudo-inverted. Matlab will issue an error message that needs to be taken seriously. Even if C is not rank deficient, it is not reasonable to allow the calculation of the spectrum of a non-absorbing
176
Chapter 4
species. As will be shown in a moment, it is possible to handle known absorptivity spectra in exactly the same way. The spectrum of a nonabsorbing species is only a special case of a known spectrum – we know its molar absorptivities are all zero. The principle is the following: the base equation Y=CA can be written in a different way. The matrix Y contains the sum of all the individual contributions of the absorbing species; the contribution of the j-th species is the product of the j-th column of C with the j-th row of A. Y = CA = c:,1a1,: + c:,2a 2,: + ... + c:, j a j ,: + ... + c:,nc anc ,:
(4.84)
It is possible to separate the products c:,j aj,: into two groups. We collect all the concentration profiles and absorption spectra of the species with known spectra in the two matrices Ck and Ak. For non-absorbing species, the appropriate rows of Ak contain only zeros. Correspondingly, the matrices Cuk and Auk contain the concentration profiles and spectra of the species with unknown spectra. Equation (4.84) can be arranged as: Y = Yk + Yuk = C k A k + C uk A uk
(4.85)
Now, prior to the computation of the unknown species spectra Auk, the known contribution Yk is subtracted from Y. Yuk = Y − Yk = Y − C k A k
(4.86)
The difference Yuk allows the computation of Auk as + A uk = C uk Yuk
(4.87)
Figure 4-45 represents the situation graphically. The parts of C and A in light grey represent the contributions of the 'known' species, dark grey characterises the 'unknown' species.
177
Model-Based Analyses
Y Y
= C =
Y
= Ck
Y - Ck Ak
= =
A
Ak
Cuk
+
Cuk
Auk
Auk
Figure 4-45. The separation of the contributions of species with known and unknown spectra.
The following example demonstrates the implementation of known spectra and non-absorbing species into the algorithms. It is an aqueous spectrophotometric titration, investigating the complexation of a metal M by a ligand L to form the complex ML. The ligand also acts as a diprotic base and, additionally, the autoprotolysis of the solvent water needs to be taken into account. The complete model is: ⎯⎯→ ML M + L ←⎯⎯
β110
⎯⎯→ LH L + H ←⎯⎯
β011
⎯⎯→ LH 2 L + 2H ←⎯⎯
β012
⎯⎯→ OH (H 2O ) − H ←⎯⎯
β00 −1
(4.88)
There are seven species, H2O not included; charges are omitted for brevity. The proton and the hydroxide ion do not absorb and we assume to have independently determined the absorptivity spectrum of the free metal. For data generation, we set the spectra of species 3 (H+) and 7 (OH-) to zero. In the main program Main_EqFix.m, we create a vector s.known, indicating the position of the species for which we have a known spectrum, including non-absorbing (i.e. zero absorptivity). Of course the positioning of the species has to be consistent throughout the whole program, e.g. the column positions in s.Model and in C match the row position in A. The corresponding known spectra are collected in the matrix s.A_k. There are no changes required in the Newton-Gauss-Levenberg/Marquardt routine nglm3.m, but we do need to implement equations (4.85) to (4.87) into the function Rcalc_EqFix.m, computing the residuals.
178
Chapter 4
Note that in the example, we only fitted the formation constant of species ML and the initial concentration of L. We leave it to the reader to fit more or other parameters and to compare the results with the model data. In Figure 4-46 the fixed (−•−) and fitted molar absorptivity spectra are shown. MatlabFile 4-41. Data_Eqfix.m function s = Data_EqFix s.spec_names = {'M' 'L' 'H' 'LH' 'LH2' 'ML' 'OH'}; s.Model = [ 1 0 0 0 0 1 0; ... 0 1 0 1 1 1 0; ... 0 0 1 1 2 0 -1]; s.log_beta s.c_0 s.c_added s.ncomp s.v_0 s.v_added s.v_tot s.ns
= [ 0
0
0
7
=[.7e-1 1e-1 2.3e-1]; =[ 0 0 -2e-1 ]; =size(s.Model,1); =10; =[.01:0.1:15]'; =s.v_0+s.v_added; =length(s.v_added);
10 % % % % % % %
9
% M % L % H
-14];
init. conc. (Mtot,Ltot,Htot) conc in titration solution number of components, 2 initial volume added volumes total volumes number of spectra
s.C_tot=(s.v_0*repmat(s.c_0,s.ns,1)+s.v_added*s.c_added) ... ./repmat(s.v_tot,1,s.ncomp); % total conc. of comp. s.c_comp_guess0 =[1e-8 1e-8 1e-8]; % default guess for Newton-Raphson c_comp_guess=s.c_comp_guess0; % init. comp. conc. beta=10.^s.log_beta; for i=1:s.ns C(i,:)=NewtonRaphson(s.Model,beta,s.C_tot(i,:),c_comp_guess,i); c_comp_guess=C(i,1:s.ncomp); end s.lam =400:5:600; s.nl =length(s.lam); s.A_sim(1,:)=10*gauss(s.lam,420,20); s.A_sim(2,:)=30*gauss(s.lam,450,50); s.A_sim(3,:)=0; s.A_sim(4,:)=20*gauss(s.lam,500,50); s.A_sim(5,:)=20*gauss(s.lam,530,50); s.A_sim(6,:)=20*gauss(s.lam,560,50); s.A_sim(7,:)=0; randn('seed',0); s.Y=C*s.A_sim; s.Y=s.Y+0.01*randn(size(s.Y)); MatlabFile 4-42. Rcalc_EqFix.m function [r,s]=Rcalc_EqFix(s) s.v_tot=s.v_0+s.v_added; s.C_tot=(s.v_0*repmat(s.c_0,s.ns,1)+s.v_added*s.c_added) ... ./repmat(s.v_tot,1,s.ncomp); beta=10.^s.log_beta; c_comp_guess=s.c_comp_guess0;
% reinit. comp. conc.
Model-Based Analyses for i=1:s.ns s.C(i,:)=NewtonRaphson(s.Model,beta,s.C_tot(i,:), ... c_comp_guess,i); c_comp_guess=s.C(i,1:s.ncomp); end if sum(s.known)>0 C_k =s.C(:,s.known==1); C_uk=s.C(:,s.known==0);
% conc with known spectra % conc with unkown spectra
Y_k=C_k*s.A_k; Y_uk=s.Y-Y_k;
% known part of Y % unkown part of Y
A_uk=C_uk\Y_uk; %A_uk=nonneg(Y_uk',C_uk')';
% unknown spectra % non-negative spectra
R=Y_uk-C_uk*A_uk; s.A(s.known==1,:)=s.A_k; s.A(s.known==0,:)=A_uk; else s.A=s.C\s.Y; R=s.Y-s.C*s.A; end r=R(:); s.ssq=sum(r.*r); MatlabFile 4-43. Main_EqFix.m % Main_EqFix s=Data_EqFix; % get titration data into structure s s.fname ='Rcalc_EqFix'; % file to calc. residuals s.par_str ={'s.log_beta(6)';'s.c_0(2)'}; % loose par s.log_beta(6) =11; % logbeta110 initial estimate s.c_0(2)=0.08; % Ltot initial estimates s.par=get_par(s); % collects variable param. into s.par s.known = [1 0 1 0 0 0 1]; s.A_k(1:sum(s.known),:) = s.A_sim(s.known==1,:); s=nglm3(s);
% call ngl/m % sigma_r s.sig_r=sqrt(s.ssq/(s.ns*s.nl-length(s.par)-sum(s.known==0)*s.nl)); s.sig_par=s.sig_r*sqrt(diag(inv(s.Curv))); % sigma_par for i=1:length(s.par) fprintf(1,'%s: %g +- %g\n',s.par_str{i}(3:end),s.par(i), ... s.sig_par(i)); end fprintf(1,'sig_r: %g\n',s.sig_r); plot(s.lam,s.A_k,'.',s.lam,s.A); xlabel('wavelength');ylabel('mol. abs'); it=0, it=1, it=2, it=3,
ssq=3.19992, mp=0, conv_crit=1 ssq=2.4536, mp=0, conv_crit=0.233233 ssq=1.8691, mp=0, conv_crit=0.23822 ssq=1.23378, mp=0, conv_crit=0.339906
179
180
Chapter 4
it=4, ssq=0.726495, mp=0, conv_crit=0.411164 it=5, ssq=0.599869, mp=0, conv_crit=0.174298 it=6, ssq=0.597756, mp=0, conv_crit=0.00352187 it=7, ssq=0.597756, mp=0, conv_crit=2.58034e-008 log_beta(6): 9.00604 +- 0.0299892 c_0(2): 0.0998105 +- 0.000103425 sig_r: 0.00999462
35 30 25
mol. abs
20 15 10 5 0 -5 400
450
500 wavelength
550
600
Figure 4-46. Fixed (−•−) and fitted molar absorptivity spectra for the species M, L, LH, LH2, ML (from left to right)
Note that for the calculation of the standard deviation σr the degrees of freedom df have to be adapted according to the number of actually fitted absorptivity spectra. Referring to equations (4.76) and (4.83) the degrees of freedom df are now calculated by
df = ns × nl − (np + nu × nl )
(4.89)
where nu denotes the number of unknown species spectra in A. Hence, if fixed spectra are incorporated, the total number of fitted linear parameters becomes reduced to nu×nl. Reduced Eigenvector Space
Not many years ago, computer memory was very precious and it was important to write code that was economic with respect to memory requirements. More recently, this aspect of computing has changed dramatically and as a consequence this sub-chapter is no longer of vital
Model-Based Analyses
181
interest. Nevertheless, saving memory does no harm and there can also be a modest reduction in computation time. In the standard equation for multiwavelength spectrophotometric investigations, based on Beer-Lambert's law, the matrix Y is written as the product of the matrices C and A. According to the Singular Value Decomposition (SVD), Y can also be decomposed into the product of three matrices Y = USV
(4.90)
We discuss this decomposition again in great depth in Chapter 5, Model-Free Analyses. For the moment we need to identify a few essential properties of the Singular Value Decomposition and in particular of the matrices U, S, and V. SVD is completely automatic. It is one of the most stable algorithms available and thus can be used 'blindly'. It is one command in Matlab: [U,S,Vt]=svd(Y,0). The matrices U and Vt contain as columns so-called eigenvectors. They are orthonormal (see Orthogonal and Orthonormal Matrices, p.25) which means that the products U t U = VV t = I
(4.91)
result in the identity matrix of the appropriate size. Since U and V are orthonormal, their transposed matrices are equivalent to their pseudoinverse, Ut=U+, Vt=V+. S is a diagonal matrix with the so-called singular values as entries. They are arranged in decreasing order. Much will be said in Chapter 5, Model-Free Analyses, about the dimensions of the matrices U, S and V. For the moment it suffices to state that there are as many significant eigenvectors in U and V and singular values in S as there are ¯, S ¯ and V ¯ we absorbing species in the chemical system. In the matrices U ¯ and C are the retain only those of significance. Essentially the shapes of U ¯ and A. same and also of V We can now combine the two decompositions Y = CA = USV
(4.92)
Y Vt = C A Vt = U S V Vt
(4.93)
¯t post-multiplication with V
and renaming Y V t = U S = Yred and A V t = A red we can write Yred = C A red
(4.94)
Figure 4-47 below, demonstrates the relationships between the dimensions of the matrices involved. The process can be summarised as a reduction of the number of wavelengths from the original nl to nc, the number of
182
Chapter 4
components. The reduction in size of the original matrices Y and A can be substantial, depending, of course, on the number of wavelengths.
ns
nl
nc
Y
= C
nl
nc
A
nc
nc
=
ns
Yred = C
= U
nc
nl
S
V
nc
nc nc
Ared
Figure 4-47. Structure of the data matrices before and after representation in eigenvector space.
The obvious question arises if the number of components, nc, is not known a priory. How many eigenvectors should be used and what are the consequences of choosing the wrong number, either too high or too low? In this context the answer is easy: it does not matter much. While this might sound surprising, it is related to the equivalent question of how many wavelengths are required for a particular measurement/ There is no definite answer and it is theoretically possible to determine the parameters of a complex problem from data acquired at one wavelength only. It does no harm to have a few too many wavelengths or columns in Y and A, and the same statement applies to the number of eigenvectors, ne, or columns in Yred and Ared. But it is also clear that there is not much additional information when spectra are taken at 1000 wavelengths or similarly, far too many eigenvectors are used. The reader is welcome to verify this by testing the routine, Main_ABC_red.m below, using different numbers of eigenvectors, ne. It fits the consecutive kinetic data set, Data_ABC.m, (p.143) that we have already employed a few times. The important point is that the above manipulations do not affect the matrix C which is directly related to the model and to the non-linear parameters. Only the matrices Y and A are reduced to Yred and Ared. The proper, non-reduced pure component spectra A are computed at the very end in the usual way A=C+Y, using the original matrix Y. MatlabFile 4-44. Main_ABC_red.m % Main_ABC_red clear; [t,lam,Y]=Data_ABC; A_0=1e-3; k0=[0.005;0.001];
% get absorbance data % initial concentration of A % start parameter vector
Model-Based Analyses [U,S,Vt]=svd(Y,0); ne=4; Y_red=U(:,1:ne)*S(1:ne,1:ne);
183
% one eigenvector more than needed
[k,ssq,C,A_red,Curv]=nglm2('Rcalc_ABC',k0,A_0,t,Y_red); % call ngl/m A=C\Y;
% sigma_r sig_r=sqrt(ssq/(prod(size(Y_red))-length(k)-(prod(size(A_red))))); sig_k=sig_r*sqrt(diag(inv(Curv))); % sigma_par for i=1:length(k) fprintf(1,'k(%i): %g +- %g\n',i,k(i),sig_k(i)); end fprintf(1,'sig_r: %g\n',sig_r); it=0, ssq=0.417978, mp=0, conv_crit=1 it=1, ssq=0.0877061, mp=0, conv_crit=0.790166 it=2, ssq=0.0771321, mp=0, conv_crit=0.120562 it=3, ssq=0.0771062, mp=0, conv_crit=0.000334676 it=4, ssq=0.0771062, mp=0, conv_crit=1.39315e-008 k(1): 0.0030178 +- 4.64533e-005 k(2): 0.00149765 +- 2.12077e-005 sig_r: 0.011063
The results obtained by Main_ABC_red.m are essentially identical to the ones for the calculations on the complete matrix Y, as computed previously in Main_ABC.m (p.165). Note that the same routine, Rcalc_ABC.m, can be used to calculate the residuals and sum of squares, ssq, if we also employ the earlier version nglm2.m (p.166) for the Newton-Gauss-Levenberg/Marquardt fitting. The sum of squares is smaller as a large part of the noise is removed by replacing Y with Yred. Accordingly, the number of degrees of freedom used to compute the standard deviations of the residuals, has to be based on the size of Yred rather than Y, as shown in the program. The changes are minimal: the computation of the SVD, calculation of Yred, and using Yred and Ared for the computation of the standard deviations. Computation times are similar; in fact, depending on the dimensions of the matrices, more time is 'wasted' for the Singular Value Decomposition than gained by the data reduction. Global Analysis
For complex chemical systems, and these are often the interesting ones, it is usually not possible to design one experiment that supplies reliable and robust information about all aspects of the system. A typical example would be the investigation of the complexation properties of a new ligand. If the investigation is performed in aqueous solution, there are two types of equilibria involved – the protonation of the ligand and the interaction between ligand and metal. In some cases, it might be possible to determine all the relevant equilibrium constants in one titration experiment. This would be rather exceptional and it is much safer to perform two types of
184
Chapter 4
experiments. The first is a titration of the ligand alone, delivering good information on the protonation equilibria and the second type is the titration of metal with ligand. It is possible to analyse the two experiments individually. The first titration would deliver the ligand protonation constants and the molar absorption spectra of the differently protonated forms of the ligand. The second titration would deliver the complexation constant and the spectra of the complexes. In this second analysis, the information from the first would need to be fixed, as described in Structures, Fixing Parameters, (p.169) and Known Spectra, Uncoloured Species, (p.175). While possible, this procedure is cumbersome and the error propagation is difficult to analyse. Global analysis of the complete set of measurements is the answer. There are additional advantages. Consider the reaction: k A + B ⎯⎯→ C
(4.95)
It is not possible to determine the spectra for the components A and B. The matrix C of the concentrating profiles has a rank of 2 only and thus the pseudo-inverse C+ is not defined. Recall the equation A=C+Y for the calculation of the spectra. There are several possibilities of still fitting the rate constant, however, there is no way of determining the spectra. The simplest option is to declare one of the spectra of A or B as zero, as demonstrated in Known Spectra, Uncoloured Species, (p.175). This results in the correct value for the rate constant k and for the spectrum of C, but obviously not for the spectra of the starting materials. A better option is to determine either the spectrum of A or B and use it as known spectrum, again as demonstrated earlier. For this simple example, this approach is perfectly adequate. In a more complex mechanism, it is not always possible to independently determine one of the undefined absorption spectra, e.g. of a reactive intermediate. In this chapter, we demonstrate the global approach. Globally analysing several measurements acquired under different initial concentrations, breaks the rank deficiency in C and thus allows the complete analysis. As a matter of fact, global analysis of series of measurements in most instances increases the robustness of the analysis, compared with analysing each data set individually and combining the results in a secondary step. There are several different ways one can implement the idea of global analysis in a computer program. Possibly the simplest is shown in Figure 4-48, which displays graphically how the individual measurements can be appended to form a global set of measurements. We arrive at augmented matrices Yglob and Cglob, but the essential equation Yglob=Cglob×A is of the same structure as the one we are used to. This allows us to maintain most of the functions we have developed so far. The numerical core is maintained, while the handling of the files and auxiliary information need to be adapted. Figure 4-48 indicates that the number of spectra, nsi, in each individual matrix Yi can be different; the number of wavelengths, nl, must be identical.
185
Model-Based Analyses
Y1
= C1
Y2
= C2
Ynm
A
Y1
A
C1
A
Y2
C2
A
Ynm
Cnm
Yglob
= Cglob
=
= Cnm A
Figure 4-48. Concatenation of individual files for global analysis.
A possible way of implementing global analysis is demonstrated in the k1 ⎯⎯⎯ → C. Matlab routines below. We use the kinetic example A + B ←⎯⎯ ⎯ k2
The function given in odeApB_C_rev.m sets up the corresponding system of differential equations that can be called by an ODE-solver. We use Matlab's ode45 to integrate the ODEs. Using this rate law, Data_Glob.m generates two complete absorbance data sets (Y1, Y2) from one common set of rate constants and component spectra (A), but two individual sets of initial concentrations and reaction times; the latter defining two sets of concentration profiles (C1, C2). Many of the fields in the structure s now contain two entries. These are arranged as cell arrays; e.g. the field s.Y contains the arrays s.Y{1} and s.Y{2}, the field s.t contains the two vectors s.t{1} and s.t{2}, etc. Naturally, more than two data sets can be arranged in this way. A new field, s.nm, contains the number of measurements nm (i.e. data sets). Recall that Matlab requires curly brackets when referring to elements of a cell array. This natural expansion of the structure requires very few changes in the other programs. As an example, the central fitting function nglm3.m is not affected at all. MatlabFile 4-45. odeApB_C_rev.m function c_dot=odeApB_C_rev(t,c,flag,k) % A+B<->C c_dot(1,1)=-k(1)*c(1)*c(2)+k(2)*c(3); % A_dot c_dot(2,1)=c_dot(1,1); % B_dot c_dot(3,1)=-c_dot(1,1); % C_dot MatlabFile 4-46. Data_Glob.m function s = Data_Glob % A+B<>C
186
Chapter 4
s.c_0 = {[1e-3 1.6e-3 0] [3e-3 1.6e-3 0]}; s.nc = length(s.c_0{1}); s.nm = length(s.c_0); s.k = [1;5e-3]; s.t = {[0:20:400]' [0:10:300]'}; for i=1:s.nm, s.ns{i}=length(s.t{i}); end s.lam = 400:20:600; s.nl = length(s.lam); s.A_sim(1,:) = 100*gauss(s.lam,450,50); s.A_sim(2,:) = 300*gauss(s.lam,500,80); s.A_sim(3,:) = 200*gauss(s.lam,550,60);
% initial conc A,B,C
for i=1:s.nm
% calc. of conc profiles A+B<>C [t_dummy,C{i}]=ode45('odeApB_C_rev',s.t{i},s.c_0{i},[],s.k); s.Y{i}=C{i}*s.A_sim; randn('state',123*i); s.Y{i}=s.Y{i}+0.001*randn(size(s.Y{i})); % added noise end MatlabFile 4-47. Rcalc_Glob.m function [r,s]=Rcalc_Glob(s) % A+B<>C for i=1:s.nm % for all nm measurements [t_dummy,s.C{i}]=ode45('odeApB_C_rev',s.t{i},s.c_0{i},[],s.k); end C_glob=vertcat(s.C{:}); Y_glob=vertcat(s.Y{:}); if sum(s.known)>0
% if known spectra are used
C_k =C_glob(:,s.known==1); C_uk=C_glob(:,s.known==0);
% conc with known spectra % conc with unkown spectra
Y_k=C_k*s.A_k; Y_uk=Y_glob-Y_k;
% known part of Y % unkown part of Y
A_uk=C_uk\Y_uk;
% unknown absorptivities
R=Y_uk-C_uk*A_uk;
% residuals
s.A(s.known==1,:)=s.A_k; s.A(s.known==0,:)=A_uk; else s.A=C_glob\Y_glob; R=Y_glob-C_glob*s.A; end r=R(:); s.ssq=sum(r.*r); if nargout==2 for i=1:s.nm subplot(3,2,i);
% build A % % unknown absorptivities % residuals
Model-Based Analyses
187
plot(s.t{i},s.C{i});axis tight; xlabel('time');ylabel('conc.'); subplot(3,2,i+2); plot(s.t{i},s.C{i}*s.A(:,[3,6,9]), ... s.t{i},s.Y{i}(:,[3,6,9]),'.');axis tight; legend(cellstr(int2str([s.lam([3 6 9])]'))) xlabel('time');ylabel('absorbance');
end subplot(3,2,5) plot(s.lam,s.A);axis tight; xlabel('wavelength');ylabel('absorptivity'); drawnow; end MatlabFile 4-48. Main_Glob.m % Main_Glob % A+B<>C
clear; s=Data_Glob; % get kinetic data into structure s s.fname='Rcalc_Glob'; s.par_str={'s.k(1)' 's.k(2)'}; % variables to be fitted s.k=[5;1e-2]; % rate const initial estimates s.par=get_par(s); % collects variable parameters into s.par s.known=[0 0 0]; s.A_k=[];
% known spectra % define known spectra
s=nglm3(s);
% call ngl/m % sigma_r s.sig_r=sqrt(s.ssq/(sum([s.ns{:}])*s.nl-length(s.par) ... -sum(s.known==0)*s.nl)); s.sig_par=s.sig_r*sqrt(diag(inv(s.Curv))); % sigma_par for i=1:length(s.par) fprintf(1,'%s: %g +- %g\n',s.par_str{i}(3:end),s.par(i), ... s.sig_par(i)); end fprintf(1,'sig_r: %g\n',s.sig_r); rate(1): 0.965868 +- 0.0402616 rate(2): 0.00518603 +- 0.000131341 sig_r: 0.00100028
Clearly, both kinetic parameters are well defined with standard errors of less than 5%. Also, all component spectra are clearly resolved, see Figure 4-49. The calculated standard deviation of the residuals matches the noise level of the generated data sets. To investigate the advantage of global analysis, we also analysed the two data sets individually. The quality of the fits, as represented by σr, are essentially the same. The main difference is that standard deviations of the parameters are substantially larger with errors between 25% and 50% and calculated parameters can be fairly off.
188
Chapter 4
absorbance
3 concentration
1.5 1 0.5 0 0
200 time
0.4 0.3 0.2 0
absorptivity
-3
400
absorbance
concentration
x 10
200 time
400
500 wavelength
600
x 10
-3
2 1 0 0
100 200 time
300
100 200 time
300
0.4 0.3 0.2 0
200 100 0 400
Figure 4-49. Global analysis of two data sets. Top panels: concentration profiles; middle panels: representative fits; bottom panel: absorption spectra.
First data set only: rate(1): 2.42965 +- 0.962225 rate(2): 0.00262281 +- 0.00137306 sig_r: 0.000997811
Second data set only: rate(1): 1.13265 +- 0.362335 rate(2): 0.00460243 +- 0.00116169 sig_r: 0.000982706
Note, that for the individual analyses, one of the two component spectra of species A or B needs to be set as colourless, i.e. non absorbing (see Known Spectra, Uncoloured Species, p.175), as the matrices C1 and C2, containing the concentration profiles, are rank deficient for this kinetic model and hence their pseudo-inverse C+ is not defined.
Model-Based Analyses
189
A few changes in the programs were required to allow the analysis of one individual data set. These lines are highlighted within the main routine Main_Glob.m, s.known=[0 1 0]; s.A_k=[0*s.A_sim(2,:)];
% known spectra % define known spectra
and in the function Data_Glob.m. s.c_0={[3e-3 1.6e-3 0]}; s.t={[0:10:300]'};
% initial conc A,B,C, 2nd data set % times, 2nd data set
Again, for the computation of the standard deviation σr the degrees of freedom, df, need to be adapted according to the number of data sets i.e. individual measurements, nm, comprising Yglob. Referring to equations (4.76) , (4.83) and (4.89), the number of degrees of freedom, df, is now given by df =
nm
∑ nsi × nl − (np + nu × nl )
(4.96)
i =1
where nsi denotes the number of spectra in the i-th data matrix Yi.
4.3.2 Non-White Noise, χ2-Fitting The actual noise distribution of the data is often not known. The most common response is to ignore this fact and assume a normal, white distribution of the noise. Even if the assumption of white noise is incorrect, it is still useful to perform the least-squares fit. There is no real alternative and the results are generally not too wrong. White noise signifies that the experimental standard deviation, σy, is normally distributed and the same for all individual measurements, yi,j. The traditional least-squares fit delivers the most likely parameters only under the condition of white noise. If, however, the standard deviations, σyi,j, for all elements of the matrix Y are known or can be estimated reliably, it does make sense to use this information in the data analysis. Then, instead of the sum of squares, it is the sum of all appropriately weighted and squared residuals that has to be minimised. This is known as the ‘chi-square’ or χ2 -fitting. If the data matrix Y has the dimensions ns×nl, χ2 is defined by ⎛ ri , j χ = ∑∑⎜ ⎜ i =1 j =1 ⎝ σy i , j 2
ns nl
⎞ ⎟ ⎟ ⎠
2
(4.97)
If all σyi,j are the same (white noise), the calculated parameters of the χ2 -fit are the same as for least-squares fitting. If the σyi,j are not constant across the data set, the least-squares fit over-emphasises those parts of the data with high noise.
190
Chapter 4
Linear χ2-Fitting
For the linear least-squares analysis of a monovariate data set equation (4.97) reduces to ⎛ r χ = ∑⎜ i ⎜ i =1 ⎝ σyi 2
ns
⎞ ⎟ ⎟ ⎠
2
(4.98)
Recall equation (4.19) ycalc=Fa. To achieve minimal χ2 in a linear regression calculation, all we need to do is to divide each element of y and of the column vectors f:,j by its corresponding σyi to result in the weighted vectors yw and fw:,j. yi σyi
ywi =
(4.99)
fi, j
f wi , j =
σyi
Or in Matlab y_w=y./sig_y F_w(:,j)=F(:,j)./sig_y
Figure 4-50 displays the original and weighted situations. y
r
f:,1
ssq
ycalc=F a f:,2
yw
χ2 rw
fw
:,1
ycalc,w=Fw a fw
:,2
Figure 4-50. Linear ssq and
χ2
fitting.
Note that all vectors changed direction and length. Each element of the vectors is multiplied by its individual σyi, the vectors are not just multiplied by a constant factor. However, the orthogonality relationship between the
Model-Based Analyses
191
weighted residuals and the weighted base vectors is maintained and the equivalent equation to (4.26) performs the linear regression a = (Fwt Fw )−1 Fwt y w
(4.100)
or a=
Fw+
yw
or in Matlab a=F_w\y_w
Instead of continuing with a monovariate example of this kind, we immediately proceed to multivariate data and revert to our standard equation Y=CA. Light emission spectroscopy provides a good example for data with nonuniformly distributed, but well defined, noise. This is in contrast to absorption measurements, where the noise is usually adequately described by white noise. In light emission, the noise is directly related to the intensity of the light. If there is no light there is zero signal with zero noise; at high emission the noise is high as well. Particularly in photon counting applications, there is a simple relationship between the number of counted photons, count, and its standard deviation σcount = count
(4.101)
As an example for linear χ2-fitting, we analyse time resolved emission spectra of a mixture of two components with overlapping emission spectra and similar lifetimes. In the 'experiment', the molecules in solution are excited with a very short flash and the emission intensity is measured at many wavelengths, as a function of time. Below is the function Data_Emission.m to generate the data. Note that experimental emission decays are exponentials and that the lifetime τ is used instead of the more customary rate constant in kinetics. Also, we use the notation C representing the concentration of the exited states, not the 'normal' concentration. MatlabFile 4-49. Data_Emission.m function s=Data_Emission % 2-component emission data s.t s.ns s.tau
=[0:1:100]'; =length(s.t); =[10; 30];
% reaction times % number of spectra % life times
s.C_sim(:,1)=exp(-s.t/s.tau(1)); s.C_sim(:,2)=exp(-s.t/s.tau(2)); s.lam=400:10:800;
% concentrations of A % conc. of B % wavelengths
192
Chapter 4
s.A_sim(1,:)=500*gauss(s.lam,500,100); s.A_sim(2,:)=800*gauss(s.lam,600,100); s.nl=length(s.lam); Y0=s.C_sim*s.A_sim;
% Emission spectrum of A % Emission spectrum of B % number of wavelengths
% noise-free data
randn('seed',0); % fixed start for random number generator s.Sig_y=0.5*Y0; % noise proportional to signal %s.Sig_y=sqrt(Y0); % noise proportional to root of signal s.Y=Y0+s.Sig_y.*randn(size(Y0)); plot(s.t,Y0(:,20),s.t,s.Y(:,20),'.'); xlabel('time');ylabel('emission');
1400 1200 1000
emission
800 600 400 200 0 -200 0
20
40
60
80
100
time
Figure 4-51. Noise free (−) and highly noisy (•) emission decay at one particular wavelength.
The noise level used to generate the data shown in Figure 4-51 is proportional to the intensity; it is far too high and not realistic. We use such a level to emphasise the difference between ssq- and χ2-fitting. For a linear fitting exercise, e.g. the calculation of the emission spectra A, we assume to know the lifetimes τ and hence the matrix Csim, which we used for the generation of the measurement. The linear regression has to be performed individually at each wavelength. This is due to the fact that at each wavelength λj the appropriate vector σy:,j is different and each weighted matrix Cw and its pseudo-inverse, needs to be computed independently. There is no equivalent of the elegant A=C\Y notation. MatlabFile 4-50. Main_Emission_lin.m % Main_Emission_lin
193
Model-Based Analyses s=Data_Emission; % chi square (weighted) for j=1:length(s.lam) C_w=s.C_sim./(repmat(s.Sig_y(:,j),1,2)); y_w=s.Y(:,j)./s.Sig_y(:,j); A_w(:,j)=C_w\y_w; R_w(:,j)=y_w-C_w*A_w(:,j); end chi_2=sum(sum(R_w.^2)) % least squares (non-weighted) A=s.C_sim\s.Y; subplot(2,1,1) plot(s.lam,s.A_sim,s.lam,A_w);ylabel('emission'); subplot(2,1,2) plot(s.lam,s.A_sim,s.lam,A);xlabel('wavelength');ylabel('emission'); chi_2 = 4.0551e+003
emission
1000 500 0 -500 400
500
600
700
800
500
600 wavelength
700
800
emission
1000 500 0 -500 400
Figure 4-52. Original (--) and calculated (−) emission spectra as the result of linear regression of very noisy data. Top panel: χ2 – fitting; bottom panel: traditional least-squares fitting.
The figure displays the clearly better defined emission spectra for the χ2 – fitting in the top panel. However, considering the high noise level (refer to Figure 4-51) we have to recognise that even the standard least-squares fit delivers useful results.
194
Chapter 4
MatlabFile 4-51. Main_Emission_lin.m …continued % Main_Emission_lin ...continued subplot(2,1,1) plot(s.t,(s.Y-s.C_sim*A_w)./s.Sig_y,'.k');ylabel('r'); subplot(2,1,2) plot(s.t,s.Y-s.C_sim*A,'.k');xlabel('time');ylabel('r');
Figure 4-53. Top panel: weighted residuals with constant standard deviation of one; bottom panel: uneven distribution of residuals.
The standard deviation of the weighted residuals is equal to one, see Figure 4-53, and χ2 is approximately equal to the number of elements in Y minus the number of fitted parameters, here the number of elements in A, i.e. equal to the degrees of freedom. χ2 ≈ ns × nl − nc × nl
(4.102)
In our example the numbers are ns=101, nl=41, nc=2 and nc×nl=82. Thus we expect χ2 to be 4059, which is very close to the result of the fit (4055). This is a powerful test for the adequateness of the fit. Knowing the standard deviation of the residuals, we know what value for χ2 we can expect. If χ2 is too high, the fit is not good enough. In practice, however, it is difficult to accurately estimate the standard deviations of the errors in the measurement and if χ2 is too large, it could also indicate an under estimation of the standard deviations. Naturally, the argument also works the other way: a χ2 that is too small, necessarily indicates an over-estimation of the standard deviations of the data.
Model-Based Analyses
195
Non-Linear χ2-Fitting
We use the same measurement as in the previous section but this time with a more realistic noise distribution. We replace the line s.Sig_y=0.5*Y0; in Data_Emission.m with the line s.Sig_y=sqrt(Y0);
% noise proportional to root of signal
corresponding to equation (4.101) and now fit the spectra as well as the emission decay lifetimes. Compared with a standard least-squares fit we need to rewrite the linear regression as just explained and additionally, we have to weight the residuals and change the statistics output at the very end. Note that compared to equation (4.102) the degrees of freedom are additionally reduced by the number of non-linear parameters np=2. MatlabFile 4-52. Main_Emission_weighted.m % Main_Emission_weighted s=Data_Emission; % s.fname ='Rcalc_Emission_weighted'; % %s.fname ='Rcalc_Emission'; % file to s.tau=[10;35]; %
get emission data file to calc weighted residuals calc non-weighted residuals start parameter vector
s.par_str={'s.tau(1)'; 's.tau(2)'}; % variable parameters s.par=get_par(s); % collects variable parameters into s.par s=nglm3(s); % call ngl/m % sigma_r sig_r=sqrt(s.ssq/(prod(size(s.Y))-length(s.tau)... -(prod(size(s.A_sim))))); sig_tau=sig_r*sqrt(diag(inv(s.Curv))); % sigma_par for i=1:length(s.par) fprintf(1,'tau(%i): %g +- %g\n',i,s.tau(i),sig_tau(i)); end fprintf(1,'sig_r: %g\n',sig_r); tau(1): 10.0011 +- 0.0805663 tau(2): 29.9019 +- 0.147129 sig_r: 0.9992766
The following routine calculates the weighted residuals: MatlabFile 4-53. Rcalc_Emission_weighted.m function [r_w,s]=Rcalc_Emission_weighted(s) s.C(:,1)=exp(-s.t/s.tau(1)); s.C(:,2)=exp(-s.t/s.tau(2));
% concentrations of A % conc. of B
for j=1:length(s.lam) C_w=s.C./(repmat(s.Sig_y(:,j),1,2)); y_w=s.Y(:,j)./s.Sig_y(:,j); s.A(:,j)=C_w\y_w; end R=s.Y-s.C*s.A; R_w=R./s.Sig_y; r_w=R_w(:); s.ssq=sum(r_w.*r_w); % sum of squared weighted residuals
196
Chapter 4
The results of a standard non-weighted least-squares fit, i.e. setting all Sig_y(i,j)=1 (in Rcalc_Emmission_weighted.m), are similar; the main difference in the results are the standard deviations of the parameters: tau(1): 10.0822 +- 0.111536 tau(2): 30.1539 +- 0.135565 sig_r: 8.6782
One could argue that the differences in the standard deviations are insignificant and that there is no real advantage in doing χ2 fitting. In order to shed some additional light onto the situation, we performed 1000 χ2 and 1000 least-squares fits using the same data generated with different seeds for the random number generator. Figure 4-54 displays the distributions of the fitted parameters. The white bars represent the χ2 fitting results, the black ones the least-squares results. The means of all four distributions are essentially correct, except for the standard deviations of these distributions being slightly narrower for the χ2 fitting. (χ2 fits: τ1=10.01 ± 0.08, τ2=30.00 ± 0.15; ssq-fits: τ1=9.98 ± 0.18, τ2=29.98 ± 0.22). The difference is hardly breathtaking. 300 200 100 0 9
9.5
10 tau(1)
10.5
11
29.5
30 tau(2)
30.5
31
300 200 100 0 29
Figure 4-54. Distributions of the fitted parameters for 1000 experiments. Top panel τ1 with a true value of 10 and bottom panel τ2 with a true value of 30. The white bars represent the χ2 fits, the black bars the ssq-fits.
Figure 4-55 displays the distributions of the computed standard deviations resulting from each fit. We expect these standard deviations to be similar to the standard deviation of the distributions given above. The coincidence is perfect for the χ2 fitting, as indicated by white bars and arrows, while the
197
Model-Based Analyses
standard least-squares fitting deviations of the parameters.
seriously
underestimates
the
standard
600 60 0 400 40 0 200 20 0 0
0.0 0. 08
0.1 0. 1
0.1 0. 12
0.1 0. 14
0.12 0. 12 0.1 0. 14 sig si gta (1)) (1 tau u
0.1 0. 16
0.18 0. 18
300 30 0 200 20 0 100 10 0 0
0.16 0.16 0.18 0. 18 sig si gta (2)) (2 tau u
0.2 0. 2
0.22 0. 22
Figure 4-55. Distributions of the calculated standard deviations of the fitted parameters for 1000 experiments. Top panel for τ1, bottom panel for τ2. The white bars represent the χ2 fits, the black bars the ssq-fits; the arrows represent the standard deviations of the distributions of Figure 4-54.
4.3.3 Finding the Correct Model We have mentioned earlier that finding the correct model to fit the data is much more difficult than fitting the data to a given model. Whether the model is right or wrong is not relevant from the point of view of the fitting algorithm. Usually, chemical intuition will guide the choice of models but there is not always one unique model that can be used. Statistical tests are the only way then to distinguish the different options. Ockham's razor will always be the guiding principle. It states that the simplest model adequately fitting the data is the 'best' one, or more accurately, the one to accept. As a general rule, the more parameters are fitted, the smaller ssq will be. In the ideal case, the decrease is significant until the correct complexity of the model is reached and only marginal thereafter. In real life there is often no well-defined change in the decrease of ssq when the correct model is reached. Statistical analysis is well established for the ideal case of pure white noise. Under such conditions the development of the correct model can be guided
198
Chapter 4
by pure statistical analyses. Real data suffer from systematic errors that are much harder to manage. The Newton-Gauss algorithm delivers standard deviations for the fitted parameters. Ideally, repeating the experiment should result in approximately the same standard deviation for the collection of fitted parameters. This is never the case, at least we have never experienced such a situation in our laboratories. Calculated standard deviations are reasonably useful for the comparison of the parameters of one data set, but they are not accurate estimates for the standard deviation of the experimental distribution of the parameters. On a very different note, in Chapter 5, Model-Free Analyses, we introduce methods that attempt a model-free analysis of the data. Typically, a matrix Y is automatically decomposed into the product of the matrices C and A. These analyses usually are not as robust as the fitting discussed in this chapter, however, the results can guide the researcher in finding the correct model.
4.4 General Optimisation
4.4.1 The Newton-Gauss Algorithm In all the different versions of the Newton-Gauss algorithms we have used so far, we have not directly minimised the sum of squares! The iterative process was driven by computation of the shift vector for the parameters by the equation δp=-J+r(p). The sum of squares was only used to monitor progress of the fitting and to formulate the termination criterion. Methods that work directly on ssq are often called direct methods. We will see later that for non linear least-squares fitting, the Newton-Gauss algorithms developed so far, are superior. However, there are many optimisation tasks that are of a different nature. In chemistry, data fitting is probably the most important application, but by no means the only one. This chapter provides additional insight into fitting algorithms and also allows expansion of the programs for more general optimisation tasks. It is worth noting that optimisation includes minimisation and maximisation. They are fundamentally identical and one of the two can always be formulated as just the negative of the other. We start with a simple example. Consider the function y = cos(x )/(log(x ) + π) , as introduced in equation (3.69) and Figure 3-21. In Chapter 3.3.4, Solving Non-Linear Equations we solved the equation y = cos(x )/(log(x ) + π) = 0 . Now the task is to find the value of x at a minimum of the function (near x=10). Clearly another non-linear problem and the first thought might be: develop the function into a Taylor series, truncate after the first two elements… etc. This would be equivalent to drawing a tangent at a point on the curve and this does not result in anything useful. The tangent does not have a minimum that could be used as an improved guess.
199
Model-Based Analyses
For the present task, we need to keep an extra term in the Taylor expansion f (x + δx ) = f (x ) + f '(x ) × δx +
1 f ''(x ) × δx 2 2
(4.103)
which is the equation for a parabola. The idea is to approximate the function at the initial point with a parabola, compute the minimum of the parabola and use it as an improved guess for the position of the minimum of the function. An iterative process imposes itself. Refer to Figure 4-56, where the parabola that approximates the curve at x=5 is drawn.
0.2
y
0.1
0
-0.1
-0.2 0
10
20
30
40
50
x
Figure 4-56. A parabola is fitted to the function
y = cos(x )/(log(x ) + π) at x=5.
Obviously, we have to make sure that the initial position for the parabola is sensible. In any iterative process the choice of initial guesses is important. Fitting a parabola at x=30 does not result in an improvement; also recall Figure 3-21. Instead of developing a program that performs the task as just explained, we move to the 2-parameter case. Subsequently, we generalise to the np parameter case and then we analyse the relationship with the Newton-Gauss algorithm for least-squares fitting.
200
Chapter 4
For the 2-parameter case, consider a function of the kind represented in Figure 4-5, where we plotted the sum of squares as a function of two parameters. In analogy to Figure 4-56, we start with an initial guess for the parameters p1 and p2 and at this point compute the first and second derivatives ∂z ∂z ∂ 2z ∂ 2z ∂ 2z , , , and 2 2 ∂p1 ∂p2 ∂p1 ∂p2 ∂p1∂p2
(4.104)
Note, the generalisation of the nomenclature, using z instead of ssq. The parabolic surface approximates z at the point p1/p2. The quality of the approximation decreases with increasing distance from p1/p2. Having determined the first and second derivatives, either explicitly or numerically, the minimum of the parabolic surface has to be localised. The general equation for a parabolic surface is z = a1 + a2 p1 + a3 p2 + a 4 p12 + a5 p22 + a6 p1 p2 = a1 + [a2
⎡p ⎤ a3 ] ⎢ 1 ⎥ + [ p1 ⎣ p2 ⎦
⎡ ⎢ a4 p2 ] ⎢ ⎢ a6 ⎢⎣ 2
a6 ⎤ 2 ⎥ ⎡ p1 ⎤ ⎥⎢ ⎥ p a5 ⎥ ⎣ 2 ⎦ ⎥⎦
(4.105)
The first derivatives are ∂z = a2 + 2a 4 p1 + a6 p2 ∂p1 ∂z = a3 + 2a5 p2 + a6 p1 ∂p2 (4.106) ⎡ ∂z ⎤ ⎡ ⎢ ∂p ⎥ ⎡a ⎤ ⎢a4 ∂z 1 2 ⎥ = ⎢ ⎥ + 2⎢ = ⎢ ∂p ⎢ ∂z ⎥ ⎣a3 ⎦ ⎢ a6 ⎢ ∂p ⎥ ⎢⎣ 2 ⎣ 2⎦ and the second derivatives:
a6 ⎤ 2 ⎥ ⎡ p1 ⎤ ⎥⎢ ⎥ p a5 ⎥ ⎣ 2 ⎦ ⎥⎦
201
Model-Based Analyses
∂ 2z = 2a 4
∂p1 2 ∂ 2z = 2a 5
∂p22 ∂ 2z ∂ 2z = = a6 ∂p1∂p2 ∂p2∂p1 ⎡ ∂ 2z ⎢ 2 ∂ 2z ⎢ ∂p1 = ∂p2 ⎢ ∂ 2z ⎢ ⎣⎢ ∂p1∂p2
(4.107)
∂ 2z ⎤ ⎡ ⎥ ⎢a4 ∂p1∂p2 ⎥ 2 = ⎢ ∂ 2z ⎥ ⎢ a6 ⎥ ⎢⎣ 2 2 ∂p2 ⎦⎥
a6 ⎤ 2⎥ ⎥ a5 ⎥ ⎥⎦
At the minimum, we know the two first derivatives to be zero: ⎡ ∂z ⎤ ⎡ ⎢ ∂p ⎥ ⎡a ⎤ ⎢ a4 1 2 ⎢ ⎥ = ⎢ ⎥ + 2⎢ ⎢ ∂z ⎥ ⎣a3 ⎦ ⎢ a6 ⎢ ∂p ⎥ ⎢⎣ 2 ⎣ 2⎦
a6 ⎤ 2 ⎥ ⎡ p1 ⎤ = 0 ⎥⎢ ⎥ p2 ⎦ min ⎥ ⎣ a5 ⎥⎦
(4.108)
Thus ⎡ a ⎡ p1 ⎤ 1⎢ 4 = − ⎢ ⎢p ⎥ 2 ⎢ a6 ⎣ 2 ⎦ min ⎣⎢ 2
a6 ⎤ 2⎥ ⎥ a5 ⎥ ⎥⎦
−1
⎡a2 ⎤ ⎢a ⎥ ⎣ 3⎦
(4.109)
In order to compute the minimum we need to know the coefficients a2 to a6. The polynomial coefficients a2 and a3 are defined by the first derivatives, equation (4.106); the coefficients a4, a5 and a6 are defined by the second derivatives, equation (4.107). It is of course possible to generalise the above equations for any number of parameters. Having a column parameter vector p of length np, we can write: z(p) = a1 + a 2p + pt A 3 p ∂z = a2 + 2A 3 p ∂p ∂ 2z = 2A 3 ∂p2 1 pmin = − A -1 3a 2
(4.110)
2
202
Chapter 4
where the polynomial coefficients are collected in a scalar a1, a row vector a2 of length np and a matrix A3 of size np×np. Compare equations (4.110) with equations (4.105)-(4.109) for the 2-parameter case. The next question: How does this compare with what we discussed in Chapter 4.3, Non-Linear Regression? Replacing z in the preceding equations with the sum of squared residuals, ssq, the first derivatives m
∑ ∂ri (p)2
∂ssq = ∂p j
i =1
∂p j m
= ∑2 i =1
∂ri (p) ri (p) ∂p j
= 2 j:,t j r
(4.111)
thus ∂ssq = 2J t r ∂p
turn out to be the product 2Jt r. The second derivatives, or the Hessian:
∂ 2ssq ∂ 2ssq ∂ssq = = ∂p j ∂pk ∂pk ∂p j ∂pk =
∂ssq ∂pk
⎛ ∂ssq ⎞ ⎜ ⎟ ⎜ ∂p j ⎟ ⎝ ⎠
⎛ m ∂r (p) ⎞ ri (p) ⎟ ⎜∑2 i ⎜ i =1 ∂p j ⎟ ⎝ ⎠
m ⎛ ∂ r (p) ∂ri (p) ∂ 2ri (p) ⎞ = 2∑ ⎜ i + ri (p) ⎟ ⎜ ∂pk ∂p j ∂pk ⎟⎠ i =1 ⎝ ∂p j m ⎛ t ∂ 2ri (p) ⎞ j:,k + ∑ ri (p) = 2 ⎜ j:,j ⎟ ⎜ ∂p j ∂pk ⎟⎠ i =1 ⎝ thus
(4.112)
∂ 2ssq ≈ 2Jt J ∂p2 turns out to be almost 2JtJ. JtJ is an approximation for the curvature matrix. It is approximately 0.5 times the Hessian matrix of second derivatives of ssq with respect to the m ∂ 2ri (p) , which is the parameters. What can we say about the term ∑ ri (p) ∂p j ∂pk i =1
Model-Based Analyses
203
difference between JtJ and the Hessian matrix? It is the sum of the products of the residuals, times the second derivatives. Close to the minimum, the elements of r are approximately randomly distributed around zero, with similar numbers of positive and negative elements. Thus the sum of the products approximately cancels and the term is small. What is the effect on the iterative refinement of the parameters? The minimum is defined by Jtr=0. The curvature matrix is only required to guide the iterative process towards the minimum and thus the approximation, JtJ, for the curvature matrix does not compromise the exact location of the minimum. The approximation only results in a slightly different path taken by the algorithm towards the minimum. Ignoring the terms m ∂ 2r (p) ∑ ri (p) ∂p i∂p generally does not affect the iterative process in a seriously i =1 k j negative way. It turns out that the Newton-Gauss algorithm as introduced in Chapter 4.3.1 The Newton-Gauss-Levenberg/Marquardt Algorithm for the minimisation of the sum of squares is a very elegant and fast way of approximating the curvature matrix, as it only requires the computation of the first derivatives. Minimising the sum of squares directly, as defined by the equation (4.110) and its predecessors, is much more computationally intensive, as it requires the calculation of the second derivatives. Within chemistry, there are not many applications of function minimisation that are not sum-of-squares minimisations. This is why we do not supply a Matlab program that optimises general functions based on the NewtonGauss algorithm, nor do we supply one for the numerical calculation of second derivatives. Additionally, Matlab provides fminunc for unconstrained function optimisation in the Optimisation Toolbox. Fminunc has many options that allow optimal usage for a wide range of optimisation problems. Note also that the Newton-Gauss algorithm for function optimisation is the standard option in Excel's solver. The following few equations recapitulate the relationship between our 'original' least-squares formulas, introduced in Chapter 4.3, Non-Linear Regression, and those developed here. The first derivatives are: ∂ssq = a2 + 2 A 3 p = 2 Jt r ∂p
(4.113)
and the second derivatives: ∂ 2ssq = 2 A 3 ≈ 2 Jt J ∂p2 and now the calculation of the shift vector δp:
(4.114)
204
Chapter 4
δp = −(Jt J ) −1J t r ≈ −
1 -1 1 A 3 (a 2 + 2 A 3 p) = − A -1 3 a2 − p 2 2
thus
pmin
(4.115) 1 = p + δp = − A -1 3 a2 2
which is the same as equation (4.110).
4.4.2 The Simplex Algorithm The simplex algorithm is conceptually a very simple method. It is reasonably fast for small numbers of parameters, robust and reliable. For high dimensional tasks with many parameters, however, it quickly becomes painfully slow. Also, the simplex algorithm does not deliver any statistical information about the parameters, e.g. it is possible to 'fit' parameters that are completely independent of the data. The algorithm delivers a value without indicating its uselessness. A simplex is a multidimensional geometrical object with n+1 vertices in an n dimensional space. In 2 dimensions the simplex is a triangle, in 3 dimensions it is a tetrahedron, etc. The simplex algorithm can be used for function minimisation as well as maximisation. We formulate the process for minimisation. At the beginning of the process, the functional values at all corners of the simplex have to be determined. Next the corner with the highest function value is determined. Then, this vertex is deleted and a new simplex is constructed by reflecting the old simplex at the face opposite the deleted corner. Importantly, only one new value has to be determined on the new simplex. The new simplex is treated in the same way: the highest vertex is determined and the simplex reflected, etc. The process is illustrated in Figure 4-57. In the initial simplex the highest value is 14 (we are searching for the minimum) and the simplex has to be reflected at the opposite face, marked in grey. A new functional value of 7 is determined in the new simplex. The next move would be deletion of corner 11 and reflection at the face (8,9,7).
205
Model-Based Analyses
9
14
9
8
reflection at the grey face
11
8
7
11
Figure 4-57. The original simplex is reflected at the grey face opposite the corner with the highest value (14). In the simplex algorithm, the size of the simplex plays an important role. If the simplex is too large, fine details are not covered. If it is too small, progress towards the optimum is painfully slow. In a well designed algorithm, its size should be fairly large at the beginning, as we want to proceed in big steps towards the minimum, but it should be small near the minimum, as we want an accurate resolution of the minimum. The simplex algorithm, implemented by Matlab, has, in addition to the reflection steps just introduced, additional expansion and contraction steps. The simplex moves fast, in growing steps, towards the minimum in those parts of the function that are unstructured, simple slopes. It shrinks in narrow valleys and close to the minimum. We do not design our own algorithm here but use the fminsearch.m function supplied by Matlab. It is based on the original Nelder, Mead simplex algorithm. As an example, we re-analyse our exponential decay data Data_Decay.m (see p.106), this time fitting both parameters, the rate constant and the amplitude. Compare the results with those from the linearisation of the exponential curve, followed by a linear least-squares fit, as performed in Linearisation of Non-Linear Problems, (p.127). The arguments passed into fminsearch are the name of the function that delivers the function value for the parameters, initial guesses for the parameters to be fitted, an empty matrix (here specific minimisation arguments could be included, refer to the manual for more details), and the actual data t and y. fminsearch returns the optimal parameters. MatlabFile 4-54. Main_Decay_Simplex.m %Main_Decay_Simplex [t,y]=Data_Decay; par=fminsearch('SsqCalc_Decay',[10 0.02],[],t,y) ssq=SsqCalc_decay(par,t,y) par = 106.8371 0.0540 ssq = 4.3885e+003
206
Chapter 4
MatlabFile 4-55. SsqCalc_Decay.m function ssq=SsqCalc_Decay(par,t,y) I_0=par(1); k=par(2); y_calc=I_0*exp(-k*t); r=y-y_calc; ssq=sum(r.*r);
In Figure 4-58 two simplex paths used by fminserch are represented. One is starting from [200 0.04] and the other from [10 0.02]. The moves of the simplex are clearly visible: they grow in size at the beginning and shrink towards the end, close to the minimum. 0.2 0.18 0.16 0.14
k
0.12 0.1 0.08 0.06 0.04 0.02 0 0
50
100 I
150
200
0
Figure 4-58. Two paths of the simplex for the fitting of the decay data. Starting parameters are [200 0.04] and [10 0.02] As the second example, we re-analyse the consecutive reaction A→B→C, Data_ABC.m, where data were 'acquired' at many wavelengths. As outlined in Multivariate Data, Separation of the Linear and Non-Linear Parameters, (p.162), it is crucial to eliminate the linear parameters by calculating the matrix A of molar absorptivities as a function of C and thus the rate constants. In fact, the function SsqCalc_ABC is almost identical to Rcalc_ABC (p.167). The only difference concerns the sum of squares, ssq, which is now returned instead of the residuals.
Model-Based Analyses
207
MatlabFile 4-56. Main_ABC_simplex.m % Main_ABC_Simplex [time,lam,Y]=Data_ABC; A_0=1e-3; k0=[0.01;0.001];
% get absorbance data % start parameter vector
[k,ssq] = fminsearch('SsqCalc_ABC',k0,[],A_0,time,Y); sig_r=sqrt(ssq/(prod(size(Y))-length(k)));% sigma_r for i=1:length(k) fprintf(1,'k(%i): %g\n',i,k(i)); end fprintf(1,'sig_r: %g\n',sig_r); k(1): 0.00301105 k(2): 0.00149269 sig_r: 0.0100068 MatlabFile 4-57. SsqCalc_ABC.m function ssq=SsqCalc_ABC(k,A_0,t,Y) C(:,1)=A_0*exp(-k(1)*t); % concentrations of A C(:,2)=A_0*k(1)/(k(2)-k(1))*(exp(-k(1)*t)-exp(-k(2)*t)); % conc. of B C(:,3)=A_0-C(:,1)-C(:,2); % concentrations of C A=C\Y; % elimination of linear parameters R=Y-C*A; % residuals ssq=sum(sum(R.*R));
The results of the simplex optimisation are essentially the same as those produced by the Newton-Gauss fit on p.166. The main difference is the lack of the standard deviations of the parameters and longer computation times.
4.4.3 Optimisation in Excel, the Solver The Excel Solver Add-In is a very powerful tool. We have already used it to solve systems of non-linear equation, see Chapters 3.3.3 Solving Complex Equilibria and 3.3.4 Solving Non-Linear Equations. The Solver includes optimisation as one of the options. Its main application, within this chapter on data analysis, is data fitting, based on the minimisation of sum of squares. There is little information available in the Excel documentation about the algorithms and techniques used by Excel. The algorithms used are not well described but this is irrelevant for most users. In a few representative examples we demonstrate the ability and power of the Solver for non-linear data fitting tasks. Several examples are based on the fitting tasks already solved by the Newton-Gauss-Levenberg/Marquardt method in the earlier parts of this chapter. In the first example we re-analyse the double Gaussian from Data_Chrom.m (p.158).
208
Chapter 4
In the Excel spreadsheet of Figure 4-59, the columns A and B contain the data, the time vector and the vector y of measurements. Columns C and D contain the individual Gaussians (see Chapter 3.2, Chromatography / Gaussian Curves) as defined by the parameters in the cells I3:J5. Column E is the sum of the two Gaussians, i.e. the model for the double-Gaussian. The squares of the differences between this model and the data are collected in column F. The sum of all these squared residuals makes up the cell I8. With the present initial guesses for the parameters, the two model Gaussians are visible as the dashed lines, the sum of the two as the full line. ExcelSheet 4-4. Chapter3.xls-chrom =I$3*EXP(-(LN(2)*($A3-I$4)^2)/(I$5^2/4)) =SUM(C3+D3) =(B3-E3)^2
=SUM(F3:F52)
Figure 4-59. The 'measured' double Gaussian and the sum of two Gaussians as defined by the parameters in the spreadsheet, prior to the fit. The solver window in Figure 4-60 indicates the set-up: the sum of squares in I8 is minimised as a function of the parameters I3:J5. Make sure the Min radio button is selected before hitting the Solve button.
Model-Based Analyses
209
Figure 4-60. The solver window set for the task of fitting the parameters in Figure 4-59.
Figure 4-61. The fitted double Gaussian and its parameters. The parameters determined by the Solver are virtually identical with those determined by the Newton-Gauss-Levenberg/Marquardt fit; see Main_Chrom.m, (p.158). In the next example, we re-analyse the consecutive reaction, Data_ABC.m (p.143)and (p.165). This time however, we use fewer data in order to keep the Excel spreadsheet reasonably compact. The important concept of treating linear and non-linear parameters separately can be implemented in Excel as well.
210
Chapter 4
ExcelSheet 4-5. Chapter3.xls-kinetics
=SUMXMY2(E3:O13,E21:O31)
=MMULT(MINVERSE(MMULT(TRANSPOSE(A21:C31) ,A21:C31)),MMULT(TRANSPOSE(A21:C31),E3:O13)) =MMULT(A21:C31,E16:O18)
=Q$4-A21-B21 =Q$4*R$4/(S$4-R$4)*(EXP(-R$4*A3)-EXP(-S$4*A3)) =Q$4*EXP(-R$4*A3)
Figure 4-62. Spreadsheet for the fitting of the reaction scheme A→B→C to multivariate data. The spreadsheet in Figure 4-62 is heavily matrix based (see Chapter 2, for an introduction to basic matrix functions in Excel). It is the only way to keep the structure reasonably simple. The matrix C in cells A21:C31 is computed in the usual way, see equation (4.63); the parameters required to compute the concentration matrix are in cells Q4:S4, they include the initial concentration for species A and the two rate constants that are to be fitted. In cells E16:O18 the computation of the best absorptivity matrix A for any given concentration matrix C, is done as a matrix equation, as demonstrated in The Pseudo-Inverse in Excel (p.146). Similarly the matrix Ycalc in cells E21:O31 is written as the matrix product CA. Even the calculation of the square sum of the residuals in cell R7 is written in a compact way, using the Excel function SUMXMY2, especially designed for this purpose. We refer to the Excel Help for more information on this and similar functions. The small plots below the data in Figure 4-62 show, from left to right, the calculated concentration profiles, the absorption spectra and fits at three selected wavelengths. The spreadsheet in Figure 4-62 is taken before the fitting. Application of the solver results in the rate constants 0.0031 and 0.00161. Due to the smaller number of data, they are not as well defined as the results of the analysis of the complete data set (p.165).
Model-Based Analyses
211
Figure 4-63. The solver window for the fitting of the rate constants in Figure 4-62. To prevent an interchange of the two rate constants, the constraint k1≥k2 has been added (see Constraint: Positive Component Spectra, p.168).
χ2-Fitting in Excel As a last example, we demonstrate the versatility of the Solver by performing a χ2-fitting of the emission data taken from Data_Emission.m (p.191). In order to keep the Excel spreadsheet reasonably concise, we selected the data at one wavelength only (500nm). At this wavelength, the correct amplitudes for the two species are 500 and 50 with lifetimes of 10 and 30 time units. Data are available from times 0 to 100 time units. Figure 4-64 displays the results of the χ2-fitting on the left and of normal least-squares fitting on the right. The distribution of the weighted residuals is uniform with a mean of 0 and a standard deviation of 1; the distribution of the residuals from ssq-fitting is non-uniform. More importantly, the fitted parameters are significantly closer to the correct values for the χ2-fitting. While this analysis seems to be straightforward, there are a few issues that deserve closer attention. How do we define the standard deviation of the error in the measurement? For the χ2-fitting in Non-White Noise, χ2-Fitting (p.189), we assumed to know the standard deviations of the errors from some independent source and used these values for the weighting of the residuals. As given in equation (4.101), for photon counting experiments the standard deviation of the error of a reading equals the square root of the reading. Thus the root of the measurement is an estimate of the error and the entry in e.g. cell D11 of the spreadsheet shown in Figure 4-64 should be sqrt(B11). For low intensities, the reading reaches 0 and so too will its estimated error, and the weighting of this reading results in a division by 0
212
Chapter 4
error. In the spreadsheet, we assign an error of 1 for a reading of 0, i.e. 0±1. This issue could be avoided if the error was computed as the square root of the calculated value (e.g. using cell C11 instead of B11). For the correct analysis, this would be reasonable, but with wrong models and poor fit, the estimated errors will be wrong. ExcelSheet 4-6. Chapter3.xls-emission
=SUM(F11:F110) =$B$2*EXP(-A11/$B$1)+$B$4*EXP(-A11/$B$3)
=$I$2*EXP(-A11/$I$1)+$I$4*EXP(-A11/$I$3)
=IF(B11>0,SQRT(B11),1) =(B11-C11)/D11 =E11^2
=(B11-H11) =I11^2 =I11^2
Figure 4-64. The result of the χ2-fit (left) and sum-of-squares fit (right). Note the uniformly distributed weighted residuals for the χ2 fit. An additional observation for photon counting data: there are no fractions of photons and thus the count can only include integer numbers. Thus the 'measurements' in column B are rounded down to the nearest integer. It seems to be reasonable to do the same with the calculated values in column C. However, a test in Excel reveals that such an attempt does not work. The reason is, that the solver's Newton-Gauss algorithm requires the computation of the derivatives of the objective (χ2 or ssq) with respect to the parameters. A rounding would destroy the continuity of the function and effectively wipe out the derivatives.
5 Model-Free Analyses In the preceding Chapter 4, Model-Based Analyses, we investigated how a given measurement is analysed based on a predetermined model. The model can be a simple mathematical function, such as a polynomial that is fitted to a mono-variate data set; it can also be a complex chemical system, such as an oscillating chemical reaction. In Chapter 4, we provided a range of Matlab routines for this task. In Chapter 4.3.3, we touched on the subject of determining the right model to be fitted, say the degree of the polynomial or the exact chemical reaction mechanism in kinetics or the correct equilibrium model in a titration experiment. This task, unfortunately, is a very difficult one. There are no generally applicable tools available to guide the researcher towards finding the model that correctly describes the chemical process under investigation. Model-fitting is much easier than model-finding. There is a good collection of methods available for performing so-called model-free or soft-modelling analyses. What exactly does this mean? Why should anyone be bothered with trying to find the correct physical/chemical model for successful data fitting, if there are model-free analyses that deliver a satisfactory results? One drawback of these model-free methods is that they do not deliver crisp and directly useful results such as a set of rate constants in a kinetic investigation or equilibrium constants in a complexation study. Typically, these methods deliver the shapes of the concentration profiles of all reacting components as well as the shapes of their absorption spectra. Such information can be very useful in supplying preliminary information about the system under investigation and ultimately could guide the researcher towards the correct model. In many instances, however, there is no model or mathematical function at all that could be used to quantitatively describe the process under investigation. Then, the concentration profiles, in conjunction with the pure component spectra, are all there is to be extracted from the data. There is nothing that could be added to the results of these model-free analyses. An example is chromatography. There are no generally applicable functions that could form the basis for a model-fitting approach. Model-free analyses are essentially all there is. Secondary analysis, such as a library search of the computed component spectra, is possible and can be useful. Most, but not all model-free methods are based on Factor Analysis and we start this chapter with a fairly detailed and comprehensive discussion of this topic.
5.1 Factor Analysis, FA The term 'Factor Analysis', FA, has a very wide range of interpretations; there is no general agreement of its exact meaning. From an abstract
214
Chapter 5
mathematical point of view, Factor Analysis is easily defined: it is the decomposition of a matrix into a product of two or three or more matrices. Such an interpretation of the term is too general and not very useful, even if any such a decomposition, of course, could be called factor analysis. In 'proper' Factor Analysis, the resulting factor matrices have very particular properties: they are orthogonal matrices − sometimes they are even orthonormal (see Orthogonal and Orthonormal Matrices, p.25). There is still a long list of different interpretations for the expression Factor Analysis. All the meanings of the term can be explained on the basis of the Singular Value Decomposition.
5.1.1 The Singular Value Decomposition, SVD The Singular Value Decomposition, SVD, has superseded earlier algorithms that perform Factor Analysis, e.g. the NIPALS or vector iteration algorithms. SVD is one of the most stable, robust and powerful algorithms existing in the world of numerical computing. It is clearly the only algorithm that should be used for any calculation in the realm of Factor Analysis. According to the SVD any matrix Y can be decomposed into the product of three matrices
Y = USV
(5.1)
Please note that we continue with our preference of not explicitly indicating column and row matrices. It is common to write equation (5.1) as Y=USVt, indicating that V is a column matrix that needs to be transposed for the Singular Value Decomposition. Matlab, too, uses this transposed notation. It is not possible to consistently, logically and generally distinguish between row and column matrices. In many matrices both rows and columns have a particular meanings, e.g. for kinetics, the matrix Y has spectra as rows and kinetic wavelength traces as columns. Y is neither a row nor a column matrix. We decided to drop the transposition of V and write the SVD as in (5.1). The price to pay is that we need to remember that Matlab's SVD routine returns Vt. n
n
n n
m
Y
= m
S
n n
V
U
Figure 5-1. Graphical representation of the Singular Value Decomposition.
Model-Free Analyses
215
The dimensions are as follows: Y is an m×n matrix where m≥n, U is an m×n matrix as well, while S and V are n×n matrices. Matlab delivers the above 'economy sized' dimensions only if the following command is used: [U,S,Vt]=svd(Y,0);
If the 0 is not included in the list of parameters passed into svd ([U,S,Vt]=svd(Y)), the resulting dimensions are U (m×m), S (m×n) and Vt (n×n). The matrices are larger but do not contain additional useful information. The important special properties of the three product matrices U, S and V are the following: S is a diagonal matrix, containing the so-called singular values in descending order. Note that the singular values of real matrices are always positive and real. U and V are orthonormal matrices, which means they are comprised of orthonormal vectors. In matrix notation: U t U = VV t = I
(5.2)
Where I is the identity matrix of dimensions (n×n). A more traditional notation for Factor Analysis is
Y = TL
(5.3)
T is often called the score matrix and L the loadings matrix. The relationship between decompositions (5.1) and (5.3) is T = U S and L = V
(5.4)
L contains normalised rows while T is weighted by the matrix S. This, however, is somewhat ambiguous as the decomposition of the transposed, Yt, is equally possible and then the score and loading matrices are simply exchanged. For this reason, we do not use the expressions 'scores' and 'loadings'. The Singular Value Decomposition maintains some kind of symmetry between the decompositions of Y and Yt. The matrices U and V contain, as columns and rows the eigenvectors of the square matrices YYt and YtY. This can easily be shown using the Singular Value Decomposition
YY t U = USVV t SU t U = US2 = UΛ
(5.5)
Remembering that the eigenvectors of a matrix are those vectors that, when multiplied by the matrix, become multiples of the vectors. As Λ=S2 is a diagonal matrix, each column of the product UΛ is a multiple of U and thus the columns of U are eigenvectors of YYt. The diagonal elements of Λ=S2 are the eigenvalues for the corresponding columns of U. In a similar way we can prove the relationship for V:
216
Chapter 5
Y t YV t = V t SU t USVV t = V t S2 = V t Λ
(5.6)
Let us confirm all of this using a few Matlab lines: MatlabFile 5-1. Main_SVD1.m % Main_SVD1 rand('state',0) Y=rand(4,3) [U,S,Vt]=svd(Y,0) UtU=U'*U VVt=Vt'*Vt
% initialise rand. number generator % random numbers % economy sized svd
Y = 0.9501 0.2311 0.6068 0.4860 U = -0.7164 -0.3821 -0.4557 -0.3649 S = 2.1373 0 0 Vt = -0.5721 -0.5355 -0.6212 UtU = 1.0000 -0.0000 0.0000 VVt = 1.0000 -0.0000 -0.0000
0.8913 0.7621 0.4565 0.0185
0.8214 0.4447 0.6154 0.7919
-0.1651 -0.5813 0.1157 0.7883
-0.5108 0.7183 -0.1563 0.4457
0 0.6248 0 0.2594 -0.8367 0.4823
0
0
0.2538
-0.7781 0.1148 0.6176
-0.0000 1.0000 0.0000
0.0000
0.0000
1.0000
-0.0000 1.0000 -0.0000
-0.0000
-0.0000
1.0000
It is hard to imagine at this stage, how powerful this fairly simple and straightforward Singular Value Decomposition is. An astonishing wealth of information can be extracted from the decomposition. The next few chapters deal with that. As a reminder, there is no model or any chemical knowledge required for the SVD. It is completely automatic. No user input of any kind is required. SVD is the core of almost all model-free methods.
Model-Free Analyses
217
5.1.2 The Rank of a Matrix The rank of a matrix Y is the number of linearly independent rows or columns in this matrix. The columns of Y are linearly dependent if one of the column vectors y:,j can be written as a linear combination of the other columns. The same holds for rows.
y:, j = ∑ y:,i ai i≠ j
y j ,: = ∑ yi ,: bi
(5.7)
i≠ j
ai and bi represent any corresponding coefficients. There is an immediate relationship between rank and the chemical process under investigation. Consider a matrix Y of perfect, noise-free spectra, measured during a chromatographic experiment. Assume there are 3 components with different spectra and all 3 components are at least partially resolved. In this situation, every row of Y is a linear combination of the three component spectra. Thus the maximum number of linearly independent rows is 3 and therefore the rank of the matrix is 3. The same holds for the columns of Y: each chromatogram is a linear combination of the 3 component concentration profiles. The row and column ranks of any matrix Y are always identical. In a casual way one could state that the rank of the matrix equals the number of different species that exist in the mixture. However, such a statement is not generally true and needs to be qualified in several ways: (a) only species that absorb in the wavelength range contribute to the rank, e.g. solvents, electrolytes often do not absorb and thus do not contribute to the rank; (b) the active species need to have distinguishable spectra or more precisely they need to have linearly independent spectra; (c) the concentration profiles need to be linearly independent, i.e. two exactly co-eluting species cannot be distinguished and only one contributes to the total rank. (d) the species need to take part in the process, e.g. they have to change concentrations. Spectator concentration profiles are just a constant and any number of such components will increase the rank by a total of one. (e) the statements above only apply directly for noise-free data. A first question in model-free analysis is: how many components are there in a system? Or, in other words, what is the rank of the matrix Y? In particular, what is the influence of noise? Providing an answer to these questions is a first, extremely powerful result of SVD. Equation (5.1) can be written in a different way
218
Chapter 5 n
Y = ∑ u:,i si ,i v i ,:
(5.8)
i =1
Recall that the singular values si,i are ordered in decreasing magnitude, s1,1 ≥ s2,2 ≥ s3,3 ≥ … sn,n. This means the eigenvectors u:,i in U and vi,: in V continuously lose importance and once the singular values si,i are small enough, their contribution can be ignored altogether. In the ideal noise-free case, all 'small' singular values are zero; with real, noisy data they are 'small'. So, instead of summing over all n terms in equation (5.8), we only sum ne significant terms often referred to as principal components. ne
Y ≈ ∑ u:,i si ,i v i ,:
(5.9)
i =1
There are many advantages in selecting only the significant ne eigenvectors and singular values for the representation of Y. In fact, from now on we only use this selection and introduce an appropriate nomenclature. ne
Y = ∑ u:,i si ,i v i ,: = U S V
(5.10)
i =1
¯, V ¯ and S ¯ contain the significant parts of the total matrices U, S and Where U V. The graphical representation is instructive: n
m
Y
ne
ne ne S
ne
n V
=m U
Figure 5-2. The shapes of the matrices after selecting the significant parts of U, S and V. ¯ and V ¯ are much thinner and S ¯ is smaller. Depending on the The matrices U dimensions of Y and the number of significant eigenvalues, the reduction in the sizes of U, S and V can be dramatic. Real data are never noise-free and in purely mathematical terms, the rank of a noisy data matrix is always the smaller of the number of rows or columns. So, the question obviously is, where do we stop? What is the correct number of independent species or the correct rank of the matrix Y? How many singular values are statistically relevant? Most importantly for the chemist: what is the practical or the chemical rank; how many components are there in the system?
Model-Free Analyses
219
There is extensive literature on this question. We do not examine this subject from the statistical point of view in any detail. Instead, we start with three graphical ways of answering the question and include a crude statistical analysis. Magnitude of the Singular Values It is easiest to examine an example. We generate a set of three overlapping peaks in a chromatogram, add two different levels of noise and analyse the two data sets. MatlabFile 5-2. Data_Chrom2.m function [t,lam,Y,C,A]=Data_Chrom2 lam=400:10:600; A(1,:)=1000*gauss(lam,450,120); % molar component spectra A A(2,:)=2000*gauss(lam,350,120); A(3,:)=1000*gauss(lam,500,50); t=(1:1:100)'; C(:,1)=1e-3*gauss(t,35,15); C(:,2)=9e-4*gauss(t,50,16); C(:,3)=2e-3*gauss(t,70,17); Y=C*A; randn('seed',0); Y=Y+1e-3*randn(size(Y));
% elution profiles C
% absorbance data Y
Figure 5-3 displays two data matrices, used to demonstrate different ways of estimating the rank of a matrix. The top matrix has a noise level of 10-3 and the lower one of 1.01×10-1. The mean of all elements of Y is about 0.2 and the maximum is 2. Thus, the noise levels amount to some 0.5% and 50% of the mean and 0.05% and 5% of the maximal value of Y. MatlabFile 5-3. Main_SVD2.m % Main_SVD2 [t,lam,Y,C,A]=Data_Chrom2; [U,S,Vt]=svd(Y,0); Y1=Y+1e-1*randn(size(Y)); [U1,S1,Vt1]=svd(Y1,0); subplot(2,1,1); mesh(lam,t,Y); axis tight xlabel('wavelength');ylabel('time'); zlabel('abs'); subplot(2,1,2); mesh(lam,t,Y1); axis tight xlabel('wavelength');ylabel('time'); zlabel('abs');
% % % %
generating data do svd add additional noise do another svd
220
Chapter 5
Figure 5-3. Two matrices Y with noise levels of 10-3 and 1.01×10-1. Logarithmic plots of the magnitude of the singular values are often instructive and allow a simple analysis. MatlabFile 5-4. Main_SVD2.m …continued % Main_SVD2, ...continued subplot(2,1,1); plot(1:length(lam),log10(diag(S)),'+'); ylabel('log(S)'); axis([0 25 -3 2]); subplot(2,1,2); plot(1:length(lam),log10(diag(S1)),'x'); ylabel('log(S1)'); axis([0 25 -3 2]);
Plots of the kind shown in Figure 5-4 are often crisp and clear, as is the case in the top panel; sometimes less so, as can be seen in the lower panel. The rank is the number of singular values above the noise level, which is represented by the series of much smaller and usually similar singular values. In both panels the rank can clearly be identified as three. Naturally, the difference between significant and noise singular values is much easier to discern for the measurement with a small noise level than it is for the increased noise level. If significantly more noise is present in the data, the third singular value 'disappears' in the noise and there will only be two significant ones remaining. Eventually, with high enough noise, the second singular value also disappears. It is interesting to observe that the significant singular values are hardly affected by increasing noise, while the noise singular values move up together.
221
Model-Free Analyses
2
log(S)
1 0 -1 -2 -3 0
5
10
15
20
25
5
10
15
20
25
2 log(S1)
1 0 -1 -2 -3 0
Figure 5-4. Log of the singular values for a 3 component chromatogram. Upper panel (+) for the data matrix Y with a noise level of 1×10-3 and lower panel (×) for Y1 with additional noise of 1×10-1.
The Structure of the Eigenvectors In a similar, slightly more complex, eye-based analysis, one can investigate the noisiness of the eigenvectors. The real eigenvectors or principal components are smooth. They have broad structures while noise eigenvectors are wildly oscillating and show no underlying structure. Of course, as before, the difference can be more or less pronounced. We analyse the same data as before: MatlabFile 5-5. Main_SVD2.m …continued % Main_SVD2, ...continued subplot(2,1,1) plot(t,U(:,1:4),'-'); hold on; plot(t,U(:,3),'-','LineWidth',3);hold off; xlabel('time');ylabel('U'); subplot(2,1,2) plot(t,U1(:,1:4),'-'); hold on plot(t,U1(:,3),'-','LineWidth',3);hold off xlabel('time');ylabel('U1');
222
Chapter 5
0.4
U
0.2 0 -0.2 -0.4 0
20
40
60
80
100
60
80
100
time 0.4
U1
0.2 0 -0.2 -0.4 0
20
40 time
Figure 5-5. Different noisiness of eigenvectors resulting from SVD of the data in Figure 5-3. The 3rd eigenvector is highlighted. The upper panel of Figure 5-5 shows a clear distinction between the first three real eigenvectors, while the 4th eigenvector represents pure noise. Note that the third eigenvector is highlighted in both panels. In the lower panel all eigenvectors are noisier, but in particular the 3rd eigenvector only just shows some broad structure that is almost completely hidden by the relatively large amount of noise. Another interesting observation can be made: the signs of the eigenvectors are not defined − they arbitrarily result from the Singular Value Decomposition. Apart from the amount of noise, the matrices Y and Y1 are identical, but the resulting eigenvectors have opposite signs. The Structure of the Residuals
Y in equation (5.9) is a good representation of the original matrix Y, but not identical. There is a residual matrix of decreasing significance the more eigenvectors are used to compute Y . Y = Y+R
(5.11)
Similar to the structure of U and V that reveals the significance of the eigenvectors, the structure of R allows the identification of the correct rank. MatlabFile 5-6. Main_SVD2.m …continued % Main_SVD2, ...continued
Model-Free Analyses
223
for i=0:3 R=Y-U(:,1:i)*S(1:i,1:i)*Vt(:,1:i)'; subplot(2,2,i+1) mesh(lam,t,R); axis tight xlabel('wavelength');ylabel('time'); zlabel(['R(' num2str(i) ')']); end
Figure 5-6. The structure of the residuals after the subtraction of the contribution of the first 0, 1, 2 and 3 eigenvectors, see equations (5.9) and (5.11). In this example, we are analysing the data set with the low noise level and, accordingly, the distinction between structured and noise residuals is crisp and unambiguous. Only noise is left after the subtraction of the contributions of three eigenvectors, equations (5.9) and (5.11). The Standard Deviation of the Residuals As an alternative to observing the structure of the residuals, statistical information about their magnitude is also readily available. In essence, after removing the correct number of eigenvalues, the standard deviation of the residuals should be the same as the noise level of the instrument, or in our case, the level of noise added.
224
Chapter 5
MatlabFile 5-7. Main_SVD2.m …continued % Main_SVD2, ...continued for i=0:5 R=Y-U(:,1:i)*S(1:i,1:i)*Vt(:,1:i)'; sig_R(i+1)=std(R(:)); end sig_R sig_R = 0.3492
0.2085
0.0591
0.0009
0.0009
0.0008
Recall, the standard deviation of the added noise in Y was 1×10-3. It is reached approximately after the removal of 3 sets of eigenvectors (at i=4). Note that, from a strictly statistical point of view, it is not quite appropriate to use Matlab's std function for the determination of the residual standard deviation since it doesn't properly take into account the gradual reduction in the degrees of freedom in the calculation of R. But it is not our intention to go into the depths of statistics here. For more rigorous statistical procedures to determine the number of significant factors, we refer to the relevant chemometrics literature on this topic.
5.1.3 Geometrical Interpretations
Two Components So far in this chapter, all our elaborations were completely abstract; there has been no attempt at an interpretation or understanding of the results of Factor Analysis in chemical terms. Abstract Factor Analysis is the core of most applications of Factor Analysis within chemistry, but, nevertheless, much more insight can be gained than the results of the rank analysis we have seen so far. How can we relate the factors U and V to something chemically meaningful? Very sensibly these factors are called abstract factors, in contrast to real factors such as the matrices C and A containing the concentration profiles and pure component spectra. Is there a useful relationship between U, V, C and A? Let us start with an example: the Matlab function Data_AB.m models the absorption spectra of a reacting solution as a function of time. They are stored as rows of the matrix Y. The reaction is a simple first order reaction A → B as introduced in Chapter 3.4.2, Rate Laws with Explicit Solutions. Recall Beer Lambert's law (Chapter 3.1):
Y = CA MatlabFile 5-8. Data_AB.m function [t,lam,Y,C,A]=Data_AB A_0=1e-3;
% init. conc. A
(5.12)
225
Model-Free Analyses k=2e-2;
lam=400:10:600;
A(1,:)=1000*gauss(lam,450,120); % component spectra
A(2,:)=2000*gauss(lam,350,120);
t=(1:2:100)';
C(:,1)=A_0*exp(-k*t); C(:,2)=A_0-C(:,1);
% [A]
% [B] (closure)
Y=C*A;
randn('seed',0);
Y=Y+1e-3*randn(size(Y));
MatlabFile 5-9. Main_plot_AB.m % Main_Plot_AB [t,lam,Y,C,A]=Data_AB; plot(lam,Y,'-k'); xlabel('wavelength'); ylabel('absorbance');
1.4 1.2
absorbance
1 0.8 0.6 0.4 0.2 0 400
450
500 wavelength
550
600
Figure 5-7. Observed spectra during a reaction A → B. Each spectrum yi,: (i-th row of Y) in Figure 5-7 and equation (5.12) is an nl dimensional vector (in the example, the number of wavelengths is nl=21). We cannot represent, nor can we comprehend, such a high dimensional vector, but we can do a reduced representation in three dimensions, without loosing many important aspects. The equivalent would be to measure the spectra at three wavelengths only. Figure 5-8, left, shows a spectrum recorded at three wavelengths; right, shows its vector representation, the absorbances at the three wavelengths form the coordinates in a 3-dimensional space.
226
Chapter 5
λ3
abs abs
abs(λ abs( λ1) yi,: abs(λ abs( λ3) λ1
λ2
λ3
λ2
wavelength
abs(λ abs( λ2) λ1
Figure 5-8. The spectrum vector yi,: measured at three wavelengths and its representation in a 3-dimensional space. The original spectra, measured at 21 wavelengths, of course are represented as 21-dimensional spectral vectors in a 21-dimensional space. An important question arises: where in the 21-dimensional space (or 3 dimensional space) can all the measured vectors be found? Is it possible to restrict the potential locations of the spectral vectors in a subspace? A first restriction is obvious: as absorbances can only be positive, only those parts of the space with positive coordinates are available to the spectral vectors. Is there anything more specific? Figure 5-9 represents this question in a 3 dimensional space.
λ3 y1,: ? yi,:
?
λ2
ym,: λ1 Figure 5-9. Possible (?) path for the intermediate spectra yi,: as a function of the reaction time.
Model-Free Analyses
227
The vector y1,: represents the initial spectrum of the reaction solution containing only the component A with concentration [A]0. The i-th spectrum is represented by the i-th row vector yi,:. The vector ym,: is the last measured spectrum, the m-th row of Y. If the reaction were finished, the final solution would only contain B with the concentration [A]0. What is the path taken by the series of measured spectra? (It is easy to guess that it is not the wild loop shown in Figure 5-9). First we recognise that each spectrum yi,: is a linear combination of the two spectra of the components A and B. The row vectors a1,: and a2,:, containing the molar absorption spectra, are multiplied with the concentrations [A] and [B].
yi ,: = [ A ] a1,: + [B ] a2,: = c i ,1 a1,: + c i ,2 a2,:
(5.13)
= ci ,: A All intermediate spectra yi,: are linear combinations of the molar component spectra aj,: and therefore they all lie in a plane defined by these component spectra. This is a first, very important result! Can the spectra be localised more precisely ? We know that the sum of the two component concentrations is constant, [A]+[B]=[A]0 or, equivalently, ci,1+ci,2=ctot. A bit of algebra demonstrates that any spectrum yi,: is the sum of a fixed vector ctota1,: plus the difference vector (a2,:-a1,:) multiplied by the concentration [B]. yi,: = [ A ]a1,: + [B ] a2,: = ([ A ]0 − [B ]) a1,: + [B ] a 2,: = [ A ]0 a1,: + [B ](a 2,: − a1,: )
(5.14)
= y1,: + [B ](a 2,: − a1,: ) = c tot a1,: + c i,2 (a 2,: − a1,: )
All spectra yi,: lie on a straight line between the initial and the final spectrum. See Figure 5-10 for a graphical representation. This figure also represents the plane in which all the action occurs. The visible part is shown as the grey triangle. It is limited by the positive part of the space, i.e. where all 3 coordinates are positive. Note also that [A]0 is about 0.7M: y1,:≈0.7a1,:. Further, the final spectrum ym,: has not reached the vector a2,:. The measurements were stopped before the reaction reached completion. The reason for the spectra lying on a straight line is the result of the fact that the sum of the concentrations is constant. We call such a system a closed system.
228
Chapter 5
a1,: λ3 y1,:=[A]0 a1,: y2,: yi,:=y1,: +[B] (a2,:-a1,:) λ2
λ1
ym,:
a2,:
Figure 5-10. Graphical representation of equations (5.13) and (5.14). The grey triangle represents the plane of action.
Reduction in the Number of Dimensions Back to Factor Analysis and eigenvectors. The fact, that in a two component system all intermediate spectra lie on a plane, allows us to define the position of the spectra on that plane with only two coordinates. Or specifically, if spectra were acquired at 1024 wavelengths (with a diode array instrument), the original spectra vectors are defined by 1024 coordinates, but they could be defined by only 2 coordinates. A tremendous reduction! Recall that Figure 5-9 and Figure 5-10 are simplifications for human consumption; they are misleading, since usually spectra are measured at many more than three wavelengths and need to be represented by vectors in a much higher dimensional space. To be able to represent the spectral vectors in the plane, we need a system of axes, preferably an orthonormal system. As it turns out, the two eigenvectors ¯ form an orthonormal system of axes in that plane. This is represented in V Figure 5-11.
229
Model-Free Analyses
a1,: λ3 y1,: v2,: yi,: v1,:
λ2
ym,:
λ1
a2,:
¯ form a system of orthonormal axis Figure 5-11. The eigenvectors V in the plane spanned by the spectra. The dark grey part of the plane is the same as in Figure 5-10. The first eigenvector v1,: is approximately parallel to the average of all measured spectra. The second eigenvector v2,: is orthogonal to v1,: and thus has negative elements. To indicate that fact, the grey plane has been expanded into the region of negative values. We are now in a position to grab that plane, turn it around and put it onto the plane of the paper. Figure 5-12 represents the new situation. The next question arises immediately: how do we determine the coordinates ¯? b of the spectral vectors yi,: in this new system of axes V yi ,: = bV
(5.15)
¯ , this is a particularly simple linear regression Due to the orthonormality of V calculation. The vector b is computed as:
b = yi,:V + = yi,:V t (VV t )−1 = yi,:V t
(5.16)
230
Chapter 5
a1,: y1,: v2,:
yi,: v1,: ym,: a1,: Figure 5-12. Everything on a plane defined by the system of axes v1,: and v2,:. Equation (5.15) holds for one specific vector yi,:. Naturally, it can be expanded into a matrix equation for all yi,:'s in Y.
Y= B V and B=YV
(5.17) t
Recall equation (5.10)
Y = USV and thus
(5.18) t
B = USV V = US ¯¯ are the coordinates of the spectral vectors Y in the The rows of the matrix US ¯ . (Note, we use the notation (US ¯¯ ) for the product U ¯ ×S ¯ .) coordinate system V This is represented in Figure 5-13.
231
Model-Free Analyses
v2
(us )i,2 i ,2
yi,:
(us )i,1 (us
v1
Figure 5-13. The coordinates of the vector yi,: in the system of axes ¯. V All this is not completely new. In Reduced Eigenvector Space (p.180), we did ¯¯ was used to represent the complete matrix Y. The just that: the matrix US ¯¯ we called Yred. The component spectra A can also be represented matrix US ¯ t. As mentioned then, the reduction in the in the eigenvector axes: Ared=AV size of the matrices Y and A can be substantial. Lawton-Sylvestre The insight we have gained so far forms the basis of what is arguably the first 'chemometrics' method. Chemometrics is not easily defined. A Google search offers: "Chemometrics is the science of relating measurements made on a chemical system or process to the state of the system via application of mathematical or statistical methods." More casually one can state: "Chemometrics is the art of extracting useful information from chemical data." Both definitions would include the calculation of the average of a few numbers, what most chemometricians would not accept as a chemometrics method; chemometrics is more exciting. There is no doubt that the method of Lawton and Sylvestre is proper chemometrics. Consider the data in Figure 5-7, spectra that were collected during the progress of the reaction A → B. For the present application, not the whole reaction was covered. The first spectrum is taken a while after the reaction began and the last spectrum before the reaction reached completion. Thus, the data include neither the pure spectrum of the starting material A, nor the spectrum of the product B. The spectrum of pure A is 'somewhere' before the first measured spectrum and the spectrum of pure B 'somewhere' past the last measured spectrum. But, where exactly? Of course it is not possible to define the spectra perfectly, but it is possible to be more precise than the above statement. As a reminder, fitting the data with a physical/chemical model will produce the 'perfect' result but we are now in the chapter on
232
Chapter 5
model-free analysis and the present task is to extract useful information without a model. The foundation of the method of Lawton and Sylvestre is the recognition that all spectra lie on a line, and that the pure spectra must be on that line as well − the spectrum of A somewhere before the first measured spectrum and the spectrum of B somewhere past the last spectrum. We have already narrowed down the region of 'past' and 'before' to be a line. We still hope to be able to localise the regions precisely. Referring to Figure 5-10, one must recognise that the line must poke somewhere through one of the planes spanned by the λ-axes. On this side of the planes, all absorptivities are positive; on the other side of one of the planes, at least one absorptivity is negative. These planes are the limits on the line where the spectra can be found.
a1,: 1,: λ3 y1,: y1,:
v2,:
yi,:
yi,: λ2
λ1
ym,:
v1,:
ym,: m,:
a2,: 2,:
Figure 5-14. Schematic of the Lawton-Sylvestre method. The bold dotted lines represent the feasible regions. In Figure 5-14 the bold dotted lines outside the first, y1,:, and last, ym,:, measured spectra, represent the range in which the pure component spectra a1,: and a2,: can be found. The sections of these lines are limited by the location of where they poke through the λ1/λ2 and the λ2/λ3 planes. These intersections could be computed in explicit equations since everything is linear. We take a more intuitive approach: starting from the first spectrum, we 'walk backwards' on the line until we 'hit the wall'. This happens when the first molar absorptivity becomes negative. We repeat the
233
Model-Free Analyses
same from the last measured spectrum, 'walking' in the opposite direction. Doing the 'walking' is easy, just refer to Figure 5-14 and Figure 5-15. Fit a straight line through the (us ¯¯ ):,1/(us ¯¯ ):,2 data pairs and continue on the line. The points thus calculated need to be translated back into real spectra before the test for negativity can be made. We only use the significant parts ¯, S ¯ and V ¯ , i.e. keep the first two factors. Referring to Figure 5-13, one can U ¯¯ . see that the i-th spectrum is defined by the i-th row of US yi,: = (us)i,: V
(5.19)
¯. where (us ¯¯ )i,: contains the coordinates of yi,: in eigenvector space V
v2 y1,:
v1 ym,: yextrap
dir
Figure 5-15. Determination of the limits for the feasible solutions from the Lawton-Sylvestre method. Vector dir denotes the directional vector pointing to the spectral direction. The program LawtonSylvestre.m performs the analysis using Data_AB.m (p.224) shown in Figure 5-7, after removing the first and last 10 spectra (rows of Y). First the original spectra are represented in the eigenvector ¯ where the coordinates are US ¯¯ . Next, the equation for the best space V straight line through the data in the eigenvector space needs to be computed; we use the MATLAB function polyfit. The vector dir (see Figure 5-15) points along the line of all spectra. Increasing contributions of dir are added to the last spectrum. The resulting points in the eigenvector space are redefined in the real space, see equation (5.20), and all its coordinates are tested for negative values. y extrap = (us)extrap V
(5.20)
234
Chapter 5
The process is repeated in the other direction. MatlabFile 5-10. LawtonSylvestre.m % LawtonSylvestre [t,lam,Y,C,A]=Data_AB; Y(41:50,:)=[]; % remove end and beginning of Y Y(1:10,:)=[]; m=length(Y(:,1)); % number of spectra left [U,S,Vt]=svd(Y,0); U_bar=U(:,1:2);S_bar=S(1:2,1:2);V_bar=Vt(:,1:2)'; US_bar=U_bar*S_bar; % coordinates in V_bar-space subplot(1,2,1) % plot in the eigenvector space plot(US_bar(:,1),US_bar(:,2),'k.'); xlabel('(us)_{:,1}');ylabel('(us)_{:,2}'); hold on b = polyfit(US_bar(:,1),US_bar(:,2),1); % straight line fit dir=[1 b(1)]; % vector in direction of spectra inc=0; % y_extrap=Y(m,:); while all(y_extrap>0) % us_extrap=[US_bar(m,:)+inc*dir] % plot(us_extrap(1),us_extrap(2),'k+'); y_extrap=us_extrap*V_bar; % inc=inc+.05; end y_last=y_extrap; %
init. step size for move check if all absorbances>0 move spect. in real absorbances first impossible spectrum
inc=0; % init. step size for move y_extrap=Y(1,:); while all(y_extrap>0) % check if all absorbances>0 us_extrap=[US_bar(1,:)-inc*dir]; % move plot(us_extrap(1),us_extrap(2),'k+'); y_extrap=us_extrap*V_bar; % spect. in real absorbances inc=inc+.05; end y_first=y_extrap; % first impossible spectrum subplot(1,2,2) plot(lam,Y,'-',lam,y_first,'--',lam,y_last,'--'); xlabel('wavelength');ylabel('absorbance');
Figure 5-16 displays the results of the analysis. On the left, the • markers ¯ , the + markers represent the measured spectra in the eigenvector space V the extrapolated values. The full lines in the right panel are the series of spectra that were used for the analysis and the dashed lines are the extrapolated boundary spectra. The spectrum of B is fairly well defined while the spectrum of A is not, there is a long extrapolation required until the its spectrum turns negative at 400nm. This exemplifies the limitation of model-free methods: they rely on very simple constraints but in certain cases the range of feasible answers can be very wide, sometimes too wide to be useful. This will be discussed later in Chapter 5.4.3, Rotational Ambiguity.
235
Model-Free Analyses
While the Lawton-Sylvestre method is very elegant and simple, it is virtually impossible to extend the principle to 3 and more components. 1
1.8 1. 8
0.5 0. 5
1.6 1. 6
B
1.4 1. 4
0
absorbance absor bance
(uss ):,2 (u
-1 -1..5 -1 -2
A
1 0.8 0. 8 0.6 0. 6 0.4 0. 4
-2..5 -2
0.2 0. 2
-3
0
-3..5 -3 -4
A
1.2 1. 2
-0..5 -0
-2 (uss):,1 (u
0
-0.2 -0 .2 400
B 500 50 0 wavvelength wa
600
Figure 5-16. The Lawton-Sylvestre analysis in action. The double arrows cover the feasible regions of positive absorbances.
Three and More Components So far we restricted our deliberations to 2-component systems. It is possible to increase this number to 3 and still comprehend the action in a 3 dimensional space. We can even project the 3-dimensional space onto the plane of the paper or computer screen and 'see' what is going on. As usual, we demonstrate the procedures based on a chemical process. Instead of another kinetics example, we use a spectrophotometric titration. The experiment follows the deprotonation of a two-protic acid by measuring the absorption spectra of the solution as a function of pH. The equilibria are quantitatively described by equation (5.21). log K
1 ⎯⎯⎯⎯ → AH A + H ←⎯⎯⎯⎯
log K
2 ⎯⎯⎯⎯→ AH + H ←⎯⎯⎯⎯ AH 2
(5.21)
236
Chapter 5
The concentrations of the differently protonated species, as a function of pH, are calculated with the explicit function we developed in Special Case: Explicit Calculation for Polyprotic Acids, (p.64). A data matrix Y is constructed as before. Data_eqAH2a.m generates the data, it is called by Main_eqAH2a.m. MatlabFile 5-11. Data_eqAH2a.m function [pH,lam,Y,C,A]=Data_eqAH2a pH=[2:.1:12]'; H=10.^(-pH); logK=[8 6]; K=10.^logK; C_tot=1e-3; n=length(logK);
% pH range % protonation constants % [AH2]+[AH]+[A] % number of protons
denom=zeros(size(H)); for i=0:n num(:,i+1)=H.^i*prod(K(1:i)); % numerator denom=denom+num(:,i+1); % denominator end alpha=diag(1./denom)*num; C=C_tot*alpha;
% degree of dissociation % concentration profiles
lam=400:10:600; A(1,:)=1000*gauss(lam,450,120); A(2,:)=2000*gauss(lam,350,120); A(3,:)=1000*gauss(lam,500,50); Y=C*A; randn('seed',0); Y=Y+1e-3*randn(size(Y));
% wavelength range % component spectra % absorbance data % noise level 0.001
MatlabFile 5-12. Main_eqAH2a.m % Main_eqAH2a [pH,lam,Y,C,A]=Data_eqAH2a; subplot(2,1,1); plot(lam,A); xlabel('wavelength');ylabel('absorptivity'); subplot(2,1,2); plot(pH,C); xlabel('pH');ylabel('concentration');
Each spectrum, measured during the titration, forms a row of the data matrix Y, and is a vector in an nl-dimensional space (in the example nl=21). As this is a three component system, all the vectors lie in a 3-dimensional sub-space. Each measured spectrum is a linear combination of the 3 component spectra shown in Figure 5-17.
237
Model-Free Analyses
absorptivity
1500 1000 500 0 400
concentration
1
x 10
450
500 wavelength
550
600
-3
0.5
0 2
4
6
8
10
12
pH
Figure 5-17. Molar absorption spectra and concentration profiles for the titration of a diprotic acid with logK values of 8 and 6. ¯ and U ¯. Thus, the rank of Y is 3 − there are 3 significant eigenvectors in V ¯ form a set of three basis vectors in the spectral space. The row vectors of V The coordinates of each vector yi,:, in this new system of axes, are given by ¯¯ , see equations (5.17) and (5.18) or in a different the i-th row of the matrix US notation: YV t = USVV t = US
(5.22)
MatlabFile 5-13. Main_EV_space.m % Main_EV_space [pH,lam,Y,C,A]=Data_eqAH2a; [U,S,Vt]=svd(Y,0); US=U*S; plot3(US(:,1),US(:,2),US(:,3),'.',US(:,1),US(:,2),0*US(:,3)-.5) grid on xlabel('us_{:,1}');ylabel('us_{:,2}');zlabel('us_{:,3}');
To support comprehension of the 3-dimensional character of Figure 5-18, we added the projection of the curve onto the bottom of the plot at z=-0.5. The titration starts at the right hand end of the trace. As can be seen from Figure 5-17, the first spectrum at pH 2 is essentially pure AH2. There is not much change in the concentrations up to pH 4 and the spectrum vectors are very similar. Then, as the pH approaches the first logK-value, the measured spectrum starts to move towards the spectrum of the intermediate AH.
238
Chapter 5
1
us:,3
0.5
0
-0.5 1 -1
0 -2
-1 us
:,2
-2
-3
us
:,1
¯ Figure 5-18. Representation of the measured spectra y:,i in V space for an AH2 titration. At pH 7 there is a maximum in the concentration of AH and the measured spectrum is close to that of pure AH. With further increase in pH the spectrum veers towards the spectrum of fully deprotonated A which is represented at the end of the series. A small side issue deserves mentioning: as discussed in connection with Figure 5-13 and equation (5.22), the system of axes is formed by the ¯ , the coordinates of the spectra in Y are the rows of the matrix eigenvectors V ¯¯ . It is not automatically clear how the axes in a plot like Figure 5-18 US ¯ or US ¯¯ ? The equivalent question can be should be labelled; should it be V posed for Figure 5-17; should the abscissa in the top panel be labelled 'wavelength' or 'nm'? Both are correct. The matrix Y can be regarded as a row or as a column matrix and consequently we can also concentrate on the columns of Y rather than the rows. The columns are linear combinations of the concentration profiles of the species and they all lie in a 3-dimensional space as well. The columns of ¯ form a basis in this space. And the coordinates of each column the matrix U ¯¯ . vector of Y are contained in the columns of the matrix SV MatlabFile 5-14. Main_EV_space.m …continued % Main_EV_space, ...continued SV=S*Vt'; plot3(SV(1,:),SV(2,:),SV(3,:),'.',SV(1,:),SV(2,:),0*SV(3,:)-2) grid on xlabel('sv_{1,:}');ylabel('sv_{2,:}');zlabel('sv_{3,:}');
239
Model-Free Analyses
3 2 sv3,:
1 0 -1 -2 4 2
0 0
-5
-2 sv2,:
-4
-10
sv1,:
Figure 5-19. Representation of the measured absorption profiles in ¯ space. the U The lowest wavelength at 400nm is represented in Figure 5-19 at the end of the trace on the top left. For increasing wavelengths, the profiles move to the right. As expected, the trace in Figure 5-19 is less ordered than the equivalent in Figure 5-18. Concentration profiles are governed by the law of mass action ¯¯ , is structured and closure and thus the trace, following the rows of US accordingly. No such law governs the relative shape of the absorption ¯¯ . spectra and the trace following the columns of SV Mean Centring, Closure In Figure 5-10, we have seen that the law of conservation of mass dictates that in the 2-component case all measured spectra lie on a straight line. In the present context this property is called closure. In general terms it means that the sum of all species concentrations is constant during an experiment. In the 2-component case the spectral action occurs in a 2-dimensional subspace. If the system is closed, the action is concentrated in a 1 dimensional space. Similarly in a closed 3-component case, the action is concentrated in a 2-dimensional subspace. Back to the data set for the titration of the 2-protic acid, Data_eqAH2a.m. Due to closure of the chemical system, the sum of all concentrations [A]+[AH]+[AH2] is constant, and as a result, the curve in Figure 5-18 lies in a plane.
240
Chapter 5
The fact that the spectral vectors in a closed system lie in an further reduced sub-space, in a 2-component system they lie on a straight line, in a 3 component system, in a plane etc., suggests that we could move the origin of the system of axes into that sub-space and in this way the number of relevant dimensions is reduced by one. We subtract the mean spectrum from each measured spectrum yi,: and as a result, the origin of the system of axes is moved into the mean. In the above example, it is into the plane of all spectral vectors. This is called meancentring. Mean-centring is numerically superior to subtraction of one particular spectrum, e.g. the first one. The Matlab program, Main_MeanCenter.m, performs mean-centring on the titration data and displays the resulting curve in such a way that we see the zero us:,3 component, i.e. the fact that the origin (+) lies in the (us:,1,us:,2)-plane. MatlabFile 5-15. Main_MeanCenter.m % Main_MeanCenter [pH,lam,Y,C,A]=Data_eqAH2a; Y_mc=Y-repmat(mean(Y,1),length(pH),1); [U,S,Vt]=svd(Y_mc,0); US=U*S;
% subtract mean spectrum
plot3(US(:,1),US(:,2),US(:,3),'.',0,0,0,'+') grid on axis([-1.5 1.5 -1.5 1.5 -1.5 1.5]); view(-70, 2); xlabel('us_{:,1}');ylabel('us_{:,2}');zlabel('us_{:,3}');
1.5 1
us:,3
0.5 0 -0.5 -1 -1.5 1
0
us:,2
-1
-1
0
1 us:,1
Figure 5-20. Mean-centring moved the origin of the system of axes into the centre of the action. This reduces the dimension of the subspace by one.
Model-Free Analyses
241
The argument can be turned around. If mean-centring reduces the rank of the matrix by one, the data set is closed. We have to be careful. The symmetry between columns and rows of the matrix Y is not complete. Closure is a property of the concentration profiles only and thus applies only in one dimension. The command mean(Y,1) computes the mean of each column of Y and the resulting mean spectrum is subtracted from each individual spectrum. There is no equivalent in the other direction. Subtracting the mean column from the columns does not reduce the rank, however, it moves the origin to the centre of the action. While not reducing the rank, it does reduce the absolute values of the numbers and improves the numerical accuracy of the computations. The improvement is often insignificant and usually only marginal. We generally refrained from performing mean-centring, with the exception of PCR and PLS in Chapter 5.6. HELP Plots Plots of the kind represented in Figure 5-18 and in Figure 5-19 are more than just graphically appealing. A considerable amount of useful information can be extracted from these plots of the spectra or concentration profiles in their respective eigenvector spaces. Consider the multivariate chromatographic data, Data_Chrom2.m (p.219) of a 3-component system as shown in Figure 5-3. MatlabFile 5-16. Main_HELPP.m % Main_HELPP [t,lam,Y,C,A]=Data_Chrom2; [U,S,Vt]=svd(Y,0); US=U*S; plot3(US(:,1),US(:,2),US(:,3),'.',US(:,1),US(:,2),0*US(:,3)-1); hold on;plot3(0,0,0,'o','MarkerSize',10);hold off; grid on xlabel('us_{:,1}');ylabel('us_{:,2}');zlabel('us_{:,3}');
From Figure 5-21, there are a few observations we can make: This is a three component system, but as it is not closed, the action occupies all three dimensions. It does not occur in a plane. The path starts and ends at the origin (marked by {). Figure 5-22 reveals that there are no components eluting at the beginning and end of the chromatogram, and therefore the respective spectral vectors contain just noise.
242
Chapter 5
1
us:,3
0.5 0 -0.5 -1 2 1
2 0
0 -1
us:,2
-2 -2 -4
us
:,1
¯ Figure 5-21. The spectra yi,: of a 3-component chromatogram in V space. The path takes off from the origin in an almost straight line and returns to the origin in an almost straight line. This is exploited in HELP plots (Heuristic Evolving Latent Projections). If a section of the path is on a straight line and its extension goes through the origin, this is an indication that there exists only one component in that section of the measurement. In the example, this is the case at the beginning and end of the overlapped concentration profiles. Figure 5-22 reveals that during times 15-25 only the first component is present and during times 70-95 only the third. MatlabFile 5-17. Main_HELPP.m …continued % Main_HELPP, ...continued plot(t,C) xlabel('time');ylabel('concentration');
The useful aspect of this follows: we can determine the regions in the series of spectra in which there is only one component. The spectral vectors are all parallel and the average over all spectra in the region is a good estimate for the pure component spectrum. The main difficulty with this approach is to decide when exactly the deviation from a straight line starts and thus, which selection of spectra we need to average.
243
Model-Free Analyses
2
x 10
-3
1.8 1.6
concentration
1.4 1.2 1 0.8 0.6 0.4 0.2 0 0
20
40
60
80
100
time
Figure 5-22. The concentration profiles from Data_Chrom2.m.
Noise Reduction ¯ and their respective Retaining only the significant singular values S ¯ and V ¯ , as indicated in Figure 5-2 and equation (5.10), results eigenvectors U in a substantial reduction of the size of the matrices needed to represent the ¯ . There is an additional valuable benefit: Y ¯ not only original matrix Y as Y ¯ is represents all relevant information contained in the original Y, Y somewhat better as it contains much less noise than Y. This is demonstrated in Main_NoiseRed1.m, using the kinetic data set Data_AB.m (p.224). MatlabFile 5-18. Main_NoiseRed1.m % Main_NoiseRed1 [t,lam,Y0,C,A]=Data_AB; Y=Y0+.05*randn(size(Y0)); ne=2; [U,S,Vt]=svd(Y,0); U_bar=U(:,1:ne);Vt_bar=Vt(:,1:ne);S_bar=S(1:ne,1:ne); Y_bar=U_bar*S_bar*Vt_bar'; subplot(3,1,1); plot(lam,Y0,'-');axis tight;ylabel('Y0'); subplot(3,1,2); plot(lam,Y,'-');axis tight;ylabel('Y'); subplot(3,1,3); plot(lam,Y_bar,'-');axis tight; xlabel('wavelength');ylabel('Y_{bar}');
244
Chapter 5
Y0
1 0.8 0.6 0.4 0.2 400
450
500
550
600
450
500
550
600
450
500 wavelength
550
600
1 Y
0.5 0 400
Ybar
1 0.5 0 400
Figure 5-23. Absorbance spectra with noise level 10-3 (top panel) and an increased noise level of 5×10-2 (second panel). The third ¯ retaining 2 significant factors. panel represents Y The graphs in Figure 5-23 are convincing. The top panel displays the original data for a simple first order reaction A→B. The next panel shows the same data after the addition of a substantial amount of noise. The third panel ¯ =U ¯S ¯V ¯ with 2 eigenvectors. Clearly a features the reconstructed matrix Y substantial amount, but not all, of the noise, was removed. It is worthwhile investigating this particular aspect of Factor Analysis more deeply. Data_AB2.m generates data for a first order reaction where only the first component A absorbs. The rank of Y is then only one. MatlabFile 5-19. Data_AB2.m function [t,lam,Y,C,A]=Data_AB2
A_0=1e-3; k=2e-2;
% initial concentration % rate constant
lam=400:10:600;
A(1,:)=1000*gauss(lam,450,120); t=(1:2:100)';
% spectrum of component A only
245
Model-Free Analyses C(:,1)=A_0*exp(-k*t); Y=C*A; randn('seed',0); Y=Y+5e-2*randn(size(Y));
¯ against The program Main_NoiseRed2.m plots three columns of Y and Y each other. MatlabFile 5-20. Main_NoiseRed2.m % Main_NoiseRed2 [t,lam,Y,C,A]=Data_AB2; ne=1; [U,S,Vt]=svd(Y,0); U_bar=U(:,1:ne);Vt_bar=Vt(:,1:ne);S_bar=S(1:ne,1:ne); Y_bar=U_bar*S_bar*Vt_bar'; plot3(Y(:,5),Y(:,10),Y(:,15),'+',Y_bar(:,5),Y_bar(:,10),Y_bar(:,15),'.') xlabel('Y_{:,5}');ylabel('Y_{:,10}');zlabel('Y_{:,15}');grid on
In Figure 5-24, the +'s are the noisy original (yi,5,yi,10,yi,15)-data points. The •'s represent the corresponding factor analytically reproduced data points. The noise reduction is obvious, however, note that the distribution of the •'s along the line is still noisy. This manifests in the irregular distribution of the markers.
0.3
Y:,15
0.2 0.1 0 -0.1 0.8 0.6
1 0.4
0.5
0.2 Y:,10
0
0
Y:,5
Figure 5-24. 3-dimensional plot of the 5th column of Y (+) or Y (•) versus the 10th versus the 15th.
246
Chapter 5
The Figure 5-25 attempts to provide a geometrical representation of the situation.
yi,: i,:
v2,: ytruei,:
yi ,:
v1,:
Figure 5-25. The relationship between the original measurement vector yi,:, its projection into the eigenvector space and the 'true' vector ytruei,:. The eigenvectors v1,: and v2,: span the grey plane; yi,: is the i-th spectrum, its ¯ -plane is y ¯ . The coordinates of spectrum y projection onto the V ¯ i,:=(us ¯¯ )i,:V ¯ i,: in ¯ -space are (us ) V . The hypothetical true, noise-free spectrum ytruei,: (which is ¯¯ i,: not known) usually lies close to, but generally not exactly on, the plane. Figure 5-26 concentrates on the triangle defined by the tips of the vectors yi,:, y ¯ i,: and ytruei,:. These are represented as small circles on that figure. The ¯. difference vector y ¯ i,:-yi,: is orthogonal to the plane spanned by V yi,: i,: noise removed
real noise
ytrue
i,:
noise left
yi ,:
plane v1,:, v2,:
Figure 5-26. Detailed view of Figure 5-25. ¯ is y The projection of yi,: into V ¯ i,:, which is usually is much closer to the true spectrum ytruei,: than yi,: itself. A substantial amount of the noise is removed in the projection but not all.
5.2 Target Factor Analyses, TFA We continue considering multivariate data sets, e.g. a series of spectra measured as a function of time, reagent addition etc. In short, a matrix of
247
Model-Free Analyses
data that can be decomposed in the usual way: Y=CA. The spectra are measured at nl wavelengths and thus they are nl-dimensional vectors. The whole series of spectra follow a particular path in an nl-dimensional space. We have recognised in the preceding Chapter 5.1 Factor Analysis, that this path is concentrated in a much lower dimensional sub-space. Usually, for an nc component system, the sub-space has nc dimensions; e.g. for a two component system, all spectra lie in a plane. Recall that, if the system is closed, the dimension of the sub-space can be further reduced by meancentring. To start with, we do not know the spectra A of the components in the system under investigation. Factor Analysis delivers an orthonormal system of axes ¯ that defines the sub-space of Y and A in an optimal way. Importantly, this V is done automatically, and there is no input from the chemist regarding the components in the system or their spectra. The basic idea of Target Factor Analysis is very simple. In order to test whether a certain compound is taking part in the process, whether its spectrum exists in the measurement, we test whether that spectrum lies in ¯ . If such a test spectrum is outside V ¯ , there is no doubt that the component V does not take part in the process under investigation. If it is in the sub space, we cannot positively conclude that the species is there; the test spectrum could be a linear combination of the existing spectra. A typical application can be found in chromatography. A group of components elute in a strongly overlapping peak cluster. We suspect that a particular chemical, for which we know the spectrum, might be in the unknown mixture, but due to overlap, its spectrum does not appear pure in the matrix Y. Due to inevitable experimental noise, the test spectrum vector will never be ¯ and consequently the question is whether the test exactly in the subspace V ¯. vector is close to V The initial idea might be to compute the distance r of the test row vector t ¯ . As indicated in Figure 5-27, r is the difference between t and its from V ¯. projection tproj into V
v2,: t1
t2 r2
r1
v1,:
tproj,2
¯ . While r1 is Figure 5-27. The distance of two test vectors to V shorter t2 is a better test vector.
248
Chapter 5
Figure 5-27 shows the principle for two test vectors t1 and t2. The fact that ¯ is shorter than the distance t2 does the distance r1 to the sub-space V not mean that t1 is a better candidate. The test vectors need to be normalised in order to be able to compare these distances. One could also use the angle between a test vector t and its projection tproj as a measure. t v2,:
tn
r rn
tn,proj
α
tproj proj v1,:
Figure 5-28. The angle α is a good measure for closeness of the ¯. test vector t to the space V The angle α is defined by the following equations: sin α =
r t
=
rn tn
= rn
(5.23)
The projection tn,proj is computed as given in equation (5.26). The Matlab file Main_TFA.m generates a three-component overlapping chromatogram, generated by Data_Chrom2.m (p.219). Two test spectra t1 and t2 are generated, t1 is the original spectrum of one of the components, t2 is slightly shifted, see Figure 5-29. Both are normalised. The output includes the length of the residuals and the angles between the test spectra and the ¯. plane V MatlabFile 5-21. Main_TFA.m % Main_TFA [t,lam,Y,C,A]=Data_Chrom2; ne=size(C,2); [U,S,Vt]=svd(Y,0);V_bar=Vt(:,1:ne)'; t1=1000*gauss(lam,450,120); % component spectrum, max at 450nm t2=1000*gauss(lam,460,120); % slightly shifted spectrum plot(lam,t1,lam,t2); xlabel('wavelength');ylabel('absorptivity'); t1n=t1/norm(t1); t2n=t2/norm(t2); t1n_proj=t1n*V_bar'*V_bar; t2n_proj=t2n*V_bar'*V_bar; r1n=t1n-t1n_proj;
% normalisation of t1 and t2 % projections % residuals
249
Model-Free Analyses r2n=t2n-t2n_proj; distance(1)=norm(r1n); distance(2)=norm(r2n) angles=asin(distance)/pi*180 distance = 0.0003 angles = 0.0160
% angles in degrees
0.0424 2.4287
While both angles (given in degrees) and distances are small, the ones for the correct spectrum are significantly smaller. The principle of Target Factor Analysis is not restricted to the testing of spectra or, more generally, to row vectors. Exactly the same principles apply, of course, to column vectors or concentration profiles. In mathematical terms, there is a complete symmetry between the two. However, in chemical terms the two dimensions are different. Along the concentration profiles, we usually have a function that quantitatively describes the action while there is nothing of that kind along the spectral dimension. In Chapter 5.2.3, Target Transform Search/Fit, we take advantage of the functional definition that is available in the column space.
1000 900 800
absorptivity
700 600 500 400 300 200 100 0 400
450
500 wavelength
550
600
Figure 5-29. Correct (—) and slightly shifted (...) species spectrum, both used as target spectra.
250
Chapter 5
5.2.1 Projection Matrices The projection of a vector into the subspace defined by eigenvectors, and the subsequent calculation of the residual vector between the original and its projection, is a very common task. Refer back to equations (5.15) and (5.16). It is worthwhile investigating the computations in some detail. The determination of the projections can be regarded as a linear least¯ =, as in Figure squares fit; only now we have an orthogonal set of vectors V 5-28, rather than a general set of non-orthogonal vectors in F in the equivalent Figure 4-12. The projected test vector tproj is a linear combination ¯. of the vectors V tproj = b V
(5.24)
The computation of the linear parameters b is easy, as the pseudo-inverse of an orthonormal matrix is equal to its transposed
b = t V+ = t V t
(5.25)
and thus the projected vector
tproj = t V t V
(5.26)
and the residuals
r = t − tproj = t − t Vt V
(5.27)
t
= t (I − V V ) The equivalent operations are valid for columns:
tproj = U b = UU t t
(5.28)
with
r = t − tproj = t − UU t t
(5.29)
t
= (I − UU ) t The above equations are valid for orthonormal sets of basis vectors. They can be written in very similar ways for general non-orthogonal bases (e.g. F in Figure 4-12). The only difference is the computation of the pseudo-inverse, which can be numerically demanding, but is trivial for orthonormal bases.
Model-Free Analyses
251
tproj = b F (5.30) +
r = t (I − F F ) Similarly, for a column vector tproj, we can write in accordance to Figure 4-11
tproj = F b (5.31) +
r = t (I − F F ) ¯ tV ¯ ), r=t(I-F+F), r=(I- U ¯U ¯ t)t and r=(I-FF+)t are While the notations r=t(I- V elegant, they are inefficient ways of performing the calculations. The ¯ tV ¯ and U ¯U ¯ t are often very large square matrices which take time to matrices V compute, store and also to multiply with the vectors t. It is faster to ¯ (U ¯ tt) rather than r=(I-U ¯U ¯ t)t. The same, of course, is valid for calculate r=t–U the equations (5.29)-(5.31).
5.2.2 Iterative Target Transform Factor Analysis, ITTFA As the name Iterative Target Transform Factor Analysis indicates, this is an iterative extension of Target Factor Analysis. This time, we apply Target Factor Analysis to column vectors or concentration profiles. The basic idea is straightforward. First, we somehow guess a concentration profile, preferably close to a true one. Call it ctest. In the 3-component chromatographic example Data_Chrom2a.m, we use a delta function (often called a needle in the context of ITTFA) with a maximum (52) close to the true maximum of the second concentration profile (50). ¯ . We iteratively Such a test vector ctest normally is not in the sub-space U ¯ , applying equation (5.28). improve it in the following way: project ctest into U ¯ , is not correct, e.g. it contains negative This projected vector, while lying in U elements. A correction is applied that makes the profile physically possible ¯ . Nevertheless, this new vector is a better estimate but it removes it from U than the original one. As the projection invariably results in a shortening of the vector ctest, we re-normalise it to a maximum of one in each iteration. Ideally, the iterations are continued until things are perfect. Unfortunately this is easier said than done, as convergence is notoriously slow. This is illustrated in Figure 5-30, the correct profile will essentially 'never' be reached. MatlabFile 5-22. Data_Chrom2a.m function [t,lam,Y,C,A]=Data_Chrom2a lam=400:10:600; A(1,:)=1000*gauss(lam,450,120); A(2,:)=2000*gauss(lam,350,120); A(3,:)=1000*gauss(lam,500,50);
% component spectra
252
Chapter 5
t=(1:1:100)'; C(:,1)=1e-3*gauss(t,35,30); C(:,2)=9e-4*gauss(t,50,31); C(:,3)=2e-3*gauss(t,70,32);
% elution profiles
Y=C*A; randn('seed',0); Y=Y+1e-3*randn(size(Y)); MatlabFile 5-23. Main_ITTFA.m % Main_ITTFA [t,lam,Y,C_sim,A_sim]=Data_Chrom2a; ne=size(C_sim,2); [U,S,Vt]=svd(Y,0); U_bar=U(:,1:ne); c_sim_n=C_sim(:,2)/max(C_sim(:,2)); % true conc profile, normalised c_test=zeros(size(t)); c_test(52)=1; % init guess,delta function at t=52 for i=1:20; C_test(:,i)=c_test; c_new=U_bar*(U_bar'*c_test); c_test=c_new.*(c_new>=0); c_test=c_test/max(c_test); end
% % % %
improved test vectors in C_test projection into U negative values=0 normalisation to max=1
plot(t,C_test,'-',t,c_sim_n,':'); xlabel('time');ylabel('norm. conc.');
1 0.9 0.8
norm. conc.
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
20
40
60
80
100
time
Figure 5-30. Progress of the ITTFA algorithm for one particular concentration profile. The dashed line represents the correct concentration profile.
Model-Free Analyses
253
We are using the function Data_Chrom2a for data generation. It produces slightly wider concentration profiles than Data_Chrom2 (p.219). Starting from the initial needle, we observe a very quick improvement towards the correct profile; however, the iterative process slows down very quickly and essentially never reaches the correct profile. This is a typical result as convergence in algorithms of this kind tends to be fast in the beginning and subsequently slows down dramatically. Defining a reliable termination criterion for the iterative process that copes with such behaviour is very difficult. This is the exact opposite of what we have experienced in the Newton-Gauss type algorithms of Chapter 4.3.1, where convergence accelerates towards the minimum.
5.2.3 Target Transform Search/Fit Traditional Target Factor Analysis just determines whether a particular test vector is close to the sub-space spanned by the significant eigenvectors, ¯ or U ¯ . As an introduction to the method of Target Transform either V Searching, consider the example discussed in Figure 5-29. One of the ¯ , the other does not. The almost obvious idea is to move the spectra lies in V ¯ spectrum along the wavelength axis, to continuously check the distance to V and determine the minimum. Moving spectra around, in this or similar ways, does not have many applications for the chemist. The reason is that there is no functional relationship that usefully defines absorption spectra. The general idea of target testing has more potential in the column direction. There, we deal with concentration profiles that often are defined by mathematical functions, based on chemical/physical laws. In the following, we develop the principle vaguely defined above and see how concentration profiles and their parameters can be determined by analysing and minimising distances. We develop the idea using a kinetic example. Any reaction scheme that consists exclusively of first order reactions, results in concentration profiles that are linear combinations of exponentials. There is no limit to the number of reacting components nc. The set of differential equations describing such a scheme of exclusively first order reactions can always be written in the following way. ⎡ [c1 ] ⎤ ⎡k1,1 k1,2 ⎢ [c ] ⎥ ⎢k ⎢ 2 ⎥ = ⎢ 2,1 ⎥ ⎢ ⎢ ⎥ ⎢ ⎢ ⎣[c nc ]⎦ ⎢⎣
⎤ ⎡ [c1 ] ⎤ ⎥⎢ ⎥ ⎥ ⎢ [c 2 ] ⎥ ⎥⎢ ⎥ ⎥⎢ ⎥ knc ,nc ⎥⎦ ⎣[cnc ]⎦
(5.32)
or
c = Kc
(5.33)
254
Chapter 5
d[ X ] , the derivative of the concentration of X with dt respect to time. The vector c contains all the derivatives, the vector c all concentrations and the matrix K is formed by the rate constants describing the reaction mechanism. Usually, most entries in K are zero. It is best to use an example: Recall the notation[ X ] =
k2 k1 → B ⎯⎯⎯ →C , equation (5.32) reduces to For the reaction A ⎯⎯⎯
⎡[ A ]⎤ ⎡ −k1 0 0 ⎤ ⎡[ A ]⎤ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢[B ]⎥ = ⎢ k1 −k2 0 ⎥ ⎢[B ]⎥ ⎢[C ]⎥ ⎢⎣ 0 k2 0 ⎥⎦ ⎢⎣[C ]⎥⎦ ⎣ ⎦
(5.34)
Equation (5.34) is the equivalent to equation (3.75)(d) in matrix notation. For the same set of components but including reversible reactions k2 k1 ⎯⎯⎯ → B ←⎯⎯ ⎯⎯⎯ → C , equation (5.32) becomes: A ←⎯⎯ ⎯ ⎯ k3
k4
⎡[ A ]⎤ ⎡ −k1 k3 ⎢ ⎥ ⎢ ⎢[B ]⎥ = ⎢ k1 −k2 − k3 ⎢[C ]⎥ ⎢⎣ 0 k2 ⎣ ⎦
0 ⎤ ⎡[ A ]⎤ k 4 ⎥⎥ ⎢⎢[B ]⎥⎥ −k 4 ⎥⎦ ⎢⎣[C ]⎥⎦
(5.35)
Such systems of differential equations are called homogeneous. They have as solutions, linear combinations of exponential functions, where the eigenvalues, λi, of the matrix K are the exponentials. In the first, irreversible example, equation (5.34), the eigenvalues of K are λ1=-k1, λ2=-k2 and λ3=0. Thus, the concentration profiles are linear combinations of the vectors e-λit, where t is the vector of times. In matrix notation we can write
C = E TE
(5.36)
where C contains the concentration profiles in the usual way, E contains the column vectors e:,i=e-λit and TE is a transformation matrix that establishes the relationship between C and E. The elements of TE are defined by the reaction scheme and the initial concentrations of the reacting species. In the above example, equation (5.34), it is relatively straightforward to determine the eigenvalues of K. In the example (5.35) it is much more difficult to develop the equations. The Symbolic Toolbox of Matlab can be employed for the task. MatlabFile 5-24. Main_Sym_ABC.m % Main_Sym_ABC syms k1 k2 K K=[-k1 k1 0
0 -k2 k2
% symbolic variables 0; 0; 0];
% A->B->C
Model-Free Analyses lambdas=eig(K)
255
% eigenvalues of K
lambdas = -k1 -k2 0
and for the more interesting case of example (5.35) MatlabFile 5-25. Main_Sym_ABC_rev.m % Main_Sym_ABC_rev syms k1 k2 k3 k4 K
% symbolic variables
K=[-k1 k1 0
% A<->B<->C
k3 -k2-k3 k2
0; k4; -k4];
lambdas=eig(K)
% eigenvalues of K
lambdas = 0 -1/2*k1-1/2*k2-1/2*k3-1/2*k4+1/2*(k1^2-2*k1*k2+2*k1*k3 2*k1*k4+k2^2+2*k2*k3+2*k4*k2+k3^2-2*k3*k4+k4^2)^(1/2) -1/2*k1-1/2*k2-1/2*k3-1/2*k4-1/2*(k1^2-2*k1*k2+2*k1*k3 2*k1*k4+k2^2+2*k2*k3+2*k4*k2+k3^2-2*k3*k4+k4^2)^(1/2)
or in a more civilised form λ1 = 0
(5.37) 1 1 2 λ2 = − (k1 + k2 + k3 + k4 ) + k1 + k22 + k32 + k42 − 2k1k2 + 2k1k3 − 2k1k4 + 2k2k3 + 2k2k4 − 2k3k4 2 2 1 1 2 λ3 = − (k1 + k2 + k3 + k4 ) − k1 + k22 + k32 + k42 − 2k1k2 + 2k1k3 − 2k1k4 + 2k2k3 + 2k2k4 − 2k3k4 2 2 Interestingly, analysis of measured data only delivers the 2 (or 3 if zero is included) λ-values. There is not enough information to resolve two equations into 4 rate constants. Or, in chemical terms, without independent additional information, it is impossible to determine all 4 rate constants. The Symbolic Toolbox can even cope with initial concentrations and thus delivers the equations for the concentration profiles. MatlabFile 5-26. Main_Sym_ABC_rev.m …continued % Main_Sym_ABC_rev, ...continued C=dsolve('Da=-k1*a+k3*b','Db=k1*a-(k2+k3)*b+k4*c', ... 'Dc=k2*b-k4*c','a(0)=A0','b(0)=0','c(0)=0'); C.a C.b C.c
The output of this short program is 9500 characters, too much to be included here. We leave it to the readers to perform the task on their computer.
256
Chapter 5
Back to Target Factor Analysis. C is both a linear combination of E (equation ¯ (see later in equation (5.49)). Combining the two (5.36)) and also of U equations
C = E TE E=
and C = U TU
C TE-1
= U TU TE-1 = U T
(5.38)
¯ . Thus, demonstrates that the columns e:,i of E are linear combinations of U ¯ if the correct eigenvalue λi is used; otherwise it is at a e:,i lies in or close to U distance. This is nothing but target testing a test vector etest=e-λtestt. The application Main_TTF.m uses data generated in Data_ABC2.m for a consecutive reaction A→B→C. MatlabFile 5-27. Data_ABC2.m function [t,lam,Y,C,A]=Data_ABC2 % A -> B -> C t lam k A_0
= = = =
[0:100]'; 400:5:600; [0.1 0.03]; 1e-3;
% % % %
reaction times wavelengths rate constants initial concentration of A
C(:,1)=A_0*exp(-k(1)*t); % concentrations of species A C(:,2)=A_0*(k(1)/(k(2)-k(1))*(exp(-k(1)*t)-exp(-k(2)*t))); % conc. of B C(:,3)=A_0-C(:,1)-C(:,2); % concentrations of C A(1,:)=1e3*gauss(lam,450,120); % molar spectrum of species A A(2,:)=2e3*gauss(lam,350,120)+1e3*gauss(lam,500,50); % mol. spect. of B A(3,:)=1e3*gauss(lam,500,50); % molar spectrum of C Y=C*A; randn('seed',0); Y=Y+0.001*randn(size(Y));
% Beer's law % fixed start for random number generator % standard deviation 0.001
MatlabFile 5-28. Main_TTF.m % Main_TTF [t,lam,Y,C,A]=Data_ABC2; ne=size(C,2); [U,S,Vt]=svd(Y,0); U_bar=U(:,1:ne); lambda_test=-.05:.001:.2; for i=1:length(lambda_test) e_test=exp(-lambda_test(i)*t); e_test=e_test/norm(e_test); r=e_test-U_bar*(U_bar'*e_test); distance(i)=norm(r); end
% % % %
test exponential vector normalise residual vector distance
plot(lambda_test,log10(distance)); xlabel('lambda_{test}');ylabel('log(\midr\mid)');
257
Model-Free Analyses
0 -0.5
log(⏐r⏐)
-1 -1.5 -2 -2.5 -3 -3.5 -0.05
0
0.05 0.1 lambda
0.15
0.2
test
Figure 5-31. Distance of normalised exponential functions, exp( ¯. λtest*t) to the subspace U Figure 5-31 clearly features three minima at the correct positions, λtest=0, and the two rate constants used to generate the data: λtest=0.03 and λtest=0.1. A very interesting feature of the whole method is that the rate constants are completely independent. Each minimum, or rate constant, is defined on its own, completely independent of all the others. This is in clear contrast to normal, hard-modelling data fitting where the residuals are a function of all parameters together. There are several extensions and comments worth making. 1. It is possible to use an iterative algorithm to determine the exact positions of the minima. Again, in such a program the rate constants can be fitted individually, irrespective of the others. 2. Under certain circumstances it is possible to represent concentration profiles encountered in titrations as linear combinations of the typical S-shaped profiles known from equilibrium studies. 3. It is feasible to target test not only column vectors of C or E but also the complete matrices C or E. Parameter Fitting via Target Testing
It is worthwhile examining point 3 above in some additional detail. Equation (5.38) C = U TU is, of course, not completely correct. It is only an approximation; it should be written as
258
Chapter 5
C = U TU + R U
(5.39)
The matrix C is defined by the non-linear parameters (rate constants). It is possible to minimise RU, i.e. the corresponding ssq, as a function of these parameters in a 'normal' Newton-Gauss algorithm. The chain of equations goes as follows TU = U t C R U = C − U TU = C − UU t C
(5.40)
t
= (I − UU ) C
The advantage is that there is no pseudo-inverse to be calculated in this way. The computation of TU, which comprises linear parameters is easier ¯ is an orthonormal matrix, U ¯ +=U ¯ t. As mentioned before, in than 'usual' as U equation (5.29), it is advantageous to compute the residuals as R U = C − U(U t C) ; it is considerably faster. The 'standard' chain of equations using Beer-Lambert's law is:
Y = CA +R A = C+ Y R = Y − CC+ Y
(5.41)
= (I − CC + ) Y The main difference to equations (5.40) is the computation of the pseudoinverse C+. For the sake of completeness, we also include the relevant equations if data reduction, according to Reduced Eigenvector Space (p.180), is applied: Yred = C A red + R red = US A red = C + US R red = US − CC+ US
(5.42)
= (I − CC + ) US
The dimensions of the corresponding matrices in equation (5.42) are all the same as in equation (5.40), the main difference is still the computation of C+ ¯ t. instead of U The two approaches, equation (5.40) or (5.41)/(5.42), are not equivalent. Figure 5-32 attempts to represent the situation graphically. Due to the limitation of our mind to 3 dimensions, this endeavour is not easy, or rather ¯¯ or Y it is impossible as we are running out of dimensions. The 'vectors' US ¯¯ or the complete space Y; the curved line C=f(k) represent the subspace US represents the space defined by the whole matrix C; the 'vector' C(kC) represents the minimum for equation (5.41) or (5.42) and C(kU) the minimum
259
Model-Free Analyses
for equation (5.40); the 'vectors' R (Rred) and RU are the corresponding residual matrices.
R or Rred
C(kc) C(ku)
Ru
US or Y
C=f(k) Figure 5-32. Graphical representation of equations (5.40) to (5.42). The 'normal' residuals R are orthogonal to the space C, defined by the ¯¯ or Y into C. This is a straightforward projection of the column vectors in US linear least-squares calculation, equivalent to Figure 4-10. C(kC) is the ¯¯ or Y. closest the space C gets to the 'vectors' US The residuals Ru are defined by the projections of the 'vectors' C into the ¯ ; they are orthogonal to U ¯ . This projection is simpler due to the space U ¯. orthogonal base vectors. RU is the closest the 'vectors' C get to U The figure, however incomplete, demonstrates that the two minima are not the same. Often they are very similar. Probably a more important difference ¯¯ have a length defined by the measurement, thus there is that the vectors US is a weight given to the vectors which is relevant. Such information is completely lost in Target Transform Fitting, employing equation (5.40); Ru is ¯¯ . In fact, the shorter C, minimised without any reference to the length of US the shorter is Ru, thus some normalising of C, as a function of the parameters k, is required. Refer also to Figure 5-27 for a similar situation.
5.3 Evolving Factor Analyses, EFA ¯¯¯ is full The Singular Value Decomposition of a matrix Y into the product USV of rich and powerful information. The model-free analyses we discussed so ¯ and V ¯. far are based on the examination of the matrices of eigenvectors U Evolving Factor Analysis, EFA, is primarily based on the analysis of the ¯ of singular values. matrix S Previously, we have seen in Magnitude of the Singular Values (p.219) that the number of significant singular values in S equals the number of linearly
260
Chapter 5
independent rows or columns of the matrix Y of measurement, which ideally equals the number of changing chemical components in the process investigated. So far, the complete data matrix Y has been analysed and thus the result reflects the total measurement, the number of components existing anywhere in the measurement. Evolving Factor Analyses investigate the evolving character of the singular values − how they change as a function of the progress of the measurement. Information about the evolution of the rank and thus the appearance of new components is revealed. Naturally, this only makes sense if there is an inherent order in the data, usually an order in the acquisition of the spectra that make up the matrix Y. Factor analytical methods have been developed by social scientists; their samples are individuals for whom they have a 'spectrum' of properties. In this collection of samples there is no inherent order and thus, methods that rely on an inherent order of the samples, such as EFA, are of no use. As a typical example of ordered data we will investigate chromatography, where spectra are measured as a function of elution time.
5.3.1 Evolving Factor Analysis, Classical EFA The basic principle of EFA is very simple. Instead of subjecting the complete matrix Y to the Singular Value Decomposition, specific sub-matrices of Y are analysed. In the original EFA, these sub-matrices are formed by the first i spectra of Y where i increases from 1 to the total number of spectra, ns. The appearance of a new compound during the acquisition of the data is indicated by the emergence of a new significant singular value. The procedure is best explained graphically in Figure 5-33. The sub-matrix, indicated in grey, is subject to the SVD and the resulting ne significant singular values are stored as a row vector in a matrix EFA of the same number of rows as Y.
ne
1 SVD i
ns Figure 5-33. Schematic of Forward EFA
si ,:
Model-Free Analyses
261
The example used for the introduction of EFA is based on the threecomponent chromatogram, Data_Chrom2.m (p.219) we have used several times earlier. While most of the Matlab listing in Main_EFA1.m is close to self explanatory, a few statements might need clarification. The singular values are stored in the matrix EFA_f which has ns rows and ne columns. It is advantageous to plot the logarithms of the singular values; their values span several orders of magnitude and cannot be represented in a normal plot. The rank is the number of significant singular values. The significance level can be estimated as the first non-significant singular value of the total matrix Y. The three sub-plots in Figure 5-34 clearly indicate the relationship between the concentration profiles, the evolving singular values and the evolving rank. MatlabFile 5-29. Main_EFA1.m % Main_EFA1 [t,lam,Y,C,A]=Data_Chrom2; [ns,nc]=size(C); ne=nc+1;
% one extra singular value
EFA_f=NaN(ns,ne); % NaNs to prevent log(0) for i=1:ns s_f=svd(Y(1:i,:)); % svd of the first i rows of Y if i
sig_level)'); % number of significant SV subplot(3,1,1); plot(t,C); ylabel('conc.'); subplot(3,1,2); plot(t,log10(EFA_f)); ylabel('log(s_f)'); subplot(3,1,3); plot(t,Rank_f,'x'); axis([0 t(ns) 0 ne]); xlabel('time');ylabel('rank');
262
Chapter 5
conc.
2
x 10
-3
1
0 0
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
log(s f)
2 0 -2 -4 0
rank
4
2
0 0
time
Figure 5-34. Forward EFA. Top panel: concentration profiles; second panel: evolving singular values; third panel: evolving rank. Evolving factor analysis can, and should be performed in both forward and backward directions. The forward plot, calculated above and shown in Figure 5-34, indicates the appearance of new components. The backward plots of Figure 5-36 are calculated similarly by determination of the singular values of the set of the last 1, 2, 3, ... spectra in Y, as seen in the schematic of Figure 5-35. These plots indicate the disappearance of the components.
ne
1 ns-i+1 SVD ns Figure 5-35. Schematic of Backward EFA
sns −i +1,: +1,:
263
Model-Free Analyses MatlabFile 5-30. Main_EFA1.m …continued % Main_EFA1 ...continued EFA_b=NaN(ns,ne); for i=1:ns s_b=svd(Y(ns-i+1:ns,:)); if i
% NaNs to prevent log(0)
Rank_b=sum((EFA_b>sig_level)'); subplot(3,1,1); plot(t,C); ylabel('conc.'); subplot(3,1,2); plot(t,log10(EFA_b)); ylabel('log(s_b)'); subplot(3,1,3); plot(t,Rank_b,'x'); axis([0 t(ns) 0 ne]); xlabel('time');ylabel('rank');
% number of species
conc.
2
x 10
% svd of the last i rows of Y % relevant SV are stored
-3
1
0 0
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
log(sb)
2 0 -2 -4 0
rank
4
2
0 0
time
Figure 5-36. Backward EFA. Top panel: concentration profiles; second panel: evolving singular values; bottom panel: evolving rank.
264
Chapter 5
The combined interpretation of the plots of the singular values in forward and backward direction is fairly straightforward. The increase of the rank by one, indicates the appearance of a species during the process monitored. In well behaved chromatograms, i.e. non-overloaded columns, the width of the elution profiles increases continuously with increasing elution time. In such instances, it is possible to connect the appearance and the disappearance of the individual components: the first compound to appear is also the first to disappear, etc. Concentration windows can be established for all compounds. These are regions along the time axis during which a component exists. Outside these windows the concentration is known to be zero. The connection between the forward and backward singular values can be made in a one-line Matlab command MatlabFile 5-31. Main_EFA1.m …continued % Main_EFA1 ...continued for i=1:3 % windows of existence for the components C_window(:,i)=EFA_f(:,i)>sig_level & EFA_b(:,ne-i)>sig_level; end subplot(4,1,1) plot(t,C); ylabel('conc.'); subplot(4,1,2); plot(t,log10(EFA_f)); ylabel('log(s_f)'); subplot(4,1,3); plot(t,log10(EFA_b)); ylabel('log(s_b)'); subplot(4,1,4); plot(t,C_window(:,1),t,C_window(:,2)+0.3,t,C_window(:,3)+0.6); xlabel('time');ylabel('conc. window');
EFA plots can be used to estimate the rank of a matrix. EFA plots have similarities with the singular value plots shown in Figure 5-4 but they clearly contain more information and thus are more instructive. In order to demonstrate this enhanced capability of EFA plots for the determination of the number of components, we generate a series of spectrophotometric titrations of a diprotic acid with different noise levels, employing Data_eqAH2a.m (p.236) and analyse with Main_EFA2.m. Forward EFA plots, next to the original data, are presented in Figure 5-38. A few observations can be made: the significant singular values are not much affected by the noise level, only the non-significant ones move up continuously with increasing noise. This behaviour is similar to the one observed in Figure 5-4. The lowest panels of Figure 5-38 demonstrate that even at very high noise levels, EFA facilitates the determination of the correct number of components.
265
Model-Free Analyses
conc.
2
x 10
1
log(sf)
0 0 5
log(sb)
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
0 -5 0 5
conc. window
-3
0 -5 0 2 1 0 0
time
Figure 5-37. Complete EFA. Concentration profiles; forward and backward evolving singular values; bottom panel: concentration windows. The human eye is very good at detecting patterns − in this case the appearance of a new significant singular value. The appearance of a new component, as indicated by the point where a new significant singular value rises above the noise level, is delayed by increasing noise. MatlabFile 5-32. Main_EFA2.m % Main_EFA2 [pH,lam,Y,C,A]=Data_eqAH2a; [ns,nc]=size(C); ne=nc+1; noise=[.05 .1 .2];
% one extra singular value
for j=1:3; Yn=Y+noise(j)*randn(size(Y)); % add different noise levels EFA_f=EFA(Yn,ne); subplot(3,2,2*j-1); plot(pH,Yn,'-'); axis([2 12 -1 2]); if j==3,xlabel('pH');end;ylabel('abs.'); subplot(3,2,2*j);
266
Chapter 5
plot(pH,log10(EFA_f)); axis([2 12 -.5 1.5]); if j==3,xlabel('pH');end;ylabel('log(s_f)'); end
1.5 1
1
log(sf)
abs.
2
0 -1 2
4
6
log(sf)
abs.
6
8 10 12
4
6
8 10 12
4
6 8 10 12 pH
1
0 4
6
0.5 0 -0.5 2
8 10 12
2
1.5 1
1
log(s f)
abs.
4
1.5
1
0 -1 2
0 -0.5 2
8 10 12
2
-1 2
0.5
4
6 8 10 12 pH
0.5 0 -0.5 2
Figure 5-38. EFA forward plots for a data set with increasing noise levels. EFA.m is a short Matlab function that computes forward and backward EFA matrices for a given number, ne, of singular values. Its structure is essentially identical to the one discussed for Main_EFA2.m. MatlabFile 5-33. EFA.m function [EFA_f,EFA_b]=EFA(Y,ne) [ns,nl]=size(Y); EFA_f=NaN(ns,ne); EFA_b=NaN(ns,ne); for i=1:ns s_f=svd(Y(1:i,:)); % forward SV s_b=svd(Y(ns-i+1:ns,:)); % backward SV EFA_f(i,1:min(i,ne))=s_f(1:min(i,ne))'; EFA_b(ns-i+1,1:min(i,ne))=s_b(1:min(i,ne))'; end
267
Model-Free Analyses
Interestingly, EFA was originally developed for the analysis of spectrophotometric titration data. Concentration profiles in chromatography and equilibrium studies can be surprisingly similar. The main difference is that in chromatography, the data set generally starts and ends without any component present (Figure 5-37), while in titrations, there is usually one particular species at the beginning and another one at the end (Figure 5-39). While the algorithm is not affected, the concentration windows are different. MatlabFile 5-34. Main_EFA3.m % Main_EFA3 [pH,lam,Y,C,A]=Data_eqAH2a; [ns,nc]=size(C); ne=nc+1; [EFA_f,EFA_b]=EFA(Y,ne); sig_level=EFA_f(ns,ne); for i=1:nc % windows of existence for the components C_window(:,i)=EFA_f(:,i)>sig_level & EFA_b(:,ne-i)>sig_level; end subplot(3,1,1); plot(pH,C); ylabel('conc.'); subplot(3,1,2); plot(pH,log10(EFA_f),'-',pH,log10(EFA_b),':'); ylabel('log(s_f),log(s_b)'); subplot(3,1,3); plot(pH,C_window(:,1),pH,C_window(:,2)+0.3,pH,C_window(:,3)+0.6); xlabel('pH');ylabel('conc. window');
conc.
1
x 10
-3
0.5
0 2
4
6
8
10
12
4
6
8
10
12
4
6
8
10
12
f
b
log(s ),log(s )
2 0 -2 -4 2 conc. window
2
1
0 2
pH
Figure 5-39. Concentration profiles for the titration of a di-protic acid; EFA plots and concentration windows.
268
Chapter 5
5.3.2 Fixed-Size Window EFA, FSW-EFA There are different strategies for the selection of sub-matrices for evolving type factor analyses. The classical, original mode has been presented so far. The most important alternative procedure is based on a moving window of fixed size. In other words, a window of a pre-defined number of consecutive spectra is moved along the columns of the matrix Y. Each window is subjected to SVD, the singular values are stored and their logarithms are plotted.
SVD
si ,:
Figure 5-40. Schematic of FSW-EFA Fixed-size-window-EFA plots reveal the number of different species that co exist in the particular window. More precisely, it is the number of species with linearly independent concentration profiles. Here is the appropriate Matlab program Main_FSW_EFA.m. The data, generated by Data_eqAH4a.m, are mimicking a spectrophotometric titration of a tetra-protic acid AH4. with log(K) values of 8, 7, 6 and 2. The equilibria are quantitatively described by equation (5.43). log K
1 ⎯⎯⎯⎯ → AH A + H ←⎯⎯⎯⎯
log K
2 ⎯⎯⎯⎯→ AH + H ←⎯⎯⎯⎯ AH 2
log K
3 ⎯⎯⎯⎯→ AH 2 + H ←⎯⎯⎯⎯ AH 3
(5.43)
log K
4 ⎯⎯⎯⎯→ AH 3 + H ←⎯⎯⎯⎯ AH 4
The concentrations of the differently protonated species as a function of pH are calculated with the explicit function we developed in Special Case: Explicit Calculation for Polyprotic Acids, p.64. MatlabFile 5-35. Data_eqAH4a.m function [pH,lam,Y,C,A]=Data_eqAH4a pH=[0:.1:12]'; H=10.^(-pH);
% pH range
Model-Free Analyses logK=[8 7 6 2]; K=10.^logK; n=length(logK); denom=zeros(size(H)); for i=0:n num(:,i+1)=H.^i*prod(K(1:i)); denom=denom+num(:,i+1); end alpha=diag(1./denom)*num; C=1e-3*alpha; lam=400:10:600; A(1,:)=1000*gauss(lam,450,120); A(2,:)=2000*gauss(lam,350,120); A(3,:)=1000*gauss(lam,500,50); A(4,:)=1000*gauss(lam,550,50); A(5,:)=1000*gauss(lam,580,50); Y=C*A; randn('seed',0); Y=Y+1e-3*randn(size(Y));
% protonation constants % number of protons
% numerator % denominator % degree of dissociation % concentration profiles % wavelength range % component spectra
% absorbance data % noise level 0.001
MatlabFile 5-36. Main_FSW_EFA.m % Main_FSW_EFA [pH,lam,Y,C,A]=Data_eqAH4a; [ns,nc]=size(C);
% eq. data 4-protic acid
size_w=3; % small windows EFA_w=zeros(ns-size_w+1,size_w); for i=1:ns-size_w+1 s_w=svd(Y(i:i+size_w-1,:)); EFA_w(i,:)=s_w'; end subplot(3,1,1); plot(pH,C,'k'); ylabel('conc'); subplot(3,1,2); plot(pH(0.5*(size_w+1):ns-0.5*(size_w-1)),log10(EFA_w),'k'); ylabel('log(s_w)'); size_w=9; % large windows EFA_w=zeros(ns-size_w+1,size_w); for i=1:ns-size_w+1 s_w=svd(Y(i:i+size_w-1,:)); EFA_w(i,:)=s_w'; end subplot(3,1,3); plot(pH(0.5*(size_w+1):ns-0.5*(size_w-1)),log10(EFA_w),'-k'); xlabel('pH'); ylabel('log(s_w)');
269
270
Chapter 5
conc
1
x 10
-3
0.5
0 0
2
4
6
8
10
12
2
4
6
8
10
12
2
4
6 pH
8
10
12
lo g (s w )
2 0 -2 -4 0
lo g (s w )
2 0 -2 -4 0
Figure 5-41. Concentration profiles for a titration of a 4-protic acid. Second panel: FSW-EFA plot for a window size of 3; and third panel for a window size of 9. Figure 5-41 displays the results of two FSW-EFA analyses with different window sizes, 3 and 9, the second and third panels. The small window of size 3 naturally cannot detect more than 3 different components within the window. The relatively high noise level using a window of only 3 spectra, and the high overlap of the concentration profiles, makes the third singular value in the middle plot, expected between pH 6-8, hardly discernable. With windows of size 9, up to 4 singular values are clearly identifiable. The price to pay for large window sizes is the spreading out of the information. Around pH 4, there are only 2 components co-existing but within the large window there is a total of 3 components. Due to that broadening effect, the
Model-Free Analyses
271
beginnings of concentration profiles are not easily detected. However, the 4 components coexisting at pH 7 are well distinguished in the FSW-EFA plot. There are advantages and disadvantages in this approach compared to classical EFA. a) In big systems with many species and spectra, the detection of new species deteriorates with increasing window size. This effect is clearly noise dependent and can qualitatively be observed in Figure 5-38. It is the effect of a continuous increase of the noise singular values. In FSW-EFA, the fixed size window maintains the magnitude of the singular values related to noise. b) the classical EFA plots are easier to interpret − compare Figure 5-37 with Figure 5-41. c) Consider the following, rather unlikely example: component 1 and 5 in a co-eluting peak system in chromatography have the same spectrum. Under such circumstances the detection of component 5 in the cluster is not detected at all in classical EFA. Window EFA does not suffer from these shortcomings as long as there is no overlap between the 2 components with identical spectra. d) a decision has to be made for the size of the window in FSW-EFA. As outlined above, this decision is important.
5.3.3 Secondary Analyses Based on Window Information The location of the concentration windows is the distinctive result of the classical EFA plots, as in Figure 5-37. Apart from information on peak purity in chromatography, there is not much that is directly useful in the information about concentration windows. In this section, we develop methods that, based on these concentration windows, result in complete concentration profiles C and subsequently the corresponding species spectra A. Iterative Refinement of the Concentration Profiles
This algorithm has many aspects similar to Iterative Target Transform Factor Analysis, ITTFA, as discussed in Chapter 5.2.2, and Alternating LeastSquares, ALS as introduced later in Chapter 5.4. The main difference is the inclusion of the window information as provided by the EFA plots. A brief description of the algorithm (as usual, everything is based on Y=CA): (a) Initial guess for the matrix C of concentrations; often this is not crucial; possible choices are combined EFA plots (see Figure 5-44), the window matrix of 0's and 1's is adequate as well (Figure 5-37). (b) Calculate A as A=C\Y (c) Corrections on A, e.g. negative values=0 (d) Calculate C as C=Y/A (e) Corrections on C, negative values and values outside the concentration windows =0. (This is the main difference to ITTFA).
272
Chapter 5
(f) If fit not 'perfect' return to (b). Instead of a proper termination criterion, which is difficult to develop, we just iterate 100 times. We employ the same chromatographic data data_chrom2a.m (p.251) as in ITTFA. Figure 5-42 displays the results. MatlabFile 5-37. Main_It_EFA.m % Main_It_EFA [t,lam,Y,C_sim,A_sim]=Data_Chrom2a; [ns,nc]=size(C_sim); ne=nc+1; % one extra singular value [EFA_f,EFA_b]=EFA(Y,ne); % perform EFA EFA_f(isnan(EFA_f)==1)=0; % replace NaN's by zeros EFA_b(isnan(EFA_b)==1)=0; % replace NaN's by zeros C=min(EFA_f(:,1:nc),fliplr(EFA_b(:,1:nc))); % combined SV curves sig_level=0.01; % define cut off level C_window=C>sig_level; % build window matrix of 0's and 1's for it=1:100 C=C/diag(max(C));
% normalization to max of 1
A=C\Y; A=A.*(A>0);
% spectra % positive
C=Y/A; C=C.*(C>0); C=C.*C_window;
% conc profiles % positive % apply windows
R=Y-C*A; ssq(it)=sum(sum(R.*R));
% residuals
end [C_n,A_n]=norm_max(C,A); % norm. C and C_sim to unit height [C_sim_n,A_sim_n]=norm_max(C_sim,A_sim); % and recalc. A, A_sim subplot(3,1,1); plot(lam,A_n,'-',lam,A_sim_n,'.'); xlabel('wavelength');ylabel('absorptivity'); subplot(3,1,2); plot(t,C_n,'-',t,C_sim_n,'.'); xlabel('time');ylabel('concentration'); subplot(3,1,3); plot(log10(ssq)); xlabel('iteration');ylabel('log(ssq)');
273
Model-Free Analyses
absorptivity
2 1 0 400
450
500 wavelength
550
600
concentration
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
2 0 -2 -4 0
20
40 iteration
Figure 5-42. Iterative EFA. Top panel: component spectra; middle panel: concentration profiles; bottom panel: progress of the quality of the fit. Markers (•) represent the true values and the lines, the iterative EFA estimates after 100 iterations. The width of the concentration windows is defined by the chosen level of significance of the singular values. Figure 5-43 represents the effect of choosing two different levels: the dotted horizontal line is defined by the first non-significant singular value (fine dotted line) of the complete matrix Y. The intersection of this horizontal line with the EFA trace of the second singular value is the beginning of the second concentration window. The lower significance level at 0.01, represented by the full line, results in an intersection at an earlier time and thus a wider concentration window. In all future iterative analyses of these chromatography data, data_chrom2a.m, the level of significance is set to 0.01.
274
Chapter 5
-1
log10(S)
-1.5
-2
-2.5 0
20
40
60
80
100
time
Figure 5-43. Different significance levels defining the concentration windows for Data_Chrom2a. The initial guess for the concentration profiles is computed as the combination of the forward and backward EFA graphs; the smaller of each forward/backward pair is used. We display them in Figure 5-44 as we use these initial guesses in most other upcoming model-free analyses. C=min(EFA_f(:,1:nc),fliplr(EFA_b(:,1:nc))); % combined SV curves plot(t,C); xlabel('time');ylabel('concentration'); 3.5 3
concentration
2.5 2 1.5 1 0.5 0 0
20
40
60
80
100
time
Figure 5-44. Initial guesses for the concentration profiles, computed as the combination of the singular value traces for forward and backward EFA.
Model-Free Analyses
275
Several additional comments are due. As observed in Chapter 5.2.2, Iterative Target Transform Factor Analysis, ITTFA, iterative progress is relatively fast at the beginning and slows down continuously with the number of iterations. The third panel of Figure 5-42 demonstrates that the minimum has not been reached at all after 100 iterations. While the concentration profiles are reasonably well reproduced, there are some problems with the absorption spectra; one spectrum has a substantial contribution from another. Nevertheless, considering the simplicity of the algorithm the results are astoundingly accurate. Model-free methods do neither supply absolute information about the concentrations or about the spectra. Essentially they only deliver the shapes for the profiles. In this and future examples, we normalise the concentration profiles in C to a maximum of 1 and adjust the species spectra of A in such a way that the product CA is correct. This is done in the function norm_max.m. MatlabFile 5-38. norm_max.m function [Cn,An]=norm_max(C,A) coef=1./max(C); Cn=C*diag(coef); if nargin==2 An=diag(1./coef)*A; end
% normalisation coefficients % apply to C % apply inverse to A
It is worthwhile to compare this iterative refinement of concentration profiles as given on p.271 with ITTFA, the other iterative process we introduced in Chapter 5.2.2. ¯S ¯V ¯ , see Instead of computing A=C+Y, as in (b) on p.271, we replace Y with U equation (5.10).
Y = CA = USV
(5.44)
The component spectra A can then be determined by the following ¯ tC)-1U ¯ t from the left results in rearrangements. Multiplication with (U (U t C )−1 U t C A = (U t C )−1 U t U S V or
(5.45) t
−1
A = (U C ) S V The next step is C=YA+, as in (c) on p.271. Applying equation (5.45) to compute the pseudo-inverse of A
C = Y A + = U S V V t S-1 U t C = U U t C
(5.46)
which is exactly the formula used in ITTFA. If no changes are applied to C and A during both iterative processes, there is no difference between the two methods. The advantage in the refinement, as outlined in steps (a)-(f) on
276
Chapter 5
p.271, lies in the possibility of the incorporation of extra information on A, such as the non-negativity constraint. Explicit Computation of the Concentration Profiles
As we have seen with the previous iterative refinement and ITTFA, convergence generally is very sluggish. Even with moderately complex systems, it is often too slow to be useful. There are alternative, non-iterative methods that compare favourably with the above iterative algorithms. ¯¯¯ which We start the derivation with the standard equations Y=CA and Y=USV t t −1 we combine, see equation (5.44). Post-multiplication with V (A V ) results in
C A V t (A V t )−1 = C = U S (A V t )−1
(5.47)
To compute C, the only unknown is A. It is advantageous to regard the product S(A )−1 as the unknown. Dimensional analysis shows that it is a nc×nc square matrix (nc =number of components), we call it a transformation matrix T:
T = S (A V t )−1
(5.48)
C = UT
(5.49)
And now
This is a very interesting and useful equation and we will return to it several times in later parts of this chapter. Equation (5.49) relates the concentration ¯ of eigenvectors. It is worthwhile representing the matrix C to the matrix U equation graphically.
nc
nc
nc ×
ns
C
=
T
nc
U
Figure 5-45. Graphical representation of C = U T
277
Model-Free Analyses
For an nc=2 component system the square transformation matrix T has only 4 elements, for a 3 component system there are 9 elements, etc. This relatively small number of unknown elements can be calculated explicitly! The crucial information is contained in the concentration windows, as determined by EFA. We know the elements of the matrix C within the concentration windows are positive, while outside these windows, the ¯ . The known elements of C are zero. We also know the complete matrix U ¯ are represented as the shaded areas in Figure 5-46. The elements of C and U white parts of the matrices have to be calculated. ×
=
C
=
U
×
T
¯ are known. Figure 5-46. The shaded parts of C and U ¯T The idea is to compute the elements of T in such a way that the product U results in zeros for the shaded part of C. In order to achieve this, we can separate the columns of C and treat them individually. The i-th column c:,i of ¯ ×t:,i, where t:,i is the i-th column of T. C is the product of U c:,i = U × t:,i
(5.50)
Graphically: ×
=
c:,i
=
U
×
t:,i
¯ ×t:,i. Figure 5-47. The i-th column c:,i of C is the product U
278
Chapter 5
This equation can be split up again. We can remove the unknown, white part of c:,i, i.e. the window of existence of the i-th component, and also the ¯ (in white). What is left, in grey, is the product corresponding part (rows) in U
c:,0i = U0 × t:,i = 0
(5.51)
where the superscript '0' represents the zero parts of c:,i and corresponding ¯. parts of U × =
0
=
U0
× t:,i
Figure 5-48. The homogeneous system of equations U0 × t:,i = 0 This represents a homogeneous system of equations. There is, as always with homogeneous equations, a trivial solution: t:,i=0. There are, however, and fortunately, non-trivial solutions as well. This is due to the fact that U0 does not have full rank. We removed all the information on the i-th component and consequently the rank of U0 is one less than the ¯ . In such a system of equations, a solution t:,i is not rank of the complete U completely defined, as it can be multiplied by any factor (≠0). Thus, we can freely choose one element, e.g. the first element in t:,i as one, t1,i=1. Equation (5.51) can now be written as
0 = U0 × t:,i
(5.52)
0 0 = u:,1 ×1 + U:,2: nc × t2:nc ,i
× =
0
+
0 u:,1 :,1
t2:nc ,i
0 U:,2: nc
Figure 5-49. Graphical representation of equation (5.52).
Model-Free Analyses
279
This allows the computation of the other elements t2:nc,i by linear regression as + 0 0 t2:nc ,i = −(U:,2: nc ) u:,1
(5.53)
This process is repeated for all individual columns of C to result in the ¯ T is complete matrix T (containing 1's in the first row). Finally, the product U the complete matrix C. As always with model-free analyses, only the shapes of the concentration profiles are determined. They have to be normalised in some way. We have seen in The Structure of the Eigenvectors (p.221) that the sign of the eigenvectors is not defined. In the present application, a concentration profile can be positive or, if negative, needs to be multiplied by -1. The two lines neg=-min(C)>max(C); C=C*diag((-1).^neg);
% sign of conc profiles % reverse if negative
in Main_Non_It_EFA.m check for negative concentrations and if necessary correct them. The subsequent computation of the absorption spectra A from C and Y is a simple linear regression. This is followed by the normalisation of the concentration profiles to a maximum of one, as has been outlined already in the preceding chapter Iterative Refinement of the Concentration Profiles. The normalisation is done using the routine norm_max.m (p.275). MatlabFile 5-39. Main_Non_It_EFA.m % Main_Non_It_EFA [t,lam,Y,C_sim,A_sim]=Data_Chrom2a; [ns,nc]=size(C_sim); ne=nc; [EFA_f,EFA_b]=EFA(Y,ne+1); sig_level=0.01;
% perform EFA % define cut-off level % build window matrix C_window=EFA_f(:,1:nc)>sig_level & fliplr(EFA_b(:,1:nc))>sig_level; [U,S,Vt]=svd(Y,0); U_bar=U(:,1:ne); T=ones(nc,nc); for i=1:nc U_i_0=U_bar(~C_window(:,i),:); T(2:nc,i)=-U_i_0(:,2:nc)\U_i_0(:,1); end C=U_bar*T; neg=-min(C)>max(C); C=C*diag((-1).^neg); A=C\Y;
% sign of conc profiles % reverse if negative
[C_n,A_n]=norm_max(C,A); % normalisation of calc. C and A [C_sim_n,A_sim_n]=norm_max(C_sim,A_sim); % norm. of true C and A subplot(2,1,1)
280
Chapter 5
plot(lam,A_n,'-',lam,A_sim_n,'.'); xlabel('wavelength');ylabel('absorptivity'); subplot(2,1,2); plot(t,C_n,'-',t,C_sim_n,'.'); xlabel('time');ylabel('concentration');
absorptivity
2 1 0 -1 400
450
500 wavelength
550
600
concentration
1 0.5 0 -0.5 0
20
40
60
80
100
time
Figure 5-50. Result of non-iterative EFA. It is instructive to compare Figure 5-50 with Figure 5-42. The explicit computation is not only much faster, it also produces better results. This clearly is the consequence of the poor convergence of the iterative version of EFA.
5.4 Alternating Least-Squares, ALS The method of Alternating Least-Squares, ALS, is very simple and exactly for that reason it can be very powerful. ALS has found widespread applications and it is an important method in the collection of model-free analyses. In contrast to most other model-free analyses, ALS is not based on Factor Analysis. ALS should more correctly be called Alternating Linear Least-Squares as every step in the iterative cycle is a linear least-squares calculation followed by some correction of the results. The main advantage and strength of ALS is the ease with which any conceivable constraint can be implemented; its main weakness is the inherent poor convergence. This is a property ALS shares with the very similar methods of Iterative Target Transform Factor Analysis, ITTFA and Iterative Refinement of the Concentration Profiles, discussed in Chapters 5.2.2 and 5.3.3.
281
Model-Free Analyses
We start with the flow diagram in Figure 5-51 demonstrating the basic ideas. Of course, the data matrix is still Y and the goal is to decompose it into the product of the concentration matrix C and matrix A of molar absorptivities according to Chapter 3.1, Beer-Lambert's Law.
Initial guess for C A = C+ Y Corrections to A → A C = Y Ã+ Corrections to C → C
R = Y-CA
<
ssqnew > = < ssqold
>
do something
= end Figure 5-51. Flow diagram for the ALS algorithm. The diagram starts with initial guesses for the concentration profiles C. It is, of course, equally possible to start with initial guesses for the component spectra A and swapping the order of the linear regression/correction steps: calculating first C and then A while the structure of the rest is the same.
5.4.1 Initial Guesses for Concentrations or Spectra Experience shows that often the quality of the initial guesses made for the concentration matrix C (or the matrix A of component spectra) is not crucial. As demonstrated in Figure 5-42, progress in that kind of algorithm typically is fast initially and slows down dramatically towards the minimum. Nevertheless, it cannot harm to have good initial starting matrices. Commonly implemented options include: •
combined eigenvalue curves, such as in Figure 5-44
•
non-iterative EFA result, Figure 5-50
282
Chapter 5
• concentration windows (matrices formed by 1's and 0's), such as the bottom panel of Figure 5-39
5.4.2 Alternating Least-Squares and Constraints By far the most important aspect of the ALS algorithm is the ease of implementing restrictions. In the following we demonstrate this using a number of examples. The program Main_ALS.m forms the backbone of the ALS algorithm. It reads in the data set Data_Chrom2a (p.251) which simulates an overlapping chromatogram of three components. It is the data set we used previously in Chapter 5.3.3 to demonstrate the concepts of iterative and explicit computation of the concentration profiles, based on the window information from EFA. In order to facilitate the comparison of the results and the progress of the iterative process, we start all iterative attempts with the same concentration profiles. They are the combined eigenvalue traces of EFA, as shown in Figure 5-44. The very basic ALS program does not include a termination criterion for the iterative cycle. Just 100 iterations are performed. As convergence invariably slows down towards the minimum, it is not trivial to introduce a generally reliable termination criterion. The algorithm also does not incorporate steps that are required if there is divergence in an iteration. This is indicated by 'do something' in Figure 5-51. Again, it is not easy to develop generally applicable measures that force the iterations towards a good direction. There is no equivalent to the Levenberg/Marquardt method that deals with divergence in the Newton-Gauss algorithm. MatlabFile 5-40. Main_ALS.m % Main_ALS [t,lam,Y,C_sim,A_sim]=Data_Chrom2a; [ns,nc]=size(C_sim); nl=length(lam); ne=nc+1; [EFA_f,EFA_b]=EFA(Y,ne); EFA_f(isnan(EFA_f)==1)=0; EFA_b(isnan(EFA_b)==1)=0;
% one extra singular value % perform EFA % replace NaN's by zeros % replace NaN's by zeros % combined singular value curves C=min(EFA_f(:,1:nc),fliplr(EFA_b(:,1:nc))); for it=1:100 C=norm_max(C);
% normalization
[C,A]=constraints_positiveCA(Y,C);
% constraints
R=Y-C*A; ssq(it)=sum(sum(R.*R));
% residuals
283
Model-Free Analyses
end [C_n,A_n]=norm_max(C,A); [C_sim_n,A_sim_n]=norm_max(C_sim,A_sim);
% norm. C, C_sim to max. 1 % and recalc. A, A_sim
subplot(3,1,1); plot(lam,A_n,'-',lam,A_sim_n,'.'); xlabel('wavelength');ylabel('absorptivity'); subplot(3,1,2); plot(t,C_n,'-',t,C_sim_n,'.'); xlabel('time');ylabel('concentration'); subplot(3,1,3); plot(log10(ssq)); xlabel('iteration');ylabel('log(ssq)'); axis([0 100 -3 0]);
absorptivity
2 1
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
0 -1 -2 -3 0
20
40 iteration
Figure 5-52. 100 iterations of ALS using the simplest constraint of setting negative values of A and C to zero. The markers represent the true spectra and concentration profiles, the lines the ALS result. The bottom panel shows the progress of the sum of squares. There are several types of constraints that can be used and generally the more constraints applied, the better the convergence and the better defined the results.
284
Chapter 5
The most important and almost universally applicable constraint is the nonnegativity of all elements of C and A. Obviously, neither concentrations nor molar absorptivities can be negative. In many ALS algorithms, this constraint is enforced by simply setting all negative entries in C and A to zero: MatlabFile 5-41. constraints_positiveCA.m function [C,A]=constraints_positiveCA(Y,C) A=C\Y; A=A.*(A>0);
% spectra % positive
C=Y/A; C=C.*(C>0);
% conc profiles % positive
There are exceptions to the universality of this non-negativity constraint: e.g. CD or ESR spectra can be negative. Apart from that, both spectroscopies produce a signal that is a linear function of concentration and thus the equivalent of Beer-Lambert's law holds. In other words, the equation Y=CA applies and thus also the ALS algorithm. The alternating computation of the matrices A and C in linear least-squares fits, each followed by setting negative values to zero, is simple but very crude. This fact is reflected in the slow process of the sum of squares minimisation. Matlab supplies the function lsqnonneg that performs a non-negative leastsquares fit of the kind y=Ca+r, where y and a are column vectors. The function computes the best vector a with only positive entries. This equation corresponds to data acquired at only one wavelength. In our application, the columns of A have to be computed individually in a loop over all wavelengths, in each instance using the appropriate column of Y. C is the complete matrix of concentrations. It is, of course, the same for all wavelengths. The following function constraints_lsqnonneg.m replaces the function constraints_positiveCA.m. (Naturally, the call in the main program, Main_ALS.m, needs to be adapted). All columns a:,j of A are computed sequentially in a loop. In the alternate computation of the best C from A, the same function can be used. It computes the rows of C in an analogue loop, using the appropriate row of Y and the complete matrix A. The computation of positive rows of C, using lsqnonneg requires the appropriate transpositions for the rows of C and Y and the matrix A. MatlabFile 5-42. constraints_lsqnonneg.m function [C,A]=constraints_lsqnonneg(Y,C) [ns,nl]=size(Y); for j=1:nl % pos spectra (MATLAB) A(:,j)=lsqnonneg(C,Y(:,j)); end
285
Model-Free Analyses
for j=1:ns % pos conc. (MATLAB) C(j,:)=lsqnonneg(A',Y(j,:)')'; end
absorptivity
2 1
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
0 -1 -2 -3 0
20
40 iteration
Figure 5-53. ALS using the Matlab function lsqnonneg.m for non negative linear least-squares fitting. Compared to Figure 5-52, the resultant concentration profiles and spectra appear to be very similar. The plot of the development of the quality of the fit indicates that a smaller sum of squares is achieved in fewer iterations. However, the calculation is much slower and there is no obvious and significant benefit. In Constraint: Positive Component Spectra (p.168), we introduced an improved, much faster matrix based function nonneg.m (provided by C. Andersson) that is more efficient than the Matlab function lsqnonneg.m. The result of implementing the function constraints_nonneg.m is identical but achieved much faster. MatlabFile 5-43. constraints_nonneg.m function [C,A]=constraints_nonneg(Y,C) A=nonneg(Y',C')'; C=nonneg(Y,A);
% pos spectra (Andersson) % pos conc. (Andersson)
The secondary hump in the third concentration profile in Figure 5-52 and
Figure 5-53 obviously is not correct. In fact, often it is independently known
286
Chapter 5
that the concentration profiles are unimodal, i.e. they have only one maximum and continuously decrease on both sides of the maximum. This is certainly the case for chromatographic concentration profiles. The function constraints_nonneg_unimod.m implements this additional constraint by levelling off secondary maxima. It also uses nonneg.m for non-negative linear least-squares fits. The effect, as demonstrated in Figure 5-54, is clear; not only has the secondary maximum been suppressed, as a consequence, the absorption spectra also are closer to the true ones. MatlabFile 5-44. constraints_nonneg_unimod.m function [C,A]=constraints_nonneg_unimod(Y,C) [ns,nc]=size(C); A=nonneg(Y',C')'; C=nonneg(Y,A); for j=1:nc [m,p]=max(C(:,j)); for i=p:ns-1 if C(i+1,j)>C(i,j); end for i=p:-1:2 if C(i-1,j)>C(i,j); end end
% pos spectra (Andersson) % pos conc. (Andersson) % unimodal conc. profiles C(i+1,j)=C(i,j);end C(i-1,j)=C(i,j);end
absorptivity
2 1
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
0 -1 -2 -3 0
20
40 iteration
Figure 5-54. ALS using the Matlab function constraints_ nonneg_unimod.m. performing positive linear least-squares and removing secondary maxima in the concentration profiles.
287
Model-Free Analyses
The unimodality constraint distorts the least-squares improvements and this is evident from the slower convergence. However, the constraint forces the concentration profiles to physically possible shapes. Another, most powerful constraint is based on concentration windows provided by EFA. MatlabFile 5-45. constraints_nonneg_window.m function [C,A]=constraints_nonneg_window(Y,C,C_window) [ns,nc]=size(C); A=nonneg(Y',C')'; C=nonneg(Y,A); C=C.*C_window;
% pos spectra (Andersson) % pos conc. (Andersson) % apply windows from EFA
absorptivity
2 1
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
0 -1 -2 -3 0
20
40 iteration
Figure 5-55. ALS applying the Matlab function constraints_nonneg_window.m; using non-negative linear leastsquares and the window matrix applied to C. The matrix C_window contains the window information. It is composed of 1's and 0' indicating whether the particular value in the corresponding entry in
288
Chapter 5
the matrix C is known to be positive or zero, see Figure 5-37. This matrix C_window is computed directly before the ALS loop: sig_level=0.01; C_window=C>sig_level;
% define cut off level % build window matrix
Of course the matrix C_window has to be passed as an argument into constraints_nonneg_window.m. Refer to Figure 5-43 for a discussion of the level of significance used above.
5.4.3 Rotational Ambiguity The one major problem with all model-free methods is the fact that often there is no unique solution for the task of decomposing the matrix Y of measurements into the product of two positive matrices C and A. In many instances, there is a whole range of possible solutions. Recall the original model-free method by Lawton-Sylvestre (p.231) that clearly results in bands of feasible solutions, see Figure 5-16. In the literature on model-free analyses, the expression 'rotational ambiguity' has been coined for such situations. In instances where there is rotational ambiguity, algorithms like ALS converge to one particular point within the range of possibilities. Importantly, the algorithm does not detect such situations and thus does not warn the user of the potential non-uniqueness of the result. It is difficult to generalise, but such ambiguous situations often occur when the concentration windows are overlapping in specific ways. Kinetic investigations are typical examples of rotational ambiguity as a result of very wide concentration windows. Using Data_ABC2.m (p.256), producing data mimicking a reaction A→B→C, instead of the chromatography data and applying constraints_nonneg.m results in Figure 5-56. The main program Main_ALS2.m is not listed here, it is virtually identical with Main_ALS.m. While the resulting concentration profiles, and in particular the computed spectra, seem to be reasonably close to the true ones, there are significant discrepancies, typical for model-free analyses. (a) The computed concentration profile for the intermediate component reaches zero at the end of the measurement. (b) The initial part of the concentration profile for the final product is wrong; it does not start with zero concentration. Both discrepancies are the result of rotational ambiguity. The minimal ssq, reached after relatively few iterations, reflects the noise of the data and not a misfit between CA and Y. ssq does not improve if the correct matrices C and A are used.
289
Model-Free Analyses
absorptivity
1 0.5
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
0 -1 -2 -3 0
20
40 iteration
Figure 5-56. ALS analysis of kinetic data Data_ABC2.m analysed using constraints_nonneg.m. Implementation of additional constraints can help remove, or at least reduce, rotational ambiguity. A possibility to narrow down the range is to utilize a known component spectrum. The simplest way of implementing known component spectra is to replace the appropriate spectrum within the iterations with the 'correct', known one. See constraints_nonneg_known_spec.m. We repeat, a powerful property of the ALS algorithm is the ease with which additional known information can be implemented. MatlabFile 5-46. constraints_nonneg_known_spec.m function [C,A]=constraints_nonneg_known_spec(Y,C,A_sim) A=nonneg(Y',C')'; A(2,:)=A_sim(2,:); C=nonneg(Y,A);
% pos spectra (Andersson) % known spectrum % pos conc. (Andersson)
290
Chapter 5
absorptivity
1.5 1 0.5
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
0 -1 -2 -3 0
20
40 iteration
Figure 5-57. ALS with known intermediate spectrum The improvement resulting from the incorporation of the correct spectrum for the intermediate is subtle but significant. The all resulting spectra are improved, with the intermediate spectrum, of course, correct. The new concentration profiles for the starting material A and product C are now correct, while the profile for the intermediate B is untouched. Compared to Figure 5-56, the minimal ssq after the incorporation of the correct spectrum, is not improved and this is a clear indication for rotational ambiguity. The incorrect concentration profile for the intermediate B indicates that there is still a reduced level of rotational ambiguity. In fact, the solution is still not unique; there is still a discrepancy in the concentration profile of the intermediate and small errors in the spectra. With the introduction of the one correct spectrum, the range of rotational ambiguity has been reduced but not totally removed.
5.5 Resolving Factor Analysis, RFA Resolving Factor Analysis, RFA, is an attempt to introduce the strengths of the Newton-Gauss algorithm into the model-free analysis methodology. As
Model-Free Analyses
we have seen in analyses can be accelerates as it methodologies is computations.
291
many instances, the iterative progress in model-free very slow. The Newton-Gauss algorithm, in contrast, converges towards the optimum. Combining the two a promising idea that should result in much faster
¯ T, is the core of RFA. See also its graphical Equation (5.49), C= U ¯ is known from the SVD of the measurement representation in Figure 5-45. U Y, C and also A, see equation(5.58), can be calculated as a function of a transformation matrix T. The residuals and the sum of squares are defined as R( T ) = Y − C(T ) × A(T ) ssq = ∑ ∑ Ri2, j
(5.54)
and are minimised by the Newton-Gauss method. For a three component system, the matrix T has nine elements and thus it appears that C and eventually the sum of squares are a function of nine parameters. As we will see in a moment there are actually fewer, only six, parameters to be fitted. The idea of RFA is to use the Newton-Gauss algorithm to fit this rather small number of parameters in T. To start the iterative cycle, we need a set of initial guesses, Tguess, for the parameters T. In order to compare the properties of RFA with the previous iterative methods, we use the same data set as generated by Data_chrom2a.m (p.251). Since the latest Newton-Gauss algorithm, nglm3.m (p.173), requires Matlab structures, the equivalent data are generated by the appropriate function Data_RFA_chrom2a.m. The Newton-Gauss algorithm requires initial estimates for the parameters in T. These can be computed from the same estimated concentration profiles Cguess as before (Figure 5-44). It is determined by
Tguess = U t C guess
(5.55)
An important issue needs to be discussed next. Multiplying a column of C with any number and its corresponding row of A with the inverse of that number, does not affect the product CA and thus this factor is not determined at all. It can be freely chosen. Due to this multiplicative ambiguity, only the shapes of the concentration profiles (and component spectra) can be determined by any model-free method and only additional quantitative information allows the absolute determination of C and A. Multiplying a concentration profile, or column of C, with a factor is equivalent to multiplication of the corresponding column of T with the same factor. Any one element of each column vector of T can be chosen freely while the other elements in that column define the shape of the concentration profile. In order to avoid numerical problems with very small or very large numbers in each column of T, we choose the largest absolute element of each column of the matrix of initial guesses Tguess and keep it
292
Chapter 5
fixed during the iterative refinement of the others. This reduces the number of parameters that need fitting to nc(nc-1) or in our example of a three component system from 9 to 6. The Newton-Gauss algorithm (nglm3.m), is called from Main_RFA.m, and requires a Matlab function that computes the residuals as a function of the parameters T, as defined in equation (5.54). This calculation is performed in the Matlab function Rcalc_RFA.m. ¯ T, see Figure 5-45. Elements of C that are First C is computed as C= U outside the concentration window and negative elements are set to zero. ¯¯ . This somewhat surprising equation can be derived A is computed as T-1SV in the following way
CA = USV
(5.56)
introducing the identity matrix TT-1 in the appropriate position: CA = UTT -1SV
(5.57)
A = T -1SV
(5.58)
¯ T, A must be: as C=U
Next, negative elements of A are set to zero and the residuals and the sum of squares are computed as indicated in Figure 5-51. The derivatives of the residuals with respect to the parameters are computed numerically by the Newton-Gauss algorithm. MatlabFile 5-47. Rcalc_RFA.m function [r,s]=Rcalc_RFA(s) s.C=s.U_bar*s.T; s.C=s.C.*s.C_window; s.C(s.C<0)=0; s.A=inv(s.T)*s.S_bar*s.V_bar; s.A(s.A<0)=0; R=s.Y-s.C*s.A; r=R(:); s.ssq=sum(r.^2); s.ssq_all=[s.ssq_all;s.ssq];
% % % % % %
calc. conc apply windows from EFA C > 0 calc. A A > 0 residuals
There is one detail that necessitates a few additional comments: nglm3.m requires a vector (s.par_str) of strings that contains the names of all variables that are fitted. In the RFA application these are the nc(nc-1) elements of T, s.par_str=[s.T(1,1), s.T(1,2), …]. The function build_par_str.m does the job of finding the maximum element of each column of T and including the other elements into par_str. MatlabFile 5-48. build_par_str.m function par_str=build_par_str(T) [maxT,index]=max(abs(T)); k=0; for j=1:3
Model-Free Analyses for i=1:3 if i~=index(j) k=k+1; par_str{k}=['s.T(' int2str(i) ',' int2str(j) ')']; end end end MatlabFile 5-49. Main_RFA.m % Main_RFA s=Data_RFA_Chrom2a; s.fname='Rcalc_RFA';
% get data into structure s % file to calc residuals
ne=s.nc; [EFA_f,EFA_b]=EFA(s.Y,ne); % perform EFA on ne sing. values EFA_f(isnan(EFA_f)==1)=0; % replace NaN's by zeros EFA_b(isnan(EFA_b)==1)=0; % replace NaN's by zeros % combined singular value curves C_guess=min(EFA_f(:,1:s.nc),fliplr(EFA_b(:,1:s.nc))); sig_level=0.01; % cut off level s.C_window=C_guess>sig_level; % build window matrix C_guess=norm_max(C_guess); % normalise [U,S,Vt]=svd(s.Y,0); s.U_bar=U(:,1:ne); s.S_bar=S(1:ne,1:ne); s.V_bar=Vt(:,1:ne)'; s.T=s.U_bar'*C_guess; % initial guesses for s.T from C_guess s.par_str=build_par_str(s.T); % cell array of 's.T(i,j)' strings s.ssq_all=[]; s.par=get_par(s); s=nglm3(s);
% Newton-Gauss % collects variable parameters into s.par % call ngl/m
fprintf(1,'s.T = \n');disp(s.T);fprintf(1,'\n'); % display T s.sig_r=sqrt(s.ssq/(s.ns*s.nl-length(s.par))); % sigma_r s.sig_par=s.sig_r*sqrt(diag(inv(s.Curv))); % sigma_par for i=1:length(s.par) fprintf(1,'%s: %g +- %g\n',s.par_str{i}(3:end), ... s.par(i),s.sig_par(i)); end fprintf(1,'sig_r: %g\n',s.sig_r); [C_sim_n,A_sim_n]=norm_max(s.C_sim,s.A_sim); [C_n,A_n]=norm_max(s.C,s.A); subplot(3,1,1); plot(s.lam,A_n,'-',s.lam,A_sim_n,'.'); xlabel('wavelength');ylabel('absorptivity'); subplot(3,1,2); plot(s.t,C_n,'-',s.t,C_sim_n,'.'); xlabel('time');ylabel('concentration'); subplot(3,1,3); plot(log10(s.ssq_all),'.'); xlabel('iteration');ylabel('log(ssq)'); it=0, ssq=7.9338, mp=0, conv_crit=1
293
294
Chapter 5
it=1, ssq=0.941161, mp=0, conv_crit=0.881373 it=2, ssq=0.176043, mp=0, conv_crit=0.812951 it=3, ssq=0.0444385, mp=0, conv_crit=0.74757 it=4, ssq=0.0145205, mp=0, conv_crit=0.673245 it=5, ssq=0.00511355, mp=0, conv_crit=0.647839 it=6, ssq=0.0029535, mp=0, conv_crit=0.422417 it=7, ssq=0.00241231, mp=0, conv_crit=0.183239 it=8, ssq=0.00237849, mp=0, conv_crit=0.0140165 it=9, ssq=0.00237848, mp=0, conv_crit=6.54114e-006 s.T = -5.7280 -2.4717 -3.9865
5.2605 0.7339 -3.1774
2.4481 -1.2217 -0.3969
T(1,1): -5.72798 +- 0.0237521 T(2,1): 5.26049 +- 0.0167212 T(2,2): 0.733879 +- 0.0025855 T(3,2): -1.2217 +- 0.00220923 T(2,3): -3.17735 +- 0.0110746 T(3,3): -0.396851 +- 0.00511213 sig_r: 0.00106576
absorptivity
2 1
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
6
8
10
time
log(ssq)
2 0 -2 -4 0
2
4 iteration
Figure 5-58. RFA analysis of chromatography data.
Model-Free Analyses
295
It is illustrative to compare Figure 5-58 with the equivalent result of the ALS analysis in Figure 5-55. The most striking difference is the number of iterations required to reach the outcome. RFA arrives at the optimal resolution, within the constraints, in 10 iterations. ALS, using equivalent constraints results in acceptable matrices C and A but even after 100 iterations the optimum clearly has not been reached.
5.6 Principle Component Regression and Partial Least Squares, PCR and PLS Principal Component Regression, PCR, and Partial Least Squares, PLS, are the most widely known and applied chemometrics methods. This is particularly the case for PLS, for which there is a tremendous number of applications and a never-ending stream of proposed improvements. The details of these latest modifications are not within the scope of this book and we concentrate on the essential, classical aspects. One could argue whether PCR and PLS should be part of the chapter ModelBased Analyses or Model-Free Analyses. Both, PCR and PLS, are clearly not hard-model fitting methods in the way presented in Chapter 4, nor are they pure model-free analyses. They are somewhere in between, maybe closer to model-free analyses and that is the reason for discussing them here. PCR or PLS establish a mathematical relationship (calibration) between the matrix that is formed by the spectra taken of a collection of samples and the vector of properties or qualities for these same samples. Additionally, both methods allow the prediction of the quality for new samples, just based on their spectra. In contrast to most methods discussed so far, PCR and PLS do not require any order in the data set. In this chapter, we deviate from our well-established principle of generating 'measurements' and analysing them subsequently with the methods developed for the purpose. Such a procedure does not make much sense for PCR/PLS. At least it would be rather difficult to generate realistic data sets that are amenable to analysis by PCR or PLS. We decided to use a publicly available data set; the file corn.mat can be downloaded from http://software.eigenvector.com/Data/Corn/index.html. This data set contains near infrared (NIR) spectra of a collection of 80 corn samples measured on three different instruments, together with the qualities 'Moisture', 'Oil', 'Protein' and 'Starch' for each sample. We use the example of 'Protein' measured on instrument 'mp6' to demonstrate the principles of the PCR/PLS analyses. In order to chemically analyse a sample of corn for its protein content, a rather complex analytical procedure (e.g. Kjeldahl analysis) is required, a slow and expensive process. In our example, the PCR/PLS group of methods replaces this procedure with a much faster spectroscopic analysis. First, a mathematical relationship is established from a calibration set, comprising a matrix of NIR-spectra of the collection of samples and the vector of
296
Chapter 5
corresponding qualities. This calibration can subsequently be used to predict the particular quality for a new sample from its NIR-spectrum alone, thus avoiding an expensive experimental analysis. A more traditional spectroscopy based approach for corn analysis would be to investigate whether there is a peak in the NIR-spectrum that correlates well with the protein content, or whether there is a ratio of peaks that correlates well, or … whatever else the scientist can think of and is prepared to try. Evidently, there is a tremendous number of potential combinations and permutation one could try. PCR/PLS do this job in a much more elegant and efficient way.
5.6.1 Principal Component Regression, PCR In the example we deal with a collection of ns=80 corn samples for which we have the NIR spectra, measured at nl=700 wavelengths, and 80 corresponding qualities (protein contents). The Matlab script Main_PCR.m first reads in the complete corn data. Then it execute stepwise all the tasks that are described in the following. In order to test PCR and later PLS, we remove a random selection of 10 test samples from the total data set; the 10 test spectra are collected row-wise in the matrix Ys and the corresponding 'known' qualities in a column vector qs,known. The remaining spectra are organised in the same way in the matrix Y of dimensions 70×700. For each one of the samples we also know the protein content; we collect these qualities in the vector q with 70 entries. In the following, Y and q serve as the calibration set that is used later to predict the unknown qualities qs for the test set Ys. The predicted qs can then be compared with the 'known' qualities qs,known. MatlabFile 5-50. Main_PCR.m % Main_PCR load corn.mat mp6spec propvals; % load corn data set Y_data=mp6spec.data; % NIR spectra q_data=propvals.data(:,3); % protein qualities ns=length(q_data); lam=[1100:2:2498]; plot(lam,Y_data); xlabel('wavelength') rand('seed',1); s=ceil(rand(10,1)*ns); Y=Y_data; Y(s,:)=[]; q=q_data; q(s)=[]; Y_s=Y_data(s,:); q_s_k=q_data(s);
% initialise random number generator % random selection of 10 samples % calibration set excluding 10 samples % 'unknown' test samples % their 'known' qualities
297
Model-Free Analyses
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 1000
1500
2000
2500
wavelength
Figure 5-59. Collection of NIR spectra of 80 corn samples. There is a fair amount of structure in the NIR spectra but the differences between the spectra are rather subtle. Obviously, the protein content cannot easily be read from any of the peaks. Mean-Centring, Normalisation
There are numerous publications proposing a glut of data treatment methods prior to PCR/PLS. Well established, tested and essentially universally applied are mean-centring and normalisation of the data. We have seen in Mean Centring, Closure (p.239) that mean centring reduces the dimensionality by one, which of course cannot harm. In PCR/PLS it is also common to normalise to the standard deviation of the signals. Both are implemented in Main_PCR.m. MatlabFile 5-51. Main_PCR.m … continued % Main_PCR ... continued (data pre-treatment) meanY=mean(Y); % mean centring Y and q meanq=mean(q); Y_mc=Y-repmat(meanY,size(Y,1),1); q_mc=q-meanq; norm_coef=1./std(Y_mc); % normalisation of Y_mc Y_mc_n=Y_mc*diag(norm_coef);
Mean-centring and normalisation are optional. The PCR (and PLS) algorithm are essentially independent of the nature of pre-treatment of the data, only the centring has to be reversed in the prediction step. In the programs we
298
Chapter 5
indicate the levels of pre-treatment, i.e. Y→Ymc→Ymc,n and q→qmc, while in the equations in the text we do not. PCR Calibration
It is possible to develop the ideas behind PCR and to a lesser extent behind PLS, based on chemical ideas and intuition. Naturally, this is not the only way and both PCR and PLS methods have been developed on pathways that are theoretically oriented. While we do not know the components of the corn samples nor their component spectra, nor even how many components there are, we can still assume that Beer-Lambert's law holds and we can write Y=CA (see Chapter 3.1). There is nothing new here. For a chemist, it intuitively makes sense to assume that the quality 'protein content' is related to the concentrations of the components in the mixture. The simplest assumption is that the protein content is the weighted average of the concentrations of the relevant individual components. Some components have a high protein content others might even have a negative influence. This is best expressed in a matrix equation: nc ns
× nc +
=
q
C
b’
ns
r
(5.59)
All we know at present is the vector of qualities q and that q might be approximated by the product Cb'. We do not have an idea about the number of components, nc, nor about C or b'. ¯ T, C is the product of U ¯ and a Now we remember equation (5.49) C= U transformation matrix T. Introduction of this equation into equation (5.59) results in
q = UTb' + r = Ub + r
(5.60)
where the product Tb' is replaced with the column vector b. The quality vector q is now approximated by a linear combination of the columns of the ¯ . This might be surprising but if (5.59) makes sense, eigenvector matrix U ¯ is known from the SVD of (5.60) does as well. The important aspect is that U Y. Note, we still do not know how many components there are, or how many ¯ . We come back to that question of the eigenvectors we should retain in U shortly. The computation of the best b, the one for which the residual vector r is ¯ minimal, is a linear least-squares calculation. Due to the orthonormality of U it is particularly easy:
299
Model-Free Analyses
b = Ut q
(5.61)
This allows us to compute the PCR approximation qPCR for the quality vector q: qPCR = Ub = UU t q
(5.62)
This should remind the reader of e.g. equation (5.28). The vector qPCR is ¯ . The PCR nothing but the projection of the quality vector q into the space U ¯ , it is bad otherwise. calibration is good if the vector q is close to the space U The Matlab function PCR_calibration.m performs the PCR calibration according to equation (5.62). Note that we use ne=12 eigenvectors in the above calculations. This is the optimal number for prediction, as we show in Cross Validation (p.303). The reader is invited to play with this number and observe the effect. The routine PCR_calibration.m also returns a 'prognostic vector' vprog. It is used for prediction and its function is explained in the next Chapter PCR Prediction. MatlabFile 5-52. Main_PCR.m … continued % Main_PCR ... continued (calibration) ne=12; % no of factors for calibration [q_PCR,v_prog]=PCR_calibration(Y_mc_n,q_mc,ne); % calibration q_PCR=q_PCR+meanq; % undo mean centering in q_PCR plot(q,q_PCR,'.') xlabel('q');ylabel('q_P_C_R')
10
9.5
q
PCR
9
8.5
8
7.5 7.5
8
8.5
9
9.5
10
q
Figure 5-60. PCR calibration of the corn data using 12 factors: qPCR versus the actual qualities q.
300
Chapter 5
MatlabFile 5-53. PCR_calibration.m function [q_PCR,v_prog]=PCR_calibration(Y_mc_n,q_mc,ne) [U,S,Vt]=svd(Y_mc_n,0); U_bar=U(:,1:ne); S_bar=S(1:ne,1:ne); V_bar=Vt(:,1:ne)'; q_PCR=U_bar*U_bar'*q_mc; v_prog=V_bar'/S_bar*U_bar'*q_mc;
% prognostic vector
PCR Prediction
So far, the program Main_PCR.m covers the calibration part of PCR. As Figure 5-60 demonstrates, there is a very reasonable mathematical relationship between the quality 'protein' and the NIR-spectra of the collection of corn samples; the correlation between measured and PCRmodelled protein contents is convincing. How can we use these results to predict the quality qs of a new sample, just based on its NIR-spectrum? The ¯ b. Each individual property qi is approximated relevant equation is (5.60) q=U ¯ and b. The calculation of by the product of the corresponding row of u ¯ i,: of U the quality qs for a new sample is done in an analogous way: q s = us b
(5.63)
However, we need to determine the row vector u ¯ s corresponding to the new sample. Figure 5-61 attempts to represent the relationship between the spectrum of a new sample, ys, and the Singular Value Decomposition of Y itself.
Y
ys
= U
S
V
us y s = u s SV SV
Figure 5-61. Relationship between a new sample spectrum ys and its representation u ¯ s in the eigenvector space. The new spectrum, ys, is shown as the grey row underneath Y and the ¯. corresponding u ¯ s as the grey row underneath U
y s = u s SV
(5.64)
¯¯ . u The spectrum of the new sample is the product of u ¯ s and the matrix SV ¯s contains the coordinates of spectrum ys in the eigenvector space spanned by ¯¯ . Rearranging equation (5.64), u SV ¯ s is computed as:
301
Model-Free Analyses
u s = y s V t S-1
(5.65)
Inserting this equation into equation (5.60) and substitution of b by equation (5.61) leads to: qs = u s b
(5.66)
= y s V t S-1U t q
¯, S ¯, V ¯ and the vector b are determined by the calibration set Y The matrices U ¯ tS ¯ -1 U ¯ tq is also completely determined by the and q; thus the product V calibration. It is a column vector of dimension nl×1 (nl is the number of wavelengths at which the spectra are taken), we call it the prognostic vector, vprog. The quality qs for any new sample can be predicted by the product of its spectrum ys and the prognostic vector vprog: qs = y s v prog
(5.67)
v prog = V t S-1U t q
This prognostic vector can reveal interesting insight into the relationship between the qualities q and the spectra of the calibration set Y. Note that the prognostic vector vprog has already been computed in the function PCR_calibration. MatlabFile 5-54. Main_PCR.m … continued % Main_PCR ... continued (calibration) subplot(2,1,1);plot(lam,meanY);axis tight; subplot(2,1,2);plot(lam,v_prog);axis tight; xlabel('wavelength')
0.6 0.4 0.2 0
1200
1400
1600
1800
2000
2200
2400
1200
1400
1600 1800 2000 wavelength
2200
2400
0.2 0.1 0 -0.1
Figure 5-62. The mean spectrum and the prognostic vector.
302
Chapter 5
Figure 5-62 displays the mean spectrum and the prognostic vector vprog. The vector product ys×vprog is the sum over all the products of pairs of elements in ys and vprog. Thus, if at a certain wavelength the prognostic vector vprog has a positive value, a high value in ys at this wavelength would add to the quality, a negative value in vprog would subtract. As an example, consider the wavelength 2100nm highlighted by the dotted line in Figure 5-62: the prognostic vector is negative, indicating that the peak in the spectrum at this wavelength is 'bad' for the protein content. Samples with shifted peaks towards longer wavelength would have an increased quality qs as the prognostic vector has a strong positive contribution at around 2200nm. For the sake of completeness, we introduce an alternative at this stage and present a more theoretical but quicker path for the development of the PCR calibration and prediction equations. The starting concept is to assume there is a linear relationship between q and the matrix Y:
q = Yb''
×
≈ q
(5.68)
Y
b”
Figure 5-63. Graphic representation of equation (5.68) The optimal vector b" cannot be computed as b"=Y+q since the pseudoinverse Y+ is not defined, its calculation would include the inversion of a ¯¯¯ , b" can then be rank deficient matrix. The way out is to replace Y with USV computed as
b'' = V t S −1U t q
(5.69)
and the prediction of qs of a new sample is simply qs = y s b'' = y s V t S −1U t q = y s v prog
(5.70)
The equations, of course, are the same as developed in the derivations given previously. And it turns out that b'' is the prognostic vector b''=vprog. Now we use the information gathered so far for the prediction of the 10 test samples Ys removed from the complete data set at the very beginning. The function PCR_PLS_pred.m does the work according to equation (5.70). Importantly the mean-centring and normalisation have to be performed in exactly the same way as in the calibration. MatlabFile 5-55. PCR_PLS_pred.m function q_s_pred=PCR_PLS_pred(Y_s,meanY,meanq,norm_coef,v_prog)
303
Model-Free Analyses
Y_s_mc Y_s_mc_n q_s_mc q_s_pred
= = = =
Y_s-repmat(meanY,size(Y_s,1),1); Y_s_mc*diag(norm_coef); Y_s_mc_n*v_prog; q_s_mc+meanq;
% % % %
mean centre normalise predict undo mean centring
MatlabFile 5-56. Main_PCR.m … continued % Main_PCR ... continued (prediction) q_s_pred=PCR_PLS_pred(Y_s,meanY,meanq,norm_coef,v_prog); plot(q_s_k,q_s_pred,'.'); xlabel('q_{s,known}');ylabel('q_{s,pred}'); axis([7.5 10 7.5 10])
10
9.5
q s,pred
9
8.5
8
7.5 7.5
8
8.5 q
9
9.5
10
s,known
Figure 5-64. PCR prediction of 10 new corn samples. The prediction result, as shown in Figure 5-64, for the 10 samples that were removed from the total data set is convincing. ¯¯¯ . Remember, for the calculations so far we used ne=12 eigenvectors in USV Now, we need to return to the question on how this number is determined. The main goal of PCR/PLS is the prediction of the qualities of new samples based on prior calibration using a suitable known calibration set. The best number of eigenvectors is the number that results in the best prediction. It's as easy as that. Cross Validation
¯, S ¯ and V ¯ is determined prior to the The optimal number of eigenvectors in U predictions of new samples, it can be seen as part of the calibration. The
304
Chapter 5
process is only discussed here because it contains both calibration and prediction steps. The most common and intuitive method for the determination of this number of eigenvalues is called Cross Validation. The idea is to remove one (or several samples) from the calibration set, use what is left for the computation of a new calibration, and use it to predict the quality of the removed sample(s). Each prediction is compared with the actual quality that is known as the removed sample really is part of the total calibration set. In a loop all samples are removed either one by one or in groups and after recalibration with the reduced calibration set their qualities are predicted and compared with the true values. In order to determine the best number of eigenvectors this procedure is repeated in a big loop systematically trying all numbers of eigenvectors. This complete procedure is called Cross Validation. The continuation of Main_PCR.m calls the function PCR_cross.m. This function performs the systematic cross validation for up to nemax=40 eigenvectors. The computations result in a plot of the accuracy of the prediction as a function of the number of eigenvectors. Hopefully, the graph has a clear minimum! MatlabFile 5-57. PCR_cross.m function [q_s_cross,PRESS]=PCR_cross(Y,q,ne_max) ns=length(q); for i=1:ns i Y_cal=Y(i~=1:ns,:); q_cal=q(i~=1:ns,:); y_s=Y(i,:);
% elim. i-th row of Y for new calib. set % elim. i-th element of q, new qual. set % extract i-th row of Y as new sample
meanY_cal=mean(Y_cal); % mean centring Y and q meanq_cal=mean(q_cal); Y_cal_mc=Y_cal-repmat(meanY_cal,ns-1,1); q_cal_mc=q_cal-meanq_cal; y_s_mc=y_s-meanY_cal; norm_coef=1./std(Y_cal_mc); % normalising Y Y_cal_mc_n=Y_cal_mc*diag(norm_coef); y_s_mc=y_s_mc.*norm_coef; [U,S,Vt]=svd(Y_cal_mc_n,0); % PCR calibration for k=1:ne_max U_bar=U(:,1:k); S_bar=S(1:k,1:k); V_bar=Vt(:,1:k)'; V_prog(:,k)=V_bar'/S_bar*U_bar'*q_cal_mc; end
end
q_s_cross(i,:)=y_s_mc*V_prog; q_s_cross(i,:)=q_s_cross(i,:)+meanq_cal;
PRESS=sum((q_s_cross-repmat(q,1,ne_max)).^2);
% prediction % undo mean centring
305
Model-Free Analyses MatlabFile 5-58. Main_PCR.m … continued % Main_PCR ... continued (cross validation) ne_max=40; [q_s_cross,PRESS]=PCR_cross(Y,q,ne_max); plot(1:ne_max,PRESS);ylabel('PRESS');xlabel('factors')
16 14 12
PRESS
10 8 6 4 2 0 0
10
20 factors
30
40
Figure 5-65. PRESS-PCR for the corn data set. PRESS, the prediction sum of squares, is the measure for the accuracy of the prediction. It is the sum over all squared differences between crossvalidation predicted and true known qualities. ns
PRESS = ∑ (q s,crossi − qi )2
(5.71)
i =1
¯, S ¯ In Main_PCR.m the PRESS values for all 1 to ne eigenvectors used in U ¯ to compute the predicted qualities qs,cross are stored in a vector PRESS and V that is displayed in Figure 5-65. The figure does not show a clear minimum. In Figure 5-66 we show the results of the cross-validation for ne=12; this number has already been used for the calibration in Figure 5-60. MatlabFile 5-59. Main_PCR.m … continued % Main_PCR ... continued (cross validation) plot(q,q_s_cross(:,ne),'.'); xlabel('q');ylabel('q_{s,cross}'); axis([7.5 10 7.5 10])
306
Chapter 5
10
9.5
qs,cross
9
8.5
8
7.5 7.5
8
8.5
9
9.5
10
q
Figure 5-66. PCR cross-validation for the corn data set. Figure 5-66 shows the relationship between the true and cross-validation predicted qualities. Note the small but significant drop in the correlation compared to pure calibration, as shown in Figure 5-60. Calibration invariably produces a better correlation than prediction.
5.6.2 Partial Least Squares, PLS Partial Least Squares is the chemometrics method 'par excellence'. There is a tremendous number of published applications and also a large number of minor improvements to the original PLS algorithm. In order to understand the difference between the PCR and the PLS methods we first return to PCR. The two central equations of PCR are:
Y = USV and q = Ub
(5.72)
The first equation is the well-known Singular Value Decomposition. In the ¯ form the basis for the column vectors of Y. context of PCR the eigenvectors U The second equation in (5.72) attempts to also represent the column vector q ¯ . If both representations are good then PCR of qualities in the same space U works well, resulting in accurate predictions. A potential drawback of PCR is ¯ is defined solely by Y. Even if there is good reasoning for a the fact that U ¯ , as indicated in the derivation of equation relationship between q and U (5.60), it is somehow 'accidental'. The basic idea of PLS is to find a better set of basis vectors that represent adequately both Y and q. In the PLS literature, this basis T is often called
Model-Free Analyses
307
'scores'. Note, matrix T must not be confused with transformation matrices T we have used several times earlier in this chapter. As required for a decent set of basis vectors, the columns of T have to be orthogonal. The ideal basis for q would be q itself but it might not be a good basis for Y. Ideally, T is a compromise that serves as a good basis for q and Y. Obviously, there is not just one compromise and this to some extent explains the large number of modified PLS algorithms. Below, we give the complete program Main_PLS.m that runs the PLS computations. Its is identical to Main_PCR.m with the exception of the PLS functions PLS_calibration.m and PLS_cross.m that are called instead of the corresponding PCR routines; additional minor differences include the axis labelling etc. Mean-centring and normalisation are implemented as in PCR. Here, the complete listing is included while the equivalent Main_PCR.m has only been given in many little fragments. MatlabFile 5-60. Main_PLS.m % Main_PLS load corn.mat mp6spec propvals; % load corn data set Y_data=mp6spec.data; % NIR spectra q_data=propvals.data(:,3); % protein qualities ns=length(q_data); lam=[1100:2:2498]; plot(lam,Y_data); xlabel('wavelength') rand('seed',1); s=ceil(rand(10,1)*ns); Y=Y_data; Y(s,:)=[]; q=q_data; q(s)=[]; Y_s=Y_data(s,:); q_s_k=q_data(s);
% % % % % %
initialise random number generator random selection of 10 samples calibration set excluding 10 samples 'unknown' test samples their 'known' qualities
% Main_PLS ... continued (data pre-treatment) meanY=mean(Y); % mean centering Y and q meanq=mean(q); Y_mc=Y-repmat(meanY,size(Y,1),1); q_mc=q-meanq; norm_coef=1./std(Y_mc); % normalisation of Y_mc Y_mc_n=Y_mc*diag(norm_coef); % Main_PLS ... continued (calibration) ne=10; % no of factors for calibration [q_PLS,v_prog]=PLS_calibration(Y_mc_n,q_mc,ne); % calibration q_PLS=q_PLS+meanq; % undo mean centering in q_PLS plot(q,q_PLS,'.') xlabel('q');ylabel('q_P_L_S') subplot(2,1,1);plot(lam,meanY);axis tight; subplot(2,1,2);plot(lam,v_prog);axis tight; xlabel('wavelength')
308
Chapter 5
% Main_PLS ... continued (prediction) q_s_pred=PCR_PLS_pred(Y_s,meanY,meanq,norm_coef,v_prog); plot(q_s_k,q_s_pred,'.'); xlabel('q_{s,known}');ylabel('q_{s,pred}'); axis([7.5 10 7.5 10]) % Main_PLS ... continued (cross validation) ne_max=40; [q_s_cross,PRESS]=PLS_cross(Y,q,ne_max); plot(1:ne_max,PRESS);xlabel('factors');ylabel('PRESS'); plot(q,q_s_cross(:,ne),'.'); xlabel('q');ylabel('q_{s,cross}'); axis([7.5 10 7.5 10])
PLS calibration
The PLS equations can be written in analogy to equation (5.72)
Y = TP and q = Tb
(5.73)
In the original PLS, the matrices T (scores), P (loadings) and the vector b are computed sequentially in the following way: (a) take q as a first estimate for the first basis vector t:,1. (b) q is assumed to be a basis for Y. Hence, we can approximate Y as Y=qw1,:, and calculate a best w1,: (loading weights) in a linear leastsquares fit
w1,: = q \ Y
(5.74)
In the standard PLS algorithm w1,: is normalised to unity length and subsequently, t:,1 is calculated as
t:,1 = Y / w1,:
(5.75)
This can be interpreted as one ALS iteration. If the iterations are continued, this process converges to the first eigenvector u:,1. The PLS compromise is to stop at one iteration. (c) this t:,1 is the first basis vector, it is the PLS analogue to the first eigenvector u:,1 in PCR. (d) Both Y and q are projected onto t:,1 and the residuals calculated. Importantly, the residuals are orthogonal to t:,1. rq = q − t:,1b1 Ry = Y − t:,1p1,:
where b1 and p1,: are computed in linear least-squares fits
(5.76)
Model-Free Analyses
b1 = t :,1\q
p1,: = t :,1\Y
309
(5.77)
(e) The remaining basis vectors t:,2:ne are computed in an adaptation of the NIPALS algorithm. q is replaced by the residual vector rq and is used as a new estimate for the next basis vector, t:,2; Ry replaces Y and the computation is continued at (a). The cycle (a)-(d) is repeated ne times. The optimal number ne is determined by cross-validation. The vectors t:,k, wk,:, pk,: and the scalars bk (k=1…ne) are collected in the matrices T, W, P and vector b. The function PLS_calibration.m is the PLS equivalent to PCR_calibration.m. The iterative loop implements equations (5.73)-(5.77). The prognostic vector vprog is introduced in the next section. MatlabFile 5-61. PLS_calibration.m function [q_PLS,v_prog]=PLS_calibration(Y_mc_n,q_mc,ne) rq=q_mc; Ry=Y_mc_n; for k=1:ne W(k,:)=rq\Y_mc_n; W(k,:)=W(k,:)/norm(W(k,:)); T(:,k)=Ry/W(k,:); P(k,:)=T(:,k)\Ry; b(k,1)=T(:,k)\rq; Ry=Ry-T(:,k)*P(k,:); rq=rq-T(:,k)*b(k,1); end q_PLS=T*b; v_prog=W'*((P*W')\b);
% prognostic vector
PLS Prediction / Cross Validation
PLS prediction can be performed in analogy to PCR by PCR_PLS_pred.m (p.302) already introduced earlier. For this, we need to determine the prognostic vector, vprog. Using the results of the PLS calibration, it can be computed as:
v prog = W t (PW t )-1 b
(5.78)
During cross validation, it is most convenient to compute this prognostic vector as a function of the number of factors. This results in a collection of prognostic vectors, conveniently stored in a matrix V_prog. After determination of the optimal number of factors, the appropriate column can be selected as prognostic vector. The predicted quality qs for an unknown sample with the spectrum ys is then calculated as: qs = y s v prog
(5.79)
310
Chapter 5
For PCR and PLS, the principle of cross validation is identical. We refer to Cross Validation (p.303). The following function PLS_cross.m differs from PCR_cross only in the few lines performing the calibration and calculating the prognostic vector. MatlabFile 5-62. PLS_cross.m function [q_s_cross,PRESS]=PLS_cross(Y,q,ne_max) ns=length(q); for i=1:ns i Y_cal=Y(i~=1:ns,:); q_cal=q(i~=1:ns,:); y_s=Y(i,:);
% elim. i-th row of Y for new calib. set % elim. i-th elem. of q for new qual. set % extract i-th row of Y as new sample
meanY_cal=mean(Y_cal); % mean centring Y and q meanq_cal=mean(q_cal); Y_cal_mc=Y_cal-repmat(meanY_cal,ns-1,1); q_cal_mc=q_cal-meanq_cal; y_s_mc=y_s-meanY_cal; norm_coef=1./std(Y_cal_mc); Y_cal_mc_n=Y_cal_mc*diag(norm_coef); y_s_mc=y_s_mc.*norm_coef;
% normalising Y
rq=q_cal_mc; % PLS calibration Ry=Y_cal_mc_n; for k=1:ne_max W(k,:)=rq\Y_cal_mc_n; W(k,:)=W(k,:)/norm(W(k,:)); T(:,k)=Ry/W(k,:); P(k,:)=T(:,k)\Ry; b(k,1)=T(:,k)\rq; Ry=Ry-T(:,k)*P(k,:); rq=rq-T(:,k)*b(k,1); V_prog(:,k)=W(1:k,:)'*inv(P(1:k,:)*W(1:k,:)')*b(1:k,1); end
end
q_s_cross(i,:)=y_s_mc*V_prog; q_s_cross(i,:)=q_s_cross(i,:)+meanq_cal;
% prediction % undo mean centring
PRESS=sum((q_s_cross-repmat(q,1,ne_max)).^2);
5.6.3 Comparing PCR and PLS Figure 5-67 displays the results of the cross validation computations of the corn data with PCR and PLS. The graph is fairly typical: PLS is consistently better at small numbers of factors and predictions are very similar at the optimal number of factors ne, which is 10 for PLS and 12 for PCR. Experience has shown that it is 'dangerous' to use an excessive number of factors (over-fitting) for thew prediction of new unknown samples. This is why we selected ne=12 rather than 23 for PCR.
311
Model-Free Analyses
PCR PLS
14
PRESS
12 10 8 6 4 2 5
10
15
20 25 factors
30
35
40
Figure 5-67. Comparison of the cross validation results for PCR and PLS. In view of the similarity of the PRESS results for PCR and PLS, it is not surprising that the predicted qualities are very similar for the two methods if the optimal numbers of factors is used. Figure 5-68 summarises the comparison.
qs,cross
10 9 PCR PLS
8 7.5
8
8.5
9
9.5
10
q PCR PLS
vprog
0.2 0 -0.2 1200
1400
1600 1800 2000 wavelength
2200
2400
qs,pred
10 9 PCR PLS
8 7.5
8
8.5 q
9
9.5
10
s,known
Figure 5-68. Comparison of predictions and prognostic vectors for PCR and PLS.
312
Chapter 5
The top panel compares the cross-validation predictions with the optimal number of factors of 12 for PCR and 10 for PLS. The middle panel shows the similarity of the two prognostic vectors. The bottom panel compares the prediction of the 10 test samples that were removed from the total data set prior to cross-validation. If a conclusion can be drawn, it is that PCR and PLS are virtually indistinguishable in their outcome. Yet, PLS appears to reach optimal prediction with fewer factors than PCR.
Further Reading Kinetics John Ross, Igor Schreiber, Marcel O. Vlad, and Adam Arkin. Determination of Complex Reaction Mechanisms: Analysis of Chemical, Biological, and Genetic Networks. Oxford University Press 2005 Robert W. Hay. Reaction Mechanisms of Metal Complexes. Albion/Harwood Pub 2000 James H Espenson. Chemical Kinetics and Reaction Mechanisms (2nd edition). McGraw-Hill 1995 S.K. Scott. Oscillations, Waves, and Chaos in Chemical Kinetics. Oxford. Oxford Chemistry Press 1994. Ralph G. Wilkins. Kinetics and Mechanisms of Reactions of Transition Metal Complexes. VCH 1991 N.M. Rodiguin, E.N. Rodiguina. Consecutive Chemical Reactions. Mathematical Analysis and Development. D. van Nostrand New York 1964 S.W. Benson. The Foundations of Chemical Kinetics. McGraw-Hill New York 1960
Equilibria Arthur Martell, Robert Hancock. Metal Complexes in Aqueous Solutions (Modern Inorganic Chemistry). Springer 1996 Arthur Martell, Ramunas J. Motekaitis. The Determination and Use of Stability Constants. (2nd Edition). Wiley 1992 Juergen Polster, Heinrich Lachmann. Spectrometric Titrations: Analysis of Chemical Equilibria. VCH 1989 Kenneth A. Connors. Binding Constants: The Measurement of Molecular Complex Stability. Wiley 1987 M.T. Beck, I. Nagypal. The Chemistry of Complex Equilibria. Van NostrandReinhold. London 1970
Chemometrics/Statistics/Data Fitting Richard G.Brereton. Applied Chemometrics for Scientists. Wiley 2007
314
Further Reading
Paul Gemperline (editor). Practical Guide to Chemometrics (2nd edition). CRC Press 2006 Bruns, Scarmino, de Barros Neto. Statistical Design - Chemometrics, Volume 25 (Data Handling in Science and Technology). Elsevier 2006 James Miller, Jane Miller. Statistics and Chemometrics for Analytical Chemistry (5th edition). Prentice Hall 2005 D. Brynn Hibbert, J. Justin Gooding. Data Analysis for Chemistry. Oxford University Press 2005 M.J. Adams. Chemometrics in Analytical Spectroscopy (2nd edition). The Royal Society of Chemistry 2004. Richard G. Brereton. Chemometrics: Data Analysis for the Laboratory and Chemical Plant. Wiley 2003 George A. F. Seber, C. J. Wild. Nonlinear Regression (Wiley Series in Probability and Statistics). Wiley 2003 Edmund R. Malinowski. Factor Analysis in Chemistry (3rd edition). Wiley 2002 Philip R. Bevington, D. Keith. Robinson. Data Reduction and Error Analysis (3rd edition). McGrawHill New York 2002 Peter C. Meier, Richard E. Zünd. Statistical Methods in Analytical Chemistry (2nd edition). Wiley 2000 Matthias Otto. Chemometrics: Statistics and Computer Application in Analytical Chemistry. Wiley 1999 Sigmund Brandt, Glen Gowan. Data Analysis: Statistical and Computational Methods for Scientists and Engineers (3rd edition). Springer 1998 Richard Kramer. Chemometrics Techniques for Quantitative Analysis. Marcel Dekker 1998 Kenneth R. Beebe, Randy J. Pell, Mary Beth Seasholtz. Chemometrics: A Practical Guide. Wiley 1998 E.J. Karjalainen, U.P. Karjalainen. Data Analysis for Hyphenated Techniques (Data Handling in Science and Technology, Vol. II). Elsevier 1996 Meloun Milan, Jiri Militky, and Michele Forina. Chemometrics for Analytical Chemistry (Vol I+II). Ellis Horwood 1994 Harald Martens, Tormod Naes. Multivariate Calibration. Wiley 1993 John Kalivas. Mathematical Analysis of Spectral Orthogonality. Marcel Dekker 1993 Richard C. Graham. Data Analysis for the Chemical Sciences: A Guide to Statistical Techniques. VCH 1993 Stephen Haswell (editor). Practical Guide to Chemometrics (1st edition). Marcel Dekker 1992
315
Further Reading
Peter Gans. Data Fitting in the Chemical Sciences by the Method of Least Squares. Wiley 1992 Ed Morgan. Chemometrics: Experimental Design. Wiley 1991 D.L. Massart, B.G.M. Vandeginste, S.N. Chemometrics: A Textbook. Elsevier 1988
Deming,
and
Y.
Michotte.
B. G. M. Vandeginste, L. M. C. Buydens, S. De Jong, and P. J. Lewi. Handbook of Chemometrics and Qualimetrics A/B. Elsevier 1988
Numerical Methods Michael B. Cutlip, Mordechai Shacham. Problem Solving in Chemical Engineering with Numerical Methods. Prentice Hall 2007 Bruce A. Finlayson. Introduction to Chemical Engineering Computing. Wiley 2006 Daniel Dubin. Numerical and Analytical Methods for Scientists and Engineers using Mathematica. Wiley 2003 James B. Riggs. Introduction to Numerical Methods for Chemical Engineers (2nd edition). Texas Tech University Press 1999. Alejandro Garcia. Numerical Methods for Physics (2nd edition). Prentice Hall 1999 William H. Press, Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling. Numerical Recipes in C: The Art of Scientific Computing (2nd edition). Cambridge 1996 Peter Pelikan, Michal Ceppan, and Marek Liska. Applications of Numerical Methods in Molecular Spectroscopy. CRC Press 1994 R. Bulirsch, J. Stoer. Introduction to Numerical Analysis. Springer New York 1993 Louis Lyons. A Practical Guide to Data Analysis for Physical Science Students. Cambridge University Press 1991
Matlab and Excel Kenneth Beers. Numerical Methods for Chemical Engineering: Applications in MATLAB. Cambridge University Press 2006 Rudra Pratap. Getting Started with MATLAB 7: A Quick Introduction for Scientists and Engineers (The Oxford Series in Electrical and Computer Engineering). Oxford University Press 2005 Gerard Verschuuren. Excel for Scientists Professionals series). Holy Macro! Books 2005
and
Engineers
(Excel
for
316
Further Reading
Steve Chapra. Applied Numerical Methods with MATLAB for Engineers and Scientists. McGraw-Hill 2004 Robert de Levie. Advanced Excel for Scientific Data Analysis. Oxford University Press 2004 S.C. Bloch. Excel for Engineers and Scientists (2nd edition). Wiley 2003 Bernard Liengme. A Guide to MS Excel 2002 for Scientists and Engineers. Butterworth, Heinemann 2002 E. Joseph Billo. Excel for Chemists: A Comprehensive Guide (2nd edition). Wiley-VCH 2001 Robert de Levie. How to Use Excel in Analytical Chemistry and in General Scientific Data Analysis. Cambridge University Press 2001 Alkis Constantinides, Navid Mostoufi. Numerical Methods for Chemical Engineers with Matlab Applications. Prentice Hall 1999 Michael B. Cutlip, Mordechai Shacham. Problem Solving in Chemical and Biochemical Engineering with POLYMATH, Excel, and MATLAB (2nd Edition). Prentice Hall 1998 Dermot Diamond, Venita C. A. Hanratty. Spreadsheet Applications in Chemistry using MS Excel. Wiley 1997 William J. Orvis. Excel for Scientist and Engineers (2nd edition). SYBEX 1995
List of Matlab Files MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE
3-1. GAS_LAWS.M 3-2. GAS_LAWS.M …CONTINUED 3-3. BEER_LAMBERT.M 3-4. GAUSS.M 3-5. GAUSS_CURVE.M 3-6. GAUSS_CURVE2.M 3-7. GAUSS_SK.M 3-8. GAUSS_SKEWED.M 3-9. NEWTONRAPHSON.M 3-10. EQ1.M 3-11. EQ2.M 3-12. EDTA.M 3-13. EGG_CARTON.M 3-14. EGG_CARTON.M …CONTINUED 3-15. EGG_CARTON.M …CONTINUED 3-16. TWO_EQUATIONS.M 3-17. NONLINEQ.M 3-18. MAIN_NONLINEQ.M 3-19. ATOB.M 3-20. ODE_AUTOCAT.M 3-21. AUTOCAT.M 3-22. ODE_ZERO_ORDER.M 3-23. ZERO_ORDER.M 3-24. ZERO_ORDER.M …CONTINUED 3-25. ODE_LOTKA_VOLTERRA.M 3-26. LOTKA_VOLTERRA.M 3-27. LOTKA_VOLTERRA.M …CONTINUED 3-28. ODE_BZ.M 3-29. BZ.M 3-30. ODE_LORENZ.M 3-31. LORENZ.M 3-32. LORENZ.M …CONTINUED 4-1. DATA_MXB.M 4-2. MAIN_MXB.M 4-3. MAIN_MXB.M …CONTINUED 4-4. DATA_DECAY.M 4-5. MAIN_DECAY_2D.M 4-6. MAIN_DECAY_SSQ.M 4-7. TAN_POLY.M 4-8. MAIN_DECAY.M 4-9. DATA_DECAY_OFFSET.M 4-10. MAIN_DECAY_OFFSET.M 4-11. MAIN_SAVGOL.M 4-12. SAVGOL_BAD.M
29
30
34
37
37
38
39
39
53
57
59
66
72
72
73
74
75
75
79
87
88
89
90
91
93
93
94
96
96
98
98
99
104
104
104
106
106
107
124
127
129
129
132
133
318 MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE
List of Matlab Files 4-13. SAVGOL.M 4-14. MAIN_SAVGOL_DERIV.M 4-15. SAVGOL_DERIV.M 4-16. MAIN_LOLIPOP.M 4-17. LOLIPOP.M 4-18. DATA_ABC.M 4-19. MAIN_ABC_3D 4-20. MAIN_ABC_LIN1.M 4-21. MAIN_ABC_LIN2.M 4-22. DATA_EXP.M 4-23. MAIN_EXP_2D.M 4-24. MAIN_NG1.M 4-25. MAIN_NG2.M 4-26. DATA_CHROM.M 4-27. MAIN_CHROM.M 4-28. NGLM.M 4-29. RCALC_CHROM.M 4-30. MAIN_CHROM2.M 4-31. MAIN_ABC.M 4-32. NGLM2.M 4-33. RCALC_ABC.M 4-34. RCALC_ABC2.M 4-35. DATA_EQAH2.M 4-36. MAIN_EQAH2.M 4-37. GET_PAR.M 4-38. PUT_PAR.M 4-39. NGLM3.M 4-40. RCALC_EQAH2.M 4-41. DATA_EQFIX.M 4-42. RCALC_EQFIX.M 4-43. MAIN_EQFIX.M 4-44. MAIN_ABC_RED.M 4-45. ODEAPB_C_REV.M 4-46. DATA_GLOB.M 4-47. RCALC_GLOB.M 4-48. MAIN_GLOB.M 4-49. DATA_EMISSION.M 4-50. MAIN_EMISSION_LIN.M 4-51. MAIN_EMISSION_LIN.M …CONTINUED 4-52. MAIN_EMISSION_WEIGHTED.M 4-53. RCALC_EMISSION_WEIGHTED.M 4-54. MAIN_DECAY_SIMPLEX.M 4-55. SSQCALC_DECAY.M 4-56. MAIN_ABC_SIMPLEX.M 4-57. SSQCALC_ABC.M 5-1. MAIN_SVD1.M 5-2. DATA_CHROM2.M 5-3. MAIN_SVD2.M
134
136
137
138
139
143
143
144
145
150
150
151
154
158
158
159
160
161
165
166
167
168
170
172
173
173
173
174
178
178
179
182
185
185
186
187
191
192
194
195
195
205
206
207
207
216
219
219
List of Matlab Files MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE
5-4. MAIN_SVD2.M …CONTINUED 5-5. MAIN_SVD2.M …CONTINUED 5-6. MAIN_SVD2.M …CONTINUED 5-7. MAIN_SVD2.M …CONTINUED 5-8. DATA_AB.M 5-9. MAIN_PLOT_AB.M 5-10. LAWTONSYLVESTRE.M 5-11. DATA_EQAH2A.M 5-12. MAIN_EQAH2A.M 5-13. MAIN_EV_SPACE.M 5-14. MAIN_EV_SPACE.M …CONTINUED 5-15. MAIN_MEANCENTER.M 5-16. MAIN_HELPP.M 5-17. MAIN_HELPP.M …CONTINUED 5-18. MAIN_NOISERED1.M 5-19. DATA_AB2.M 5-20. MAIN_NOISERED2.M 5-21. MAIN_TFA.M 5-22. DATA_CHROM2A.M 5-23. MAIN_ITTFA.M 5-24. MAIN_SYM_ABC.M 5-25. MAIN_SYM_ABC_REV.M 5-26. MAIN_SYM_ABC_REV.M …CONTINUED 5-27. DATA_ABC2.M 5-28. MAIN_TTF.M 5-29. MAIN_EFA1.M 5-30. MAIN_EFA1.M …CONTINUED 5-31. MAIN_EFA1.M …CONTINUED 5-32. MAIN_EFA2.M 5-33. EFA.M 5-34. MAIN_EFA3.M 5-35. DATA_EQAH4A.M 5-36. MAIN_FSW_EFA.M 5-37. MAIN_IT_EFA.M 5-38. NORM_MAX.M 5-39. MAIN_NON_IT_EFA.M 5-40. MAIN_ALS.M 5-41. CONSTRAINTS_POSITIVECA.M 5-42. CONSTRAINTS_LSQNONNEG.M 5-43. CONSTRAINTS_NONNEG.M 5-44. CONSTRAINTS_NONNEG_UNIMOD.M 5-45. CONSTRAINTS_NONNEG_WINDOW.M 5-46. CONSTRAINTS_NONNEG_KNOWN_SPEC.M 5-47. RCALC_RFA.M 5-48. BUILD_PAR_STR.M 5-49. MAIN_RFA.M 5-50. MAIN_PCR.M 5-51. MAIN_PCR.M … CONTINUED
319 220
221
222
224
224
225
234
236
236
237
238
240
241
242
243
244
245
248
251
252
254
255
255
256
256
261
263
264
265
266
267
268
269
272
275
279
282
284
284
285
286
287
289
292
292
293
296
297
320 MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE
List of Matlab Files 5-52. 5-53. 5-54. 5-55. 5-56. 5-57. 5-58. 5-59. 5-60. 5-61. 5-62.
PCR_CALIBRATION.M MAIN_PCR.M … CONTINUED MAIN_PCR.M … CONTINUED PCR_PLS_PRED.M MAIN_PCR.M … CONTINUED PCR_CROSS.M MAIN_PCR.M … CONTINUED MAIN_PCR.M … CONTINUED MAIN_PLS.M PLS_CALIBRATION.M PLS_CROSS.M
300 299 301 302 303 304 305 305 307 309 310
List of Excel Sheets EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET
3-1. 3-2. 3-3. 3-4. 3-5. 3-6. 4-1. 4-2. 4-3. 4-4. 4-5. 4-6.
CHAPTER2.XLS-FESCN CHAPTER2.XLS-EQML2 CHAPTER2.XLS-CASO4 CHAPTER2.XLS-H3PO4 CHAPTER2.XLS-EQSYS CHAPTER2.XLS-RUNGEKUTTA CHAPTER3.XLS-TRENDLINE CHAPTER3.XLS-LINEST CHAPTER3.XLS-PSEUDOINVERSE CHAPTER3.XLS-CHROM CHAPTER3.XLS-KINETICS CHAPTER3.XLS-EMISSION
42
61
63
67
75
83
111
126
147
208
210
212
Index # χ2-fitting,
189, 211
D data generation, 34
linear, 190
chromatograms, 36
non-linear, 195
spectra, 36
standard deviation of the residuals, 194
degrees of freedom, 122, 161, 165, 180, 189,
194, 195
A
design matrix, 115
activity coefficients, 44, 62
E
Debye-Hückel, 63
Alternating Least-Squares (ALS), 280
constraints, 282
eigenvalues, 215
eigenvectors, 181, 215
concentration windows, 287
noise, 221
known component spectrum, 289
significant, 218
non-negativity, 284
unimodality, 286
initial guesses, 281
structure, 221
equilibria, 32, 40
beta-values, 43
complex, 48, 62
B Beer-Lambert's law, 33
absorption, 33
calculation of component spectra using
known concentrations, 144
calculation of concentrations using known
component spectra, 145
component spectrum, 34
concentration profiles, 34
molar absorptivity, 33
path length, 33
components, 40, 44, 49
concentration profiles, 42, 56
degree of dissociation, 65
deprotonation of coordianted water, 47
equilibrium concentrations, 47
equilibrium constant, 44
explicit equations, 41, 64
formation constant, 44
general case, 43
general solution, 48
hydroxide ion concentration, 47
ionic product of water, 58
C chemical equilibrium, 40
chemometrics, 231
chromatography, 36
elution profiles, 36
overlapping peaks, 36
closure, 227, 239
constraints
positive component spectra, 168
coordination chemistry, 45
curvature matrix, 122, 161, 202
model, 45
notation, 43
numerical solution. See Newton-Raphson
algorithm
simple case Fe/SCN, 40
species, 40, 44, 49
stoichiometry, 47, 53
total concentrations, 41, 43, 47
Evolving Factor Analysis (EFA), 259
classical, 260
analysis of titration data, 267
backward EFA, 262
concentration windows, 264, 271
323
Index evolving singular values, 260
forward EFA, 261
rank analysis, 261
significance level, 261, 273
spectrophotometric titration of a 2-protic
acid, 236, 264
spectrophotometric titration of a 4-protic
acid, 268
Fixed-Size Window EFA, 268
steady state, 91
secondary analyses, 271
time resolved single photon counting, 191,
explicit computation of concentration
profiles, 276
initial guesses of concentration profiles,
274
iterative concentration refinement, 271
examples
211
titration curve for a 2-protic acid, 170
titration of acetic acid, 58
Excel
χ2-fitting, 211
$ operator, 14
0th order reaction, 89
element-wise operations, 15
autocatalysis, 87
equilibria, 60
Beer-Lambert's law and multivariate linear
introduction, 7
regression, 144, 145, 146
linear regression, 111, 125
Belousov-Zhabotinsky (BZ), 95
linest, 125
chaos, 97
matrix operations, 11
citric acid titration, 68
multivariate linear regression, 146
Cu/en/H+, 45
polynomial fitting, 125
distorted Gaussian, 38
pseudo-inverse, 146
EDTA species distribution, 66
solver, 61, 74, 207
exponential curve fitting, 150, 154
solver constraints, 61
exponential decay, 105, 205
straight line fit, 111
Fe/SCN titration, 40
gas law, 29
trendline, 111
explicit equations, 29
Gaussian curve, 37
F
general 3-component titration, 56
H3PO4 titration, 67
linearisation of exponential decay, 127
Lorenz attractor, 97
Lotka-Volterra, 92
metal/ligand titration, 60
multivariate chromatogram, 219, 241, 253,
261, 272, 282
overlapping Gaussians, 158, 161, 207
PCR and PLS of corn data, 295
polynomial fitting, 124
predator-prey, 92
reaction 2A→B, 79, 80
reaction A→B, 78, 224, 233, 244
reaction A→B→C, 143, 162, 165, 182, 206,
Factor Analysis, 213
geometrical interpretations, 224
closure, 239
HELP plots, 241
Lawton-Sylvestre, 231
mean centring, 239
noise reduction, 243
reduction in the number of dimensions,
228
three and more components, 235
two components, 224
number of significant factors, 224
feasible regions. See model-free analyses
209, 254, 255, 256, 288
G
reaction A+B↔C, 185
solubility product, 31, 62
spectrophotometric metal/ligand titration,
177
Gaussian curves, 36
distorted, 38
linear combination of, 36
global analysis, 183
324
Index
augmented data matrices, 184
multivariate, 139
numerical difficulties, 120
H hard-modelling. See model-based analyses
Hessian matrix, 202
Heuristic Evolving Latent Projections (HELP),
241
polynomials, 114, 124
pseudo-inverse, 117, 118, 122, 140
standard deviation of the residuals, 122
straight line fit, 109
using Excel, 111
using Matlab, 121
I
ionic strength, 44
Iterative Target Transform Factor Analysis
linearisation of non-linear problems non-uniform error distribution, 130
loadings, 215
target testing, 251
M K
kinetics, 32, 76, Also see examples
boundary conditions, 77
chemical model, 77
complex mechanisms, 80
concentration profiles, 78
Euler method, 80
explicit solutions, 77
initial concentrations, 77
mechanism, 76
numerical integration. See numerical
integration
ordinary differential equations (ODEs), 77
oscillating reaction, 95
rate constants, 77
Matlab
/ \ operators, 48, 109, 117, 118, 121, 156,
165
cell arrays, 169
element-wise operations, 15, 19
introduction, 7
linear regression, 117, 121
multivariate linear regression, 144, 145
optimisation toolbox, 203
polynomial fitting, 124
pseudo-inverse, 142
Singular Value Decomposition, 215
structures, 169
symbolic toolbox, 79
matrix
addition and subtraction, 12
rate law, 76
diagonal, 22
Runge-Kutta, 82
dimension, 8
L
identity, 23
inverse, 24
law of mass action, 31, 40, 44
multiplication, 16
Lawton-Sylvestre, 231
notation, 8
least-squares methods, 102
operator, 11
ligand, 45
orthogonal, 25
linear dependence, 119
orthonormal, 25
concentration profiles, 217
pseudo-inverse, 49
in concentration profiles, 175
rank, 119
linear least-squares. See linear regression
size, 8
linear regression, 109, 163
square, 21
applications, 127
submatrix, 9
design matrix, 115
symmetric, 22
errors in parameters, 121
transposition, 10
generalised matrix notation, 114
mean centring, 239
linearisation of non-linear problems, 127
metal ion, 45
matrix notation, 113
model-based analyses, 101
325
Index best model, 101, 197
model-free analyses, 213
feasible regions, 234, 288
multivariate, 162
Newton-Gauss algorithm. See Newton-
Gauss algorithm
standard deviation of the residuals, 161,
limitations, 234
multiplicative ambiguity, 291
rotational ambiguity, 234, 288
multiplicative ambiguity. See model-free
165, 180, 189
non-white noise, 189
normal equations, 115
normalisation
analyses
multivariate data, 34, 139, 162
unity hight of concentration profiles, 275
number of species, 217
N Newton-Gauss algorithm, 148, 290
calculation of the residuals, 163
convergence criterion, 153
curvature matrix, 161, 202
explicit derivatives, 151
numerical integration
accuracy, 97
Euler, 80
Runge-Kutta, 82, 88, 93
step size, 86
stiff problems, 86, 90, 97
fitting in reduced eigenvector space, 180
O
fixing parameters, 169
flow diagram, 157
Hessian matrix, 202
initial guesses, 148
optimisation Newton-Gauss algorithm. See NewtonGauss algorithm
Jacobian, 149, 163
simplex, 204, See Simplex optimisation
known spectra, 175, 177
solver. See Excel
Levenberg/Marquardt extension, 155
P
Marquardt parameter, 156, 161
minimal algorithm, 149
numerical derivatives, 153, 173
separation of linear and non-linear
parameters, 163
termination criterion, 153
uncolored species, 175, 177
Newton-Raphson algorithm, 48, 69
equilibrium model, 53
flow diagram, 52
initial guesses, 49, 70
input arguments, 53
Jacobian, 50, 74
numerical accuracy, 56
output arguments, 54
shift vector, 50
termination criterion, 56
noise reduction, 243
non-linear least-squares. See non-linear
regression
non-linear regression, 148
errors in parameters, 161
initial guesses, 148
Jacobian, 149
parameters
linear, 105
Partial Least Squares (PLS), 295, 306
calibration, 295, 308
comparing with PCR, 310
mean-centring and normalisation, 307
prediction, 309
prognostic vector, 309
physical/chemical models, 29
polynomial fitting, 114, 124
Savitzky-Golay filter, 130
using Excel, 125
using Matlab, 124
polynomial interpolation, 138
lolipop, 138
Principal Component Regression (PCR), 295,
296
calibration, 295
cross validation, 303
mean-centring and normalisation, 297
prediction, 300, 302
PRESS, 305
326
Index
prognostic vector, 300
sum of squares, 103, 109, 140
principal components, 218, 221
systems of linear equations, 26
pseudo-inverse, 117, 140, 142
systems of non-linear equations, 69
using Excel, 146
T R
Target Factor Analysis, 246
projection matrices, 250
rank deficiency
of concentration profiles, 175, 184
rank of a matrix, 119, 120, 217
residuals, 34, 103, 140
target testing, 247, 250, 251
parameter fitting, 257
Target Transform Search/Fit, 253
standard deviation, 223
parameter fitting via target testing, 257
structure, 222
system of homogeneous differential
Resolving Factor Analysis (RFA), 290
rotational ambiguity. See model-free analyses
equations, 254
transformation matrix, 254
Taylor series, 48, 80, 149, 199
S Savitzky-Golay filter, 130
derivative of a curve using, 135
smoothing using, 131, 132
scalar, 8
scores, 215
simplex optimisation, 204
Singular Value Decomposition (SVD), 181,
214, 260, 268
singular values, 181, 215
titration, 40, Also see examples
acetic acid, 58
acid-base, 40
complexometric, 40
equilibrium constants, 40
Fe/SCN, 40
general 3-component titration, 56
metal/ligand titration, 60
pH, 40
polyprotic acids, 64
magnitude, 219
V
significant, 218
soft-modelling. See model-free analyses
van der Waals coefficients, 30
straight line fit
vector, 8
explicit equations, 111
dimension, 8
matrix notation, 113
scalar product, 17