Linear and Nonlinear Models: Fixed Effects, Random Effects, and Mixed Models
Erik W. Grafarend
Walter de Gruyter
Grafarend · Linear and Nonlinear Models
Erik W. Grafarend
Linear and Nonlinear Models Fixed Effects, Random Effects, and Mixed Models
≥
Walter de Gruyter · Berlin · New York
Author Erik W. Grafarend, em. Prof. Dr.-Ing. habil. Dr. tech. h.c. mult Dr.-Ing. E.h. mult Geodätisches Institut Universität Stuttgart Geschwister-Scholl-Str. 24/D 70174 Stuttgart, Germany E-Mail:
[email protected]
앝 Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence 앪 and durability.
Library of Congress Cataloging-in-Publication Data Grafarend, Erik W. Linear and nonlinear models : fixed effects, random effects, and mixed models / by Erik W. Grafarend. p. cm. Includes bibliographical references and index. ISBN-13: 978-3-11-016216-5 (hardcover : acid-free paper) ISBN-10: 3-11-016216-4 (hardcover : acid-free paper) 1. Regression analysis. 2. Mathematical models. I. Title. QA278.2.G726 2006 519.5136⫺dc22 2005037386
Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at ⬍http://dnb.ddb.de⬎.
ISBN-13: 978-3-11-016216-5 ISBN-10: 3-11-016216-4 쑔 Copyright 2006 by Walter de Gruyter GmbH & Co. KG, 10785 Berlin All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Printed in Germany Cover design: Rudolf Hübler, Berlin Typeset using the author’s word files: M. Pfizenmaier, Berlin Printing and binding: Hubert & Co. GmbH & Co. Kg, Göttingen
Preface “All exact science is dominated by the idea of approximation.” B. Russell “You must always invert.” C.G.J. Jacobi “Well, Mr. Jacobi; here it is: all the generalized inversion of two generations of inventors who knowingly or unknowingly subscribed and extended your dictum. Please, forgive us if we have over-inverted, or if we have not always inverted in the natural and sensible way. Some of us have inverted with labor and pain by using hints from a dean or a tenure and promotion committee that “you better invert more, or else you would be inverted.”” M.Z. Nashed, L.B. Rall There is a certain intention in reviewing linear and nonlinear models from the point-of-view of fixed effects, random effects and mixed models. First, we want to portray the different models from the algebraic point of view – for instance a minimum norm, least squares solution (MINOLESS) – versus the stochastic point-of-view – for instance a minimum bias, minimum variance “best” solution (BLIMBE). We are especially interested in the question under which assumption the algebraic solution coincides with the stochastic solution, for instance when MINOLESS is identical to BLIMBE. The stochastic approach is richer with respect to modeling. Beside the first order moments we need, the expectation of a random variable, also a design for the central second order moments, the variance-covariance matrix of the random variable as long as we deal with second order statistics. Second, we therefore setup a unified approach to estimate (predict) the first order moments, or for instance by BLUUE (BLUUP) and the central second order moments, for instance by BIQUUE, if they exist. In short, BLUUE (BLUUP) stands for “Best Linear Uniformly Unbiased Estimation” (Prediction) and BIQUUE alternatively for “Best Invariant Quadratic Uniformly Unbiased Estimation”. A third criterion is the decision whether the observation vector is inconsistent or random, whether the unknown parameter vector is random or not, whether the “first design matrix” within a linear model is random or not and finally whether the “mixed model” E{y} = Aȟ + &E{]} + E{;}Ȗ has to be applied if we restrict ourselves to linear models. How to handle a nonlinear model where we have a priori information about approximative values will be outlined in detail. As a special case we also deal with “condition equations with unknowns” BE{y} c = Aȟ where the matrices/vector {A, B, c} are given and the observation vector y is again a random variable.
vi
Preface
A fourth problem is related to the question of what is happening when we take observations not over \ n (real line, n-dimensional linear space) but over S n (circle S1 , sphere S 2 ,…, hypersphere S n ), over E n (ellipse E1 , ellipsoid E 2 , …, hyperellipsoid E n ), in short over a curved manifold. We show in particular that the circular variables are elements of a von Mises distribution or that the spherical variables are elements of a von Mises-Fisher distribution. A more detailed discussion is in front of you. The first problem of algebraic regression in Chapter one is constituted by a consistent system of linear observational equations of type underdetermined system of linear equations. So we may say “more unknowns than equations”. We solve the corresponding system of linear equations by an optimization problem which we call the minimum norm solution (MINOS). We discuss the semi-norm solution of Special Relativity and General Relativity and alternative norms of type l p . For “MINOS” we identify the typical generalized inverse and the eigenvalue decomposition for G x -MINOS. For our Front Page Example we compute canonical MINOS. Special examples are Fourier series and Fourier-Legendre series, namely circular harmonic and spherical harmonic regression. Special nonlinear models included Taylor polynomials and generalized Newton iteration, for the case of planar triangular network as an example whose nodal points are a priori coordinated. The representation of the proper objective function of type “MINOS” is finally given for a defective network (P-diagram, E-diagram). The transformation group for observed coordinate differences (translation groups T(2), T(3),..., T(n) ), for observed distances (group of motions T(2) S O(2), T(3) S O(3) ,…, T(n) SO (n) ), for observed angles or distance ratios (conformal groups C 4 (2), C7 (3),..., C( n +1) ( n + 2) / 2 ( n) ) and for observed cross-ratios of area elements in the projective plane (projective group) are reviewed with their datum parameters. Alternatively, the first problem of probabilistic regression – the special GaussMarkov model with datum defect – namely the setup of the linear uniformly minimum bias estimator of type LUMBE for fixed effects is introduced in Chapter two. We define the first moment equations Aȟ = E{y} and the second central moment equations Ȉ y = D{y} and estimate the fixed effects by the homogeneous linear setup ȟˆ = / y of type S-LUMBE by the additional postulate of minimum bias || B ||S2 =|| (I m LA ) ||S2 where B := I m LA is the bias matrix. When are G x -MINOS and S-LUMBE equivalent? The necessary and sufficient condition is G x = S 1 or G x1 = S , a key result. We give at the end an extensive example. The second problem of algebraic regression in Chapter three treats an inconsistent system of linear observational equations of type overdetermined system of linear equations. Or we may say “more observations than unknowns”. We solve the corresponding system of linear equations by an optimization problem which we call the least squares solution (LESS). We discuss the signature of the observation space when dealing with Special Relativity and alternative norms of type l p , namely l2 ,… , l p ,… , lf . For extensive applications we discuss various objective functions like (i) optimal choice of the weight matrix G y : second order design SOD, (ii) optimal choice of weight matrix G y by means of condition equations, and (iii) robustifying objective functions. In all detail we introduce the second order design SOD by an optimal choice of a criterion matrix of weights.
Preface
vii
What is the proper choice of an ideal weight matrix G x ? Here we propose the Taylor-Karman matrix borrowed from the Theory of Turbulence which generates a homogeneous and isotropic weight matrix G x (ideal). Based upon the fundamental work of G. Kampmann, R. Jurisch and B. Krause we robustify G y -LESS and identify outliers. In particular we identify Grassmann-Plücker coordinates which span the normal space R ( A) A . We pay a short tribute to Fuzzy Sets. In some detail we identify G y -LESS and its generalized inverse. Canonical LESS is based on the eigenvalue decomposition of G y -LESS illustrated by an extensive example. As a case study we pay attention to partial redundancies, latent conditions, high leverage points versus break points, direct and inverse Grassmann coordinates, Plücker coordinates, “hat” matrix, right eigenspace analysis, multilinear algebra, “join” and “meet”, the Hodge star operator, dual Grassmann coordinates, dual Plücker coordinates, Gauss-Jacobi Combinatorial Algorithm concluding with a historical note on C.F. Gauss, A.M. Legendre and the invention of Least Squares and its generalization. Alternatively, the second problem of probabilistic regression in Chapter four the special Gauss-Markov model without datum effect – namely the setup of the best linear uniformly unbiased estimator for the first order moments of type BLUUE and of the best invariant quadratic uniformly unbiased estimator for the central second order moments of type BIQUUE for random observations is introduced. First, we define ȟˆ and Ȉ y -BLUUE, two lemmas and a theorem. Alternatively, second we setup by four definitions and by six corollaries, five lemmas and two theorems IQE (“invariant quadratic estimation ”) and best IQUUE (“best invariant quadratic uniformly unbiased estimator ”). Alternative estimators of type MALE (“maximum likelihood”) are reviewed. Special attention is paid to the “IQUUE” of variance-covariance components of Helmert type called “HIQUUE” and “MIQE”. For the case of one variance component, we are able to find necessary and sufficient conditions when LESS agrees to BLUUE, namely G y = Ȉ y1 or G y1 = Ȉ y , a key result. The third problem of algebraic regression in Chapter five – the inconsistent system of linear observational equations with datum defect: overdeterminedunderdetermined system – presents us with three topics. First, by one definition and five lemmas we document the minimum norm, least squares solution (“MINOLESS”). Second, we review of the general eigenspace analysis versus the general eigenspace synthesis. Third, special estimators of type “ D -hybrid approximation solution” (“ D -HAPS ”) and “Tykhonov-Phillips regularization” round up the alternative estimators. Alternatively, third problem of probabilistic regression in Chapter six – the special Gauss-Markov model with datum problem – namely the setup of estimators of type “BLIMBE” and “BLE” for the moments of first order and of type “BIQUUE ” and “BIQE ” for the central moments of second order, is reviewed. First, we define ȟˆ as homogeneous Ȉ y , S-BLUMBE (“ Ȉ y , S – best linear uniformly minimum bias estimator”) and compute via two lemmas and three theorems “hom Ȉ y , S-BLUMBE”, E{y}, D{Aȟˆ}, D{H y } as well as “ Vˆ 2 - BIQUUE” and “ Vˆ 2 BIQE ” of V 2 . Second, by three definitions and one lemma and three theorems we work on “hom BLE”, “hom S-BLE”, “hom D -BLE”. Extensive examples are given. For the case of one variance component we are able to find
viii
Preface
necessary and sufficient conditions when MINOLESS agrees to BLIMBE, namely G x = S 1 , G y = Ȉ y1 or G x1 = S, G y1 = Ȉ y , a key result. As a spherical problem of algebraic representation we treat an incomplete system of directional observational equations, namely an overdetermined system of nonlinear equations on curved manifolds (circle, sphere, hypersphere S p ). We define what we mean by minimum geodesic distance on S1 and S 2 and present two lemmas on S1 and two lemmas on S 2 of type minimum geodesic distances. In particular, we take reference to the von Mises distribution on a circle, to the Fisher spherical distribution on the sphere and, in general, to the Langevin sphere S p \ p +1 . The minimal geodesic distance (“MINGEODISC”) is computed for / g and (/ g , ) g ) . We solve the corresponding nonlinear normal equations. In conclusion, we present a historical note of the von Mises distribution and generalize to the two-dimensional generalized Fisher distribution by an oblique map projection. At the end, we summarize the notion of angular metric and give an extensive case study. The fourth problem of probabilistic regression in Chapter eight as a special Gauss-Markov model with random effects is described as “BLIP” and “VIP” for the central moments of first order. Definitions are given by hom BLIP (“homogeneous best linear Mean Square Predictor”), S-hom BLIP (“homogeneous linear minimum S-modified Mean Square Predictor”) and hom D -VIP (“homogeneous linear minimum variance-minimum bias in the sense of a weighted hybrid normsolution”). One lemma and three theorems collect the results for (i) hom BLIP, (ii) hom S-BLIP and (iii) hom D -VIP. In all detail, we compute the predicted solution for the random effects, its bias vector, the Mean Square Prediction Error MSPE. Three cases for nonlinear error propagation with random effects are discussed. In Chapter nine we specialize towards the fifth problem of algebraic regression, namely the system of conditional equations of homogeneous and inhomogeneous type. We follow two definitions, one theorem and three lemmas of type G y LESS before we present an example from angular observations. As Chapter ten we treat the fifth problem of probabilistic regression, the GaussMarkov model with mixed effects in setting up BLUUE estimators for the moments of first order, special case of Kolmogorov-Wiener prediction. After defining Ȉ y -BLUUE of [ and E{z} where z is random variable, we present two n lemmas and one theorem how to construct estimators of ȟˆ , E {z} on the basis of Ȉ y -BLUUE of ȟ and E{z} . By a separate theorem we fix a homogeneous quadratic setup of the variance component Vˆ 2 within the first model of fixed effects and random effects superimposed. As an example we present “collocation” enriched by a set of comments about A.N. Kolmogorov – N. Wiener prediction, the so-called “yellow devil”. Chapter eleven leads us to the “sixth problem of probabilistic regression”, the celebrated random effect model “errors-in-variables”. We outline the model and sum up the theory of normal equations. Our example is the linear equation E{y} = E{X}Ȗ where the first order design matrix is random. An alternative name is “straight line fit by total least squares”. Finally we give a detailed example and a literature list.
ix
Preface
C.F. Gauss and F.R. Helmert introduced the sixth problem of generalized algebraic regression, the system of conditional equations with unknowns which we proudly present in Chapter twelve. First, we define W-LESS of the model Ax + Bi = By where i is an inconsistency parameter. In two lemmas we solve its normal equations and discuss the condition on the matrix A B . Two alternative solutions are based on R, W-MINOLESS (two lemmas, one definition) and R, W-HAPS (one lemma, one definition) are given separately. An example is reviewed as a height network. For shifted models of type Ax + Bi = By c with similar results are summarized. For the special nonlinear problem of the 3d datum transformation of Chapter thirteen we review the famous Procrustes Algorithm. With the algorithm we consider the coupled unknowns of type dilatation, also called scale factor, translation and rotation for random variables of 3d coordinates in an “old system” and in a “new system”. After the definition of the conformal group C7(3) in a three-dimensional network with 7 unknown parameters we present four corollaries and one theorem: First, we reduce the translation parameters, second the scale parameters and last, not least, third the rotation parameters bound together in a theorem. A special result is the computation of the variance-covariance matrix of the observation array E := Y1 Y2 X c3 x1 1x c2 as a function of Ȉ vecYc , Ȉ vecYc , Ȉ vecYc , (I n x1 X 2 ) vec Y2c and (I n
x1X3 ) . A detailed example of type ILESS is given including a discussion about || El || and ||| El ||| precisely defined. Here we conclude with a reference list. 1
2
1
Chapter fourteen as our sixth problem of type generalized algebraic regression “revisited” deals with “The Grand Linear Model”, namely the split level of conditional equations with unknowns (general Gauss-Helmert model). The linear model consists of 3 components: (i) B1i = By c , (ii) A 2 x + B 2 i = B 2 y , c 2 R (B 2 ) and (iii) A 3 x c = 0 or A 3 x + c = 0, R3 R ( A 3 ) . The first equation contains only conditions on the observation vector. In contrast, the second equation balances both condition equations between the unknown parameters in the form of A 2 x and the conditions B 2 y = c3 . Finally, the third equation is a condition exclusively on the unknown parameter vector. For our model Lemma 14.1 presents the W-LESS solution, Lemma 14.2 the R, W-MINOLESS solution and Lemma 14.3 the R, W-HAPS solution. As an example we treat a planar triangle whose coordinates consist of three distances being measured under a datum condition. Chapter fifteen is concerned with three topics. First, we generalize the univariate Gauss-Markov model to the multivariate Gauss-Markov model with and without constraints. We present two definitions, one lemma about multivariate LESS, one lemma about its counterpart of type multivariate Gauss-Markov modeling and one theorem of type multivariate Gauss-Markov modeling with constraints. Second, by means of a MINOLESS solution we present the celebrated “n-way classification model” to answer the question of how to compute a basis of unbiased estimable quantities. Third, we take into account the fact that in addition to observational models we also have dynamical system equations. In some detail, we review the Kalman Filter (Kalman-Bucy Filter), models of type ARMA and ARIMA. We illustrate the notions of “steerability” and “observability” by two examples. The state differential equation as well as the observational equation are simultaneously solved by Laplace transformation. At this end we focus on
x
Preface
the modern theory of dynamic nonlinear models and comment on the theory of chaotic behavior as its up-to-date counterparts. In the appendices we specialize on specific topics. Appendix A is a review on matrix algebra, namely special matrices, scalar measures and inverse matrices, eigenvalues and eigenvectors and generalized inverses. The counterpart is matrix analysis which we outline in Appendix B. We begin with derivations of scalarvalued and vector-valued vector functions, followed by a chapter on derivations of trace forms and determinantal forms. A specialty is the derivation of a vector/matrix function of a vector/matrix. We learn how to derive the KroneckerZehfuß product and matrix-valued symmetric or antisymmetric matrix function. Finally we show how to compute higher order derivatives. Appendix C is an elegant review of Lagrange multipliers. The lengthy Appendix D introduces sampling distributions and their use: confidence intervals and confidence regions. As peculiar vehicles we show how to transform random variables. A first confidence interval of Gauss-Laplace normally distributed observations is computed for the case P , V 2 known, example the Three Sigma Rule. A second confidence interval for the sampling form the Gauss-Laplace normal distributions for the mean built on the assumption that the variance is known. The alternative sampling from the Gauss-Laplace normal distribution leads the third confidence interval for the mean, variance unknown based on the Student sampling distribution. The fourth confidence interval for the variance is based on the analogue sampling for the variance based on the F 2 - Helmert distribution. For both the intervals of confidence, namely based on the Student sampling distribution for the mean, variance unknown, and based on the F 2 - Helmert distribution for the sample variance, we compute the corresponding Uncertainty Principle. The case of a multidimensional Gauss-Laplace normal distribution is outlined for the computation of confidence regions for fixed parameters in the linear Gauss-Markov model. Key statistical notions like moments of a probability distribution, the Gauss-Laplace normal distribution (quasi-normal distribution), error propagation as well as important notions of identifiability and unbiasedness are reviewed. We close with bibliographical indices. Here we are not solving rank-deficient or ill-problems in using UTV or QR factorization techniques. Instead we refer to A. Björk (1996), P. Businger and G. H. Golub (1965), T. F. Chan and P. C. Hansen (1991, 1992), S. Chandrasekaran and I. C. Ipsen (1995), R. D. Fierro (1998), R. D. Fierro and J. R. Bunch (1995), R. D. Fierro and P. C. Hansen (1995, 1997), L. V. Foster (2003), G. Golub and C. F. van Loan (1996), P. C. Hansen (1990 a, b, 1992, 1994, 1995, 1998), Y. Hosada (1999), C. L. Lawson and R. J. Hanson (1974), R. Mathias and G.W. Stewart (1993), A. Neumaier (1998), H. Ren (1996), G. W. Stewart (1992, 1992, 1998), L. N. Trefethen and D. Bau (1997). My special thanks for numerous discussions go to J. Awange (Kyoto/Japan), A. Bjerhammar (Stockholm/Sweden), F. Brunner (Graz/Austria), J. Cai (Stuttgart/Germany), A. Dermanis (Thessaloniki/Greece), W. Freeden (Kaiserslautern /Germany), R. Jurisch (Dessau/Germany), J. Kakkuri (Helsinki/Finland), G. Kampmann (Dessau/Germany), K. R. Koch (Bonn/Germany), F. Krumm (Stuttgart/Germany), O. Lelgemann (Berlin/Germany), H. Moritz (Graz/Austria), F. Sanso (Milano/Italy), B. Schaffrin (Columbus/Ohio/USA), L. Sjoeberg (Stock-
Preface
xi
holm/Sweden), N. Sneeuw (Calgary/Canada), L. Svensson (Gävle/Sweden), P. Vanicek (Fredericton/New Brunswick/Canada). For the book production I want to thank in particular J. Cai, F. Krumm, A. Vollmer, M. Paweletz, T. Fuchs, A. Britchi, and D. Wilhelm (all from Stuttgart/ Germany). At the end my sincere thanks go to the Walter de Gruyter Publishing Company for including my book into their Geoscience Series, in particular to Dr. Manfred Karbe and Dr. Robert Plato for all support.
Stuttgart, December 2005
Erik W. Grafarend
Contents 1
2
3
The first problem of algebraic regression – consistent system of linear observational equations – underdetermined system of linear equations: {Ax = y | A \ n×m , y R ( A ) rk A = n, n = dim Y} 1-1 Introduction 1-11 The front page example 1-12 The front page example in matrix algebra 1-13 Minimum norm solution of the front page example by means of horizontal rank partitioning 1-14 The range R( f ) and the kernel N(A) 1-15 Interpretation of “MINOS” by three partitionings 1-2 The minimum norm solution: “MINOS” 1-21 A discussion of the metric of the parameter space X 1-22 Alternative choice of the metric of the parameter space X 1-23 G x -MINOS and its generalized inverse 1-24 Eigenvalue decomposition of G x -MINOS: canonical MINOS 1-3 Case study: Orthogonal functions, Fourier series versus Fourier-Legendre series, circular harmonic versus spherical harmonic regression 1-31 Fourier series 1-32 Fourier-Legendre series 1-4 Special nonlinear models 1-41 Taylor polynomials, generalized Newton iteration 1-42 Linearized models with datum defect 1-5 Notes
1 3 4 5 7 9 12 17 23 24 25 26
40 41 52 68 68 74 82
The first problem of probabilistic regression – special GaussMarkov model with datum defect – Setup of the linear uniformly minimum bias estimator of type LUMBE for fixed effects 2-1 Setup of the linear uniformly minimum bias estimator of type LUMBE 2-2 The Equivalence Theorem of G x -MINOS and S -LUMBE 2-3 Examples
86 90 91
The second problem of algebraic regression – inconsistent system of linear observational equations – overdetermined system of linear equations: {Ax + i = y | A \ n×m , y R ( A ) rk A = m, m = dim X} 3-1 Introduction 3-11 The front page example
95 97 97
85
xiv
Contents
3-2
3-3
3-4 3-5 4
3-12 The front page example in matrix algebra 98 3-13 Least squares solution of the front page example by means of vertical rank partitioning 100 3-14 The range R( f ) and the kernel N( f ), interpretation of the least squares solution by three partitionings 103 The least squares solution: “LESS” 111 3-21 A discussion of the metric of the parameter space X 118 3-22 Alternative choices of the metric of the observation space Y 119 3-221 Optimal choice of weight matrix: SOD 120 3-222 The Taylor Karman criterion matrix 124 3-223 Optimal choice of the weight matrix: 125 A The space R ( A ) and R ( A ) 3-224 Fuzzy sets 129 129 3-23 G x -LESS and its generalized inverse 3-24 Eigenvalue decomposition of G y -LESS: canonical LESS 131 Case study Partial redundancies, latent conditions, high leverage points versus break points, direct and inverse Grassmann coordinates, Plücker coordinates 143 3-31 Canonical analysis of the hat matrix, partial redundancies, high leverage points 143 3-32 Multilinear algebra, ”join” and “meet”, the Hodge star operator 152 3-33 From A to B: latent restrictions, Grassmann coordinates, Plücker coordinates 158 3-34 From B to A: latent parametric equations, dual Grassmann coordinates, dual Plücker coordinates 172 3-35 Break points 176 Special linear and nonlinear models A family of means for direct observations 184 A historical note on C. F. Gauss, A.-M. Legendre and the invention of Least Squares and its generalization 185
The second problem of probabilistic regression – special Gauss-Markov model without datum defect – Setup of BLUUE for the moments of first order and of BIQUUE for the central moment of second order 4-1 Introduction 4-11 The front page example 4-12 Estimators of type BLUUE and BIQUUE of the front page example 4-13 BLUUE and BIQUUE of the front page example, sample median, median absolute deviation
187 190 191 192 201
Contents
4-14 Alternative estimation Maximum Likelihood (MALE) 4-2 Setup of the best linear uniformly unbiased estimators of type BLUUE for the moments of first order 4-21 The best linear uniformly unbiased estimation ȟˆ of ȟ : Ȉ y -BLUUE 4-22 The Equivalence Theorem of G y -LESS and Ȉ y -BLUUE 4-3 Setup of the best invariant quadratic uniform by unbiased estimator of type BIQUUE for the central moments of second order 4-31 Block partitioning of the dispersion matrix and linear space generated by variance-covariance components 4-32 Invariant quadratic estimation of variance-covariance components of type IQE 4-33 Invariant quadratic uniformly unbiased estimations of variance-covariance components of type IQUUE 4-34 Invariant quadratic uniformly unbiased estimations of one variance component (IQUUE) from Ȉ y -BLUUE: HIQUUE 4-35 Invariant quadratic uniformly unbiased estimators of variance covariance components of Helmert type: HIQUUE versus HIQE 4-36 Best quadratic uniformly unbiased estimations of one variance component: BIQUUE 5
xv 205 208 208 216
217 218 223 226
230
232 236
The third problem of algebraic regression – inconsistent system of linear observational equations with datum defect overdetermined- underdermined system of linear 243 equations: {Ax + i = y | A \ n×m , y R ( A ) rk A < min{m, n}} 5-1 Introduction 245 5-11 The front page example 246 5-12 The front page example in matrix algebra 246 5-13 Minimum norm - least squares solution of the front page example by means of additive rank partitioning 248 5-14 Minimum norm - least squares solution of the front page example by means of multiplicative rank partitioning: 252 5-15 The range R( f ) and the kernel N( f ) interpretation of “MINOLESS” by three partitionings 256 5-2 MINOLESS and related solutions like weighted minimum normweighted least squares solutions 263 5-21 The minimum norm-least squares solution: "MINOLESS" 263 5-22 (G x , G y ) -MINOS and its generalized inverse 273 5-23 Eigenvalue decomposition of (G x , G y ) -MINOLESS 277 5-24 Notes 282
xvi
6
7
8
9
Contents
5-3 The hybrid approximation solution: D-HAPS and TykhonovPhillips regularization The third problem of probabilistic regression – special Gauss- Markov model with datum problem – Setup of BLUMBE and BLE for the moments of first order and of BIQUUE and BIQE for the central moment of second order 6-1 Setup of the best linear minimum bias estimator of type BLUMBE 6-11 Definitions, lemmas and theorems 6-12 The first example: BLUMBE versus BLE, BIQUUE versus BIQE, triangular leveling network 6-121 The first example: I3, I3-BLUMBE 6-122 The first example: V, S-BLUMBE 6-123 The first example: I3 , I3-BLE 6-124 The first example: V, S-BLE 6-2 Setup of the best linear estimators of type hom BLE, hom S-BLE and hom Į-BLE for fixed effects A spherical problem of algebraic representation – Inconsistent system of directional observational equationsoverdetermined system of nonlinear equations on curved manifolds 7-1 Introduction 7-2 Minimal geodesic distance: MINGEODISC 7-3 Special models: from the circular normal distribution to the oblique normal distribution 7-31 A historical note of the von Mises distribution 7-32 Oblique map projection 7-33 A note on the angular metric 7-4 Case study The fourth problem of probabilistic regression – special Gauss-Markov model with random effects– Setup of BLIP and VIP for the moments of first order 8-1 The random effect model 8-2 Examples
282
285 287 289 296 297 301 306 308 312
327 328 331 335 335 337 340 341
347 348 362
The fifth problem of algebraic regression - the system of conditional equations: homogeneous and inhomogeneous equations {By = Bi versus c + By = Bi} 373 9-1 G y -LESS of system of inconsistent homogeneous conditional equations 374 9-2 Solving a system of inconsistent inhomogeneous conditional equations 376
Contents
9-3 Examples 10
11
12
The fifth problem of probabilistic regression – general Gauss-Markov model with mixed effects– Setup of BLUUE for the moments of first order (Kolmogorov-Wiener prediction) 10-1 Inhomogeneous general linear Gauss-Markov model (fixed effects and random effects) 10-2 Explicit representations of errors in the general Gauss-Markov model with mixed effects 10-3 An example for collocation 10-4 Comments The sixth problem of probabilistic regression – the random effect model – “errors-in-variables” 11-1 Solving the nonlinear system of the model “errors-in-variables” 11-2 Example: The straight line fit 11-3 References The sixth problem of generalized algebraic regression – the system of conditional equations with unknowns – (Gauss-Helmert model) 12-1 Solving the system of homogeneous condition equations with unknowns 12-11 W-LESS 12-12 R, W-MINOLESS 12-13 R, W-HAPS 12-14 R, W-MINOLESS against R, W - HAPS 12-2 Examples for the generalized algebraic regression problem: homogeneous conditional equations with unknowns 12-21 The first case: I-LESS 12-22 The second case: I, I-MINOLESS 12-23 The third case: I, I-HAPS 12-24 The fourth case: R, W-MINOLESS, R positive semidefinite, W positive semidefinite 12-3 Solving the system of inhomogeneous condition equations with unknowns 12-31 W-LESS 12-32 R, W-MINOLESS 12-33 R, W-HAPS 12-34 R, W-MINOLESS against R, W-HAPS 12-4 Conditional equations with unknowns: from the algebraic approach to the stochastic one
xvii 377
379 380 385 386 397 401 404 406 410
411 414 414 416 419 421 421 422 422 423 423 424 424 426 427 428 429
xviii
Contents
12-41 12-42 12-43 12-44 13
14
15
Shift to the center The condition of unbiased estimators n {ȟ} The first step: unbiased estimation of ȟˆ and E The second step: unbiased estimation N 1 and N 2
The nonlinear problem of the 3d datum transformation and the Procrustes Algorithm 13-1 The 3d datum transformation and the Procrustes Algorithm 13-2 The variance - covariance matrix of the error matrix E 13-3 Case studies: The 3d datum transformation and the Procrustes Algorithm 13-4 References The seventh problem of generalized algebraic regression revisited: The Grand Linear Model: The split level model of conditional equations with unknowns (general Gauss-Helmert model) 14-1 Solutions of type W-LESS 14-2 Solutions of type R, W-MINOLESS 14-3 Solutions of type R, W-HAPS 14-4 Review of the various models: the sixth problem
429 429 430 430 431 433 441 441 444
445 446 449 450 453
Special problems of algebraic regression and stochastic estimation: multivariate Gauss-Markov model, the n-way classification model, dynamical systems 455 15-1 The multivariate Gauss-Markov model – a special problem of probabilistic regression – 15-2 n-way classification models 15-21 A first example: 1-way classification 15-22 A second example: 2-way classification without interaction 15-23 A third example: 2-way classification with interaction 15-24 Higher classifications with interaction 15-3 Dynamical Systems
Appendix A: Matrix Algebra A1 A2 A3 A4 A5 A6
Matrix-Algebra Special Matrices Scalar Measures and Inverse Matrices Vectorvalued Matrix Forms Eigenvalues and Eigenvectors Generalized Inverses
455 460 460 464 469 474 476 485 485 488 495 506 509 513
Contents
xix
Appendix B: Matrix Analysis
522
B1 Derivations of Scalar-valued and Vector-valued Vector Functions B2 Derivations of Trace Forms B3 Derivations of Determinantal Forms B4 Derivations of a Vector/Matrix Function of a Vector/Matrix B5 Derivations of the Kronecker-Zehfuß product B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Matrix Functions B7 Higher order derivatives Appendix C: Lagrange Multipliers C1 A first way to solve the problem Appendix D: Sampling distributions and their use: Confidence Intervals and Confidence Regions
522 523 526 527 528 528 530 533 533 543
D1 A first vehicle: Transformation of random variables 543 D2 A second vehicle: Transformation of random variables 547 D3 A first confidence interval of Gauss-Laplace normally distributed observations: P , V 2 known, the Three Sigma Rule 553 D31 The forward computation of a first confidence interval of Gauss-Laplace normally distributed observations: P , V 2 known D32 The backward computation of a first confidence interval of Gauss-Laplace normally distributed observations: P , V 2 known
564
D4 Sampling from the Gauss-Laplace normal distribution: a second confidence interval for the mean, variance known
567
D41 Sampling distributions of the sample mean Pˆ , V 2 known, and of the sample variance Vˆ 2 D42 The confidence interval for the sample mean, variance known
582 592
D5 Sampling from the Gauss-Laplace normal distribution: a third confidence interval for the mean, variance unknown
596
557
D51 Student’s sampling distribution of the random variable ( Pˆ P ) / Vˆ 596 D52 The confidence interval for the sample mean, variance unknown 605 D53 The Uncertainty Principle 611 D6 Sampling from the Gauss-Laplace normal distribution: a fourth confidence interval for the variance
613
D61 The confidence interval for the variance D62 The Uncertainty Principle
613 619
xx
Contents
D7 Sampling from the multidimensional Gauss-Laplace normal distribution: the confidence region for the fixed parameters in the linear Gauss-Markov model Appendix E: Statistical Notions E1 Moments of a probability distribution, the Gauss-Laplace normal distribution and the quasi-normal distribution E2 Error propagation E3 Useful identities E4 The notions of identifiability and unbiasedness
621 163 644 648 651 652
Appendix F: Bibliographic Indexes
655
References
659
Index
745
1
The first problem of algebraic regression – consistent system of linear observational equations – underdetermined system of linear equations: {Ax = y | A \ n×m , y R ( A ) rk A = n, n = dim Y} : Fast track reading: Read only Lemma 1.3.
Lemma 1.2 xm G x -MINOS of x
Lemma 1.3 xm G x -MINOS of x
Definition 1.1 xm G x -MINOS of x
Lemma 1.4 characterization of G x -MINOS
Lemma 1.6 # adjoint operator A
Definition 1.5 # Adjoint operator A
Lemma 1.7 eigenspace analysis versus eigenspace synthesis
Corollary 1.8 Symmetric pair of eigensystems
Lemma 1.9 Canonical MINOS
“The guideline of chapter one: definitions, lemmas and corollary”
2
1 The first problem of algebraic regression
The minimum norm solution of a system of consistent linear equations Ax = y subject to A R n× m , rk A = n, n < m, is presented by Definition 1.1, Lemma 1.2 and Lemma 1.3. Lemma 1.4 characterizes the solution of the quadratic optimization problem in terms of the (1,2,4)-generalized inverse, in particular the right inverse. The system of consistent nonlinear equations Y = F( X) is solved by means of two examples. Both examples are based on distance measurements in a planar network, namely a planar triangle. In the first example Y = F( X) is linearized at the point x, which is given by prior information and solved by means of Newton iteration. The minimum norm solution is applied to the consistent system of linear equations 'y = A'x and interpreted by means of first and second moments of the nodal points. In contrast, the second example aims at solving the consistent system of nonlinear equations Y = F( X) in a closed form. Since distance measurements as Euclidean distance functions are left equivariant under the action of translation group as well as the rotation group – they are invariant under translation and rotation of the Cartesian coordinate system – at first a TRbasis (translation-rotation basis) is established. Namely the origin and the axes of the coordinate system are fixed. With respect to the TR-basis (a set of free parameters has been fixed) the bounded parameters are analytically fixed. Since no prior information is built in, we prove that two solutions of the consistent system of nonlinear equations Y = F( X) exist. In the chosen TR-basis the solution vector X is not of minimum norm. Accordingly, we apply a datum transformation X 6 x of type group of motion (decomposed into the translation group and the rotation group). The parameters of the group of motion (2 for translation, 1 for rotation) are determined under the condition of minimum norm of the unknown vector x, namely by means of a special Procrustes algorithm. As soon as the optimal datum parameters are determined we are able to compute the unknown vector x which is minimum norm. Finally, the Notes are an attempt to explain the origin of the injectivity rank deficiency, namely the dimension of the null space N ( A), m rk A of the consistent system of linear equations Ax = y subject to A R n× m and rk A = n, n < m , as well as of the consistent system of nonlinear equations F( X) = Y subject to a Jacobi matrix J R n× m and rk J = n, n < m = dim X. The fundamental relation to the datum transformation, also called transformation groups (conformal group, dilatation group /scale/, translation group, rotation group and projective group) as well as to the “soft” Implicit Function Theorem is outlined. By means of a certain algebraic objective function which geometrically is called minimum distance function we solve the first inverse problem of linear and nonlinear equations, in particular of algebraic type, which relate observations to parameters. The system of linear or nonlinear equations we are solving here is classified as underdetermined. The observations, also called measurements, are elements of a certain observation space Y of integer dimension, dim Y = n,
1-1 Introduction
3
which may be metrical, especially Euclidean, pseudo–Euclidean, in general a differentiable manifold. In contrast, the parameter space X of integer dimension, dim X = m, is metrical as well, especially Euclidean, pseudo–Euclidean, in general a differentiable manifold, but its metric is unknown. A typical feature of algebraic regression is the fact that the unknown metric of the parameter space X is induced by the functional relation between observations and parameters. We shall outline three aspects of any discrete inverse problem: (i) set-theoretic (fibering), (ii) algebraic (rank partitioning, “IPM”, the Implicit Function Theorem) and (iii) geometrical (slicing) Here we treat the first problem of algebraic regression: A consistent system of linear observational equations: Ax = y , A R n× m , rk A = n, n < m , also called “underdetermined system of linear equations”, in short “more unknowns than equations” is solved by means of an optimization problem. The Introduction presents us a front page example of two inhomogeneous linear equations with unknowns. In terms of five boxes and five figures we review the minimum norm solution of such a consistent system of linear equations which is based upon the trinity
1-1 Introduction With the introductory paragraph we explain the fundamental concepts and basic notions of this section. For you, the analyst, who has the difficult task to deal with measurements, observational data, modeling and modeling equations we present numerical examples and graphical illustrations of all abstract notions. The elementary introduction is written not for a mathematician, but for you, the analyst, with limited remote control of the notions given hereafter. May we gain your interest. Assume an n-dimensional observation space Y, here a linear space parameterized by n observations (finite, discrete) as coordinates y = [ y1 ," , yn ]c R n in which an r-dimensional model manifold is embedded (immersed). The model
4
1 The first problem of algebraic regression
manifold is described as the range of a linear operator f from an m-dimensional parameters space X into the observation space Y. The mapping f is established by the mathematical equations which relate all observables to the unknown parameters. Here the parameters space X , the domain of the linear operator f, will be restricted also to a linear space which is parameterized by coordinates x = [ x1 ," , xm ]c R m . In this way the linear operator f can be understood as a coordinate mapping A : x 6 y = Ax. The linear mapping f : X o Y is geometrically characterized by its range R(f), namely R(A), defined by R(f):= {y Y | y = f (x) for all x X} which in general is a linear subspace of Y and its null space defined by N ( f ) := {x X | f (x) = 0}. Here we restrict the range R(f), namely R(A), to coincide with the n = r-dimensional observation space Y such that y R (f ) , namely y R (A) . Example 1.1 will therefore demonstrate the range space R(f), namely R(A), which here coincides with the observation space Y, (f is surjective or “onto”) as well as the null space N(f), namely N(A), which is not empty. (f is not injective or one-to-one) Box 1.1 will introduce the special linear model of interest. By means of Box 1.2 it will be interpreted as a polynomial of degree two based upon two observations and three unknowns, namely as an underdetermined system of consistent linear equations. Box 1.3 reviews the formal procedure in solving such a system of linear equations by means of “horizontal” rank partitioning and the postulate of the minimum norm solution of the unknown vector. In order to identify the range space R(A), the null space N(A) and its orthonormal complement, N ( A) A , Box 1.4 by means of algebraic partitioning (“horizontal” rank partitioning) outlines the general solution of a system of homogeneous linear equations approaching zero. With a background Box 1.5 presents the diagnostic algorithm for solving an underdetermined system of linear equations. In contrast, Box 1.6, is a geometric interpretation of a special solution of a consistent system of inhomogeneous linear equations of type “minimum norm” (MINOS). The g-inverse A m of the type “MINOS” is finally characterized by three conditions collected in Box 1.7. Figure 1.1 demonstrates the range space R(A), while Figure 1.2 and 1.3 demonstrate the null space N ( A ) as well as its orthogonal complement N ( A) A . Figure 1.4 illustrates the orthogonal projection of an element of the null space N ( A ) onto the range space R ( A ) , where A is a generalized inverse. In terms of fibering the set of points of the parameter space as well as of the observation space Figure 1.5 introduced the related Venn diagrams. 1-11
The front page example
Example 1.1
(polynomial of degree two, consistent system of linear equations Ax = y, x X = R m , dim X = m, y Y = R n , dim Y = n, r = rk A = dim Y ):
5
1-1 Introduction
First, the introductory example solves the front page consistent system of linear equations, x1 + x2 + x3 = 2 x1 + 2 x2 + 4 x3 = 3, obviously in general dealing with the linear space X = R m x, dim X = m, here m=3, called the parameter space, and the linear space Y = R n y , dim Y = n, here n=2 , called the observation space. 1-12
The front page example in matrix algebra
Second, by means of Box 1.1 and according to A. Cayley’s doctrine let us specify the consistent system of linear equations in terms of matrix algebra. Box 1.1: Special linear model: polynomial of degree two, two observations, three unknowns ª y º ªa y = « 1 » = « 11 ¬ y2 ¼ ¬ a21
a12 a22
ª x1 º a13 º « » x2 a23 »¼ « » ¬« x3 ¼»
ª x1 º ª 2 º ª1 1 1 º « » y = Ax : « » = « » « x2 » ¬ 3 ¼ ¬1 2 4 ¼ « » ¬ x3 ¼ xc = [ x1 , x2 , x3 ], y c = [ y1 , y2 ] = [2, 3], x R 3×1 , y Z +2×1 R 2×1 ª1 1 1 º
A := « » Z+ R 1 2 4 ¬ ¼ r = rk A = dim Y = n = 2. The matrix A R n× m is an element of R n× m , the n×m array of real numbers. dim X = m defines the number of unknowns (here: m=3), dim Y = n, the number of observations (here: n=2). A mapping f is called linear if f ( x1 + x2 ) = f ( x1 ) + f ( x2 ) and f (O x) = O f ( x) holds. Beside the range R(f), the range space R(A), the linear mapping is characterized by the kernel N ( f ) := {x R m | f (x) = 0}, the null space N ( A) := {x R m | Ax = 0} to be specified lateron. ? Why is the front page system of linear equations called “underdetermined”?
6
1 The first problem of algebraic regression
Just observe that we are left with only two linear equations for three unknowns ( x1 , x2 , x3 ) . Indeed the system of inhomogeneous linear equations is “underdetermined”. Without any additional postulate we shall be unable to inverse those equations for ( x1 , x2 , x3 ) . In particular we shall outline how to find such an additional postulate. Beforehand we have to introduce some special notions from the theory of operators. Within matrix algebra the index of the linear operator A is the rank r = rk A, here r = 2, which coincides with the dimension of the observation space, here n = dim Y = 2. A system of linear equations is called consistent if rk A = dim Y. Alternatively we say that the mapping f : x 6 y = f (x) R (f ) or A : x 6 Ax = y R (A) takes an element x X into the range R(f) or the range space R(A), also called the column space of the matrix A. f : x 6 y = f (x), y R ( f ) A : x 6 Ax = y, y R(A ) . Here the column space is spanned by the first column c1 and the second column c 2 of the matrix A, the 2×3 array, namely ° ª1º R (A) = span ® « » , ¯° ¬1¼
ª 1 º ½° «2» ¾ . ¬ ¼ ¿°
Let us continue with operator theory. The right complementary index of the linear operator A R n× m which accounts for the injectivity defect given by d = m - rk A (here d = m - rk A = 1). “Injectivity” relates to the kernel N(f), or “the null space” we shall constructively introduce lateron. How can such a linear model of interest, namely a system of consistent linear equations, be generated? Let us assume that we have observed a dynamical system y(t) which is represented by a polynomial of degree two with respect to time t R, namely y (t ) = x1 + x2 t + x3t 2 . Due to y (t ) = 2 x3 it is a dynamical system with constant acceleration or constant second derivative with respect to time t. The unknown polynomial coefficients are collected in the column array x = [ x1 ," , xm ]c, x X = R 3 , dim X = 3, and constitute the coordinates of the three-dimensional parameter space X. If the dynamical system y(t) is observed at two instants, say y(t1) = y1 = 2, y(t2) = y2 = 3, say at t1 = 1 and t2 = 2, respectively, and if we collect the observations in the column array y = [ y1 , y2 ]c = [2, 3]c, y Y = R 2 , dim Y = 2, they constitute the coordinates of the two-dimensional observation space Y. Thus we are left with a special linear model interpreted in Box 1.2. We use “ ” as the symbol for “equivalence”.
7
1-1 Introduction
Box 1.2: Special linear model: polynomial of degree two, two observations, three unknowns ª y º ª1 t1 y = « 1» = « ¬ y2 ¼ ¬1 t2
ª x1 º t12 º « » » x2 t22 ¼ « » «¬ x3 »¼
ª x1 º ª t1 = 1, y1 = 2 ª 2 º ª1 1 1 º « » « :« » = « » « x2 » ~ ¬t2 = 2, y2 = 3 ¬ 3 ¼ ¬1 2 4 ¼ « x » ¬ 3¼ ~ y = Ax, r = rk A = dim Y = n = 2 . Third, let us begin with a more detailed analysis of the linear mapping f : Ax = y , namely of the linear operator A R n× m , r = rk A = dim Y = n. We shall pay special attention to the three fundamental partitionings, namely (i) algebraic partitioning called rank partitioning of the matrix A, (ii) geometric partitioning called slicing of the linear space X, (iii) set-theoretical partitioning called fibering of the domain D(f). 1-13
Minimum norm solution of the front page example by means of horizontal rank partitioning
Let us go back to the front page consistent system of linear equations, namely the problem to determine three unknown polynomial coefficients from two sampling points which we classified as an underdetermined one. Nevertheless we are able to compute a unique solution of the underdetermined system of inhomogeneous linear equations Ax = y , y R ( A) or rk A = dim Y , here A R 2×3 , x R 3×1 , y R 2×1 if we determine the coordinates of the unknown vector x of minimum norm (minimal Euclidean length, A2-norm), here & x &2I = xcx = x12 + x22 + x32 = min. Box 1.3 outlines the solution of the related optimization problem. Box 1.3: Minimum norm solution of the consistent system of inhomogeneous linear equations, horizontal rank partitioning The solution of the optimization problem {|| x ||2I = min | Ax = y , rk A = dim Y} x
is based upon the horizontal rank partitioning of the linear mapping
8
1 The first problem of algebraic regression
f : x 6 y = Ax, rk A = dim Y , which we already introduced. As soon as we decompose x1 = A11 A 2 x 2 + A11 y and implement it in the norm & x &2I , we are prepared to compute the first derivatives of the unconstrained Lagrangean
L (x1 , x 2 ) := & x &2I = x12 + x22 + x32 = = (y A 2 x 2 )c( A1A1c ) 1 (y A 2 x 2 ) + xc2 x 2 = = y c( A1A1c ) 1 y 2xc2 A 2 ( A1A1c ) 1 y + xc2 A c2 ( A1A1c ) 1 A 2 x 2 + xc2 x 2 = = min x2
wL (x 2m ) = 0 wx 2 A c2 ( A1A1c ) 1 y + [ A c2 ( A1A1c ) 1 A 2 + I]x 2m = 0 x 2 m = [ A c2 ( A1A1c ) 1 A 2 + I]1 Ac2 ( A1 A1c ) 1 y, which constitute the necessary conditions. (The theory of vector derivatives is presented in Appendix B.) Following Appendix A devoted to matrix algebra, namely (I + AB) 1 A = A(I + BA) 1 , (BA) 1 = A 1B 1 , for appropriate dimensions of the involved matrices, such that the identities hold x 2 m = [ A c2 ( A1 A1c ) 1 A 2 + I]1 A c2 ( A1 A1c ) 1 y = = Ac2 ( A1 A1c ) 1 [ A 2 Ac2 ( A1 A1c ) 1 + I]1 y = = Ac2 [ A 2 Ac2 ( A1 A1c ) 1 + I]1 ( A1 A1c ) 1 y , we finally derive x 2 m = A c2 ( A1 A1c + A 2 A c2 ) 1 y. The second derivatives w2L (x 2 m ) = 2[ A c2 ( A1 A1c ) 1 A 2 + I ] > 0 wx 2 wxc2 due to positive-definiteness of the matrix A c2 ( A1 A1c ) 1 A 2 + I generate the sufficiency condition for obtaining the minimum of the unconstrained Lagrangean. Finally let us backward transform x 2 m 6 x1m = A11 A 2 x 2 + A11 y. x1m = A11 A 2 A c2 ( A1 A1c + A 2 A c2 ) 1 y + A11 y. Let us right multiply the identity A1A1c = A 2 A c2 + ( A1A1c + A 2 A c2 ) by ( A1 A1c + A 2 A c2 ) 1 such that
9
1-1 Introduction
A1 A1c ( A1 A1c + A 2 A c2 ) 1 = A 2 A c2 ( A1 A1c + A 2 A c2 ) 1 + I holds, and left multiply by A11 , namely A1c ( A1 A1c + A 2 A c2 ) 1 = A11 A 2 A c2 ( A1 A1c + A 2 A c2 ) 1 + A11 . Obviously we have generated the linear form 1 ° x1m = A1c ( A1A1c + A 2 A c2 ) y ® 1 °¯ x 2m = A c2 ( A1A1c + A 2 A c2 ) y or
ª x1m º ª A1c º 1 « x » = « A c » ( A1 A1c + A 2 A c2 ) y ¬ 2m ¼ ¬ 2 ¼ or x m = A c( AA c) 1 y. A numerical computation with respect to the introductory example is ª3 7 º 1 ª 21 7 º A1 A1c + A 2 A c2 = « , ( A1 A1c + A 2 A c2 ) 1 = 14 » « 7 3 » ¬ 7 21¼ ¬ ¼ 1 ª14 4 º A1c ( A1 A1c + A 2 A c2 ) 1 = 14 « 7 1» ¬ ¼ 1 [ 7, 6] A c2 ( A1 A1c + A 2 A c2 ) 1 = 14
ª 73 º 3 1 x1m = « 11 » , x 2 m = 14 , & x m & I = 14 42 ¬ 14 ¼ 11 y (t ) = 87 + 14 t + 141 t 2
w2L (x 2 m ) = 2[ A c2 ( A1 A1c ) 1 A 2 + I ] = 28 > 0 . wx 2 wxc2 1-14
The range R(f) and the kernel N(f)
Fourthly, let us go into the detailed analysis of R(f), N(f), N ( f ) A with respect to the front page example. How can we actually identify the range space R(A), the null space N(A) or its orthogonal complement N ( A) A ? The range space R (A) := {y R n | Ax = y, x R m } is conveniently described by first column c1 = [1, 1]c and the second column c 2 = [1, 2]c of the matrix A, namely 2-leg
10
1 The first problem of algebraic regression
{e1 + e 2 , e1 + 2e 2 | O} or {ec1 , ec 2 | O}, with respect to the orthogonal base vector e1 and e 2 , respectively, attached to the origin O. Symbolically we write R ( A) = span{e1 + e 2 , e1 + 2e 2 | O}. As a linear space R (A) Y is illustrated by Figure 1.1. y ec2
1 ec1 e2 e1
1
Figure 1.1: Range R(f), range space R(A), y R (A) By means of Box 1.4 we identify N(f) or “the null space N(A)” and give its illustration by Figure 1.2. Such a result has paved the way to the diagnostic algorithm for solving an underdetermined system of linear equations by means of rank partitioning presented in Box 1.5. Box 1.4: The general solution of the system of homogeneous linear equations Ax = 0, “horizontal” rank partitioning The matrix A is called “horizontally rank partitioned ”, if r = rk A = rk A1 = n ½ ° n× m n× r n× d ° ® A R A = [ A1 , A 2 ] A1 R , A 2 R ¾ ° d = d ( A) = m rk A °¿ ¯ holds. (In the introductory example A R 2×3 , A1 R 2× 2 , A 2 R 2×1 , rk A = 2, d ( A) = 1 applies.) A consistent system of linear equations Ax = y, rk A = dim Y , is “horizontally rank partitioned” if Ax = y , rk A = dim Y A1x1 + A 2 x 2 = y
11
1-1 Introduction
for a partitioned unknown vector ªx º {x R m x = « 1 » | x1 R r ×1 , x 2 R d ×1 } ¬x2 ¼ applies. The “horizontal” rank partitioning of the matrix A as well as the “horizontally rank partitioned” consistent system of linear equations Ax = y, rk A = dim Y , of the introductory example is ª1 1 1 º ª1 1 º ª1º A=« , A1 = « , A2 = « » , » » ¬1 2 4 ¼ ¬1 2¼ ¬ 4¼ Ax = y , rk A = dim Y A1x1 + A 2 x 2 = y x1 = [ x1 , x2 ]c R 2×1 , x 2 = [ x3 ] R ª1 1 º ª x1 º ª 1 º «1 2 » « x » + « 4 » x3 = y. ¬ ¼¬ 2¼ ¬ ¼ By means of the horizontal rank partitioning of the system of homogenous linear equations an identification of the null space N(A), namely N ( A) = {x R m | Ax = A1 x1 + A 2 x 2 = 0}, is A1 x1 + A 2 x 2 = 0 x1 = A11 A 2 x 2 , particularly in the introductory example ª x1 º ª 2 1º ª 1 º « x » = « 1 1 » « 4 » x3 , ¬ ¼¬ ¼ ¬ 2¼ x1 = 2 x3 = 2W , x2 = 3x3 = 3W , x3 = W . Here the two equations Ax = 0 for any x X = R 2 constitute the linear space N(A), dim N ( A) = 1 , a one-dimensional subspace of X = R 2 . For instance, if we introduce the parameter x3 = W , the other coordinates of the parameter space X = R 2 amount to x1 = 2W , x2 = 3W . In geometric language the linear 1 space N(A) is a parameterized straight line L0 through the origin illustrated by Figure 1.2. The parameter space X = R r (here r = 2) is sliced by the subspace, the linear space N(A), also called linear manifold, dim N ( A) = d( A) = d , here a 1 straight line L0 (here), through the origin O.
12 1-15
1 The first problem of algebraic regression
Interpretation of “MINOS” by three partitionings: (i) algebraic (rank partitioning) (ii) geometric (slicing) (iii) set-theoretical (fibering)
Figure 1.2:
The parameter space X = R 3 ( x3 is not displayed) sliced by the null space, 1 the linear manifold N ( A) = L0 R 2
The diagnostic algorithm for solving an underdetermined system of linear equations y = Ax, rk A = dim Y = n, n < m = dim X, y R ( A) by means of rank partitioning is presented to you by Box 1.5. Box 1.5: algorithm The diagnostic algorithm for solving an underdetermined system of linear equations y = Ax, rk A = dim Y , y R ( A) by means of rank partitioning Determine the rank of the matrix A rk A = dimY = n
Compute the “horizontal rank partitioning ” A = [ A1 , A 2 ], A1 R r × r = R n× n , A 2 R n× ( m r ) = R n× ( m n ) “ m r = m n = d is called right complementary index.” “A as a linear operator is not injective, but surjective”
13
1-1 Introduction
Compute the null space N(A) N ( A) := {x R m | Ax = 0} = {x R m | x1 + A11A 2 x 2 = 0}
Compute the unknown parameter vector of type MINOS (Minimum Norm Solution x m ) x m = A c( AA c) 1 y .
h
While we have characterized the general solution of the system of homogenous linear equations Ax = 0, we are left with the problem of solving the consistent system of inhomogeneous linear equations. Again we take advantage of the rank partitioning of the matrix A summarized in Box 1.4. Box 1.6: A special solution of a consistent system of inhomogeneous linear equations Ax = y, “horizontal” rank partitioning ª rk A = dim Y, A1 x1 + A 2 x 2 = y . Ax = y , « ¬ y R ( A) Since the matrix A1 is of full rank it can be regularly inverted (Cayley inverse). In particular, we solve for x1 = A11 A 2 x 2 + A11 y , or ª x1 º ª 2 1º ª1 º ª 2 1º ª y1 º « x » = « 1 1 » « 4 » x3 + « 1 1 » « y » ¬ ¼¬ ¼ ¬ ¼¬ 2¼ ¬ 2¼ x1 = 2 x3 + 2 y1 y2 , x2 = 3x3 y1 + y2 . For instance, if we introduce the parameter x3 = W , the other coordinates of the parameter space X = R 2 amount to x1 = 2W + 2 y1 y2 , x2 = 3W y1 + y2 . In geometric language the admissible parameter space is a family of a one-dimensional linear space, a family of one-dimensional parallel straight lines dependent on y = [ y1 , y2 ]c, here [2, 3]c, in particular
14
1 The first problem of algebraic regression
L1( y1 , y2 ) := { x R 3 | x1 = 2 x3 + 2 y1 y2 , x2 = 3x3 y1 + y2 }, including the null space L1(0, 0) = N ( A). Figure 1.3 illustrates (i)
the admissible parameter space L1( y1 , y2 ) ,
(ii)
the line L1A which is orthogonal to the null space called N ( A) A ,
(iii) the intersection L1( y1 , y2 ) N ( A ) A , generating the solution point xm as will be proven now.
x2 1 A
~ N (A)A
x1
N (A) ~ L1(0,0)
L1( 2 , 3 )
Figure 1.3: The range space R ( A ) (the admissible parameter space) parallel straight lines L1( y , y ) , namely L1(2, 3) : 1
2
L ( y1 , y2 ) := { x R | x1 = 2 x3 + 2 y1 y2 , x2 = 3x3 y1 + y2 } . 1
3
The geometric interpretation of the minimum norm solution & x & I = min is the following: With reference to Figure 1.4 we decompose the vector x = x N (A) + x N (A)
A
where x N ( A ) is an element of the null space N ( A) (here: the straight line L1(0, 0) ) and x N ( A ) is an element of the orthogonal complement N ( A) A of the null space N ( A) (here: the straight line L1(0, 0) , while the inconcistency parameter i N ( A ) = i m is an element of the range space R ( A ) (here: the straight line L1( y , y ) , namely L1(2, 3) ) of the generalized inverse matrix A of type MINOS (“minimum norm solution”). & x &2I =& x N ( A ) + x N ( A ) &2 2 2 =& x N ( A ) & +2 < x | i > + & i & is minimal if and only if the inner product ¢ x N ( A ) | i ² = 0 , x N ( A ) and i m = i N ( A ) are orthogonal. The solution point x m is the orthogonal projection of the null point onto R ( A ) : A
1
2
A
A
15
1-1 Introduction
PR ( A ) = A Ax = A y for all x D ( A). Alternatively, if the vector x m of minimal length is orthogonal to the null space N ( A ) , being an element of N ( A) A (here: the line L1(0, 0) ) we may say that N ( A) A intersects R ( A ) in the solution point x m . Or the normal space NL10 with respect to the tangent space TL10 – which is in linear models identical to L10 , the null space N ( A ) – intersects the tangent space TL1y , the range space R ( A ) in the solution point x m . In summary, x m N ( A ) A R ( A ).
Figure 1.4: Orthogonal projection of an element of N ( A) onto the range space R ( A ) Let the algebraic partitioning and the geometric partitioning be merged to interpret the minimum norm solution of the consistent system of linear equations of type “underdetermined” MINOS. As a summary of such a merger we take reference to Box 1.7. The first condition AA A = A Let us depart from MINOS of y = Ax, x X = R m , y Y = R n , r = rk A = n, namely x m = A m y = A c( AA c) 1 y. Ax m = AA m y = AA m Ax m Ax m = AA m Ax m AA A. The second condition A AA = A
16
1 The first problem of algebraic regression
x m = A c( AA c) 1 y = A m y = A m Ax m x m = A m y = A m AA m y
A m y = A m AA m y A AA = A . rk A m = rk A is interpreted as follows: the g-inverse of type MINOS is the generalized inverse of maximal rank since in general rk A d rk A holds The third condition AA = PR ( A )
x m = A m y = A m Ax m
A A = PR ( A ) .
Obviously A m A is an orthogonal projection onto R ( A ) , but i m = I A m A onto its orthogonal complemert R ( A ) A . If the linear mapping f : x 6 y = f (x), y R (f ) is given we are aiming at a generalized inverse (linear) mapping y 6 x = g(y ) such that y = f (x) = = f ( g (y ) = f ( g ( f ( x))) or f = f D g D f as a first condition is fulfilled. Alternatively we are going to construct a generalized inverse A : y 6 A y = x such that the first condition y = Ax = AA Ay or AA A = A holds. Though the linear mapping f : x 6 y = f (x) R (f ), or the system of linear equations Ax = y , rk A = dim Y , is consistent, it suffers from the (injectivity) deficiency of the linear mapping f(x) or of the matrix A. Indeed it recovers from the (injectivity) deficiency if we introduce the projection x 6 g ( f (x)) = q R (g ) or x 6 A Ax = q R (A ) as the second condition. Note that the projection matrix A A is idempotent which follows from P 2 = P or ( A A)( A A) = A AA A = A A. Box 1.7: The general solution of a consistent system of linear equations; f : x 6 y = Ax, x X = R m (parameter space), y Y = R n (observation space) r = rk A = dim Y , A generalized inverse of MINOS type Condition #1 f (x) = f ( g (y )) f = f DgD f . Condition #2 (reflexive g-inverse mapping)
Condition #1 Ax = AA Ax AA A = A. Condition #2 (reflexive g-inverse)
17
1-2 The minimum norm solution: “MINOS”
x = g (y ) =
x = A y = A AA y
= g ( f (x)).
A AA = A .
Condition #3
Condition #3
g ( f (x)) = x R ( A g D f = projR ( A
)
A A = x R (A )
)
A A = projR (A ) .
The set-theoretical partitioning, the fibering of the set system of points which constitute the parameters space X, the domain D(f), will be finally outlined. Since the set system X (the parameters space) is R r , the fibering is called “trivial”. Non-trivial fibering is reserved for nonlinear models in which case we are dealing with a parameters space X which is a differentiable manifold. Here the fibering D( f ) = N ( f ) N ( f )A produces the trivial fibers N ( f ) and N ( f ) A where the trivial fibers N ( f ) A is the quotient set R n /N ( f ) . By means of a Venn diagram (John Venn 18341928) also called Euler circles (Leonhard Euler 1707–1783) Figure 1.5 illustrates the trivial fibers of the set system X = R m generated by N ( f ) and N ( f ) A . The set system of points which constitute the observation space Y is not subject to fibering since all points of the set system D(f) are mapped into the range R(f).
Figure 1.5: Venn diagram, trivial fibering of the domain D(f), trivial fibers N(f) and N ( f ) A , f : R m = X o Y = R n , Y = R (f ) , X set system of the parameter space, Y set system of the observation space.
1-2 The minimum norm solution: “MINOS” The system of consistent linear equations Ax = y subject to A R n× m , rk A = n < m , allows certain solutions which we introduce by means of Definition 1.1
18
1 The first problem of algebraic regression
as a solution of a certain optimization problem. Lemma 1.2 contains the normal equations of the optimization problem. The solution of such a system of normal equations is presented in Lemma 1.3 as the minimum norm solution with respect to the G x -seminorm. Finally we discuss the metric of the parameter space and alternative choices of its metric before we identify by Lemma 1.4 the solution of the quadratic optimisation problem in terms of the (1,2,4)-generalized inverse. Definition 1.1 (minimum norm solution with respect to the G x -seminorm): A vector xm is called G x -MINOS (Minimum Norm Solution with respect to the G x -seminorm) of the consistent system of linear equations rk A = rk( A, y ) = n < m ° n Ax = y, y Y { R , ® A R n×m ° y R ( A), ¯
(1.1)
if both Ax m = y
(1.2)
and in comparison to all other vectors of solution x X { R m , the inequality
|| x m ||G2 x := xcmG x x m d xcG x x =:|| x ||G2 x
(1.3)
holds. The system of inverse linear equations A y + i = x, rk A z m or x R ( A ) is inconsistent. By Definition 1.1 we characterized G x -MINOS of the consistent system of linear equations Ax = y subject to A R n× m , rk A = n < m (algebraic condition) or y R ( A) (geometric condition). Loosely speaking we are confronted with a system of linear equations with more unknowns m than equations n, namely n < m . G x -MINOS will enable us to find a solution of this underdetermined problem. By means of Lemma 1.2 we shall write down the “normal equations” of G x -MINOS. Lemma 1.2 (minimum norm solution with respect to the G x -(semi)norm) : A vector x m X { R m is G x -MINOS of (1.1) if and only if the system of normal equations ªG x «A ¬
A cº ª x m º ª 0 º = 0 »¼ «¬ Ȝ m »¼ «¬ y »¼
(1.4)
19
1-2 The minimum norm solution: “MINOS”
with the vector Ȝ m R n×1 of “Lagrange multipliers” is fulfilled. x m exists always and is in particular unique, if rk[G x , A c] = m
(1.5)
holds or equivalently if the matrix G x + AcA is regular. : Proof : G x -MINOS is based on the constraint Lagrangean L(x, Ȝ ) := xcG x x + 2Ȝ c( Ax y ) = min x, O
such that the first derivatives 1 2 1 2
wL ½ (x m , Ȝ m ) = G x x m + A cȜ m = 0 ° ° wx ¾ wL ° (x m , Ȝ m ) = Ax m y = 0 wO ¿° ªG A c º ª x m º ª 0 º « x »« » = « » ¬ A 0 ¼ ¬Ȝ m ¼ ¬y ¼
constitute the necessary conditions. The second derivatives 1 w 2L (x m , Ȝ m ) = G x t 0 2 wxwxc due to the positive semidefiniteness of the matrix G x generate the sufficiency condition for obtaining the minimum of the constrained Lagrangean. Due to the assumption rk A = rk [ A, y ] = n or y R ( A) the existence of G x -MINOS x m is guaranteed. In order to prove uniqueness of G x -MINOS x m we have to consider case (i) G x positive definite and case (ii) G x positive semidefinite. Case (i) : G x positive definite Due to rk G x = m , G x z 0 , the partitioned system of normal equations ªG A c º ª x m º ª 0 º G x z 0, « x »« » = « » ¬ A 0 ¼ ¬Ȝ m ¼ ¬y ¼ is uniquely solved. The theory of inverse partitioned matrices (IPM) is presented in Appendix A. Case (ii) : G x positive semidefinite Follow these algorithmic steps: Multiply the second normal equation by Ac in order to produce A cAx Acy = 0 or A cAx = Acy and add the result to the first normal equation which generates
20
1 The first problem of algebraic regression
G x x m + A cAx m + A cȜ m = A cy or (G x + A cA)x m + A cȜ m = A cy . The augmented first normal equation and the original second normal equation build up the equivalent system of normal equations ªG + A cA A cº ª x m º ª A cº G x + A cA z 0, « x » « » = « » y, 0 ¼ ¬ Ȝ m ¼ ¬I n ¼ ¬ A which is uniquely solved due to rk (G x + A cA ) = m , G x + A cA z 0 .
ƅ
The solution of the system of normal equations leads to the linear form x m = Ly which is G x -MINOS of (1.1) and can be represented as following. Lemma 1.3
(minimum norm solution with respect to the G x -(semi-) norm):
x m = Ly is G x -MINOS of the consistent system of linear equations (1.1) Ax = y , rk A = rk( A, y ) = n (or y R ( A) ), if and only if L R m × n is represented by Case (i): G x = I m L = A R = Ac( AAc) 1
(right inverse)
(1.6)
x m = A y = Ac( AAc) y x = xm + i m , R
1
(1.7) (1.8)
is an orthogonal decomposition of the unknown vector x X { R m into the I-MINOS vector x m Ln and the I-MINOS vector of inconsistency i m Ld subject to (1.9) x m = A c( AA c) 1 Ax , i m = x x m =[I A c( AA c) 1 A]x . m
(1.10)
(Due to x m = A c( AA c) 1 Ax , I-MINOS has the reproducing property. As projection matrices A c( AAc) 1 A , rk A c( AA c) 1 A = rk A = n and [I m A c( AA c) 1 A] , rk[I m A c( AA c) 1 A] = n rk A = d , are independent). Their corresponding norms are positive semidefinite, namely & x m ||I2 = y c( AA c) 1 y = xcA c( AA c) 1 Ax = xcG m x
(1.11)
|| i m ||I2 = xc[I m A c( AA c) 1 A]x.
(1.12)
21
1-2 The minimum norm solution: “MINOS”
Case (ii): G x positive definite L = G x 1 A c( AG x 1 A c) 1 (weighted right inverse)
(1.13)
x m = G A c( AG A c) y x = xm + i m
(1.14)
1 x
1 x
1
(1.15)
is an orthogonal decomposition of the unknown vector x X { R m into the G x -MINOS vector x m Ln and the G x -MINOS vector of inconsistency i m Ld subject to x m = G x 1 A c( AG x 1 A c) 1 Ax ,
(1.16)
i m = x x m = [I m G x 1 A c( AG x 1 A c) 1 A]x .
(1.17)
(Due to x m = G x 1 A c( AG x 1 A c) 1Ax G x -MINOS has the reproducing property. As projection matrices G x 1 A c( AG x 1 A c) 1A , rk G x 1 A c (A G x 1A c) 1A =n , and [I m G x 1 A c( AG x 1 A c) 1 A] , rk[I G x 1A c( A G x 1A c) 1 A ] = n rk A = d , are independent.) The corresponding norms are positive semidefinite, namely || x m ||G2 = y c( AG x A c) 1 y = xcA c( AG x A c) 1 Ax = xcG m x
(1.18)
|| i m ||G2 = xc[G x A c( AG x1A c) 1 A]x .
(1.19)
x
x
Case (iii): G x positive semidefinite L = (G x + A cA) 1 A c [ A(G x + A cA) 1 A c]1
(1.20)
x m = (G x + A cA) 1 A c [ A(G x + A cA) 1 A c]1 y
(1.21)
x = xm + i m
(1.22)
is an orthogonal decomposition of the unknown vector x X = Ln into the ( G x + A cA )-MINOS vector x m Ln and the G x + AA c MINOS vector of inconsistency i m Ld subject to x m = (G x + A cA) 1 A c [ A(G x + A cA) 1 A c]1 Ax i m = {I m ( G x + A cA ) A c[ A ( G x + A cA ) 1
1
A c]1 A}x .
Due to x m = (G x + A cA) 1 A c [ A(G x + A cA) 1 A c]1 Ax (G x + AcA) -MINOS has the reproducing property. As projection matrices (G x + A cA) 1 A c[ A(G x + A cA ) 1 A c]1 A, rk (G x + A cA ) 1 A c[ A(G x + A cA ) 1 A c]1 A = rk A = n,
(1.23) (1.24)
22
1 The first problem of algebraic regression
and {I m = (G x + A cA ) 1 A c[ A (G x + A cA ) 1 A c]1 A}, rk{I m (G x + A cA) 1 A c[ A(G x + A cA) 1 A c]1 A} = n rk A = d , are independent. The corresponding norms are positive semidefinite, namely || x m ||G2
x
+ AcA
= y c[ A(G x + A cA) 1 A c]1 y = xcA c[ A(G x + A cA) 1 A c]1 Ax = xcG m x,
|| i m ||G2
x
+ AcA
= xc{I m (G x + A cA ) 1 A c[ A (G x + A cA ) 1 A c]1 A}x .
(1.25) (1.26)
: Proof : A basis of the proof could be C. R. Rao´s Pandora Box, the theory of inverse partitioned matrices (Appendix A: Fact: Inverse Partitioned Matrix /IPM/ of a symmetric matrix). Due to the rank identity rk A = rk ( AG x 1 A c) = n < m the normal equations of the case (i) as well as case (ii) can be faster directly solved by Gauss elimination. ªG x A c º ª x m º ª 0 º « » « »=« » ¬ A 0 ¼ ¬Om ¼ ¬ y ¼ G x x m + A cOm = 0 Ax m = y. Multiply the first normal equation by AG x 1 and subtract the second normal equation from the modified first one. Ax m + AG x 1A cOm = 0 Ax m = y
Om = ( AG A c) 1 y. 1 x
The forward reduction step is followed by the backward reduction step. Implement Om into the first normal equation and solve for x m . G x x m A c( AG x 1A c) 1 y = 0 x m = G x1A c( AG x1A c) 1 y Thus G x -MINOS x m and Om are represented by x m = G x 1 A c( AG x 1 A c) 1 y , Ȝ m = ( AG x 1 A c) 1 y.
1-2 The minimum norm solution: “MINOS”
23
For the Case (iii), to the first normal equation we add the term AA cx m = Acy and write the modified normal equation ªG x + A cA A cº ª x m º ª A cº « » « » = « » y. 0 ¼ ¬Om ¼ ¬ I n ¼ ¬ A Due to the rank identity rk A = rk[ A c(G x + AA c) 1 A] = n < m the modified normal equations of the case (i) as well as case (ii) are directly solved by Gauss elimination. (G x + A cA)x m + A cOm = A cy Ax m = y. Multiply the first modified normal equation by AG x 1 and subtract the second normal equation from the modified first one. Ax m + A(G x + A cA) 1 A cOm = A (G x + A cA ) 1 A cy Ax m = y A(G x + A cA) 1 A cOm = [ A(G x + A cA) 1 A c I n ]y
Om = [ A(G x + A cA) 1 A c]1[ A(G x + A cA) 1 A c I n ]y Om = [I n ( A(G x + A cA) 1 A c) 1 ]y. The forward reduction step is followed by the backward reduction step. Implement Om into the first modified normal equation and solve for x m . (G x + A cA )x m Ac[ A(G x + A cA ) 1 A c]1 y + A cy = A cy (G x + A cA )x m Ac[ A(G x + A cA ) 1 A c]1 y = 0 x m = (G x + A cA) 1 Ac[ A(G x + A cA ) 1 A c]1 y. Thus G x -MINOS of (1.1) in terms of particular generalized inverse is obtained as x m = (G x + A cA) 1 Ac[ A(G x + A cA ) 1 A c]1 y ,
Om = [I n ( A(G x + A cA) 1 A c) 1 ]y .
ƅ 1-21
A discussion of the metric of the parameter space X
With the completion of the proof we have to discuss the basic results of Lemma 1.3 in more detail. At first we have to observe that the matrix G x of the metric of the parameter space X has to be given a priori. We classified MINOS according to (i) G x = I m , (ii) G x positive definite and (iii) G x positive semidefinite. But how do we know the metric of the parameter space? Obviously we need prior information about the geometry of the parameter space X , namely from
24
1 The first problem of algebraic regression
the empirical sciences like physics, chemistry, biology, geosciences, social sciences. If the parameter space X R m is equipped with an inner product ¢ x1 | x 2 ² = x1cG x x 2 , x1 X, x 2 X where the matrix G x of the metric & x &2 = xcG x x is positive definite, we refer to the metric space X R m as Euclidean E m . In contrast, if the parameter space X R m is restricted to a metric space with a matrix G x of the metric which is positive semidefinite, we call the parameter space semi Euclidean E m , m . m1 is the number of positive eigenvalues, m2 the number of zero eigenvalues of the positive semidefinite matrix G x of the metric (m = m1 + m2 ). In various applications, namely in the adjustment of observations which refer to Special Relativity or General Relativity we have to generalize the metric structure of the parameter space X : If the matrix G x of the pseudometric & x &2 = xcG x x is built on m1 positive eigenvalues (signature +), m2 zero eigenvalues and m3 negative eigenvalues (signature –), we call the pseudometric parameter space pseudo Euclidean E m , m , m , m = m1 + m2 + m3 . For such a parameter space MINOS has to be generalized to & x &2G = extr , for instance "maximum norm solution" . 1
2
1
2
3
x
1-22
Alternative choice of the metric of the parameter space X
Another problem associated with the parameter space X is the norm choice problem. Here we have used the A 2 -norm, for instance A 2 -norm:
& x & 2 := xcx = x12 + x22 + ... + xm2 1 + xm2 ,
A p -norm:
& x & p :=
p
x1 + x2
p
p
p
p
+ ... + xm 1 + xm ,
as A p -norms (1 d p < f ) are alternative norms of choice. Beside the choice of the matrix G x of the metric within the A 2 -norm and of the norm A p itself we like to discuss the result of the MINOS matrix G m of the metric. Indeed we have constructed MINOS from an a priori choice of the metric G called G x and were led to the a posteriori choice of the metric G m of type (1.27), (1.28) and (1.29). The matrices (i) G m = A c( AA c) 1 A
(1.27)
(ii) G m = A c( AG Ac) A 1 x
1
(1.28)
(iii) G m = A c[ A(G x + A cA) A c] A 1
1
(1.29)
are (i) idempotent, (ii) G x idempotent and (iii) [ A(G x + AcA) 1 Ac]1 idempotent, 2 namely projection matrices. Similarly, the norms i m of the type (1.30), (1.31) and (1.32) measure the distance of the solution point x m X from the null space N ( A ) . The matrices (i) I m A c( AA c) 1 A 1 x
(1.30)
1
(ii) G x A c( AG A c) A
(1.31)
(iii) I m (G x + A cA ) A c[ A(G x + A cA ) A c] A 1
1
1
(1.32)
25
1-2 The minimum norm solution: “MINOS”
are (i) idempotent, (ii) G x 1 idempotent and (iii) (G x + A cA) 1 A idempotent, namely projection matrices. 1-23
G x -MINOS and its generalized inverse
A more formal version of the generalized inverse which is characteristic for G x -MINOS is presented by Lemma 1.4 (characterization of G x -MINOS): x m = Ly is I – MINOS of the consistent system of linear equations (1.1) Ax = y , rk A = rk ( A, y ) (or y R ( A) ) if and only if the matrix L R m × n fulfils ALA = A ½ ¾. LA = (LA)c¿
(1.33)
The reflexive matrix L is the A1,2,4 generalized inverse. x m = Ly is G x -MINOS of the consistent system of linear equations (1.1) Ax = y , rk A = rk( A, y ) (or y R ( A) ) if and only if the matrix L R m × n fulfils the two conditions ALA = A ½ ¾. c G x LA = (G x LA ) ¿
(1.34)
The reflexive matrix L is the G x -weighted A1,2,4 generalized inverse. : Proof : According to the theory of the general solution of a system of linear equations which is presented in Appendix A, the conditions ALA = L or L = A guarantee the solution x = Ly of (1.1) , rk A = rk( A, y ) . The general solution x = x m + (I LA)z with an arbitrary vector z R m×1 leads to the appropriate representation of the G x -seminorm by means of || x m ||G2 = || Ly || G2 d || x ||G2 = || x m + (I LA )z ||G2 x
x
x
x
=|| x m ||G2 +2xcmG x (I LA )z + || (I LA)z ||G2 x
=|| Ly ||
2 Gx
x
+2y cLcG x (I LA )z + || (I LA)z ||G2
x
= y cLcG x Ly + 2y cLcG x (I LA)z + z c(I A cLc)G x (I LA)z where the arbitrary vectors y Y { R n holds if and only if y cLcG x (I LA)z = 0 for all z R m×1 or A cLcG x (I LA) = 0 or A cLcG xc = A cLcG x LA . The right
26
1 The first problem of algebraic regression
hand side is a symmetric matrix. Accordingly the left hand side must have this property, too, namely G x LA = (G x LA)c , which had to be shown. Reflexivity of the matrix L originates from the consistency condition, namely (I AL)y = 0 for all y R m×1 or AL = I . The reflexive condition of the G x weighted, minimum norm generalized inverse, (1.17) G x LAL = G x L , is a direct consequence. Consistency of the normal equations (1.4) or equivalently the uniqueness of G x x m follows from G x L1y = A cL1cG x L1y = G x L1 AL1 y = G x L1 AL 2 y = A cL1c A × ×Lc2 G x L 2 y = A cL 2 G x L 2 y = G x L 2 y for arbitrary matrices L1 R m × n and L 2 R m × n which satisfy (1.16).
ƅ
1-24
Eigenvalue decomposition of G x -MINOS: canonical MINOS
In the empirical sciences we meet quite often the inverse problem to determine the infinite set of coefficients of a series expansion of a function of a functional (Taylor polynomials) from a finite set of observations. First example: Determine the Fourier coefficients (discrete Fourier transform, trigonometric polynomials) of a harmonic function with circular support from observations in a one-dimensional lattice. Second example: Determine the spherical harmonic coefficients (discrete Fourier-Legendre transform) of a harmonic function with spherical support from observations n a twodimensional lattice. Both the examples will be dealt with lateron in a case study. Typically such expansions generate an infinite dimensional linear model based upon orthogonal (orthonormal) functions. Naturally such a linear model is underdetermined since a finite set of observations is confronted with an infinite set of unknown parameters. In order to make such an infinite dimensional linear model accessible to the computer, the expansion into orthogonal (orthonormal) functions is truncated or band-limited. Observables y Y , dim Y = n , are related to parameters x X , dim X = = m n = dim Y , namely the unknown coefficients, by a linear operator A \ n× m which is given in the form of an eigenvalue decomposition. We are confronted with the problem to construct “canonical MINOS”, also called the eigenvalue decomposition of G x -MINOS. First, we intend to canonically represent the parameter space X as well as the observation space Y . Here, we shall assume that both spaces are Euclidean
27
1-2 The minimum norm solution: “MINOS”
equipped with a symmetric, positive definite matrix of the metric G x and G y , respectively. Enjoy the diagonalization procedure of both matrices reviewed in Box 1.19. The inner products aac and bbc , respectively, constitute the matrix of the metric G x and G y , respectively. The base vectors {a1 ,..., am | O} span the parameter space X , dim X = m , the base vectors {b1 ,..., bm | O} the observation space, dim Y = n . Note the rank identities rk G x = m , rk G y = n , respectively. The left norm || x ||G2 = xcG x x is taken with respect to the left matrix of the metric G x . In contrast, the right norm || y ||G2 = y cG y y refers to the right matrix of the metric G y . In order to diagonalize the left quadratic form as well as the right quadratic form we transform G x 6 G *x = Diag(Ȝ 1x ,..., Ȝ mx ) = 9 cG x 9 - (1.35), (1.37), (1.39) - as well as G y 6 G *y = Diag(Ȝ 1y ,..., Ȝ ny ) = 8 cG y 8 - (1.36), (1.38), (1.40) - into the canonical form by means of the left orthonormal matrix V and by means of the right orthonormal matrix U . Such a procedure is called “eigenspace analysis of the matrix G x ” as well as “eigenspace analysis of the matrix G y ”. ȁ x constitutes the diagonal matrix of the left positive eigenvalues (Ȝ 1x ,..., Ȝ mx ) , the right positive eigenvalues (Ȝ 1y ,..., Ȝ ny ) the n-dimensional right spectrum. The inverse transformation G *x = ȁ x 6 G x - (1.39) - as well as G *y = ȁ y 6 G y - (1.40) - is denoted by “left eigenspace synthesis” as well as “right eigenspace synthesis”. x
y
Box 1.8: Canonical representation of the matrix of the metric parameter space versus observation space “parameter space X ”
“observation space”
span{a1 ,..., am } = X
Y = span{b1 ,..., bn }
aj |aj 1
2
= gj ,j 1
2
1
2
= g i ,i 1 2
aac = G x
bbc = G y
j1 , j2 {1,..., m}
i1 , i2 {1,..., n}
rk G x = m
rk G y = n
“left norms”
“right norms”
|| x ||G2 = xcG x x = (x* )cx*
(y * )cy * = y cG y y =|| y ||G2
“eigenspace analysis of the matrix G x ”
“eigenspace analysis of the matrix G y ”
x
(1.35)
ai | ai
G *x = V cG x V =
G *y = U cG y U =
= Diag(Ȝ 1x ,..., Ȝ mx ) =: ȁ x
= Diag(Ȝ 1y ,..., Ȝ ny ) =: ȁ y
y
(1.36)
28
1 The first problem of algebraic regression
subject to
subject to
(1.37)
VV c = V cV = I m
UU c = U cU = I n
(1.38)
(1.39)
(G x Ȝ xj I m ) v j = 0
(G y Ȝ iy I n )u i = 0
(1.40)
“eigenspace synthesis of the matrix G x ”
“eigenspace synthesis of the matrix G y ”
(1.41) G x = VG *x V c = Vȁ x V c
Uȁ y U c = UG *y U c = G y . (1.42)
Second, we study the impact of the left diagonalization of the metric of the metric G x as well as right diagonalization of the matrix of the metric G y on the coordinates x X and y Y , the parameter systems of the left Euclidean space X , dim X = m , and of the right Euclidian space Y . Enjoy the way how we have established the canonical coordinates x* := [ x1* ,..., xm* ]c of X as well as the canonical coordinates y * := [ y1* ,..., yn* ] called the left and right star coordinates of X and Y , respectively, in Box 1.9. In terms of those star coordinates (1.45) as well as (1.46) the left norm || x* ||2 of the type (1.41) as well as the right norm || y * ||2 of type (1.42) take the canonical left and right quadratic form. The transformations x 6 x* as well as y 6 y * of type (1.45) and (1.46) are special versions of the left and right polar decomposition: A rotation constituted by the matrices {U, V} is followed by a stretch constituted by the matrices {ȁ x , ȁ y } as diagonal matrices. The forward transformations (1.45), (1.46), x 6 x* and y 6 y * are computed by the backward transformations x* 6 x and y * 6 y . ȁ x and ȁ y , respectively, denote those diagonal matrices which are generated by the positive roots of the left and right eigenvalues, respectively. (1.49) – (1.52) are corresponding direct and inverse matrix identities. We conclude with the proof that the ansatz (1.45), (1.46) indeed leads to the canonical representation (1.43), (1.44) of the left and right norms. 1 2
1 2
1 2
1 2
Box 1.9: Canonical coordinates x* X and y * Y , parameter space versus observation space “canonical coordinates “canonical coordinates of the parameter space” of the observation space” (1.43)
|| x* ||2 = (x* )cx* =
|| y * ||2 = (y * )cy * =
= xcG x x =|| x ||G2
= y cG y y =|| y ||G2
x
(1.44)
y
ansatz (1.45)
1 2
x* = V cȁ x x
1 2
y * = U cȁ y y
(1.46)
29
1-2 The minimum norm solution: “MINOS”
versus
versus - 12 x
x = ȁ Vx
(1.47) 1 2
(1.49) ȁ x := Diag
(
- 12
y = ȁ y Uy *
*
O1x ,..., Omx
)
Diag
§ 1 1 (1.51) ȁ x := Diag ¨ ,..., x ¨ O Omx © 1 1 2
· ¸ ¸ ¹
(
)
§ 1 1 Diag ¨ ,..., y ¨ O Ony © 1
· ¸ =: ȁ -y (1.52) ¸ ¹
“the ansatz proof” G y = Uȁ y Uc
|| x ||G2 = xcG x x =
|| y ||G2 = y cG y y =
1 2
1 2
= xcVȁ x ȁ x V cx = 1 2
1 2
y
x
- 12
1 2
O1y ,..., Ony := ȁ y (1.50)
“the ansatz proof” G x = Vȁ x V c
1 2
(1.48)
1 2
= y cUȁ y ȁ y U cy = - 12
= (x* )cȁ x V cVȁ x ȁ x V cVȁ x x* =
= (y * )cȁ y U cUȁ y ȁ y U cUȁ y y * =
= (x* )cx* =|| x* ||2
= (y * )cy * =|| y * ||2 .
- 12
1 2
- 12
Third, let us discuss the dual operations of coordinate transformations x 6 x* , y 6 y * , namely the behavior of canonical bases, also called orthonormal bases e x , e y , or Cartan frames of reference e1x ,..., emx | 0 spanning the parameter space X as well as e1y ,..., eny | 0 spanning the observation space Y , here a 6 e x , b 6 e y . In terms of orthonormal bases e x and e y as outlined in Box 1.10, the matrix of the metric e x e xc = I m and e yce y = I n takes the canonical form (“modular”). Compare (1.53) with (1.55) and (1.54) with (1.56) are achieved by the changes of bases (“CBS”) of type left e x 6 a , a 6 ex - (1.57), (1.59) - and of type right e y 6 b , b 6 e y - (1.58), (1.60). Indeed these transformations x 6 x* , a 6 e x - (1.45), (1.57) - and y 6 y * , b 6 e y - (1.46), (1.58) - are dual or inverse.
{
}
{
}
Box 1.10: General bases versus orthonormal bases spanning the parameter space X as well as the observation space Y “left” “parameter space X ” “general left base”
“right” “observation space” “general right base”
span {a1 ,..., am } = X
Y = span {b1 ,..., bn }
30
1 The first problem of algebraic regression
: matrix of the metric : (1.54) bbc = G y
: matrix of the metric : (1.53) aac = G x “orthonormal left base”
{
x 1
span e ,..., e
x m
“orthonormal right base”
}=X
{
Y = span e1y ,..., eny
: matrix of the metric : (1.56) e y ecy = I n
: matrix of the metric : (1.55) e x ecx = I m
(1.57)
(1.59)
}
“base transformation”
“base transformation”
1 2
a = ȁ x Ve x
b = ȁ y Ue y
versus
versus
1 2
- 12
- 12
e y = Ucȁ y b
e x = V cȁ x a
{
(1.58)
}
{
span e1x ,..., emx = X
(1.60)
}
Y = span e1y ,..., eny .
Fourth, let us begin the eigenspace analysis versus eigenspace synthesis of the rectangular matrix A \ n× m , r := rk A = n , n < m . Indeed the eigenspace of the rectangular matrix looks differently when compared to the eigenspace of the quadratic, symmetric, positive definite matrix G x \ m × m , rk G x = m and G y \ n×n , rk G y = n of the left and right metric. At first we have to generalize the transpose of a rectangular matrix by introducing the adjoint operator A # which takes into account the matrices {G x , G y } of the left, right metric. Definition 1.5 of the adjoint operator A # is followed by its representation, namely Lemma 1.6. Definition 1.5 (adjoint operator A # ): The adjoint operator A # \ m× n of the matrix A \ n× m is defined by the inner product identity y | Ax G = x | A # y , (1.61) y
Gx
where the left inner product operates on the symmetric, full rank matrix G y of the observation space Y , while the right inner product is taken with respect to the symmetric full rank matrix G x of the parameter space X . Lemma 1.6 (adjoint operator A # ): A representation of the adjoint operator A # \ m × n of the matrix A \ n× m is A # = G -1x A cG y . (1.62)
31
1-2 The minimum norm solution: “MINOS”
For the proof we take advantage of the symmetry of the left inner product, namely y | Ax
Gy
= y cG y Ax
x | A#y
versus
Gx
= xcG x A # y
y cG y Ax = xcA cG y y = xcG x A # y A cG y = G x A # G x1A cG y = A # .
ƅ Five, we solve the underdetermined system of linear equations
{y = Ax | A \
n× m
}
, rk A = n, n < m
by introducing
• •
the eigenspace of the rectangular matrix A \ n× m of rank r := rk A = n , n < m : A 6 A* the left and right canonical coordinates: x o x* , y o y *
as supported by Box 1.11. The transformations (1.63) x 6 x* , (1.64) y 6 y * from the original coordinates ( x1 ,..., xm ) , the parameters of the parameter space X , to the canonical coordinates x1* ,..., xm* , the left star coordinates, as well as from the original coordinates ( y1 ,..., yn ) , the parameters of the observation space Y , to the canonical coordinates y1* ,..., yn* , the right star coordinates are polar decompositions: a rotation {U, V} is followed by a general stretch G y , G x . The matrices G y as well as G x are product decompositions of type G y = S y S yc and G x = S xcS x . If we substitute S y = G y or S x = G x symbolically, we are led to the methods of general stretches G y and G x respectively. Let us substitute the inverse transformations (1.65) x* 6 x = G x Vx* and (1.66) * * y 6 y = G y Uy into our system of linear equations (1.67) y = Ax or its dual (1.68) y * = A* x* . Such an operation leads us to (1.69) y * = f x* as well as (1.70) y = f ( x ) . Subject to the orthonormality conditions (1.71) U cU = I n and (1.72) V cV = I m we have generated the matrix A* of left–right eigenspace analysis (1.73)
(
)
(
{
1 2
1 2
}
1 2
)
1 2
1 2
1 2
1 2
1 2
1 2
1 2
( )
A* = [ ȁ, 0] subject to the horizontal rank partitioning of the matrix V = [ V1 , V2 ] . Alternatively, the left-right eigenspace synthesis (1.74) ªV c º A = G y U [ ȁ, 0 ] « 1 » G x «V c » ¬ 2¼ 1 2
1 2
- 12
is based upon the left matrix (1.75) L := G y U and the right matrix (1.76) R := G x V . Indeed the left matrix L by means of (1.77) LLc = G -1y reconstructs the inverse matrix of the metric of the observation space Y . Similarly, the right 1 2
32
1 The first problem of algebraic regression
matrix R by means of (1.78) RR c = G -1x generates the inverse matrix of the metric of the parameter space X . In terms of “L, R” we have summarized the eigenvalue decompositions (1.79)-(1.84). Such an eigenvalue decomposition helps us to canonically invert y * = A* x* by means of (1.85), (1.86), namely the rank partitioning of the canonical unknown vector x* into x*1 \ r and x*2 \ m r to determine x*1 = ȁ -1 y * , but leaving x*2 underdetermined. Next we shall proof that x*2 = 0 if x* is MINOS. A
X x
y Y
1 2
1 2
U cG y
V cG x
X x*
y* Y
A*
Figure 1.6: Commutative diagram of coordinate transformations Consult the commutative diagram for a short hand summary of the introduced transformations of coordinates, both of the parameter space X as well as the observation space Y . Box 1.11: Canonical representation, underdetermined system of linear equations “parameter space X ” versus “observation space Y ” (1.63) y * = U cG y y (1.64) x* = V cG x x and and 1 2
1 2
- 12
- 12
y = G y Uy *
x = G x Vx*
(1.65)
(1.66)
“underdetermined system of linear equations” y = Ax | A \ n× m , rk A = n, n < m
{
}
y = Ax
(1.67) - 12
- 12
G y Uy * = AG x Vx*
(
1 2
y * = A * x*
versus
- 12
)
(1.69) y * = U cG y AG x V x*
1 2
(1.68) 1 2
U cG y y = A* V cG x x
(
- 12
1 2
)
y = G y UA* V cG x x (1.70)
33
1-2 The minimum norm solution: “MINOS”
subject to U cU = UUc = I n
(1.71)
V cV = VV c = I m
versus
(1.72)
“left and right eigenspace” “left-right eigenspace “left-right eigenspace analysis” synthesis” A* = U cG y AG x [ V1 , V2 ] 1 2
(1.73)
1 2
ªV c º A = G y U [ ȁ, 0] « 1 » G x (1.74) «V c » ¬ 2¼ 1 2
= [ ȁ, 0]
1 2
“dimension identities” ȁ\
r×r
, 0 \ r × ( m r ) , r := rk A = n, n < m
V1 \ m × r , V2 \ m × ( m r ) , U \ r × r “left eigenspace” - 12 y
“right eigenspace” - 12
1 2
1 2
R := G x V R -1 = V cG x (1.76)
(1.75) L := G U L = U cG y -1
- 12
- 12
R 1 := G x V1 , R 2 := G x V2 1 2
1 2
R 1- := V1cG x , R -2 := V2cG x (1.77) LLc = G -1y (L-1 )cL-1 = G y (1.79)
A = LA* R -1 1
RR c = G -1x (R -1 )cR -1 = G x (1.78) versus
A* = L-1 AR A = [ ȁ, 0] =
(1.80)
*
ªR º (1.81) A = L [ ȁ, 0] « - » ¬« R 2 ¼»
versus
AA # L = Lȁ 2
versus
(1.83)
= L-1 A [ R 1 , R 2 ] ª A # AR 1 = R 1 ȁ 2 « # «¬ A AR 2 = 0
(1.82)
(1.84)
“underdetermined system of linear equations solved in canonical coordinates” (1.85)
ª x* º x* \ r ×1 y * = A* x* = [ ȁ, 0] « 1* » = ȁx*1 , * 1 ( m r )×1 x2 \ «¬ x 2 »¼ ª x*1 º ª ȁ -1 y * º « *» = « * » ¬« x 2 ¼» ¬ x 2 ¼
( )
“if x* is MINOS, then x*2 = 0 : x1*
(1.86)
m
= ȁ -1 y * .”
34
1 The first problem of algebraic regression
Six, we prepare ourselves for MINOS of the underdetermined system of linear equations
{y = Ax | A \
n× m
}
, rk A = n, n < m, || x ||G2 = min x
by introducing Lemma 1.7, namely the eigenvalue - eigencolumn equations of the matrices A # A and AA # , respectively, as well as Lemma 1.9, our basic result on “canonical MINOS”, subsequently completed by proofs. (eigenspace analysis versus eigenspace synthesis of the matrix A \ n× m , r := rkA = n < m ) The pair of matrices {L, R} for the eigenspace analysis and the eigenspace synthesis of the rectangular matrix A \ n× m of rank r := rkA = n < m , namely versus A* = L-1 AR A = LA* R -1 Lemma 1.7
{
}
or
or
A = [ ȁ, 0 ] = L A [ R 1 , R 2 ] *
-1
versus
ª R -1 º A = L [ ȁ, 0] « 1-1 » , ¬« R 2 ¼»
are determined by the eigenvalue – eigencolumn equations (eigenspace equations) of the matrices A # A and AA # , respectively, namely versus A # AR 1 = R 1 ȁ 2 AA # L = Lȁ 2 subject to ªO12 … 0 º « » ȁ 2 = « # % # » , ȁ = Diag + O12 ,..., + Or2 . « 0 " Or2 » ¬ ¼
)
(
Let us prove first AA # L = Lȁ 2 , second A # AR 1 = R 1 ȁ 2 . (i) AA # L = Lȁ 2 AA # L = AG -1x A cG y L = ªV c º ªȁº = L [ ȁ, 0] « 1 » G x G -1x (G x )c [ V1 , V2 ] « » U c(G y )cG y G y U, c 0 «V c » ¬ ¼ ¬ 2¼ 1 2
1 2
ª V cV AA # L = L [ ȁ, 0] « 1 1 «V cV ¬ 2 1 ªI AA # L = L [ ȁ, 0] « r ¬0
1 2
1 2
V1c V2 º ª ȁ º » « », V2c V2 »¼ ¬ 0c ¼ 0 º ªȁº . I m -r »¼ «¬ 0c »¼
ƅ
35
1-2 The minimum norm solution: “MINOS”
(ii) A # AR 1 = R 1 ȁ 2 A # AR = G -1x AcG y AR = ªȁº = G -1xG x V « » U c(G y )cG y G y U [ ȁ, 0] V cG x G x V, ¬ 0c ¼ ª ȁ 2 0º ªȁº A # AR = G x V « » [ ȁ, 0] = G x [ V1 , V2 ] « », ¬ 0c ¼ ¬ 0 0¼ 1 2
1 2
1 2
1 2
1 2
1 2
1 2
A # A [ R 1 , R 2 ] = G x ª¬ V1 ȁ 2 , 0 º¼ 1 2
A # AR 1 = R 1 ȁ 2 .
ƅ
{
}
The pair of eigensystems AA # L = Lȁ 2 , A # AR 1 = R 1 ȁ 2 is unfortunately based upon non-symmetric matrices AA # = AG -1x A cG y and A # A = G -1x A cG y A which make the left and right eigenspace analysis numerically more complex. It appears that we are forced to use the Arnoldi method rather than the more efficient Lanczos method used for symmetric matrices. In this situation we look out for an alternative. Indeed when we substitute
{L, R}
{
- 12
}
- 12
by G y U, G x V
- 12
into the pair of eigensystems and consequently left multiply AA # L by G x , we achieve a pair of eigensystems identified in Corollary 1.8 relying on symmetric matrices. In addition, such a symmetric pair of eigensystems produces the canonical base, namely orthonormal eigencolumns. Corollary 1.8 (symmetric pair of eigensystems): The pair of eigensystems (1.87)
1 2
1 2
G y AG -1x A c(G y )cU = ȁ 2 U versus 1 2
- 12
- 12
- 12
(G x )cA cG y AG x V1 = V1 ȁ 2 (1.88) - 12
- 12
(1.89) G y AG -1x Ac(G y )c Ȝ i2 I r = 0 versus (G x )cA cG y AG x Ȝ 2j I m = 0 (1.90) is based upon symmetric matrices. The left and right eigencolumns are orthogonal. Such a procedure requires two factorizations, 1 2
1 2
- 12
- 12
G x = (G x )cG x , G -1x = G x (G x )c
and
1 2
- 12
- 12
G y = (G y )cG y , G -1y = G y (G y )c
via Cholesky factorization or eigenvalue decomposition of the matrices G x and Gy .
36
1 The first problem of algebraic regression
Lemma 1.9 (canonical MINOS): Let y * = A* x* be a canonical representation of the underdetermined system of linear equations
{y = Ax | A \
n× m
}
, r := rkA = n, n < m .
Then the rank partitioning of x*m ª x* º ª ȁ -1 y * º x1* = ȁ -1 y * * or , x1 \ r ×1 , x*2 \ ( m r )×1 x*m = « *1 » = « » * x2 = 0 ¬x2 ¼ ¬ 0 ¼
(1.91)
is G x -MINOS. In terms of the original coordinates [ x1 ,..., xm ]c of the parameter space X a canonical representation of G x -MINOS is ª ȁ -1 º xm = G x [ V1 , V2 ] « » U cG y y , ¬ 0c ¼ 1 2
1 2
- 12
1 2
xm = G x V1 ȁ -1 U cG y = 5 1 ȁ -1 /-1 y. The G x -MINOS solution xm = A m- y - 12
1 2
A m- = G x V1 ȁ -1 U cG y is built on the canonical ( G x , G y ) weighted reflexive inverse of A . For the proof we depart from G x -MINOS (1.14) and replace the matrix A \ n× m by its canonical representation, namely eigenspace synthesis.
(
xm = G -1x Ac AG -1x Ac
)
-1
y
ªV c º A = G y U [ ȁ, 0 ] « 1 » G x «V c » ¬ 2¼ 1 2
1 2
ªVc º ªȁº AG -1x Ac = G y U [ ȁ, 0] « 1 » G x G -1x (G x )c [ V1 , V2 ] « » Uc(G y )c «V c » ¬0¼ ¬ 2¼ 1 2
1 2
- 12
- 12
1 2
(
AG -1x Ac = G y Uȁ 2 Uc(G y )c, AG -1x Ac
1 2
)
-1
( )
c = G y Uȁ -2 UcG y 1 2
( )c [V , V ] «¬ªȁ0 »¼º Uc (G )c (G )c Uȁ 1 2
xm = G -1x G x
1
2
- 12 y
1 2
y
1 2
-2
1 2
U cG y y
37
1-2 The minimum norm solution: “MINOS”
ª ȁ -1 º xm = G x [ V1 , V2 ] « » U cG y y ¬ 0 ¼ 1 2
1 2
- 12
1 2
xm = G x V1 ȁ -1 U cG y y = A m- y - 12
1 2
A m- = G x V1 ȁ -1 U cG y A1,2,4 G x
( G x weighted reflexive inverse of A ) ª x* º ª ȁ -1 º ª ȁ -1 º ª ȁ -1 y * º ƅ x*m = « *1 » = V cG x xm = « » U cG y y = « » y * = « ». ¬ 0 ¼ ¬ 0 ¼ ¬ 0 ¼ ¬x2 ¼ The important result of x*m based on the canonical G x -MINOS of {y * = A* x* | A* \ n× m , rkA* = rkA = n, n < m} needs a short comment. The rank partitioning of the canonical unknown vector x* , namely x*1 \ r , x*2 \ m r again paved the way for an interpretation. First, we acknowledge the “direct inversion” 1 2
1 2
(
)
x*1 = ȁ -1 y * , ȁ = Diag + O12 ,..., + Or2 , for instance [ x1* ,..., xr* ]c = [O11 y1 ,..., Or1 yr ]c . Second, x*2 = 0 , for instance [ xr*+1 ,..., xm* ]c = [0,..., 0]c introduces a fixed datum for the canonical coordinates ( xr +1 ,..., xm ) . Finally, enjoy the commutative diagram of Figure 1.7 illustrating our previously introduced transformations of type MINOS and canonical MINOS, by means of A m and ( A* )m . A m xm X Y y
1 2
1 2
UcG y
Y y*
V cG x
(A ) *
x*m X
m
Figure 1.7: Commutative diagram of inverse coordinate transformations Finally, let us compute canonical MINOS for the Front Page Example, specialized by G x = I 3 , G y = I 2 .
38
1 The first problem of algebraic regression
ª x1 º ª 2 º ª1 1 1 º « » y = Ax : « » = « » « x2 » , r := rk A = 2 ¬ 3 ¼ ¬1 2 4 ¼ « » ¬ x3 ¼ left eigenspace AA U = AAcU = Uȁ #
right eigenspace A # AV1 = A cAV1 = V1 ȁ 2
2
A # AV2 = A cAV2 = 0 ª2 3 5 º « 3 5 9 » = A cA « » «¬ 5 9 17 »¼
ª3 7 º AA c = « » ¬7 21¼ eigenvalues AA c Oi2 I 2 = 0
A cA O j2 I 3 = 0
O12 = 12 + 130, O22 = 12 130, O32 = 0 left eigencolumns 2 1
ª3 O (1st) « ¬ 7
7 º ª u11 º »« » = 0 21 O12 ¼ ¬u21 ¼
right eigencolumns ª 2 O12 « (1st) « 3 « 5 ¬
3 5 º ª v11 º » 2 5 O1 9 » «« v 21 »» = 0 9 17 O12 »¼ «¬ v31 »¼
subject to
subject to
2 u112 + u21 =1
2 v112 + v 221 + v31 =1
(3 O12 )u11 + 7u21 = 0
versus
ª(2 O12 )v11 + 3v 21 + 5v31 = 0 « 2 ¬3v11 + (5 O1 )v 21 + 9v31 = 0
49 49 ª 2 « u11 = 49 + (3 O 2 ) 2 = 260 + 18 130 1 « 2 2 « 2 (3 O1 ) 211 + 18 130 = «u21 = 2 2 O 49 + (3 ) 260 + 18 130 ¬« 1 2 ª v11 º 1 « 2» « v 21 » = (2 + 5O 2 ) 2 + (3 9O 2 ) 2 + (1 + 7O 2 O 4 ) 2 1 1 1 1 2 » « v31 ¬ ¼
ª (2 + 5O12 ) 2 º « » 2 2 « (3 9O1 ) » « (1 7O12 + O14 ) 2 » ¬ ¼
39
1-2 The minimum norm solution: “MINOS”
(
)
ª 62 + 5 130 2 º « » ªv º « 2» 1 « » « 105 9 130 » «v » = » « v » 102700 + 9004 130 « ¬ ¼ « 191 + 17 130 2 » ¬« ¼» 2 11 2 21 2 31
ª3 O22 (2nd) « ¬ 7
( (
ª 2 O22 7 º ª u12 º « = 0 (2nd) « 3 » 2»« 21 O2 ¼ ¬u22 ¼ « 5 ¬
) )
3 5 º ª v12 º » 2 5 O2 9 » «« v 22 »» = 0 9 17 O22 »¼ «¬ v32 »¼
subject to
subject to
u +u =1
2 v + v 222 + v32 =1
2 12
2 22
2 12
(3 O22 )u12 + 7u22 = 0
versus
ª (2 O22 )v12 + 3v 22 + 5v32 = 0 « 2 ¬ 3v12 + (5 O2 )v 22 + 9v32 = 0
49 49 ª 2 « u12 = 49 + (3 O 2 ) 2 = 260 18 130 2 « 2 2 « 2 (3 O2 ) 211 18 130 = «u22 = 2 2 + 49 (3 O ) «¬ 260 18 130 2 2 ª v12 º 1 « 2 » « v 22 » = (2 + 5O 2 ) 2 + (3 9O 2 ) 2 + (1 + 7O 2 O 4 ) 2 2 2 2 2 2 » « v32 ¬ ¼
(
ª (2 + 5O22 ) 2 º « » 2 2 « (3 9O2 ) » « (1 7O22 + O24 ) 2 » ¬ ¼
)
ª 62 5 130 2 º 2 « » ª v12 º « 2» 1 « 2 » « 105 + 9 130 » « v 22 » = 102700 9004 130 « » 2 » « v32 ¬ ¼ « 191 17 130 2 » «¬ »¼
( (
ª 2 3 5 º ª v13 º (3rd) «« 3 5 9 »» «« v 23 »» = 0 «¬ 5 9 17 »¼ «¬ v33 »¼
subject to
) )
2 v132 + v 223 + v33 =1
2v13 + 3v 23 + 5v33 = 0 3v13 + 5v 23 + 9v33 = 0 ª v13 º ª 2 3º ª v13 º ª 5º ª 5 3º ª 5º « 3 5» « v » = « 9» v33 « v » = « 3 2 » «9» v33 ¬ ¼ ¬ 23 ¼ ¬ ¼ ¬ ¼¬ ¼ ¬ 23 ¼ v13 = 2v33 , v 23 = 3v33
40
1 The first problem of algebraic regression
v132 =
2 9 1 2 2 , v 23 = , v33 = . 7 14 14
There are four combinatorial solutions to generate square roots. 2 ª u11 u12 º ª ± u11 « = «u » 2 ¬ 21 u22 ¼ «¬ ± u21
ª v11 «v « 21 «¬ v31
v12 v 22 v32
2 ª v13 º « ± v11 v 23 »» = « ± v 221 « v33 »¼ « ± v 2 31 ¬
± u122 º » 2 » ± u22 ¼ 2 ± v12
± v 222 2 ± v32
2 º ± v13 » ± v 223 » . » 2 » ± v33 ¼
Here we have chosen the one with the positive sign exclusively. In summary, the eigenspace analysis gave the result as follows. ȁ = Diag
( 12 +
130 , 12 130
7 ª « « 260 + 18 130 U=« « 211 + 18 130 « ¬ 260 + 18 130 ª 62 + 5 130 « « 102700 + 9004 130 « 105 + 9 130 « V=« « 102700 + 9004 130 « 191 + 17 130 « «« 102700 + 9004 130 ¬
)
7
º » 260 18 130 » » 211 + 18 130 » » 260 18 130 ¼
62 5 130 102700 9004 130 105 9 130 102700 9004 130 191 + 17 130 102700 9004 130
º 2 » » 14 » 3 » » = [ V1 , V2 ] . 14 » 1 » » 14 » »¼
1-3 Case study: Orthogonal functions, Fourier series versus Fourier-Legendre series, circular harmonic versus spherical harmonic regression In empirical sciences, we continuously meet the problems of underdetermined linear equations. Typically we develop a characteristic field variable into orthogonal series, for instance into circular harmonic functions (discrete Fourier transform) or into spherical harmonics (discrete Fourier-Legendre transform) with respect to a reference sphere. We are left with the problem of algebraic regression to determine the values of the function at sample points, an infinite set of coefficients of the series expansion from a finite set of observations. An infi-
41
1-3 Case study
nite set of coefficients, the coordinates in an infinite-dimensional Hilbert space, cannot be determined by finite computer manipulations. Instead, band-limited functions are introduced. Only a finite set of coefficients of a circular harmonic expansion or of a spherical harmonic expansion can be determined. It is the art of the analyst to fix the degree / order of the expansion properly. In a peculiar way the choice of the highest degree / order of the expansion is related to the Uncertainty Principle, namely to the width of lattice of the sampling points. Another aspect of any series expansion is the choice of the function space. For instance, if we develop scalar-valued, vector-valued or tensor-valued functions into scalar-valued, vector-valued or tensor-valued circular or spherical harmonics, we generate orthogonal functions with respect to a special inner product, also called “scalar product” on the circle or spherical harmonics are eigenfunctions of the circular or spherical Laplace-Beltrami operator. Under the postulate of the Sturm-Liouville boundary conditions the spectrum (“eigenvalues”) of the Laplace-Beltrami operator is positive and integer. The eigenvalues of the circular Laplace-Beltrami operator are l 2 for integer values l {0,1,..., f} , of the spherical Laplace-Beltrami operator k (k + 1), l 2 for integer values k {0,1,..., f} , l {k , k + 1,..., 1, 0,1,..., k 1, k} . Thanks to such a structure of the infinite-dimensional eigenspace of the Laplace-Beltrami operator we discuss the solutions of the underdetermined regression problem (linear algebraic regression) in the context of “canonical MINOS”. We solve the system of linear equations
{
}
{Ax = y | A \ n× m , rk A = n, n m} by singular value decomposition as shortly outlined in Appendix A. 1-31
Fourier series
? What are Fourier series ? Fourier series (1.92) represent the periodic behavior of a function x(O ) on a circle S1 . They are also called trigonometric series since trigonometric functions {1,sin O , cos O ,sin 2O , cos 2O ,...,sin AO , cos AO} represent such a periodic signal. Here we have chosen the parameter “longitude O ” to locate a point on S1 . Instead we could exchange the parameter O by time t , if clock readings would substitute longitude, a conventional technique in classical navigation. In such a setting, 2S O = Zt = t = 2SQ t , T t AO = AZt = 2S A = 2S AQ t T
42
1 The first problem of algebraic regression
longitude O would be exchanged by 2S , the product of ground period T and time t or by 2S , the product of ground frequency Q . In contrast, AO for all A {0,1,..., L} would be substituted by 2S the product of overtones A / T or AQ and time t . According to classical navigation, Z would represent the rotational speed of the Earth. Notice that A is integer, A Z . Box 1.12: Fourier series x(O ) = x1 + (sin O ) x2 + (cos O ) x3 +
(1.92)
+(sin 2O ) x4 + (cos 2O ) x5 + O3 (sin AO , cos AO ) x(O ) = lim
L of
+L
¦ e (O ) x
A = L
A
(1.93)
A
ª cos AO A > 0 « eA (O ) := « 1 A = 0 «¬sin A O A < 0. Example
(1.94)
(approximation of order three):
x (O ) = e0 x1 + e 1 x2 + e +1 x3 + e 2 x4 + e +2 x5 + O3 .
(1.95)
Fourier series (1.92), (1.93) can be understood as an infinite-dimensional vector space (linear space, Hilbert space) since the base functions (1.94) eA (O ) generate a complete orthogonal (orthonormal) system based on trigonometric functions. The countable base, namely the base functions eA (O ) or {1,sin O , cos O , sin 2O , cos 2O , ..., sin AO , cos AO} span the Fourier space L2 [0, 2S [ . According to the ordering by means of positive and negative indices { L, L + 1,..., 1, 0, +1, ..., L 1, L} (1.95) x (O ) is an approximation of the function x(O ) up to order three, also denoted by x L . Let us refer to Box 1.12 as a summary of the Fourier representation of a function x(O ), O S1 . Box 1.13: The Fourier space “The base functions eA (O ), A { L, L + 1,..., 1, 0, +1,..., L 1, L} , span the Fourier space L2 [ 0, 2S ] : they generate a complete orthogonal (orthonormal) system of trigonometric functions.” “inner product” : x FOURIER and y FOURIER : f
x y :=
1 1 ds * x( s*) y ( s*) = ³ s0 2S
2S
³ d O x(O ) 0
y (O )
(1.96)
43
1-3 Case study
“normalization” < eA (O ) | eA (O ) >:= 1
2
1 2S
2S
³ dO e
A1
(O ) eA (O ) = OA G A A 2
1
1 2
(1.97)
0
ª OA = 1 A1 = 0 subject to « 1 «¬ OA = 2 A1 z 0 1
1
“norms, convergence” || x ||
2
=
1 2S
2S 2 ³ d O x (O ) = lim
Lof
0
lim || x x L ||2 = 0
+L
¦Ox A
2 A
(1.98)
A=L
(convergence in the mean)
Lof
(1.99)
“synthesis versus analysis”
(1.100)
x = lim
L of
xA =
+L
¦e x
A = L
versus
A A
x = lim
L of
+L
1
¦Oe
A = L
A
1 := 2SOA
1 < eA | x >= OA 2S
(1.101)
³ dO e (O ) x (O ) A
0
< x | eA >
(1.102)
A
“canonical basis of the Hilbert space FOURIER” ª 2 sin AO A > 0 « e := « 1 A = 0 « ¬ 2 cos AO A < 0
A
(1.103)
(orthonormal basis) (1.104) (1.106)
1
eA
versus
eA = OA e*A
xA* = OA xA
versus
xA =
e*A =
OA
x = lim
L of
+L
¦e
A = L
* A
< x | e*A >
1
OA
xA*
(1.105) (1.107)
(1.108)
“orthonormality” < e*A (x) | e*A (x) >= G A A 1
2
1 2
(1.109)
44
1 The first problem of algebraic regression
Fourier space Lof FOURIER = span{e L , e L +1 ,..., e 1 , e0 , e1 ,..., e L 1 , e L } dim FOURIER = lim(2 L + 1) = f L of
“ FOURIER = HARM L ( S ) ”. 2
1
? What is an infinite dimensional vector space ? ? What is a Hilbert space ? ? What makes up the Fourier space ? An infinite dimensional vector space (linear space) is similar to a finite dimensional vector space: As in an Euclidean space an inner product and a norm is defined. While the inner product and the norm in a finite dimensional vector space required summation of their components, the inner product (1.96), (1.97) and the norm (1.98) in an infinite-dimensional vector space force us to integration. Indeed the inner products (scalar products) (1.96), (1.97) are integrals over the line element of S1r applied to the vectors x(O ) , y (O ) or eA , eA , respectively. Those integrals are divided by the length s of a total arc of S1r . Alternative representations of < x | y > and < eA | eA > (Dirac’s notation of brackets, decomposed into “bra” and “ket”) based upon ds = rd O , s = 2S r , lead us directly to the integration over S1 , the unit circle. 1
1
2
2
A comment has to be made to the normalization (1.97). Thanks to < eA (O ) | eA (O ) >= 0 for all A1 z A 2 , 1
2
for instance < e1 (O ) | e1 (O ) > = 0 , the base functions eA (O ) are called orthogonal. But according to < eA (O ) | eA (O ) > = 12 , for instance < e1 (O ) | e1 (O ) > = || e1 (O ) ||2 = 12 , < e 2 (O ) | e 2 (O ) > = || e 2 (O ) ||2 = 12 , they are not normalized to 1. A canonical basis of the Hilbert space FOURIER has been introduced by (1.103) e*A . Indeed the base functions e*A (O ) fulfil the condition (1.109) of orthonormality. The crucial point of an infinite dimensional vector space is convergency. When we write (1.93) x(O ) as an identity of infinite series we must be sure that the series converge. In infinite dimensional vector space no pointwise convergency is required. In contrast, (1.99) “convergence in the mean” is postulated. The norm (1.98) || x ||2 equals the limes of the infinite sum of the OA weighted, squared coordinate xA , the coefficient in the trigonometric function (1.92),
45
1-3 Case study
|| x ||2 = lim
L of
+L
¦Ox
A = L
2 A A
< f,
which must be finite. As soon as “convergence in the mean” is guaranteed, we move from a pre-Fourier space of trigonometric functions to a Fourier space we shall define more precisely lateron. Fourier analysis as well as Fourier synthesis, represented by (1.100) versus (1.101), is meanwhile well prepared. First, given the Fourier coefficients x A we are able to systematically represent the vector x FOURIER in the orthogonal base eA (O ) . Second, the projection of the vector x FOURIER onto the base vectors eA (O ) agrees analytically to the Fourier coefficients as soon as we take into account the proper matrix of the metric of the Fourier space. Note the reproducing representation (1.37) “from x to x ”. The transformation from the orthogonal base eA (O ) to the orthonormal base e*A , also called canonical or modular as well as its inverse is summarized by (1.104) as well as (1.105). The dual transformations from Fourier coefficients x A to canonical Fourier coefficients x*A as well as its inverse is highlighted by (1.106) as well as (1.107). Note the canonical reproducing representation (1.108) “from x to x ”. The space ª FOURIER = span {e L , e L +1 ,..., e L 1 , e L }º L of « » « » L + 1) = f » «¬ dim FOURIER = Llim(2 of ¼ has the dimension of hyperreal number f . As already mentioned in the introduction FOURIER = HARM L ( S ) 2
1
is identical with the Hilbert space L2 (S1 ) of harmonic functions on the circle S1 . ? What is a harmonic function which has the unit circle S1 as a support ? A harmonic function “on the unit circle S1 ” is a function x(O ) , O S1 , which fulfils (i) the one-dimensional Laplace equation (the differential equation of a harmonic oscillator) and (ii) a special Sturm-Liouville boundary condition. (1st) '1 x(O ) = 0 (
d2 + Z 2 ) x (O ) = 0 dO2
46
1 The first problem of algebraic regression
x(0) = x(2S ) ª « (2nd) «[ d x(O )](0) = [ d x(O )](2S ). «¬ d O dO The special Sturm-Liouville equations force the frequency to be integer, shortly proven now. ansatz: x(O ) = cZ cos ZO + sZ sin ZO x(0) = x(2S ) cZ = cZ cos 2SZ + sZ sin 2SZ [
d d x(O )](0) = [ x(2S )](2S ) dO dO
sZZ = cZZ sin 2SZ + sZZ cos 2SZ
cos 2SZ = 0 º Z = A A {0,1,..., L 1, L} . sin 2SZ = 0 »¼
Indeed, Z = A , A {0,1,..., L 1, L} concludes the proof. Box 1.14: Fourier analysis as an underdetermined linear model “The observation space Y ” ª y1 º ª x(O1 ) º « y » « x (O ) » 2 » « 2 » « « # » := « # » = [ x(Oi ) ] i {1,.., I }, O [ 0, 2S ] « » « » « yn 1 » « x(On 1 ) » «¬ yn »¼ «¬ x(On ) »¼
(1.110)
dim Y = n I “equidistant lattice on S1 ”
Oi = (i 1)
2S i {1,..., I } I
(1.111)
Example ( I = 2) : O1 = 0, O2 = S 180° Example ( I = 3) : O1 = 0, O2 = Example ( I = 4) : O1 = 0, O2 =
2S 4
6S 5
120°, O3 =
4S 3
240°
90°, O3 = S 180°, O4 =
Example ( I = 5) : O1 = 0, O2 =
O4 =
2S 3
2S 5
216°, O5 =
72°, O3 = 8S 5
288°
4S 5
3S 2
144°,
270°
47
1-3 Case study
“The parameter space X ” x1 = x0 , x2 = x1 , x3 = x+1 , x4 = x2 , x5 = x+2 ,..., xm 1 = x L , xm = xL (1.112) dim X = m 2 L + 1 “The underdetermined linear model” n < m : I < 2L + 1 cos O1 ª y1 º ª1 sin O1 « y » «1 sin O cos O2 2 « 2 » « y := « ... » = « « » « « yn 1 » «1 sin On 1 cos On 1 «¬ yn »¼ «¬1 sin On cos On
... ...
sin LO1 sin LO2
... sin LOn 1 ... sin LOn
cos LO1 º ª x1 º cos LO2 »» «« x2 »» » « ... » . (1.113) »« » cos LOn 1 » « xm 1 » cos LOn »¼ «¬ xm »¼
? How can we setup a linear model for Fourier analysis ? The linear model of Fourier analysis which relates the elements x X of the parameter space X to the elements y Y of the observation space Y is setup in Box 1.14. Here we shall assume that the observed data have been made available on an equidistant angular grid, in short “equidistant lattice” of the unit circle S1 parameterized by ( O1 ,..., On ) . For the optimal design of the Fourier linear model it has been proven that the equidistant lattice 2S i {1,..., I } Oi = (i 1) I is “D-optimal”. Box 1.14 contains three examples for such a lattice. In summary, the finite dimensional observation space Y , dim Y = n , n = I , has integer dimension I . I =2 0° 180° 360° level L = 0
I =3 0°
level L = 1
120°
240°
360°
level L = 2 level L = 3
I =4 0°
90° 180° 270° 360°
I =5 0° 72° 144° 216° 288° 360° Figure 1.8: Fourier series, Pascal triangular graph, weights of the graph: unknown coefficients of Fourier series
Figure 1.9: Equidistant lattice on S1 I = 2 or 3 or 4 or 5
48
1 The first problem of algebraic regression
In contrast, the parameter space X , dim X = f , is infinite dimensional. The unknown Fourier coefficients, conventionally collected in a Pascal triangular graph of Figure 1.8, are vectorized by (1.112) in a peculiar order. X = span{x0 , x1 , x+1 ,..., x L , x+ L } L of
dim X = m = f . Indeed, the linear model (1.113) contains m = 2 L + 1 , L o f , m o f , unknowns, a hyperreal number. The linear operator A : X o Y is generated by the base functions of lattice points. yi = y (Oi ) = lim
L of
L
¦ e (O ) x
A = L
A
i
A
i {1,..., n}
is a representation of the linear observational equations (1.113) in Ricci calculus which is characteristic for Fourier analysis. number of observed data at lattice points
versus
number of unknown Fourier coefficients
n=I
m = 2L + 1 o f
(finite)
(infinite)
Such a portray of Fourier analysis summarizes its peculiarities effectively. A finite number of observations is confronted with an infinite number of observations. Such a linear model of type “underdetermined of power 2” cannot be solved in finite computer time. Instead one has to truncate the Fourier series, a technique or approximation to make up Fourier series “finite” or “bandlimited”. We have to consider three cases. n>m
n=m
n<m
overdetermined case
regular case
underdetermined case
First, we can truncate the infinite Fourier series such that n > m holds. In this case of an overdetermined problem , we have more observations than equations. Second, we alternatively balance the number of unknown Fourier coefficients such that n = m holds. Such a model choice assures a regular linear system. Both linear Fourier models which are tuned to the number of observations suffer from a typical uncertainty. What is the effect of the forgotten unknown Fourier coefficients m > n ? Indeed a significance test has to decide upon any truncation to be admissible. We are in need of an objective criterion to decide upon the degree m of bandlimit. Third, in order to be as objective as possible we follow the third case of “less observations than unknowns” such that
49
1-3 Case study
n < m holds. Such a Fourier linear model which generates an underdetermined system of linear equations will consequently be considered. The first example (Box 1.15: n m = 1 ) and the second example (Box 1.16: n m = 2 ) demonstrate “MINOS” of the Fourier linear model. Box 1.15: The first example: Fourier analysis as an underdetermined linear model: n rk A = n m = 1, L = 1 “ dim Y = n = 2, dim X = m = 3 ” ªx º cos O1 º « 1 » x y = Ax cos O2 »¼ « 2 » «¬ x3 »¼
ª y1 º ª1 sin O1 « y » = «1 sin O ¬ 2¼ ¬ 2
Example ( I = 2) : O1 = 0°, O2 = 180° sin O1 = 0, cos O1 = 1,sin O2 = 0, cos O2 = 1 ª1 0 1 º ª1 sin O1 A := « »=« ¬1 0 1¼ ¬1 sin O2
cos O1 º \ 2× 3 cos O2 »¼
AA c = 2I 2 ( AA c) 1 = 12 I 2 2 1 + sin O1 sin O2 + cos O1 cos O2 º ª AA c = « » 2 ¬1 + sin O2 sin O1 + cos O2 cos O1 ¼ if Oi = (i O )
2S , then I
1 + 2sin O1 sin O2 + 2 cos O1 cos O2 = 0 or L = 1:
+L
¦ e (O A
A = L
L = 1:
i1
+L
¦ e (O
A = L
A
i1
)eA (Oi ) = 0 i1 z i2 2
)eA (Oi ) = L + 1 i1 = i2 2
ª x1 º ª1 1 º ª y1 + y2 º 1« 1« « » » 1 x A = « x2 » = A c( AA c) y = « 0 0 » y = « 0 »» 2 2 «¬ x3 »¼ A «¬1 1»¼ «¬ y1 y2 »¼ || x A ||2 = 12 y cy .
50
1 The first problem of algebraic regression
Box 1.16: The second example: Fourier analysis as an underdetermined linear model: n rk A = n m = 2, L = 2 “ dim Y = n = 3, dim X = m = 5 ”
ª y1 º ª1 sin O1 « y » = «1 sin O 2 « 2» « «¬ y3 »¼ «¬1 sin O3
cos O1 cos O2
sin 2O1 sin 2O2
cos O3
sin 2O3
ª x1 º cos 2O1 º «« x2 »» cos 2O2 »» « x3 » « » cos 2O3 »¼ « x4 » «¬ x5 »¼
Example ( I = 3) : O1 = 0°, O2 = 120° , O3 = 240° sin O1 = 0,sin O2 =
1 2
3,sin O3 = 12 3
cos O1 = 1, cos O2 = 12 , cos O3 = 12 sin 2O1 = 0,sin 2O2 = 12 3,sin 2O3 =
1 2
3
cos 2O1 = 1, cos 2O2 = 12 , cos 2O3 = 12 0 1 ª1 « 1 A := «1 2 3 12 « 1 1 ¬1 2 3 2
0 1 2
1 2
1º » 3 12 » » 3 12 ¼
AA c = 3I 3 ( AAc) 1 = 13 I 3 AA c = ª « « «1 + sin O « + sin 2O «1 + sin O « + sin 2O ¬
1 + sin O1 sin O2 + cos O1 cos O2 + + sin 2O1 sin 2O2 + cos 2O1 cos 2O 2
1 + sin O1 sin O3 + cos O1 cos O3 + + sin 2O1 sin 2O3 + cos 2O1 cos 2O3
sin O1 + cos O2 cos O1 + sin 2O1 + cos 2O2 cos 2O1
3
1 + sin O2 sin O3 + cos O2 cos O3 + + sin 2O2 sin 2O3 + cos 2O 2 cos 2O3
sin O1 + cos O3 cos O1 + sin 2O1 + cos 2O3 cos 2O1
1 + sin O3 sin O2 + cos O3 cos O 2 + + sin 2O3 sin 2O2 + cos 2O3 cos 2O2
3
3
2
2
3
3
if Oi = (i 1)
2S , then I
1 + sin O1 sin O2 + cos O1 cos O2 + sin 2O1 sin 2O2 + cos 2O1 cos 2O2 = = 1 12 12 = 0
º » » » » » » ¼
51
1-3 Case study
1 + sin O1 sin O3 + cos O1 cos O3 + sin 2O1 sin 2O3 + cos 2O1 cos 2O3 = = 1 12 12 = 0 1 + sin O2 sin O3 + cos O2 cos O3 + sin 2O2 sin 2O3 + cos 2O2 cos 2O3 = = 1 34 14 14 + 14 = 0 L = 2:
+L
¦ e (O
A = L
L = 2:
A
i1
+L
¦ e (O
A = L
A
i1
)eA (Oi ) = 0 i1 z i2 2
)eA (Oi ) = L + 1 i1 = i2 2
1 1 º ª1 « » 1 1 «0 2 3 2 3 » ª y1 º 1 12 12 » «« y2 »» x A = Ac( AAc) 1 y = «1 « » 3 «0 12 3 12 3 » «¬ y3 »¼ « » 12 12 »¼ «¬1 ª y1 + y2 + y3 º ª x1 º « 1 » «x » 1 « 2 3 y2 2 3 y3 » « 2» 1 x A = « x3 » = « y1 12 y2 12 y3 » , » 3« « » « 12 3 y2 + 12 3 y3 » « x4 » « » «¬ x5 »¼ «¬ y1 12 y2 12 y3 »¼ A
1 || x ||2 = y cy . 3
Lemma 1.10 (Fourier analysis): If finite Fourier series ª x1 º « x2 » « x3 » yi = y (Oi ) = [1,sin Oi , cos Oi ,..., cos( L 1)Oi ,sin LOi , cos LOi ] « # » (1.114) « xm 2 » «x » « xm 1 » ¬ m ¼ or y = Ax, A \ n× m , rk A = n, I = n < m = 2 L + 1 A O ( n ) := {A \ n × m | AA c = ( L + 1)I n }
(1.115)
are sampled at observations points Oi S1 on an equidistance lattice (equiangular lattice)
52
1 The first problem of algebraic regression
Oi = (i 1)
2S I
i, i1 , i2 {1,..., I } ,
(1.116)
then discrete orthogonality AA c = ( L + 1)I n
+L
¦ e (O A
A=L
i1
ª 0 i1 z i2 )eA (Oi ) = « ¬ L + 1 i1 = i2 2
(1.117)
applies. A is an element of the orthogonal group O(n) . MINOS of the underdetermined system of linear equations (1.95) is xm = 1-32
1 A cy, L +1
xm
2
=
1 y cy. L +1
(1.118)
Fourier-Legendre series ? What are Fourier-Legendre series ?
Fourier-Legendre series (1.119) represent the periodic behavior of a function x(O , I ) on a sphere S 2 . They are called spherical harmonic functions since {1, P11 (sin I ) sin O , P10 (sin I ), P11 (sin I ) cos I ,..., Pkk (sin I ) cos k O} represent such a periodic signal. Indeed they are a pelicular combination of Fourier’s trigonometric polynomials {sin AO , cos AO} and Ferrer’s associated Legendre polynomials Pk A (sin I ) . Here we have chosen the parameters “longitude O and latitude I ” to locate a point on S 2 . Instead we could exchange the parameter O by time t , if clock readings would submit longitude, a conventional technique in classical navigation. In such a setting,
O = Zt =
2S t = 2SQ t , T
t = 2S AQ t , T longitude O would be exchanged by 2S the product of ground period T and time t or by 2S the product of ground frequency Q . In contrast, AO for all A { k , k + 1,..., 1, 0,1,..., k 1, k} would be substituted by 2S the product of overtones A / T or AQ and time t . According to classical navigation, Z would represent the rotational speed of the Earth. Notice that both k , A are integer, k, A Z . AO = AZt = 2S A
Box 1.17: Fourier-Legendre series x (O , I ) =
(1.119)
P00 (sin I ) x1 + P11 (sin I ) sin O x2 + P10 (sin I ) x3 + P1+1 (sin I ) cos O x4 + + P2 2 (sin I ) sin 2O x5 + P21 (sin I ) sin O x6 + P20 (sin I ) x7 + P21 (sin I ) cos O x8 +
53
1-3 Case study
+ P22 (sin I ) cos 2O x9 + O3 ( Pk A (sin I )(cos AO ,sin AO )) K
+k
x(O , I ) = lim ¦ ¦ e k A (O , I ) xk A
(1.120)
Pk A (sin I ) cos AO A > 0 ° ek A (O , I ) := ® Pk 0 (sin I ) A = 0 ° P (sin I ) sin | A | O A < 0 ¯ kA
(1.121)
K of
K
k = 0 A = k
k
x(O , I ) = lim ¦ ¦ Pk A (sin I )(ck A cos AO + sk A sin AO ) K of
(1.122)
k = 0 A = k
“Legendre polynomials of the first kind” recurrence relation k Pk (t ) = (2k 1) t Pk 1 (t ) (k 1) Pk 2 (t ) º » initial data : P0 (t ) = 1, P1 (t ) = t ¼ Example: 2 P2 (t ) = 3tP1 (t ) P0 (t ) = 3t 2 1 P2 (t ) = 32 t 2 12 “if t = sin I , then P2 (sin I ) = 32 sin 2 I 12 ” “Ferrer’s associates Legendre polynomials of the first kind” Pk A (t ) := (1 t 2 )
l
2
d A Pk (t ) dt A
Example: P11 (t ) = 1 t 2
d P1 (t ) dt
P11 (t ) = 1 t 2 “if t = sin I , then P11 (sin I ) = cos I ” Example: P21 (t ) = 1 t 2 P21 (t ) = 1 t 2
d P2 (t ) dt
d 3 2 1 ( t 2) dt 2
P21 (t ) = 3t 1 t 2
(1.123)
54
1 The first problem of algebraic regression
“if t = sin I , then P21 (sin I ) = 3sin I cos I ” Example: P22 (t ) = (1 t 2 )
d2 P2 (t ) dt 2
P22 (t ) = 3(1 t 2 ) “if t = sin I , then P22 (sin I ) = 3cos 2 I ” Example (approximation of order three): x (O , I ) = e00 x1 + e11 x2 + e10 x3 + e11 x4 +
(1.124)
+e 2 2 x5 + e 21 x6 + e 20 x7 + e 21 x8 + e 22 x9 + O3 recurrence relations vertical recurrence relation Pk A (sin I ) = sin I Pk 1, A (sin I ) cos I ¬ª Pk 1, A +1 (sin I ) Pk 1, A 1 (sin I ) ¼º initial data: P0 0 (sin I ) = 1, Pk A = Pk A A < 0 k 1, A 1
k 1, A
k 1, A + 1
k,A Fourier-Legendre series (1.119) can be understood as an infinite-dimensional vector space (linear space, Hilbert space) since the base functions (1.120) e k A (O , I ) generate a complete orthogonal (orthonormal) system based on surface spherical functions. The countable base, namely the base functions e k A (O , I ) or {1, cos I sin O ,sin I , cos I cos O ,..., Pk A (sin I ) sin AO} span the Fourier-Legendre space L2 {[0, 2S [ × ] S / 2, +S / 2[} . According to our order xˆ(O , I ) is an approximation of the function x(O , I ) up to order Pk A (sin I ) {cos AO ,sin A O } for all A > 0, A = 0 and A < 0, respectively. Let us refer to Box 1.17 as a summary of the Fourier-Legendre representation of a function x(O , I ), O [0, 2S [, I ] S/2, +S/2[.
55
1-3 Case study
Box 1.18: The Fourier-Legendre space “The base functions e k A (O , I ) , k {1,..., K } , A { K , K + 1,..., 1, 0,1,..., K 1, K } span the Fourier-Legendre space L2 {[0, 2S ] ×] S 2, + S 2[} : they generate a complete orthogonal (orthonormal) system of surface spherical functions.” “inner product” : x FOURIER LEGENDRE and y FOURIER LEGENDRE : 1 dS x(O , I )T y (O , I ) = S³
< x | y >= =
1 4S
(1.125)
+S 2
2S
³ dO 0
dI cosI x(O , I )y (O , I )
³
S 2
“normalization” <: e k A (O , I ) | e k A > (O , I ) > = 1 1
2 2
1 4S
³ dO 0
= Ok A G k k G A A 1 1
1 2
1 1
³
dI cos I e k A (O , I )e k A (O , I ) (1.126) 1 1
2 2
S 2
1 2
Ok A =
+S 2
2S
k1 , k2 {0,..., K } A 1 , A 2 { k ,..., + k}
1 (k1 A1 )! 2k1 + 1 (k1 + A1 )!
(1.127)
“norms, convergence” +S 2 2S 1 O || x ||2 = d dI cos I x 2 (O , I ) = ³ 4S ³0 S 2 K
(1.128)
+k
= lim ¦ ¦ Ok A x k2A < f K of
k = 0 A = k
lim || x x K ||2 = 0 (convergence in the mean)
K of
(1.129)
“synthesis versus analysis” K
+k
(1.130) x = lim ¦ ¦ e k A xk A K of
xk A =
versus
k = 0 A = k
:=
1 4SOk A
2S
³ dO 0
1 < e k A | x >:= O
(1.131)
+S 2
³
S 2
dI cos I e k A (O , I )x(O , I )
56
1 The first problem of algebraic regression +k
K
1 ekA < x | ekA > K of k = 0 A = k Ok A
x = lim ¦ ¦
(1.132)
“canonical basis of the Hilbert space FOURIER-LEGENDRE” ª 2 cos AO A > 0 « 1 e (O , I ) := 2k + 1 Pk A (sin I ) « A = 0 (k A )! « ¬ 2 sin A O A < 0 (k + A )!
* kA
(1.133)
(orthonormal basis) 1
(1.134)
e*k A =
ek A
versus
e k A = Ok A e*k A
(1.136)
xk*A = Ok A xk A
versus
xk A =
Ok A
K
1
Ok A
xk*A
(1.137)
+k
x = lim ¦ ¦ e*k A < x | e*k A > K of
(1.135)
(1.138)
k = 0 A = k
“orthonormality” < e*k A (O , I ) | e*k A (O , I ) > = G k k G A A 1 2
(1.139)
1 2
Fourier-Legendre space K of FOURIER LEGENDRE = span {e K , L , e K , L +1 ,..., e K , 1 , e K ,0 , e K ,1 ,..., e K , L 1 , e K , L } dim FOURIER LEGENDRE = lim ( K + 1) 2 = f K of
“ FOURIER LEGENDRE = HARM L ( S ) ”. 2
2
An infinite-dimensional vector space (linear space) is similar to a finitedimensional vector space: As in an Euclidean space an inner product and a norm is defined. While the inner product and the norm in a finite-dimensional vector space required summation of their components, the inner product (1.125), (1.126) and the norm (1.128) in an infinite-dimensional vector space force us to integration. Indeed the inner products (scalar products) (1.125) (1.126) are integrals over the surface element of S 2r applied to the vectors x(O , I ), y (O , I ) or e k A , e k A respectively. 1 1
2 2
Those integrals are divided by the size of the surface element 4S of S 2r . Alternative representations of < x, y > and <e k A , e k A > (Dirac’s notation of a bracket 1 1
2 2
57
1-3 Case study
decomposed into “bra” and “txt” ) based upon dS = rd O dI cos I , S = 4S r 2 , lead us directly to the integration over S 2r , the unit sphere. Next we adopt the definitions of Fourier-Legendre analysis as well as FourierLegendre synthesis following (1.125) - (1.139). Here we concentrate on the key problem: ?What is a harmonic function which has the sphere S 2 as a support? A harmonic function “on the unit sphere S 2 ” is a function x(O, I), (O, I) [0, 2S[ × ] S / 2, +S / 2[ which fulfils (i)
the two-dimensional Laplace equation (the differential equation of a two-dimensional harmonic osculator) and
(ii)
a special Sturm-Liouville boundary condition (1st) ' k A x(O , I ) = 0 (
d2 + Z ) x(O ) = 0 dO2
plus the harmonicity condition for ' k ª x(0) = x(2S ) d (2nd) « d «¬[ d O x(O )](0) = [ d O x(O )](2S ). The special Strum-Liouville equation force the frequency to be integer! Box 1.19: Fourier-Legendre analysis as an underdetermined linear model - the observation space Y “equidistant lattice on S 2 ” (equiangular) S S O [0, 2S [, I ] , + [ 2 2 2S ª O = (i 1) i {1,..., I } I = 2J : « i I « j {1,..., I } Ij ¬ S S J ª k {1,..., } « Ik = J + (k 1) J 2 J even: « «Ik = S (k 1) S k { J + 2 ,..., J } «¬ J J 2 S J +1 ª « Ik = (k 1) J + 1 k {1,..., 2 } J odd: « J +3 «Ik = (k 1) S k { ,..., J } J +1 2 ¬«
58
1 The first problem of algebraic regression
longitudinal interval: 'O := Oi +1 Oi =
2S I
S ª « J even : 'I := I j +1 I j = J lateral interval: « « J odd : 'I := I j +1 I j = S «¬ J +1 “initiation: choose J , derive I = 2 J ” 'I J ª k {1,..., } « Ik = 2 + (k 1)'I 2 J even: « ' I J + 2 «Ik = (k 1)'I k { ,..., J } 2 2 ¬ J +1 ª k {1,..., } « Ik = (k 1)'I 2 J odd: « «Ik = (k 1)'I k { J + 3 ,..., J } ¬ 2 Oi = (i 1)'O i {1,..., I } and I = 2 J “multivariate setup of the observation space X ” yij = x(Oi , I j ) “vectorizations of the matrix of observations” Example ( J = 1, I = 2) : Sample points Observation vector y (O1 , I1 ) ª x(O1 , I1 ) º 2×1 « x (O , I ) » = y \ (O2 , I1 ) ¬ 2 1 ¼ Example ( J = 2, I = 4) : Sample points Observation vector y (O1 , I1 ) ª x(O1 , I1 ) º « x (O , I ) » (O2 , I1 ) « 2 1 » (O3 , I1 ) « x(O3 , I1 ) » « » (O4 , I1 ) « x(O4 , I1 ) » = y \8×1 (O1 , I2 ) « x(O1 , I2 ) » « » (O2 , I2 ) « x(O2 , I2 ) » « x (O , I ) » (O3 , I2 ) « 3 2 » (O4 , I2 ) ¬« x(O4 , I2 ) ¼» Number of observations: n = IJ = 2 J 2 Example: J = 1 n = 2, J = 3 n = 18 J = 2 n = 8, J = 4 n = 32.
59
1-3 Case study
?How can we setup a linear model for Fourier-Legendre analysis? The linear model of Fourier-Legendre analysis which relates the elements of the parameter space X to the elements y Y of the observations space Y is again setup in Box 1.19. Here we shall assume that the observed data have been made available on a special grid which extents to O [0, 2S[, I ] S / 2, +S / 2[ 2S ª O = (i 1) , i {1,..., I } I = 2J : « i I « Ij, j {1,..., I }! ¬ longitudinal interval: 2S 'O =: O i +1 O i = I lateral interval: S J even: 'I =: I j + i I j = J S J odd: 'I =: I j +1 I j = . J +1 In addition, we shall review the data sets fix J even as well as for J odd. Examples are given for (i) J = 1, I = 2 and (ii) J = 2, I = 4 . The number of observations which correspond to these data sets have been (i) n = 18 and (ii) n = 32 . For the optimal design of the Fourier-Legendre linear model it has been shown that the equidistant lattice ª J even: 2S O i = (i 1) , I j = « I ¬ J odd: ª S S J½ «Ik = J + (k 1) J , k ®1,..., 2 ¾ ¯ ¿ J even: « « S S J +2 ½ ,..., J ¾ «Ik = (k 1) , k ® J J ¯ 2 ¿ ¬ ª S J + 1½ «Ik = (k 1) J + 1 , k ®1,..., 2 ¾ ¯ ¿ J odd: « « S J +3 ½ , k ® ,..., J ¾ «Ik = (k 1) J + 1 2 ¯ ¿ ¬ is “D-optimal”. Table 1.1 as well Table 1.2 are samples of an equidistant lattice on S 2 especially in a lateral and a longitudinal lattice.
60
1 The first problem of algebraic regression
Table 1.1: Equidistant lattice on S 2 - the lateral lattice J 1 2 3 4 5 6 7 8 9 10
'I
1
2
3
lateral grid 5 6
4
0° +45° -45° 90° 0° +45° -45° 45° 45° +22,5° +67.5° -22.5° -67.5° 0° +30° +60° -30° 30° 15° +45° +75° -15° 30° 0° +22.5° +45° +67.5° 22.5° 22.5° +11.25° +33.75° +56.25° +78.75° 0° +18° +36° +54° 18° +90° +27° +45° +63° 18°
7
8
9
10
-60° -45°
-75°
-22.5°
-45°
-67.5°
-11.25° -33.75° -56.25° -78.75° +72°
-18°
-36°
-54°
-72°
+81°
-9°
-27°
-45°
-63°
-81°
Longitudinal grid 5 6 7
8
9
10
288°
324°
Table 1.2: Equidistant lattice on S 2 - the longitudinal lattice J I = 2 J 'O 1 2 3 4 5
2 4 6 8 10
180° 90° 60° 45° 36°
1
2
3
4
0°
180°
0°
90°
180°
270°
0°
60°
120°
180°
240°
300°
0°
45°
90°
135°
180°
225°
270° 315°
0°
36°
72°
108°
144°
180°
216° 252°
In summary, the finite-dimensional observation space Y, dimY = IJ , I = 2J has integer dimension. As samples, we have computed via Figure 1.10 various horizontal and vertical sections of spherical lattices for instants (i) J = 1, I = 2, (ii) J = 2, I = 4, (iii) J = 3, I = 6, (iv) J = 4, I = 8 and (v) J = 5, I = 10 . By means of Figure 1.11 we have added the corresponding Platt-Carré Maps. Figure 1.10: Spherical lattice left: vertical section, trace of parallel circles right: horizontal section, trace of meridians vertical section meridian section
horizontal section perspective of parallel circles
61
1-3 Case study
+
I
J = 1, I = 2
J = 2, I = 4
J = 2, I = 4
J = 3, I = 6
J = 3, I = 6
J = 4, I = 8
J = 4, I = 8
J = 5, I = 10
J = 5, I = 10
S 2
0°
J = 1, I = 2
S 2
+
S
2S
I
O
S 2
S 2
Figure 1.11 a: Platt-Carré Map of S 2 longitude-latitude lattice, case: J = 1, I = 2, n = 2
62
1 The first problem of algebraic regression +
I
S 2
0°
S 2
+
S 2
I
0°
S 2
+
S 2
I
0°
S 2
+
S 2
I
0°
S 2
+
S
2S
O
S
2S
O
S
2S
I
S 2
+
S 2
I
S 2
+
S 2
I
S 2
+
S 2
O
S
2S
I
O
S 2
S 2
Figure 1.11 b: Platt-Carré Map of S 2 longitude-latitude lattice, case: J = 2, I = 4, n = 8
Figure 1.11 c: Platt-Carré Map of S 2 longitude-latitude lattice, case: J = 3, I = 6, n = 18
Figure 1.11 d: Platt-Carré Map of S 2 longitude-latitude lattice, case: J = 4, I = 8, n = 32
Figure 1.11 e: Platt-Carré Map of S 2 longitude-latitude lattice, case: J = 5, I = 10, n = 50.
In contrast, the parameter space X, dimX = f is infinite-dimensional. The unknown Fourier-Legendre coefficients, collected in a Pascal triangular graph of Figure 1.10 are vectorized by X = span{x00 , x11 , x10 , x11 ,..., xk A } k of k =0ok | A |= 0 o k dim X = m = f .
63
1-3 Case study
Indeed the linear model (1.138) contains m = IJ = 2 J 2 , m o f, unknows, a hyperreal number. The linear operator A : X o Y is generated by the base functions of lattice points K
+k
jij = y ( xij ) = lim ¦ ¦ e k A (Oi , I j )x k A K of
k = 0 A = k
i, j {1,..., n} is a representation of the linear observational equations (1.138) in Ricci calculus which is characteristic for Fourier-Legendre analysis. number of observed data at lattice points
number of unknown Fourier-Legendre coefficients K
n = IJ = 2 J 2
versus
+k
m = lim ¦ ¦ e k A K of
(finite)
k = 0 A = k
(infinite)
Such a portray of Fourier-Legendre analysis effectivly summarizes its peculiarrities. A finite number of observations is confronted with an infinite number of observations. Such a linear model of type “underdetermined of power 2” cannot be solved in finite computer time. Instead one has to truncate the FourierLegendre series, leaving the serier “bandlimited”. We consider three cases. n>m
n=m
n<m
overdetermined case
regular cases
underdetermined case
First, we have to truncate the infinite Fourier-Legendre series that n > m hold. In this case of an overdetermined problem, we have more observations than equations. Second, we alternativly balance the number of unknown FourierLegendre coefficients such that n = m holds. Such a model choice assures a regular linear system. Both linear Fourier-Legendre models which are tuned to the number of observations suffer from a typical uncertainty. What is the effect of the forgotten unknown Fourier-Legendre coefficients m > n ? Indeed a significance test has to decide upon any truncation to be admissible. We need an objective criterion to decide upon the degree m of bandlimit. Third, in order to be as obiective as possible we again follow the third case of “less observations than unknows” such that n < m holds. Such a Fourier-Legendre linear model generating an underdetermined system of linear equations will consequently be considered.
64
1 The first problem of algebraic regression
A first example presented in Box 1.20 demonstrates “MINOS” of the FourierLegendre linear model for n = IJ = 2 J 2 = 2 and k = 1, m = (k + 1) 2 = 4 as unknowns and observations. Solving the system of linear equations Z and four unknows [x1 , x2 , x3 , x4 ](MINOS) = ª y1 + y2 º ª1 1 º « » «0 0 » 1 » = 1 « 0 ». = « 2 «0 0 » 2 « 0 » « » « » ¬1 1¼ ¬ y1 y2 ¼ The second example presented in Box 1.21 refers to “MINOS” of the FourierrLegendre linear model for n = 8 and m = 9 . We have computed the design matrix A .
Box 1.20 The first example: Fourier-Legendre analysis as an underdetermined linear model: m rk A = m n = 2 dim Y = n = 2 versus dim X = m = 4 J = 1, I = 2 J = 2 n = IJ = 2 J 2 = 2 versus K = 1 m = ( k + 1) 2 = 4 ª x1 º « » ª º ª y1 º 1 P11 ( sin I1 ) sin O1 P10 ( sin I1 ) P11 ( sin I1 ) cos O1 « x2 » = « » « y » 1 P sin I sin O P sin I P sin I cos O « x » ¬ 2 ¼ «¬ 11 ( 2) 2 10 ( 2 ) 11 ( 2) 2» ¼« 3» ¬ x4 ¼ subject to
( O1 , I1 ) = (0D , 0D ) and ( O2 , I2 ) = (180D , 0D ) {y = Ax A \ 2×4 , rkA = n = 2, m = 4, n = m = 2} ª1 0 0 1 º ª1 P11 ( sin I1 ) sin O1 P10 ( sin I1 ) P11 ( sin I1 ) cos O1 º A := « » »=« ¬1 0 0 1¼ «¬1 P11 ( sin I2 ) sin O2 P10 ( sin I2 ) P11 ( sin I2 ) cos O2 »¼ P11 ( sin I ) = cos I , P10 ( sin I ) = sin I P11 ( sin I1 ) = P11 (sin I2 ) = 1 , P10 ( sin I1 ) = P10 (sin I2 ) = 0 sin O1 = sin O2 = 0, cos O1 = 1, cos O2 = 1
65
1-3 Case study
AAc = 1 + P11 ( sin I1 ) P11 (sin I2 ) sin O1 sin O2 + º » + P10 (sin I1 ) P10 (sin I2 ) + » » + P11 ( sin I1 ) P11 (sin I2 ) cos O1 cos O2 » » » » 1 + P112 ( sin I2 ) + P102 (sin I2 ) » ¼»
ª « 2 2 « 1 + P11 ( sin I1 ) + P10 (sin I1 ) « « «1 + P ( sin I ) P (sin I ) sin O sin O + 11 2 11 1 2 1 « « + P10 ( sin I2 ) P10 (sin I2 ) + « ¬« + P11 ( sin I2 ) P11 (sin I1 ) cos O2 cos O1
1 AAc = 2I 2 (AAc)-1 = I 2 2 K =1
¦
+ k1 , + k 2
e k A (Oi , Ii ) e k A (Oi , Ii ) = 0, i1 z i2
¦
k1 , k 2 = 0 A1 =-k1 , A 2 = k 2 K =1
¦
+ k1 , + k 2
¦
1 1
1
1
2 2
2
2
e k A (Oi , Ii ) e k A (Oi , Ii ) = 2, i1 = i2
k1 , k 2 = 0 A1 =-k1 , A 2 = k 2
1 1
1
1
2 2
2
2
ª x1 º ª c00 º «x » «s » 1 2» « xA = ( MINOS ) = « 11 » ( MINOS ) = A cy = « x3 » « c10 » 2 « » « » ¬ x4 ¼ ¬ c11 ¼ ª y1 + y2 º ª1 1 º « » «0 0 » y 1 » ª 1º = 1 « 0 ». = « « » 2 « 0 0 » ¬ y2 ¼ 2 « 0 » « » « » ¬1 1¼ ¬ y1 y2 ¼
Box 1.21 The second example: Fourier-Legendre analysis as an underdetermined linear model: m rk A = m n = 1 dim Y = n = 8 versus dim X 1 = m = 9 J = 2, I = 2 J = 4 n = IJ = 2 J 2 = 8 versus k = 2 m = (k + 1) 2 = 9
66
1 The first problem of algebraic regression
ª y1 º ª1 P11 ( sin I1 ) sin O1 P10 ( sin I1 ) P11 ( sin I1 ) cos O1 « ... » = «" … … … « » « ¬ y2 ¼ «¬ 1 P11 ( sin I8 ) sin O8 P10 ( sin I8 ) P11 ( sin I8 ) cos O8 P22 ( sin I1 ) sin 2O1 P21 ( sin I1 ) sin O1 P20 ( sin I1 ) … … … P22 ( sin I8 ) sin 2O8 P21 ( sin I8 ) sin O8 P20 ( sin I8 ) P21 ( sin I1 ) cos O1 P22 ( sin I1 ) cos 2O1 º ª x1 º »« » … … » «... » P21 ( sin I8 ) cos O8 P22 ( sin I8 ) cos 2O8 »¼ «¬ x9 »¼ “equidistant lattice, longitudinal width 'O , lateral width 'I ” 'O = 90D , 'I = 90D (O1 , I1 ) = (0D , +45D ), (O2 , I2 ) = (90D , +45D ), (O3 , I3 ) = (180D , +45D ), (O4 , I4 ) = (270D , +45D ), (O5 , I5 ) = (0D , 45D ), (O6 , I6 ) = (90D , 45D ), (O7 , I7 ) = (180D , 45D ), (O8 , I8 ) = (270D , 45D ) P11 (sin I ) = cos I , P10 (sin I ) = sin I P11 (sin I1 ) = P11 (sin I2 ) = P11 (sin I3 ) = P11 (sin I4 ) = cos 45D = 0,5 2 P11 (sin I5 ) = P11 (sin I6 ) = P11 (sin I7 ) = P11 (sin I8 ) = cos( 45D ) = 0,5 2 P10 (sin I1 ) = P10 (sin I2 ) = P10 (sin I3 ) = P10 (sin I4 ) = sin 45D = 0,5 2 P10 (sin I5 ) = P10 (sin I6 ) = P10 (sin I7 ) = P10 (sin I8 ) = sin( 45D ) = 0,5 2 P22 (sin I ) = 3cos 2 I , P21 (sin I ) = 3sin I cos I , P20 (sin I ) = (3 / 2) sin 2 I (1/ 2) P22 (sin I1 ) = P22 (sin I2 ) = P22 (sin I3 ) = P22 (sin I4 ) = 3cos 2 45D = 3 / 2 P22 (sin I5 ) = P22 (sin I6 ) = P22 (sin I7 ) = P22 (sin I8 ) = 3cos 2 ( 45D ) = 3 / 2 P21 (sin I1 ) = P21 (sin I2 ) = P21 (sin I3 ) = P21 (sin I4 ) = 3sin 45D cos 45D = 3 / 2 P21 (sin I5 ) = P21 (sin I6 ) = P21 (sin I7 ) = P21 (sin I8 ) = 3sin( 45D ) cos( 45D ) = 3 / 2 P20 (sin I1 ) = P20 (sin I2 ) = P20 (sin I3 ) = P20 (sin I4 ) = (3 / 2) sin 2 45D (1/ 2) = 1/ 4 P20 (sin I5 ) = P20 (sin I6 ) = P20 (sin I7 ) = P20 (sin I8 ) = (3 / 2) sin 2 ( 45D ) (1/ 2) = 1/ 4 sin O1 = sin O3 = sin O5 = sin O7 = 0 sin O2 = sin O6 = +1, sin O4 = sin O8 = 1 cos O1 = cos O5 = +1, cos O2 = cos O4 = cos O6 = cos O8 = 0
67
1-3 Case study
cos O3 = cos O7 = 1 sin 2O1 = sin 2O2 = sin 2O3 = sin 2O4 = sin 2O5 = sin 2O6 = sin 2O7 = sin 2O8 = 0 cos 2O1 = cos 2O3 = cos 2O5 = cos 2O7 = +1 cos 2O2 = cos 2O4 = cos 2O6 = cos 2O8 = 1 A \8×9 ª1 0 «1 2/2 « 0 «1 1 2/2 « A=« 1 0 « 2/2 «1 1 0 « «¬1 2 / 2
0 0 1/ 4 3 / 2 3 / 2 º 0 3 / 2 1/ 4 0 3 / 2 » » 0 0 1/ 4 3 / 2 3 / 2 » 0 3 / 2 1/ 4 0 3 / 2 » . 0 0 1/ 4 3 / 2 3 / 2 » » 0 3 / 2 1/ 4 0 3 / 2 » 0 0 1/ 4 3 / 2 3 / 2 » 0 3 / 2 1/ 4 0 3 / 2 »¼ rkA < min{n, m} < 8.
2/2 2/2 2/2 0 2/2 2/2 2/2 0 2/2 2/2 2/2 0 2/2 2/2 2/2 0
Here “the little example” ends, since the matrix A is a rank smaller than 8! In practice, one is taking advantage of • Gauss elimination or • weighting functions in order to directly compute the Fourier-Legendre series. In order to understand the technology of “weighting function” better, we begin with rewriting the spherical harmonic basic equations. Let us denote the letters f k A :=
1 4S
+S / 2
³
S / 2
2S
dI cos I ³ d O Z (I )e k A (O , I ) f (O , I ) , 0
the spherical harmonic expansion f k A of a spherical function f (O , I ) weighted by Z (I ) , a function of latitude. A band limited representation could be specified by +S / 2 2S K 1 f k A := d I I d O Z I e k A (O , I )e k A (O , I ) f kA cos ( ) ¦ ³0 4S S³/ 2 k ,A 1 1
1
fkA =
K
¦
f k,A 1
1 4S
1
k1 , A1
=
S /2
³
S / 2
2S
dI cos I ³ d O w(I )e k A (O , I )e k A ( O , I ) = 1 1
0
K
¦ f ¦g e k1 , A1
k1 , A1
ij
kA
( Oi , I j )e k A ( Oi , I j ) = 1 1
i, j
K
= ¦ gij ¦ gij ekA (Oi ,I j )ek A (Oi ,I j ) = i, j
1 1
1
k1 ,A1
1 1
68
1 The first problem of algebraic regression
= ¦ gij f ( Oi , I j )e k A ( Oi , I j ) . i, j
As a summary, we design the weighted representation of Fourier-Legendre synthesis J
f = ¦ g j Pk*A (sin I j ) f A (I j ) kA
j =1
J
1st: Fourier f (I j ) = ¦ g j eA (O ) f (Oi , I j ) A
i =1
J
2nd: Legendre f k,A ( I , J ) = ¦ g j Pk*A (sin I j ) f A (I j ) j =1
lattice: (Oi , I j ) .
1-4 Special nonlinear models As an example of a consistent system of linearized observational equations Ax = y , rk A = rk( A, y ) where the matrix A R n× m is the Jacobi matrix (Jacobi map) of the nonlinear model, we present a planar triangle whose nodal points have to be coordinated from three distance measurements and the minimum norm solution of type I -MINOS. 1-41
Taylor polynomials, generalized Newton iteration
In addition we review the invariance properties of the observational equations with respect to a particular transformation group which makes the a priori indeterminism of the consistent system of linearized observational equations plausible. The observation vector Y Y { R n is an element of the column space Y R ( A) . The geometry of the planar triangle is illustrated in Figure 1.12. The point of departure for the linearization process of nonlinear observational equations is the nonlinear mapping X 6 F ( X) = Y . The B. Taylor expansion Y = F( X) = F(x) + J (x)( X x) + H( x)( X x)
( X x) + + O [( X x)
( X x)
( X x)], which is truncated to the order O [( X x)
( X x)
( X x)], J ( x), H( x) , respectively, represents the Jacobi matrix of the first partial derivatives, while H , the Hesse matrix of second derivatives, respectively, of the vectorvalued function F ( X) with respect to the coordinates of the vector X , both taken at the evaluation point x . A linearized nonlinear model is generated by truncating the vector-valued function F(x) to the order O [( X x)
( X x)] , namely 'y := F( X) F(x) = J (x)( X x) + O [( X x)
( X x)]. A generalized Newton iteration process for solving the nonlinear observational equations by solving a sequence of linear equations of (injectivity) defect by means of the right inverse of type G x -MINOS is the following algorithm.
1-4 Special nonlinear models
69
Newton iteration Level 0: x 0 = x 0 , 'y 0 = F( X) F(x 0 ) 'x1 = [ J (x 0 ) ]R 'y 0
Level 1:
x1 = x 0 + 'x1 , 'y1 = F (x) F (x1 ) 'x 2 = [ J (x1 ) ]R 'y1
Level i:
xi = xi 1 + 'xi , 'y i = F(x) F(xi ) 'xi +1 = [ J (xi ) ]R 'y i
Level n: 'x n +1 = 'x n (reproducing point in the computer arithmetric) I -MINOS, rk A = rk( A, y ) The planar triangle PD PE PJ is approximately an equilateral triangle pD pE pJ whose nodal points are a priori coordinated by Table 1.3. Table 1.3: Barycentric rectangular coordinates of the equilateral triangle pD pE pJ of Figure 1.12 1 1 ª ª ª xJ = 0 « xD = 2 « xE = 2 pD = « , pE = « , pJ = « «y = 1 3 1 1 «y = «y = 3 3 «¬ J 3 D E «¬ «¬ 6 6 Obviously the approximate coordinates of the three nodal points are barycentric, namely characterized by Box 1.22: Their sum as well as their product sum vanish.
Figure 1.12: Barycentric rectangular coordinates of the nodal points, namely of the equilateral triangle
70
1 The first problem of algebraic regression
Box 1.22: First and second moments of nodal points, approximate coordinates x B + x C + x H = 0, yB + y C + y H = 0 J xy = xD yD + xE yE + xJ yJ = 0 J xx = ( yD2 + yE2 + yJ2 ) = 12 , J yy = ( xD2 + xE2 + xJ2 ) = 12 : J xx = J yy ª xD + xE + xJ º ª0 º »=« » ¬« yD + yE + yJ ¼» ¬0 ¼
ªI º
[ Ii ] = « I x » = « ¬ y¼
for all i {1, 2} 2 2 2 xD yD + yE xE + xJ yJ º ª I xx I xy º ª ( yD + yE + yJ ) ª¬ I ij º¼ = « »= »=« 2 2 2 «¬ I xy I yy »¼ «¬ xD yD + yE xE + xJ yJ ( xD + xE + xJ ) »¼ ª 1 0 º =« 2 = 12 I 2 1» 0 ¬ 2¼
for all i, j {1, 2} .
Box 1.23: First and second moments of nodal points, inertia tensors 2
I1 = ¦ ei I i = e1 I1 + e 2 I 2 i =1
for all i, j {1, 2} : I i =
+f
³
f
I2 =
2
¦e
i , j =1
i
+f
dx ³ dy U ( x, y ) xi f
e j I ij = e1
e1 I11 + e1
e 2 I12 + e 2
e1 I 21 + e 2
e 2 I 22
for all i, j {1, 2} : I ij =
+f
³
f
+f
dx ³ dy U ( x, y )( xi x j r 2G ij ) f
subject to r 2 = x 2 + y 2
U ( x, y ) = G ( x, y, xD yD ) + G ( x, y, xE yE ) + G ( x, y , xJ yJ ) . The product sum of the approximate coordinates of the nodal points constitute the rectangular coordinates of the inertia tensor I=
2
¦e
i , j =1
I ij =
+f
i
e j I ij
+f
³ dx ³ dy U ( x, y)( x x i
f
f
j
r 2G ij )
1-4 Special nonlinear models
71
for all i, j {1, 2} , r 2 = x 2 + y 2 ,
U ( x, y ) = G ( x, y, xD yD ) + G ( x, y, xE yE ) + G ( x, y , xJ yJ ) . The mass density distribution U ( x, y ) directly generates the coordinates I xy , I xx , I yy of the inertia tensor in Box 1.22. ( G (., .) denotes the Dirac generalized function.). The nonlinear observational equations of distance measurements are generated by the Pythagoras representation presented in Box 1.24:
Nonlinear observational equations of distance measurements in the plane, (i) geometric notation versus (ii) algebraic notation 2 Y1 = F1 ( X) = SDE = ( X E X D ) 2 + (YE YD ) 2
Y2 = F2 ( X) = S EJ2 = ( X J X E ) 2 + (YJ YE ) 2 Y3 = F3 ( X) = SJD2 = ( X D X J ) 2 + (YD YJ ) 2 .
sB. Taylor expansion of the nonlinear distance observational equationss 2 Y c := ª¬ SDE , S EJ2 , SJD2 º¼ , Xc := ª¬ X D , YD , X E , YE , X J , YJ º¼
xc = ª¬ xD , yD , xE , yE , xJ , yJ º¼ = ª¬ 12 , 16 3, 12 , 16 3, 0, 13 3 º¼ sJacobi maps ª w F1 « « wX D «wF J (x) := « 2 « wX D « « w F3 « wX D ¬
w F1 wYD
w F1 wX E
w F1 wYE
w F1 wX J
w F2 wYD
w F2 wX E
w F2 wYE
w F2 wX J
w F3 wYD
w F3 wX E
w F3 wYE
w F3 wX J
ª 2( xE xD ) 2( y E yD ) 2( xE xD ) 2( y E yD ) « 0 0 2( xJ xE ) 2( yJ y E ) « « 2( xD xJ ) 2( yD yJ ) 0 0 ¬ 0 2 0 0 ª 2 « =«0 0 1 3 1 « 0 1 ¬ 1 3 0 Let us analyze sobserved minus computed s
w F1 º » wYJ » w F2 » » ( x) = wYJ » » w F3 » wYJ »¼ º 0 0 » 2( xJ xE ) 2( yJ y E ) » = 2( xD xJ ) 2( yD yJ ) »¼ 0º » 3» » 3¼
72
1 The first problem of algebraic regression
'y := F( X) F(x) = J (x)( X x) + O [ ( X x)
( X x) ] = = J'x + O [ ( X x)
( X x) ] ,
here specialized to Box 1.25: Linearized observational equations of distance measurements in the plane, I -MINOS, rkA = dimY sObserved minus computeds 'y := F( X) F(x) = J (x)( X x) + O [ ( X x)
( X x) ] = = J'x + O [ ( X x)
( X x) ] ,
2 2 2 ª 'sDE º ª SDE º ª1.1 1 º ª 1 º sDE 10 « 2 » « 2 » « « 1» » 2 « 'sEJ » = « S EJ sEJ » = «0.9 1» = « 10 » « 2 » « 2 2 » « » « 1» «¬ 'sJD »¼ «¬ SJD sJD »¼ ¬1.2 1 ¼ ¬ 5 ¼
2 ª 'sDE º ª aDE bDE aDE bDE « 2 » « 0 aEJ bEJ « 'sEJ » = « 0 « 2 » « a bJD 0 0 ¬« 'sJD ¼» ¬ JD
ª 'xD º « » 'yD » 0 0 º« » « 'xE » » aEJ bEJ » « 'yE » « aJD bJD »¼ « » « 'xJ » « 'y » ¬ J¼
slinearized observational equationss y = Ax, y R 3 , x R 6 , rkA = 3 0 2 0 0 ª 2 « A=«0 0 1 3 1 « 0 1 ¬ 1 3 0 ª 9 « « 3 1 «« 9 A c( AA c) 1 = 36 « 3 « « 0 « 2 3 ¬
0º » 3» » 3¼
3 º » 3 5 3 » 3 3 » » 5 3 3 » » 6 6 » 4 3 4 3 »¼ 3
sminimum norm solutions
1-4 Special nonlinear models
73
ª ª 'xD º « « » « « 'yD » « 'xE » 1 « « »= xm = « « 'yE » 36 « « « » « « 'xJ » « « 'y » ¬ J¼ ¬ 2
9 y1 + 3 y2 3 y3
º » 3 y1 + 3 y2 5 3 y3 » » 9 y1 + 3 y2 3 y3 » 3 y1 5 3 y2 + 3 y3 » » 6 y2 + 6 y3 » » 3 y1 + 4 3 y2 + 4 3 y3 ¼
1 ª º xcm = 180 ¬ 9, 5 3, 0, 4 3, +9, 3 ¼
(x + 'x)c = ª¬ xD + 'xD , yD + 'yD , xE + 'xE , yE + 'yE , xJ + 'xJ , yJ + 'yJ º¼ = 1 ª º = 180 ¬ 99, 35 3, +90, 26 3, +9, +61 3 ¼ .
The sum of the final coordinates is zero, but due to the non-symmetric displacement field ['xD , 'yD , 'xE , 'yE , 'xJ , 'yJ ]c the coordinate J xy of the inertia tensor does not vanish. These results are collected in Box 1.26. Box 1.26: First and second moments of nodal points, final coordinates yD + 'yD + yE + 'yE + yJ + 'yJ = yD + yE + yJ + 'yD + 'yE + 'yJ = 0 J xy = I xy + 'I xy = = ( xD + 'xD )( yD + 'yD ) + ( xE + 'xE )( yE + 'yE ) + ( xJ + 'xJ )( yJ + 'yJ ) = = xD yD + xE yE + xJ yJ + xD 'yD + yD 'xD + xE 'yE + yE 'xE + xJ 'yJ + yJ 'xJ + + O ('xD 'yD , 'xE 'yE , 'xJ 'yJ ) = 3 /15 J xx = I xx + 'I xx = = ( yD + 'yD ) 2 ( yE + 'yE ) 2 ('yJ yJ ) 2 = = ( yD2 + yE2 + yJ2 ) 2 yD 'yD 2 yE 'yE 2 yJ 'yJ O ('yD2 , 'yE2 , 'yJ2 ) = = 7 /12 J yy = I yy + 'I yy = = ( xD + 'xD ) 2 ( xE + 'xE ) 2 ('xJ xJ ) 2 = = ( xD2 + xE2 + xJ2 ) 2 xD 'xD 2 xE 'xE 2 xJ 'xJ O ('xD2 , 'xE2 , 'xJ2 ) = 11/ 20 .
ƅ
74 1-42
1 The first problem of algebraic regression
Linearized models with datum defect
More insight into the structure of a consistent system of observational equations with datum defect is gained in the case of a nonlinear model. Such a nonlinear model may be written Y = F ( X) subject to Y R n , X R m , or {Yi = Fi ( X j ) | i {1, ..., n}, j {1, ..., m}}. A classification of such a nonlinear function can be based upon the "soft" Implicit Function Theorem which is a substitute for the theory of algebraic partioning, namely rank partitioning. (The “soft” Implicit Function Theorem is reviewed in Appendix C.) Let us compute the matrix of first derivatives [
wFi ] R n× m , wX j
a rectangular matrix of dimension n × m. The set of n independent columns builds up the Jacobi matrix ª wF1 « wX « 1 « wF2 « A := « wX 1 «" « « wFn « wX ¬ 1
wF1 wX 2 wF2 wX 2 wFn wX 2
wF1 º wX n » » wF2 » " » wX n » , r = rk A = n, » » wFn » " wX n »¼ "
the rectangular matrix of first derivatives A := [ A1 , A 2 ] = [J, K ] subject to A R n× m , A1 = J R n× n = R r × r , A 2 = K R n× ( m n ) = R n× ( m r ) . m-rk A is called the datum defect of the consistent system of nonlinear equations Y = F ( X) which is a priori known. By means of such a rank partitioning we have decomposed the vector of unknowns Xc = [ X1c , Xc2 ] into “bounded parameters” X1 and “free parameters” X 2 subject to X1 R n = R r , and X 2 R m n = R m r . Let us apply the “soft” Implicit Function Theorem to the nonlinear observational equations of distance measurements in the plane which we already have intro-
1-4 Special nonlinear models
75
duced in the previous example. Box 1.27 outlines the nonlinear observational 2 equations for Y1 = SDE , Y2 = S EJ2 , Y3 = SJD2 . The columns c1 , c 2 , c3 of the matrix [wFi / wX j ] are linearly independent and accordingly build up the Jacobi matrix J of full rank. Let us partition the unknown vector Xc = [ X1c , Xc2 ] , namely into the "free parameters" [ X D , YD , YE ]c and the "bounded parameters" [ X E , X J , YJ ]c. Here we have made the following choice for the "free parameters": We have fixed the origin of the coordinate system by ( X D = 0, YD = 0). Obviously the point PD is this origin. The orientation of the X-axis is given by YE = 0. In consequence the "bounded parameters" are now derived by solving a quadratic equation, indeed a very simple one: Due to the datum choice we find 2 (1st) X E = ± SDE = ± Y1 2 (2 nd) X J = ± ( SDE S EJ2 + SJD2 ) /(2SDE ) = ±(Y1 Y2 + Y3 ) /(2 Y1 ) 2 2 (3rd) YJ = ± SJD2 ( SDE S EJ2 + SJD2 ) 2 /(4 SDE ) = ± Y3 (Y1 Y2 + Y3 ) 2 /(4Y1 ) .
Indeed we meet the characteristic problem of nonlinear observational equations. There are two solutions which we indicated by "± " . Only prior information can tell us what the realized one in our experiment is. Such prior information has been built into by “approximate coordinates” in the previous example, a prior information we lack now. For special reasons here we have chosen the "+" solution which is in agreement with Table 1.3. An intermediate summary of our first solution of a set of nonlinear observational equations is as following: By the choice of the datum parameters (here: choice of origin and orientation of the coordinate system) as "free parameters" we were able to compute the "bounded parameters" by solving a quadratic equation. The solution space which could be constructed in a closed form was non-unique. Uniqueness was only achieved by prior information. The closed form solution X = [ X 1 , X 2 , X 3 , X 4 , X 5 , X 6 ]c = [ X D , YD , X E , YE , X J , YJ ]c has another deficiency. X is not MINOS: It is for this reason that we apply the datum transformation ( X , Y ) 6 ( x, y ) outlined in Box 1.28 subject to & x &2 = min, namely I-MINOS. Since we have assumed distance observations, the datum transformation is described as rotation (rotation group SO(2) and a translation (translation group T(2) ) in toto with three parameters (1 rotation parameter called I and two translational parameters called t x , t y ). A pointwise transformation ( X D , YD ) 6 ( xD , yD ), ( X E , YE ) 6 ( xE , yE ) and ( X J , YJ ) 6 ( xJ , yJ ) is presented in Box 1.26. The datum parameters ( I , t x , t y ) will be determined by IMINOS, in particular by a special Procrustes algorithm contained in Box 1.28. There are various representations of the Lagrangean of type MINOS outlined in Box 1.27. For instance, we could use the representation
76
1 The first problem of algebraic regression
2 of & x &2 in terms of observations ( Y1 = SDE , Y2 = S EJ2 , Y3 = SJD2 ) which transforms 2 2 (i) & x & into (ii) & x & (Y1 , Y2 , Y3 ) . Finally (iii) & x &2 is equivalent to minimizing the product sums of Cartesian coordinates.
Box 1.27: nonlinear observational equations of distance measurements in the plane (i) geometric notation versus (ii) algebraic notation "geometric notation" 2 SDE = ( X E X D ) 2 + (YE YD ) 2
S EJ2 = ( X J X E ) 2 + (YJ YE ) 2 SJD2 = ( X D X J ) 2 + (YD YJ ) 2 "algebraic notation" 2 Y1 = F1 ( X) = SDE = ( X E X D ) 2 + (YE YD ) 2
Y2 = F2 ( X) = S EJ2 = ( X J X E ) 2 + (YJ YE ) 2 Y3 = F3 ( X) = SJD2 = ( X D X J ) 2 + (YD YJ ) 2 2 Y c := [Y1 , Y2 , Y3 ] = [ SDE , S EJ2 , SJD2 ]
Xc := [ X 1 , X 2 , X 3 , X 4 , X 5 , X 6 ] = [ X D , YD , X E , YE , X J , YJ ] "Jacobi matrix" [
wFi ]= wX j
0 0 ª ( X 3 X 1 ) ( X 4 X 2 ) ( X 3 X 1 ) ( X 4 X 2 ) º « =2 0 0 ( X 5 X 3 ) ( X 6 X 4 ) ( X 5 X 3 ) ( X 6 X 4 ) » « » «¬ ( X 1 X 5 ) ( X 2 X 6 ) 0 0 ( X 1 X 5 ) ( X 2 X 6 ) ¼» wF wF rk[ i ] = 3, dim[ i ] = 3 × 6 wX j wX j ª ( X 3 X 1 ) ( X 4 X 2 ) ( X 3 X 1 ) º J = «« 0 0 ( X 5 X 3 ) »» , rk J = 3 «¬ ( X 1 X 5 ) »¼ (X2 X6) 0 0 0 ª (X4 X2) º K = «« ( X 6 X 4 ) ( X 5 X 3 ) ( X 6 X 4 ) »» . «¬ 0 ( X 1 X 5 ) ( X 2 X 6 ) »¼
1-4 Special nonlinear models
77
"free parameters"
"bounded parameters"
X1 = X D = 0
X 3 = X E = + SDE
X 2 = YD = 0
2 X 5 = X J = + SDE S EJ2 + SJD2 = + Y32 Y22 + Y12
X 4 = YE = 0
2 X 6 = YJ = + S EJ2 SDE = + Y22 Y12
( )
( )
()
( )
( )
Box 1.28: Datum transformation of Cartesian coordinates ª xº ª X º ªtx º « y » = R « Y » + «t » ¬ ¼ ¬ ¼ ¬ y¼ R SO(2):={R R 2× 2 | R cR = I 2 and det R = +1} Reference: Facts :(representation of a 2×2 orthonormal matrix) of Appendix A: ª cos I R=« ¬ sin I
sin I º cos I »¼
xD = X D cos I + YD sin I t x yD = X D sin I + YD cos I t y xE = X E cos I + YE sin I t x yE = X E sin I + YE cos I t y xJ = X J cos I + YJ sin I t x yJ = X J sin I + YJ cos I t y . Box 1.29: Various forms of MINOS (i ) & x &2 = xD2 + yD2 + xE2 + yE2 + xJ2 + yJ2 = min I ,tx ,t y
2 (ii ) & x &2 = 12 ( SDE + S EJ2 + SJD2 ) + xD xE + xE xJ + xJ xD + yD yE + yE yJ + yJ yD = min
I ,tx ,t y
(iii ) & x &2 = min
ª xD xE + xE xJ + xJ xD = min « y y + y y + y y = min . E J J D ¬ D E
The representation of the objective function of type MINOS in term of the obser2 vations Y1 = SDE , Y2 = S EJ2 , Y3 = SJD2 can be proven as follows:
78
1 The first problem of algebraic regression
Proof: 2 SDE = ( xE xD ) 2 + ( yE yD ) 2
= xD2 + yD2 + xE2 + yE2 2( xD xE + yD yE )
1 2
2 SDE + xD xE + yD yE = 12 ( xD2 + yD2 + xE2 + yE2 )
& x &2 = xD2 + yD2 + xE2 + yE2 + xJ2 + yJ2 = 2 = 12 ( SDE + S EJ2 + SJD2 ) + xD xE + xE xJ + xJ xD + yD yE + yE yJ + yJ yD
& x &2 = 12 (Y1 + Y2 + Y3 ) + xD xE + xE xJ + xJ xD + yD yE + yE yJ + yJ yD .
Figure1.13: Commutative diagram (P-diagram) P0 : centre of polyhedron (triangle PD PE PJ ) action of the translation group
ƅ
Figure1.14:
Commutative diagram (E-diagram) P0 : centre of polyhedron (triangle PD PE PJ orthonormal 2-legs {E1 , E1 | P0 } and {e1 , e1 | P0 } ) at P0 action of the translation group
As soon as we substitute the datum transformation of Box 1.28 which we illustrated by Figure 1.9 and Figure 1.10 into the Lagrangean L (t x , t y , I ) of type MINOS ( & x &2 = min ) we arrive at the quadratic objective function of Box 1.30. In the first forward step of the special Procrustes algorithm we obtain the minimal solution for the translation parameters (tˆx , tˆy ) . The second forward step of the special Procrustes algorithm is built on (i) the substitution of (tˆx , tˆy ) in the original Lagrangean which leads to the reduced Lagrangean of Box 1.29 and (ii) the minimization of the reduced Lagrangean L (I ) with respect to the rotation parameter I . In an intermediate phase we introduce "centralized coordinates" ('X , 'Y ) , namely coordinate differences with respect to the centre Po = ( X o , Yo ) of the polyhedron, namely the triangle PD , PE , PJ . In this way we are able to generate the simple (standard form) tan 2I of the solution I the argument of L1 = L1 (I ) = min or L2 = L2 (I ) .
1-4 Special nonlinear models
79
Box 1.30: Minimum norm solution, special Procrustes algorithm, 1st forward step & x &2 := := xD2 + yD2 + xE2 + yE2 + xJ2 + yJ2 = min
tx ,t y ,I
"Lagrangean "
L (t x , t y , I ) := := ( X D cos I + YD sin I t x ) 2 + ( X D sin I + YD cos I t y ) 2 + ( X E cos I + YE sin I t x ) 2 + ( X E sin I + YE cos I t y ) 2 + ( X J cos I + YJ sin I t x ) 2 + ( X J sin I + YJ cos I t y ) 2 1st forward step 1 wL (t x ) = ( X D + X E + X J ) cos I + (YD + YE + YJ ) sin I 3t x = 0 2 wt x 1 wL (t y ) = ( X D + X E + X J ) sin I + (YD + YE + YJ ) cos I 3t y = 0 2 wt y t x = + 13 {( X D + X E + X J ) cos I + (YD + YE + YJ ) sin I} t y = + 13 {( X D + X E + X J ) sin I + (YD + YE + YJ ) cos I} (t x , t y ) = arg{L (t x , t y , I ) = min} . Box 1.31: Minimum norm solution, special Procrustes algorithm, 2nd forward step "solution t x , t y in Lagrangean: reduced Lagrangean"
L (I ) := := { X D cos I + YD sin I [( X D + X E + X J ) cos I + (YD + YE + YJ ) sin I ]}2 + 1 3
+ { X E cos I + YE sin I 13 [( X D + X E + X J ) cos I + (YD + YE + YJ ) sin I ]}2 + + { X J cos I + YJ sin I 13 [( X D + X E + X J ) cos I + (YD + YE + YJ ) sin I ]}2 + + { X D sin I + YD cos I 13 [( X D + X E + X J ) sin I + (YD + YE + YJ ) cos I ]}2 + + { X E sin I + YE cos I 13 [( X D + X E + X J ) sin I + (YD + YE + YJ ) cos I ]}2 + + { X J sin I + YJ cos I 13 [( X D + X E + X J ) sin I + (YD + YE + YJ ) cos I ]}2 = min I
80
1 The first problem of algebraic regression
L (I ) = = {[ X D ( X D + X E + X J )]cos I + [YD 13 (YD + YE + YJ )]sin I }2 + 1 3
+ {[ X E 13 ( X D + X E + X J )]cos I + [YE 13 (YD + YE + YJ )]sin I }2 + + {[ X J 13 ( X D + X E + X J )]cos I + [YJ 13 (YD + YE + YJ )]sin I }2 + + {[ X D 13 ( X D + X E + X J )]sin I + [YD 13 (YD + YE + YJ )]cos I }2 + + {[ X E 13 ( X D + X E + X J )]sin I + [YE 13 (YD + YE + YJ )]cos I }2 + + {[ X J 13 ( X D + X E + X J )]sin I + [YJ 13 (YD + YE + YJ )]cos I }2 "centralized coordinate" 'X := X D 13 ( X D + X E + X J ) = 13 (2 X D X E X J ) 'Y := YD 13 (YD + YE + YJ ) = 13 (2YD YE YJ ) "reduced Lagrangean"
L1 (I ) = ('X D cos I + 'YD sin I ) 2 + + ('X E cos I + 'YE sin I ) 2 + + ('X J cos I + 'YJ sin I ) 2
L2 (I ) = ('X D sin I + 'YD cos I ) 2 + + ('X E sin I + 'YE cos I ) 2 + + ('X J sin I + 'YJ cos I ) 2 1 wL (I ) = 0 2 wI ('X D cos I + 'YD sin I )('X D sin I + 'YD cos I ) + + ('X E cos I + 'YE sin I ) 2 ('X E sin I + 'YE cos I ) + + ('X J cos I + 'YJ sin I ) 2 ('X J sin I + 'YJ cos I ) = 0 ('X D2 + 'X E2 + 'X J2 ) sin I cos I + + ('X D 'YD + 'X E 'YE + 'X J 'YJ ) cos 2 I ('X D 'YD + 'X E 'YE + 'X J 'YJ ) sin 2 I + ('YD2 + 'YE2 + 'YJ2 ) sin I cos I = 0 [('X D2 + 'X E2 + 'X J2 ) ('YD2 + 'YE2 + 'YJ2 )]sin 2I = = 2['X D 'YD + 'X E 'YE + 'X J 'Y ]cos 2I
1-4 Special nonlinear models tan 2I = 2
81 'X D 'YD + 'X E 'YE + 'X J 'Y
('X + 'X E2 + 'X J2 ) ('YD2 + 'YE2 + 'YJ2 ) 2 D
"Orientation parameter in terms of Gauss brackets" tan 2I =
2['X'Y] ['X 2 ] ['Y 2 ]
I = arg{L1 (I ) = min} = arg{L2 (I ) = min}. The special Procrustes algorithm is completed by the backforward steps outlined in Box 1.32: At first we convert tan 2I to (cos I ,sin I ) . Secondly we substitute (cos I ,sin I ) into the translation formula (t x , t y ) . Thirdly we substitute (t x , t y , cos I ,sin I ) into the Lagrangean L (t x , t x , I ) , thus generating the optimal objective function & x &2 at (t x , t y , I ) . Finally as step four we succeed to compute the centric coordinates ª xD xE xJ º «y » ¬ D yE yJ ¼ with respect to the orthonormal 2-leg {e1 , e 2 | Po } at Po from the given coordinates ª XD X E XJ º «Y » ¬ D YE YJ ¼ with respect to the orthonormal 2-leg {E1 , E2 | o} at o , and the optimal datum parameters t x , t y , cos I ,sin I . Box 1.32: Special Procrustes algorithm backward steps step one tan 2I =
2['X'Y] ['X 2 ] ['Y 2 ]
ª cos I « ¬ sin I step two
t x = 13 ([ X]cos I + [Y]sin I ) t y = 13 ([ X]sin I + [Y]cos I ) step three
82
1 The first problem of algebraic regression
& x &2 = L (t x , t y , I ) step four ª xD «y ¬ D
xE yE
xJ º ª cos I = yJ »¼ «¬ sin I
sin I º ª X D »« cos I ¼ ¬ YD
XE YE
X J º ªt x º « » 1c . YJ »¼ «¬t y »¼
We leave the proof for [x] = xD + xE + xJ = 0, [y ] = yD + yE + yJ = 0, [xy ] = xD yD + xE yE + xJ yJ z 0 to the reader as an exercise. A numerical example is SDE = 1.1, S EJ = 0.9, SJD = 1.2, Y1 = 1.21, Y2 = 0.81, Y3 = 1.44, X D = 0, X E = 1.10, X J = 0.84, YD = 0 , YE = 0
, YJ = 0.86,
'X D = 0.647, 'X E = +0.453, 'X J = +0.193, 'YD = 0.287 , 'YE = 0.287, 'YJ = +0.573, test: ['X] = 0, ['Y] = 0, ['X'Y] = 0.166, ['X 2 ] = 0.661, ['Y 2 ] = 0.493, tan 2I = 1.979, I = 31D.598,828, 457,
I = 31D 35c 55cc.782, cos I = 0.851, 738, sin I = 0.523,968, t x = 0.701, t y = 0.095, ª xD = 0.701, xE = +0.236, xJ = +0.465, « ¬ yD = +0.095, yE = 0.481, yJ = +0.387, test: [x] = xD + xE + xJ = 0, [y ] = yD + yE + yJ = 0, [xy ] = +0.019 z 0 . ƅ
1-5 Notes What is the origin of the rank deficiency three of the linearized observational equations, namely the three distance functions observed in a planar triangular network we presented in paragraph three?
1-5 Notes
83
In geometric terms the a priori indeterminancy of relating observed distances to absolute coordinates placing points in the plane can be interpreted easily: The observational equation of distances in the plane P 2 is invariant with respect to a translation and a rotation of the coordinate system. The structure group of the twodimensional Euclidean space E 2 is the group of motion decomposed into the translation group (two parameters) and the rotation group (one parameter). Under the action of the group of motion (three parameters) Euclidean distance functions are left equivariant. The three parameters of the group of motion cannot be determined from distance measurements: They produce a rank deficiency of three in the linearized observational equations. A detailed analysis of the relation between the transformation groups and the observational equations has been presented by E. Grafarend and B. Schaffrin (1974, 1976). More generally the structure group of a threedimensional Weitzenboeck space W 3 is the conformal group C7 (3) which is decomposed into the translation group T3 (3 parameters), the special orthogonal group SO(3) (3 parameters) and the dilatation group ("scale", 1 parameter). Under the action of the conformal group C7 (3) – in total 7 parameters – distance ratios and angles are left equivariant. The conformal group C7 (3) generates a transformation of Cartesian coordinates covering R 3 which is called similarity transformation or datum transformation. Any choice of an origin of the coordinate system, of the axes orientation and of the scale constitutes an S-base following W. Baarda (1962,1967,1973,1979,1995), J. Bossler (1973), M. Berger (1994), A. Dermanis (1998), A. Dermanis and E. Grafarend (1993), A. Fotiou and D. Rossikopoulis (1993), E. Grafarend (1973,1979,1983), E. Grafarend, E. H. Knickmeyer and B. Schaffrin (1982), E. Grafarend and G. Kampmann (1996), G. Heindl (1986), M. Molenaar (1981), H. Quee (1983), P. J. G. Teunissen (1960, 1985) and H. Wolf (1990). In projective networks (image processing, photogrammetry, robot vision) the projective group is active. The projective group generates a perspective transformation which is outlined in E. Grafarend and J. Shan (1997). Under the action of the projective group cross-ratios of areal elements in the projective plane are left equivariant. For more details let us refer to M. Berger (1994), M. H. Brill and E. B. Barrett (1983), R. O. Duda and P.E. Heart (1973), E. Grafarend and J. Shan (1997), F. Gronwald and F. W. Hehl (1996), M. R. Haralick (1980), R. J. Holt and A. N. Netrawalli (1995), R. L. Mohr, L. Morin and E. Grosso (1992), J. L. Mundy and A. Zisserman (1992a, b), R. F. Riesenfeldt (1981), J. A. Schonten (1954). In electromagnetism (Maxwell equations) the conformal group C16 (3,1) is active. The conformal group C16 (3,1) generates a transformation of "space-time" by means of 16 parameters (6 rotational parameters – three for rotation, three for "hyperbolic rotation", 4 translational parameters, 5 "involutary" parameters, 1 dilatation – scale – parameter) which leaves the Maxwell equations in vacuum as
84
1 The first problem of algebraic regression
well as pseudo – distance ratios and angles equivariant. Sample references are A. O. Barut (1972), H. Bateman (1910), F. Bayen (1976), J. Beckers, J. Harnard, M, Perroud and M. Winternitz (1976), D. G. Boulware, L. S. Brown, R. D. Peccei (1970), P. Carruthers (1971), E. Cunningham (1910), T. tom Dieck (1967), N. Euler and W. H. Steeb (1992), P. G. O. Freund (1974), T. Fulton, F. Rohrlich and L. Witten (1962), J. Haantjes (1937), H. A. Kastrup (1962,1966), R. Kotecky and J. Niederle (1975), K. H. Marivalla (1971), D. H. Mayer (1975), J. A. Schouten and J. Haantjes (1936), D. E. Soper (1976) and J. Wess (1990). Box 1.33: Observables and transformation groups observed quantities
transformation group
datum parameters
coordinate differences in R 2 coordinate differences in R 3 coordinate differences in R n Distances in R 2 Distances in R 3 Distances in R n angles, distance ratios in R 2 angles, distance ratios in R 3 angles, distance ratios in R n
translation group T(2) translation group T(3) translation group T( n ) group of motion T(2) , SO(2) group of motion T(3) , SO(3) group of motion T(n) , SO(n) conformal group C 4 (2) conformal group C7 (3) conformal group C( n +1)( n + 2) / 2 (n)
2
cross-ratios of area elements in the projective plane
projective group
3 n 3 3+3=6 n+(n+1)/2 4 7 (n+1)(n+2)/2 8
Box 1.33 contains a list of observables in R n , equipped with a metric, and their corresponding transformation groups. The number of the datum parameters coincides with the injectivity rank deficiency in a consistent system of linear (linearized) observational equations Ax = y subject to A R n× m , rk A = n < m, d ( A) = m rk A .
2
The first problem of probabilistic regression – special Gauss-Markov model with datum defect – Setup of the linear uniformly minimum bias estimator of type LUMBE for fixed effects.
In the first chapter we have solved a special algebraic regression problem, namely the inversion of a system of consistent linear equations classified as “underdetermined”. By means of the postulate of a minimum norm solution || x ||2 = min we were able to determine m unknowns ( m > n , say m = 106 ) from n observations (more unknowns m than equations n, say n = 10 ). Indeed such a mathematical solution may surprise the analyst: In the example “MINOS” produces one million unknowns from ten observations. Though “MINOS” generates a rigorous solution, we are left with some doubts. How can we interpret such an “unbelievable solution”? The key for an evaluation of “MINOS” is handed over to us by treating the special algebraic regression problem by means of a special probabilistic regression problem, namely as a special Gauss-Markov model with datum defect. The bias generated by any solution of an underdetermined or ill-posed problem will be introduced as a decisive criterion for evaluating “MINOS”, now in the context of a probabilistic regression problem. In particular, a special form of “LUMBE”, the linear uniformly minimum bias estimator || LA - I ||2 = min , leads us to a solution which is equivalent to “MINOS”. Alternatively we may say that in the various classes of solving an underdetermined problem “LUMBE” generates a solution of minimal bias. ? What is a probabilistic regression problem? By means of a certain statistical objective function, here of type “minimum bias”, we solve the inverse problem of linear and nonlinear equations with “fixed effects” which relates stochastic observations to parameters. According to the Measurement Axiom observations are elements of a probability space. In terms of second order statistics the observation space Y of integer dimension, dim Y = n , is characterized by the first moment E {y} , the expectation of y Y , and the central second moment D {y} , the dispersion matrix or variancecovariance matrix Ȉ y . In the case of “fixed effects” we consider the parameter space X of integer dimension, dim X = m , to be metrical. Its metric is induced from the probabilistic measure of the metric, the variance-covariance matrix Ȉ y of the observations y Y . In particular, its variance-covariance-matrix is pulled-back from the variance-covariance-matrix Ȉ y . In the special probabilistic regression model “fixed effects” ȟ Ȅ (elements of the parameter space) are estimated. Fast track reading: Consult Box 2.2 and read only Theorem 2.3
86
2 The first problem of probabilistic regression
Please pay attention to the guideline of Chapter two, say definitions, lemma and theorems.
Lemma 2.2 ˆȟ hom S -LUMBE of ȟ Definition 2.1 ˆȟ hom S -LUMBE of ȟ
Theorem 2.3 ˆȟ hom S -LUMBE of ȟ Theorem 2.4 equivalence of G x -MINOS and S -LUMBE
“The guideline of chapter two: definition, lemma and theorems”
2-1 Setup of the linear uniformly minimum bias estimator of type LUMBE Let us introduce the special consistent linear Gauss-Markov model specified in Box 2.1, which is given for the first order moments again in the form of a consistent system of linear equations relating the first non-stochastic (“fixed”), realvalued vector ȟ of unknowns to the expectation E{y} of the stochastic, realvalued vector y of observations, Aȟ = E{y}. Here, the rank of the matrix A , rkA equals the number n of observations, y \ n . In addition, the second order central moments, the regular variance-covariance matrix Ȉ y of the observations, also called dispersion matrix D{y} , constitute the second matrix Ȉ y \ n×n as unknowns to be specified as a linear model further on, but postponed to the fourth chapter. Box 2.1: Special consistent linear Gauss-Markov model {y = Aȟ | A \ n× m , rk A = n, n < m} 1st moments Aȟ = E{y}
(2.1)
2nd moments Ȉ y = D{y} \
n× n
, Ȉ y positive definite, rk Ȉ y = n
ȟ unknown Ȉ y unknown or known from prior information.
(2.2)
2-1 Setup of the linear uniformly minimum bias estimator
87
Since we deal with a linear model, it is “a natural choice” to setup a homogeneous linear form to estimate the parameters ȟ of fixed effects, at first, namely ȟˆ = Ly ,
(2.3)
where L \ m × n is a matrix-valued fixed unknown. In order to determine the real-valued m × n matrix L , the homogeneous linear estimation ȟˆ of the vector ȟ of foxed effects has to fulfil a certain optimality condition we shall outline. Second, we are trying to analyze the bias in solving an underdetermined system of linear equations. Take reference to Box 2.2 where we systematically introduce (i) the bias vector ȕ , (ii) the bias matrix, (iii) the S -modified bias matrix norm as a weighted Frobenius norm. In detail, let us discuss the bias terminology: For a homogeneous linear estimation ȟˆ = Ly the vector-valued bias ȕ := E{ȟˆ ȟ} = E{ȟˆ} ȟ takes over the special form ȕ := E{ȟˆ} ȟ = [I LA] ȟ ,
(2.4)
which led us to the definition of the bias matrix ( I - LA )c . The norm of the bias vector ȕ , namely || ȕ ||2 := ȕcȕ , coincides with the ȟȟ c weighted Frobenius norm 2 of the bias matrix B , namely || B ||ȟȟ c . Here, we meet the central problem that the c c weight matrix ȟȟ , rk ȟȟ = 1, has rank one. In addition, ȟȟ c is not accessible since ȟ is unknown. In this problematic case we replace the matrix ȟȟ c by a fixed positive-definite m × m matrix S , rk S = m , C. R. Rao’s substitute matrix and define the S -weighted Frobenius matrix norm || B ||S2 := trBcSB = tr(I m LA)S(I m LA )c .
(2.5)
Indeed, the substitute matrix S constitutes the matrix of the metric of the bias space. Box 2.2: Bias vector, bias matrix Vector and matrix bias norms Special consistent linear Gauss-Markov model of fixed effects A \ n×m , rk A = n, n < m E{y} = Aȟ, D{y} = Ȉ y “ansatz” ȟˆ = Ly
(2.6)
“bias vector” ȕ := E{ȟˆ ȟ} = E{ȟˆ} ȟ z 0 ȟ \ m
(2.7)
ȕ = LE{y} ȟ = [I m LA]ȟ = 0 ȟ \ m
(2.8)
88
2 The first problem of probabilistic regression
“bias matrix” Bc = I m LA
(2.9)
“bias norms” || ȕ ||2 = ȕcȕ = ȟ c[I m LA ]c [I m LA ]ȟ
(2.10)
2 || ȕ ||2 = tr ȕȕc = tr[I m LA]ȟȟ c[I m LA]c = || B ||[[ c
(2.11)
|| ȕ ||S2 := tr[I m LA]S[I m LA]c =:|| B ||S2 .
(2.12)
Being prepared for optimality criteria we give a precise definition of ȟˆ of type hom S-LUMBE. Definition 2.1 ( ȟˆ hom S -LUMBE of ȟ ): An m × 1 vector ȟˆ is called hom S-LUMBE (homogeneous Linear Uniformly Minimum Bias Estimation) of ȟ in the special consistent linear Gauss-Markov model of fixed effects of Box 2.1, if (1st)
ȟˆ is a homogeneous linear form ȟˆ = Ly ,
(2nd)
(2.13)
in comparison to all other linear estimation ȟˆ has the minimum bias in the sense of || B ||S2 := || (I m LA)c ||S2 .
(2.14)
The estimations ȟˆ of type hom S-LUMBE can be characterized by Lemma 2.2 ( ȟˆ hom S -LUMBE of ȟ ): An m × 1 vector ȟˆ is hom S-LUMBE of ȟ in the special consistent linear Gauss-Markov model with fixed effects of Box 2.1, if and only if the matrix Lˆ fulfils the normal equations ASA cLˆ c = AS .
(2.15)
: Proof : The S -weighted Frobenius norm || ( I m LA )c ||S2 establishes the Lagrangean
L (L) := tr ( I m LA ) S ( I m LA )c = min L
(2.16)
2-1 Setup of the linear uniformly minimum bias estimator
89
for S -LUMBE. The necessary conditions for the minimum of the quadratic Lagrangean L (L) are wL ˆ c L := 2 ª¬ ASA cLˆ c AS º¼ = 0. wL
( )
(2.17)
The theory of matrix derivatives is reviewed in Appendix B “Facts: derivative of a scalar-valued function of a matrix: trace”. The second derivatives w2L Lˆ > 0 w (vec L)w (vec L)c
( )
(2.18)
at the “point” Lˆ constitute the sufficiency conditions. In order to compute such a mn × mn matrix of second derivatives we have to vectorize the matrix normal equation by wL ˆ ˆ c SAcº L = 2 ª¬LASA ¼ wL
(2.19)
wL ˆ c SAcº Lˆ = vec 2 ª¬LASA ¼ w (vec L)
(2.20)
wL Lˆ = 2 [ ASAc
I m ] vec Lˆ 2 vec ( SAc ) . w (vec L)
(2.21)
( )
( )
( )
The Kronecker-Zehfuß poduct A
B of two arbitrary matrices as well as ( A + B )
C = A
C + B
C of three arbitrary matrices subject to the dimension condition dim A = dim B is introduced in Appendix A, “Definition of Matrix Algebra: multiplication of matrices of the same dimension (internal relation) and Laws”. The vec operation (vectorization of an array) is reviewed in Appendix A as well, namely “Definition, Facts: vec AB = (Bc
I Ac ) vec A for suitable matrices A and B ”. No we are prepared to compute w2L ( Lc ) = 2( ASAc)
Im > 0 w (vec L)w (vec L)c
(2.22)
as a positive-definite matrix. The useful theory of matrix derivatives which applies here is reviewed in Appendix B, “Facts: derivative of a matrix-valued function of a matrix namely w (vec X) / w (vec X)c ”. The normal equations of hom S-LUMBE wL / wL(Lˆ ) = 0 agree to (2.15).
ƅ For an explicit representation of ȟˆ as hom LUMBE in the special consistent linear Gauss-Markov model of fixed effects of Box 2.1, we solve the normal equations (2.15) for Lˆ = arg{L (L) = min} . L
90
2 The first problem of probabilistic regression
Beside the explicit representation of ȟˆ of type hom LUMBE we compute the related dispersion matrix D{ȟˆ} in Theorem 2.3 ( ȟˆ hom LUMBE of ȟ ): Let ȟˆ = Ly be hom LUMBE in the special consistent linear Gauss-Markov model of fixed effects of Box 2.1. Then the solution of the normal equation is ȟˆ = SA c( ASA c) 1 y
(2.23)
completed by the dispersion matrix D{ȟˆ} = SAc( ASAc) 1 Ȉ y ( ASAc) AS
(2.24)
and by the bias vector ȕ := E{ȟˆ} ȟ = = ª¬I m SAc( ASAc) 1 A º¼ ȟ
(2.25)
for all ȟ \ m . The proof of Theorem 2.3 is straight forward. At this point we have to comment what Theorem 2.3 is actually telling us. hom LUMBE has generated the estimation ȟˆ of type (2.23), the dispersion matrix D{ȟˆ} of type (2.24) and the bias vector of type (2.25) which all depend on C. R. Rao’s substitute matrix S , rk S = m . Indeed we can associate any element of the solution vector, the dispersion matrix as well as the bias vector with a particular weight which can be “designed” by the analyst.
2-2 The Equivalence Theorem of Gx -MINOS and S -LUMBE We have included the second chapter on hom S -LUMBE in order to interpret G x -MINOS of the first chapter. The key question is open: ? When are hom S -LUMBE and G x -MINOS equivalent ? The answer will be given by Theorem 2.4 (equivalence of G x -MINOS and S -LUMBE) With respect to the special consistent linear Gauss-Markov model (2.1), (2.2) ȟˆ = Ly is hom S -LUMBE for a positive-definite matrix S , if ȟ m = Ly is G x -MINOS of the underdetermined system of linear equations (1.1) for G x = S -1 G -1x = S .
(2.26)
91
2-2 The Equivalence Theorem of G x -MINOS and S-LUMBE
The proof is straight forward if we directly compare the solution (1.14) of G x MINOS and (2.23) of hom S -LUMBE. Obviously the inverse matrix of the metric of the parameter space X is equivalent to the matrix of the metric of the bias space B . Or conversely, the inverse matrix of the metric of the bias space B determines the matrix of the metric of the parameter space X . In particular, the bias vector ȕ of type (2.25) depends on the vector ȟ which is inaccessible. The situation is similar to the one in hypothesis testing. We can produce only an estimation ȕˆ of the bias vector ȕ if we identify ȟ by the hypothesis ȟ 0 = ȟˆ . A similar argument applies to the second central moment D{y} Ȉ y of the “random effect” y , the observation vector. Such a dispersion matrix D{y} Ȉ y has to be known a priori in order to be able to compute the dispersion matrix D{ȟˆ} Ȉȟˆ . Again we have to apply the argument that we are only able to conˆ and to setup a hypothesis about Ȉ . struct an estimate Ȉ y y
2-3 Examples Due to the Equivalence Theorem G x -MINOS ~ S -LUMBE the only new items of the first problem of probabilistic regression are the dispersion matrix D{ȟˆ | hom LUMBE} and the bias matrix B{ȟˆ | hom LUMBE} . Accordingly the first example outlines the simple model of the variance-covariance matrix D{ȟˆ} =: Ȉȟˆ and its associated Frobenius matrix bias norm || B ||2 . New territory is taken if we compute the variance-covariance matrix D{ȟˆ * } =: Ȉȟˆ and its related Frobenius matrix bias norm || B* ||2 for the canonical unknown vector ȟ* of star coordinates [ȟˆ1* ,..., ȟˆ *m ]c , lateron rank partitioned. *
Example 2.1 (simple variance-covariance matrix D{ȟˆ | hom LUMBE} , Frobenius norm of the bias matrix || B(hom LUMBE) || ): The dispersion matrix Ȉ := D{ȟˆ} of ȟˆ (hom LUMBE) is called ȟˆ
Simple, if S = I m and Ȉ y := D{y} = I n ı y2 . Such a model is abbreviated “i.i.d.”
and
“u.s.”
or
or
independent identically distributed observations (one variance component)
unity substituted (unity substitute matrix).
Such a simple dispersion matrix is represented by Ȉȟˆ = A c( AA c) 2 Aı 2y .
(2.27)
The Frobenius norm of the bias matrix for such a simple invironment is derived by
92
2 The first problem of probabilistic regression
|| B ||2 = tr[I m A c( AA c) 1 A]
(2.28)
|| B ||2 = d = m n = m rk A,
(2.29)
since I m A c(AA c)1 A and A c( AAc) 1 A are idempotent. According to Appendix A, notice the fact “ tr A = rk A if A is idempotent”. Indeed the Frobenius norm of the u.s. bias matrix B ( hom LUMBE ) equalizes the square root m n = d of the right complementary index of the matrix A . Table 2.1 summarizes those data of the front page examples of the first chapter relating to D{ȟˆ | hom LUMBE}
and
|| B(hom BLUMBE) || .
Table 2.1: Simple variance-covariance matrix (i.i.d. and u.s.) Frobenius norm of the simple bias matrix Front page example 1.1 A \ 2×3 , n = 2, m = 3 1 ª 21 7 º ª1 1 1 º ª3 7 º A := « , AAc = « , ( AAc) 1 = « » » 14 ¬ 7 3 »¼ ¬1 2 4 ¼ ¬7 21¼ rk A = 2 A c( AA c) 1 =
( AA c) 2 =
ª14 4 º ª10 6 2º 1 « 1 7 1» , A c( AA c) 1 A = « 6 5 3 » 14 « 7 5 » 14 « 2 3 13 » ¬ ¼ ¬ ¼
ª106 51 59 º 1 ª 245 84 º 1 « 2 c c , A ( AA ) A = 51 25 27 » 98 ¬« 84 29 ¼» 98 « 59 27 37 » ¬ ¼
ª106 51 59 º 1 « Ȉȟˆ = A c( AA c) AV = 51 25 27 » V y2 98 « 59 27 37 » ¬ ¼ 2
2 y
|| B ||2 = tr ¬ªI m A c( AA c) 1 A º¼ = tr I 3 tr A c( AA c) 1 A || B ||2 = 3 141 (10 + 5 + 13) = 3 2 = 1 = d || B || = 1 = d .
93
2-3 Examples
Example 2.2 (canonically simple variance-covariance matrix D{ȟˆ * | hom LUMBE} , Frobenius norm of the canonical bias matrix || B* (hom LUMBE) || ): The dispersion matrix Ȉȟˆ := D{ȟˆ * } of the rank partitioned vector of canonical coordinates ȟˆ * = 9 cS ȟˆ of type hom LUMBE is called *
1 2
canonically simple, if S = I m and Ȉ y := D{y * } = I nV y2 . In short, we denote such a model by *
*
“i.i.d.”
and
“u.s.”
or
Or
independent identically distributed observations (one variance component)
unity substituted (unity substitute matrix).
Such a canonically simple dispersion matrix is represented by ° ª ȟ* º ½° ª ȁ-2Vy2 D{ȟˆ* } = D ® « *1 » ¾ = « °¯ ¬«ȟ2 ¼» °¿ ¬« 0
*
0º » 0 ¼»
(2.30)
or ª1 1 º var ȟˆ 1* = Diag « 2 ,..., 2 » V y2 , ȟ1* \ r ×1 , Or ¼ ¬ O1 *
(
)
cov ȟˆ 1* , ȟˆ *2 = 0, var ȟˆ *2 = 0, ȟ*2 \ ( m r )×1 . If the right complementary index d := m rk A = m n is interpreted as a datum defect, we may say that the variances of the “free parameters” ȟˆ *2 \ d are zero. Let us specialize the canonical bias vector ȕ* as well as the canonical bias matrix B* which relates to ȟˆ * = L* y * of type “canonical hom LUMBE” as follows. Box 2.3: Canonical bias vector, canonical bias matrix “ansatz” ȟˆ * = L* y * E{y * } = A*ȟ* , D{y * } = Ȉ y “bias vector” ȕ := E{ȟ* } ȟ * ȟ * \ m *
*
94
2 The first problem of probabilistic regression
ȕ* = L* E{y * } ȟ* ȟ* \ m ȕ* = (I m L* A* )ȟ* ȟ* \ m ª ȕ* º ªI ȕ* (hom LUMBE) = « 1* » = ( « r ¬0 ¬ȕ 2 ¼
0 º ª ȁ 1 º ª ȟ1* º ȁ , 0 ) ] « *» « »[ I d »¼ ¬ 0 ¼ ¬ȟ 2 ¼
(2.31)
for all ȟ*1 \ r , ȟ*2 \ d ª ȕ* º ª0 0 º ª ȟ*1 º ȕ* (hom LUMBE) = « *1 » = « »« *» ¬ 0 I d ¼ ¬ȟ 2 ¼ ¬ȕ 2 ¼
(2.32)
ª ȕ* º ª0º ȕ* (hom LUMBE) = « *1 » = « * » ȟ *2 \ d ¬ȟ 2 ¼ ¬ȕ 2 ¼
(2.33)
“bias matrix” (B* )c = I m L* A* ªI ª¬B* (hom LUMBE) º¼c = « r ¬0
0 º ª ȁ 1 º « » [ ȁ, 0 ] I d »¼ ¬ 0 ¼
ª0 0 º ª¬B* (hom LUMBE) º¼c = « » ¬0 I d ¼
(2.34)
(2.35)
“Frobenius norm of the canonical bias matrix” ª0 0 º || B* (hom LUMBE) ||2 = tr « » ¬0 I d ¼
(2.36)
|| B* (hom LUMBE) || = d = m n .
(2.37)
d = = m n = m rk A of Box 2.3 agrees to the value of the Frobenius norm of the ordinary bias matrix.
It is no surprise that the Frobenius norm of the canonical bias matrix
3
The second problem of algebraic regression – inconsistent system of linear observational equations – overdetermined system of linear equations: {Ax + i = y | A \ n×m , y R ( A ) rk A = m, m = dim X} :Fast track reading: Read only Lemma 3.7.
Lemma 3.2 x A G y -LESS of x
Lemma 3.3 x A G y -LESS of x
Lemma 3.4 x A G y -LESS of x constrained Lagrangean Lemma 3.5 x A G y -LESS of x constrained Lagrangean
Theorem 3.6 bilinear form
Lemma 3.7 Characterization of G y -LESS
“The guideline of chapter three: theorem and lemmas”
96
3 The second problem of algebraic regression
By means of a certain algebraic objective function which geometrically is called a minimum distance function, we solve the second inverse problem of linear and nonlinear equations, in particular of algebraic type, which relate observations to parameters. The system of linear or nonlinear equations we are solving here is classified as overdetermined. The observations, also called measurements, are elements of a certain observation space Y of integer dimension, dim Y = n, which may be metrical, especially Euclidean, pseudo–Euclidean, in general a differentiable manifold. In contrast, the parameter space X of integer dimension, dim X = m, is metrical as well, especially Euclidean, pseudo–Euclidean, in general a differentiable manifold, but its metric is unknown. A typical feature of algebraic regression is the fact that the unknown metric of the parameter space X is induced by the functional relation between observations and parameters. We shall outline three aspects of any discrete inverse problem: (i) set-theoretic (fibering), (ii) algebraic (rank partitioning, “IPM”, the Implicit Function Theorem) and (iii) geometrical (slicing). Here we treat the second problem of algebraic regression: A inconsistent system of linear observational equations: Ax + i = y , A R n× m , rk A = m, n > m, also called “overdetermined system of linear equations”, in short “more observations than unknowns” is solved by means of an optimization problem. The introduction presents us a front page example of three inhomogeneous equations with two unknowns. In terms of 31 boxes and 12 figures we review the least-squares solution of such a inconsistent system of linear equations which is based upon the trinity.
3-1 Introduction
97
3-1 Introduction With the introductory paragraph we explain the fundamental concepts and basic notions of section. For you, the analyst, who has the difficult task to deal with measurements, observational data, modeling and modeling equations we present numerical examples and graphical illustrations of all abstract notions. The elementary introduction is written not for a mathematician, but for you, the analyst, with limited remote control of the notions given hereafter. May we gain your interest. Assume an n-dimensional observation space Y, here a linear space parameterized by n observations (finite, discrete) as coordinates y = [ y1 ," , yn ]c R n in which an m-dimensional model manifold is embedded (immersed). The model manifold is described as the range of a linear operator f from an m-dimensional parameter space X into the observation space Y. The mapping f is established by the mathematical equations which relate all observables to the unknown parameters. Here the parameter space X , the domain of the linear operator f, will also be restricted to a linear space which is parameterized by coordinates x = [ x1 ," , xm ]c R m . In this way the linear operator f can be understood as a coordinate mapping A : x 6 y = Ax. The linear mapping f : X o Y is geometrically characterized by its range R(f), namely R(A), defined by R(f):= {y Y | y = f (x) for all x X} which in general is a linear subspace of Y and its kernel N(f), namely N(f), defined by N ( f ) := {x X | f (x) = 0}. Here the range R(f), namely R(A), does not coincide with the n-dimensional observation space Y such that y R (f ) , namely y R (A) . In contrast, we shall assume that the null space element N(f) = 0 “is empty”: it contains only the element x = 0 . Example 3.1 will therefore demonstrate the range space R(f), namely the range space R(A), which dose not coincide with the observation space Y, (f is not surjective or “onto”) as well as the null space N(f), namely N(A), which is empty. f is not surjective, but injective. Box 3.20 will introduce the special linear model of interest. By means of Box 3.21 it will be interpreted. 3-11 The front page example
Example 3.1 (polynomial of degree two, inconsistent system of linear equations Ax + i = y, x X = R m , dim X = m, y Y = R n , r = rk A = dim X = m, y N ( A ) ):
98
3 The second problem of algebraic regression
First, the introductory example solves the front page inconsistent system of linear equations, x1 + x2 1 x1 + 2 x2 3
x1 + x2 + i1 = 1 x1 + 2 x2 + i2 = 3
or
x1 + 3 x2 4
x1 + 3x2 + i3 = 4,
obviously in general dealing with the linear space X = R m x, dim X = m, here m=2, called the parameter space, and the linear space Y = R n y , dim Y = n, here n =3 , called the observation space. 3-12 The front page example in matrix algebra Second, by means of Box 3.1 and according to A. Cayley’s doctrine let us specify the inconsistent system of linear equations in terms of matrix algebra. Box 3.1: Special linear model: polynomial of degree one, three observations, two unknowns ª y1 º ª a11 y = «« y2 »» = «« a21 «¬ y3 »¼ «¬ a31
a12 º ªx º a22 »» « 1 » x a32 »¼ ¬ 2 ¼
ª1 º ª1 1 º ª i1 º ª1 1 º ª x1 º « » « » « » y = Ax + i : « 2 » = «1 2 » « » + «i2 » A = ««1 2 »» x «¬ 4 »¼ «¬1 3 »¼ ¬ 2 ¼ «¬ i3 »¼ «¬1 3 »¼ xc = [ x1 , x2 ], y c = [ y1 , y2 , y3 ] = [1, 2, 3], i c = [i1 , i2 , i3 ] , A Z +3× 2 R 3× 2 , x R 2×1 , y Z +3×1 R 3×1 r = rk A = dim X = m = 2 . As a linear mapping f : x 6 y Ax can be classified as following: f is injective, but not surjective. (A mapping f is called linear if f ( x1 + x2 ) = f ( x1 ) + f ( x2 ) holds.) Denote the set of all x X by the domain D(f) or the domain space D($). Under the mapping f we generate a particular set called the range R(f) or the range space R(A). Since the set of all y Y is not in the range R(f) or the range space R(A), namely y R (f ) or y R (A) , the mapping f is not surjective. Beside the range R(f), the range space R(A), the linear mapping is characterized by the kernel N ( f ) := {x R m | f (x) = 0} or the null space N ( A) := {x R m | Ax = 0} . Since the inverse mapping
99
3-1 Introduction
g : R ( f ) y / 6 x D( f ) is one-to-one, the mapping f is injective. Alternatively we may identify the kernel N(f), or the null space N ( A ) with {0} . ? Why is the front page system of linear equations called inconsistent ? For instance, let us solve the first two equations, namely x1 = 0, x2 = 1. As soon as we substitute this solution in the third one, the inconsistency 3 z 4 is met. Obviously such a system of linear equations needs general inconsistency parameters (i1 , i2 , i3 ) in order to avoid contradiction. Since the right-hand side of the equations, namely the inhomogeneity of the system of linear equations, has been measured as well as the model (the model equations) have been fixed, we have no alternative but inconsistency. Within matrix algebra the index of the linear operator A is the rank r = rk A, here r = 2, which coincides with the dimension of the parameter space X, dim X = m, namely r = rk A = dim X = m, here r=m=2. In the terminology of the linear mapping f, f is not “onto” (surjective), but “one-to-one” (injective). The left complementary index of the linear operator A R n× m , which account for the surjectivity defect is given by d s = n rkA, also called “degree of freedom” (here d s = n rkA = 1 ). While “surjectivity” related to the range R(f) or “the range space R(A)” and “injectivity” to the kernel N(f) or “the null space N(A)” we shall constructively introduce the notion of range R (f ) range space R (A)
versus
kernel N ( f ) null space N ( f )
by consequently solving the inconsistent system of linear equations. But beforehand let us ask: How can such a linear model of interest, namely a system of inconsistent linear equations, be generated ? With reference to Box 3.2 let us assume that we have observed a dynamical system y(t) which is represented by a polynomial of degree one with respect to time t R + , namely y(t ) = x1 + x2t. (Due to y• (t ) = x2 it is a dynamical system with constant velocity or constant first derivative with respect to time t.) The unknown polynomial coefficients are collected in the column array x = [ x1 , x2 ]c, x X = R 2 , dim X = 2, and constitute the coordinates of the two-dimensional parameter space X. If the dynamical system y(t) is observed at three instants, say y(t1) = y1 = 1, y(t2) = y2 = 2, y(t3) = y3 = 4, and if we collect the observations in the column array y = [ y1 , y2 , y3 ]c = [1, 2, 4]c, y Y = R 3 , dim Y = 3, they constitute the coordinates of the three-dimensional observation space Y. Thus we are left with the
100
3 The second problem of algebraic regression
problem to compute two unknown polynomial coefficients from three measurements. Box 3.2: Special linear model: polynomial of degree one, three observations, two unknowns ª y1 º ª1 t1 º ª i1 º ª x1 º « » « » « » y = « y2 » = «1 t2 » « » + «i2 » x «¬ y3 »¼ «¬1 t3 »¼ ¬ 2 ¼ «¬ i3 »¼ ª t1 = 1, y1 = 1 ª 1 º ª1 1 º ª i1 º ª x1 º « » « « » « » «t2 = 2, y2 = 2 : « 2 » = «1 2 » « » + «i2 » ~ x «¬ t3 = 3, y3 = 4 «¬ 4 »¼ «¬1 3 »¼ ¬ 2 ¼ «¬ i3 »¼ ~ y = Ax + i, r = rk A = dim X = m = 2 . Thirdly, let us begin with a more detailed analysis of the linear mapping f : Ax y or Ax + i = y , namely of the linear operator A R n× m , r = rk A = dim X = m. We shall pay special attention to the three fundamental partitionings, namely (i)
algebraic partitioning called rank partitioning of the matrix A,
(ii) geometric partitioning called slicing of the linear space Y (observation space), (iii) set-theoretical partitioning called fibering of the set Y of observations. 3-13 Least squares solution of the front page example by means of vertical rank partitioning Let us go back to the front page inconsistent system of linear equations, namely the problem to determine two unknown polynomial coefficients from three sampling points which we classified as an overdetermined one. Nevertheless we are able to compute a unique solution of the overdetermined system of inhomogeneous linear equations Ax + i = y , y R ( A) or rk A = dim X , here A R 3× 2 x R 2×1 , y R 3×1 if we determine the coordinates of the unknown vector x as well as the vector i of the inconsistency by least squares (minimal Euclidean length, A2-norm), here & i &2I = i12 + i22 + i32 = min . Box 3.3 outlines the solution of the related optimization problem.
101
3-1 Introduction
Box 3.3: Least squares solution of the inconsistent system of inhomogeneous linear equations, vertical rank partitioning The solution of the optimization problem {& i &2I = min | Ax + i = y , rk A = dim X} x
is based upon the vertical rank partitioning of the linear mapping f : x 6 y = Ax + i, rk A = dim X , which we already introduced. As soon as ª y1 º ª A1 º ª i1 º r ×r « y » = « A » x + « i » subject to A1 R ¬ 2¼ ¬ 2¼ ¬ 2¼ 1 1 x = A1 i1 + A1 y1 y 2 = A 2 A11i1 + i 2 + A 2 A11y1
i 2 = A 2 A11i1 A 2 A11 y1 + y 2
is implemented in the norm & i &2I we are prepared to compute the first derivatives of the unconstrained Lagrangean
L (i1 , i 2 ) := & i &2I = i1ci1 + i c2 i 2 = = i1ci1 + i1c A1c1A c2 A 2 A11i1 2i1c A1c1A c2 ( A 2 A11y1 y 2 ) + +( A 2 A11y1 y 2 )c( A 2 A11y1 y 2 ) = = min i1
wL (i1l ) = 0 wi1 A1c1A c2 ( A 2 A11y1 y 2 ) + [ A1c1Ac2 A 2 A11 + I ] i1l = 0 i1l = [I + A1c1Ac2 A 2 A11 ]1 A1c1 A c2 ( A 2 A11y1 y 2 ) which constitute the necessary conditions. The theory of vector derivatives is presented in Appendix B. Following Appendix A, “Facts: Cayley inverse: sum of two matrices , namely (s9), (s10) for appropriate dimensions of the involved matrices”, we are led to the following identities:
102
3 The second problem of algebraic regression
1st term (I + A1c1A c2 A 2 A11 ) 1 A1c1A c2 A 2 A11y1 = ( A1c + A c2 A 2 A11 ) 1 A c2 A 2 A11y1 = A1 ( A1c A1 + A c2 A 2 ) 1 A c2 A 2 A11y1 = A1 ( A1c A1 + A c2 A 2 ) 1 A1cy1 + + A1 ( A1c A1 + A c2 A 2 ) 1 ( A c2 A 2 A11 + A1c )y1 = A1 ( A1c A1 + A c2 A 2 ) 1 A1cy1 + +( A1c A1 + A c2 A 2 ) 1 ( A c2 A 2 + A1c A1 )y1 = A1 ( A1c A1 + A c2 A 2 ) 1 A1cy1 + y1 2nd term (I + A1c1A c2 A 2 A11 ) 1 A1c1A 2 y 2 = ( A1c + A c2 A 2 A11 ) 1 A 2 y 2 = = A1 ( A1c A1 + A c2 A 2 ) 1 A c2 y 2 i1l = A1 ( A1c A1 + A c2 A 2 ) 1 ( A1cy1 + A c2 y 2 ) + y1 . The second derivatives w2L (i1l ) = 2[( A 2 A11 )c( A 2 A11 ) + I] > 0 wi1wi1c due to positive-definiteness of the matrix ( A 2 A11 )c( A 2 A11 ) + I generate the sufficiency condition for obtaining the minimum of the unconstrained Lagrangean. Finally let us backward transform i1l 6 i 2 l = A 2 A11i1l A 2 A11 y1 + y 2 . i 2 l = A 2 ( A1c A1 + Ac2 A 2 ) 1 ( A1cy1 + Ac2 y 2 ) + y 2 . Obviously we have generated the linear form i1l = A1 ( A1c A1 + A c2 A 2 ) 1 ( A1cy1 + Ac2 y 2 ) + y1 i 2l = A 2 ( A1c A1 + Ac2 A 2 ) 1 ( A1cy1 + Ac2 y 2 ) + y 2 or ª i1l º ª A1 º ª y1 º ª y1 º 1 «i » = « A » ( A1c A1 + A c2 A 2 ) [ A1c , A c] « y » + « y » ¬ 2¼ ¬ 2¼ ¬ 2¼ ¬ 2l ¼ or i l = A( A cA) 1 y + y. Finally we are left with the backward step to compute the unknown vector of parameters x X : xl = A11i1l + A11 y1 xl = ( A1c A1 + A c2 A 2 ) 1 ( A1cy1 + A c2 y 2 ) or xl = ( A cA) 1 Acy.
103
3-1 Introduction
A numerical computation with respect to the introductory example is ª3 6 º ª14 6º A1c A1 + A c2 A 2 = « , ( A1c A1 + A c2 A 2 ) 1 = 16 « » », ¬6 14 ¼ ¬ 6 3 ¼ ª 8 3 º A1 ( A1c A1 + A c2 A 2 ) 1 = 16 « », ¬2 0 ¼ A 2 ( A1c A1 + A c2 A 2 ) 1 = 16 [4, 3], ª8º ª1 º A1cy1 + Ac2 y 2 = « » , y1 = « » , y 2 = 4 ¬19 ¼ ¬ 3¼ ª 1º i1l = 16 « » , i 2 l = 16 , & i l & I = 16 6, ¬2¼ ª 2 º xl = 16 « » , & xl &= 16 85, ¬9¼ y(t ) = 13 + 32 t ª 2 2 º 1 w 2L (x 2m ) = [( A 2 A11 )c( A 2 A11 ) + I] = « » > 0, 2 wx 2 wxc2 ¬ 2 5 ¼ § ª 2 2 º · § ª 2 2 º · " first eigenvalue O1 ¨ « ¸ = 6", " second eigenvalue O2 ¨ « » » ¸ = 1". © ¬ 2 5 ¼ ¹ © ¬ 2 5 ¼ ¹ The diagnostic algorithm for solving an overdetermined system of linear equations y = Ax, rk A = dim X = m, m < n = dim Y, y Y by means of rank partitioning is presented to you by Box 3.4. 3-14
The range R(f) and the kernel N(f), interpretation of “LESS” by three partitionings: (i) algebraic (rank partitioning) (ii) geometric (slicing) (iii) set-theoretical (fibering)
Fourthly, let us go into the detailed analysis of R(f), R ( f ) A , N(f), with respect to the front page example. Beforehand we begin with a comment. We want to emphasize the two step procedure of the least squares solution (LESS) once more: The first step of LESS maps the observation vector y onto the range space R(f) while in the second step the LESS point y R ( A) is uniquely mapped to the point xl X , an element of the parameter space. Of
104
3 The second problem of algebraic regression
course, we directly produce xl = ( A cA) 1 Acy just by substituting the inconsistency vector i = y – Ax into the l2 norm & i &2I = (y Ax)c(y Ax) = min . Such a direct procedure which is common practice in LESS does not give any insight into the geometric structure of LESS. But how to identify the range R(f), namely the range space R(A), or the kernel N(f), namely the null space N(A) in the front page example? By means of Box 3.4 we identify R(f) or “the null space R(A)” and give its illustration by Figure 3.1. Such a result has paved the way to the diagnostic algorithm for solving an overdetermined system of linear equations by means of rank partitioning presented in Box 3.5. The kernel N(f) or “the null space” is immediately identified as {0} = N ( A ) = {x R m | Ax) = 0} = {x R m | A1 x = 0} by means of rank partitioning ( A1x = 0 x = 0} . Box 3.4: The range space of the system of inconsistent linear equations Ax + i = y, “vertical” rank partitioning The matrix A is called “vertically rank partitioned”, if r = rk A = rk A1 = m, ªA º {A R n×m A = « 1 » A1 R r ×r , A 2 R d ×r } ¬ A 2 ¼ d = d ( A) = m rk A holds. (In the introductory example A R 3× 2 , A1 R 2× 2 , A 2 R1× 2 , rk A = 2, d ( A) = 1 applies.) An inconsistent system of linear equations Ax = y, rk A = dim X = m, is “vertically rank partitioned” if ªA º ªi º Ax = y , rk A = dim X y = « 1 » x + « 1 » ¬A2 ¼ ¬i 2 ¼ ª y = A1x + i1 « 1 ¬y 2 = A 2x + i 2 for a partitioned observation vector ªy º {y R n y = « 1 » | y1 R r ×1 , y 2 R d ×1 } ¬y 2 ¼ and a partitioned inconsistency vector ªi º {i R n i = « 1 » | i1 R r ×1 , i 2 R d ×1 }, ¬i 2 ¼ respectively, applies. (The “vertical” rank partitioning of the
105
3-1 Introduction
matrix A as well as the “vertically rank partitioned” inconsistent system of linear equations Ax + i = y , rk A = dim X = m , of the introductory example is ª1 1 º ª1 1 º «1 2 » = A = ª A1 º = «1 2» , « » « « » » ¬ A 2 ¼ «1 3» «¬1 3 »¼ ¬ ¼
ª1 º ª y1 º « » 2 ×1 « » = «3 » , y1 R , y 2 R . y ¬ 2 ¼ «4» ¬ ¼ By means of the vertical rank partitioning of the inconsistent system of inhomogeneous linear equations an identification of the range space R(A), namely R ( A) = {y R n | y 2 A 2 A11 y1 = 0} is based upon y1 = A1x + i1 x1 = A11 (y1 i1 ) y 2 = A 2 x + i 2 x 2 = A 2 A11 (y1 i1 ) + i 2 y 2 A 2 A11 y1 = i 2 A 2 A11i1 which leads to the range space R(A) for inconsistency zero, particularly in the introductory example 1
ª1 1 º ª y1 º y3 [1, 3] « » « » = 0. ¬1 2 ¼ ¬ y2 ¼ For instance, if we introduce the coordinates y1 = u , y2 = v, the other coordinate y3 of the range space R(A) Y = R 3 amounts to ª 2 1º ªu º ªu º y3 = [1, 3] « » « v » = [ 1, 2] « v » 1 1 ¬ ¼¬ ¼ ¬ ¼ y3 = u + 2v. In geometric language the linear space R(A) is a parameterized plane 2 P 0 through the origin illustrated by Figure 3.1. The observation space Y = R n (here n = 3) is sliced by the subspace, the linear space (linear manifold) R(A), dim R ( A) = rk( A) = r , namely a straight line, a plane (here), a higher dimensional plane through the origin O.
106
3 The second problem of algebraic regression
y 2 0
e3
ec1 e2 e1
Figure 3.1: Range R(f), range space R(A), y R(A), observation space Y = R 3 , slice by R ( A) = P02 R 3 , y = e1u + e 2 v+ e3 (u + 2 v) R ( A) Box 3.5: Algorithm Diagnostic algorithm for solving an overdetermined system of linear equations y = Ax + i, rk A = dim X , y R ( A) by means of rank partitioning Determine the rank of the matrix A rk A = dimX = m
107
3-1 Introduction
Compute the “vertical rank partitioning” ªA º A = « 1 » , A1 R r × r = R m × n , A 2 R ( n r )× r = R ( n m )× m ¬A2 ¼ “n – r = n – m = ds is called left complementary index” “A as a linear operator is not surjective, but injective”
Compute the range space R(A) R ( A) := {y R n | y 2 A 2 A11 y1 = 0}
Compute the inconsistency vector of type LESS i l = A( AcA) 1 y + y test : A ci l = 0
Compute the unknown parameter vector of type LESS xl = ( A cA) 1 Acy .
h What is the geometric interpretation of the least-squares solution & i &2I = min ? With reference to Figure 3.2 we additively decompose the observation vector accordingly to y = y R(A) + y R(A) , A
where y R ( A ) R ( A) is an element of the range space R ( A) , but the inconsistency vector i l = i R ( A ) R ( A) A an element of its orthogonal complement, the normal space R ( A) A . Here R ( A) is the central plane P02 , y R ( A ) P02 , but A
108
3 The second problem of algebraic regression
R ( A) A the straight line L1 , i l R ( A) A . & i &2I = & y y R ( A ) &2 = min can be understood as the minimum distance mapping of the observation point y Y onto the range space R ( A) . Such a mapping is minimal, if and only if the inner product ¢ y R ( A ) | i R ( A ) ² = 0 approaches zero, we say A
A
" y R ( A ) and i R ( A ) are orthogonal". A
The solution point y R ( A ) is the orthogonal projection of the observation point y Y onto the range space R ( A), an m-dimensional linear manifold, also called a Grassmann manifold G n , m .
Figure 3.2: Orthogonal projection of the observation vector an y Y onto the range space R ( A), R ( A) := {y R n | y 2 A 2 A11 y1 = 0} , i l R ( A) A , here: y R ( A ) P02 (central plane), y L1 (straight line ), representation of y R ( A ) (LESS) : y = e1u + e 2 v + e3 (u + 2v) R 3 , R ( A) = span{eu , e v } ª eu := Du y R ( A ) / & Du y R ( A ) &= (e1 e3 ) / 2 « Dv y R ( A ) < Dv y R ( A ) | eu > eu « Gram - Schmidt : «ev := = (e1 + e 2 + e3 ) / 3 & Dv y R ( A ) < Dv y R ( A ) | eu > eu & « « < eu | ev > = 0, Dv y R ( A ) = e 2 + 2e3 ¬ As an “intermezzo” let us consider for a moment the nonlinear model by means of the nonlinear mapping " X x 6 f (x) = y R ( A ) , y Y ".
109
3-1 Introduction
In general, the observation space Y as well as the parameter space X may be considered as differentiable manifolds, for instance “curved surfaces”. The range R(f) may be interpreted as the differentiable manifolds. X embedded or more generally immersed, in the observation space Y = R n , for instance: X Y. The parameters [ x1 ,… , xm ] constitute a chart of the differentiable manifolds X = M m M n = Y. Let us assume that a point p R ( f ) is given and we are going to attach the tangent space Tp M m locally. Such a tangent space Tp M m at p R ( f ) may be constructed by means of the Jacobi map, parameterized by the Jacobi matrix J, rk J = m, a standard procedure in Differential Geometry. An observation point y Y = R n is orthogonally projected onto the tangent space Tp M m at p R ( f ) , namely by LESS as a minimum distance mapping. In a second step – in common use is the equidistant mapping – we bring the point q Tp M m which is located in the tangent space Tp M m at p R ( f ) back to the differentiable manifold, namely y R ( f ). The inverse map " R ( f ) y 6 g ( y ) = xl X " maps the point y R ( f ) to the point xl of the chosen chart of the parameter space X as a differentiable manifold. Examples follow lateron. Let us continue with the geometric interpretation of the linear model of this paragraph. The range space R(A), dim R ( A) = rk( A) = m is a linear space of dimension m, here m = rk A, which slices R n . In contrast, the subspace R ( A) A corresponds to a n rk A = d s dimensional linear space Ln r , here n - rk A = n – m, r = rk A= m. Let the algebraic partitioning and the geometric partitioning be merged to interpret the least squares solution of the inconsistent system of linear equations as a generalized inverse (g-inverse) of type LESS. As a summary of such a merger we take reference to Box 3.6. The first condition: AA A = A Let us depart from LESS of y = Ax + i, namely xl = A l y = ( AcA) 1 Acy, i l = (I AA l ) y = [I A( AcA) 1 Ac]y. º Axl = AA l y = AA l ( Axl + i l ) » 1 1 A ci l = Ac[I A( AcA) Ac]y = 0 A l i l = ( A cA) Aci l = 0 ¼ Axl = AA l Axl AA A = A . The second condition A AA = A
110
3 The second problem of algebraic regression
xl = ( A cA) 1 A cy = A l y = A l ( Axl + i l ) º » A l i l = 0 ¼ xl = A l y = A l AA l y
A l y = A l AA l y A AA = A . rk A l = rk A is interpreted as following: the g-inverse of type LESS is the generalized inverse of maximal rank since in general rk A d rk A holds. The third condition AA = PR ( A )
y = Axl + i l = AA l + (I AA l )y y = Axl + i l = A( A cA) 1 A cy + [I A( A cA) 1 A c]y º » y = y R(A) + i R(A) »¼ A
A A = PR ( A ) , (I AA ) = PR ( A
A
)
.
Obviously AA l is an orthogonal projection onto R ( A) , but I AA l onto its orthogonal complement R ( A) A . Box 3.6: The three condition of the generalized inverse mapping (generalized inverse matrix) LESS type Condition #1 f (x) = f ( g (y )) f = f DgD f Condition #2 (reflexive g-inverse mapping)
Condition #1 Ax = AA Ax AA A = A Condition #2 (reflexive g-inverse)
x = g (y ) =
x = A y = A AA y
= g ( f (x))
A AA = A
Condition #3
Condition #3
f ( g (y )) = y R ( A )
A Ay = y R (A)
f D g = projR (f)
A A = PR (A) .
3-2 The least squares solution: “LESS”
111
The set-theoretical partitioning, the fibering of the set system of points which constitute the observation space Y, the range R(f), will be finally outlined. Since the set system Y (the observation space) is R n , the fibering is called “trivial”. Non-trivial fibering is reserved for nonlinear models in which case we are dealing with a observation space as well as an range space which is a differentiable manifold. Here the fibering Y = R( f ) R( f )A produces the trivial fibers R ( f ) and R ( f ) A where the trivial fibers R ( f ) A is the quotient set R n /R ( f ) . By means of a Venn diagram (John Venn 1834-1928) also called Euler circles (Leonhard Euler 1707-1783) Figure 3.3 illustrates the trivial fibers of the set system Y = R n generated by R ( f ) and R ( f ) A . The set system of points which constitute the parameter space X is not subject to fibering since all points of the set system R(f) are mapped into the domain D(f).
Figure 3.3: Venn diagram, trivial fibering of the observation space Y, trivial fibers R ( f ) and R ( f ) A , f : R m = X o Y = R ( f ) R ( f ) A , X set system of the parameter space, Y set system of the observation space.
3-2 The least squares solution: “LESS” The system of inconsistent linear equations Ax + i = y subject to A R n×m , rk A = m < n , allows certain solutions which we introduce by means of Definition 3.1 as a solution of a certain optimization problem. Lemma 3.2 contains the normal equations of the optimization problem. The solution of such a system of normal equations is presented in Lemma 3.3 as the least squares solution with respect to the G y - norm . Alternatively Lemma 3.4 shows the least squares solution generated by a constrained Lagrangean. Its normal equations are solved for (i) the Lagrange multiplier, (ii) the unknown vector of inconsistencies by Lemma 3.5. The unconstrained Lagrangean where the system of linear equations has been implemented as well as the constrained Lagrangean lead to the identical solution for (i) the vector of inconsistencies and (ii) the vector of unknown parameters. Finally we discuss the metric of the observation space and alternative choices of its metric before we identify the solution of the quadratic optimization problem by Lemma 3.7 in terms of the (1, 2, 3)-generalized inverse.
112
3 The second problem of algebraic regression
Definition 3.1 ( least squares solution w.r.t. the G y -seminorm): A vector xl X = R m is called G y - LESS (LEast Squares Solution with respect to the G y -seminorm) of the inconsistent system of linear equations ª rk A = dim X = m Ax + i = y , y Y { R , «« or «¬ y R ( A) n
(3.1)
(the system of inverse linear equations A y = x, rk A = dim X = m or x R ( A ) , is consistent) if in comparison to all other vectors x X { R m the inequality & y Axl &G2 = (y Axl )cG y (y Axl ) d y
d (y Ax)cG y (y Ax) = & y Ax &G2
(3.2)
y
holds, in particular if the vector of inconsistency i l := y Axl has the least G y -seminorm. The solution of type G y -LESS can be computed as following Lemma 3.2 (least squares solution with respect to the G y -seminorm) : A vector xl X { R m is G y -LESS of (3.1) if and only if the system of normal equations A cG y Axl = AcG y y
(3.3)
is fulfilled. xl always exists and is in particular unique, if A cG y A is regular. : Proof : G y -LESS is constructed by means of the Lagrangean b ± b 2 4ac 2a = xcA cG y Ax 2y cG y Ax + y cG y y = min
L(x) := & i &2G = & y Ax &2G = y
y
x
such that the first derivatives w i cG y i wL (xl ) = (xl ) = 2 A cG y ( Axl y ) = 0 wx wx constitute the necessary conditions. The theory of vector derivative is presented in Appendix B. The second derivatives
3-2 The least squares solution: “LESS”
113
w 2 i cG y i w2L (xl ) = (xl ) = 2 A cG y A t 0 wx wxc wx wxc due to the positive semidefiniteness of the matrix A cG y A generate the sufficiency condition for obtaining the minimum of the unconstrained Lagrangean. Because of the R ( A cG y A) = R ( A cG y ) there always exists a solution xl whose uniqueness is guaranteed by means of the regularity of the matrix A cG y A .
ƅ
It is obvious that the matrix A cG y A is in particular regular, if rk A = dim X = m , but on the other side the matrix G y is positive definite, namely & i &2G is a G y norm. The linear form xl = Ly which for arbitrary observation vectors y Y { R n leads to G y -LESS of (3.1) can be represented as following. y
Lemma 3.3 (least squares solution with respect to the G y - norm, rk A = dim X = m or ( x R ( A ) ): xl = Ly is G y -LESS of the inconsistent system of linear equations (3.1) Ax + i = y , restricted to rk ( A cG y A) = rk A = dim X (or R ( A cG y ) = R ( A c) and x R ( A ) ) if and only if L R m × n is represented by Case (i) : G y = I Lˆ = A L = ( AcA) 1 Ac
(left inverse)
(3.4)
xl = A L y = ( A cA) 1 Acy. y = yl + il
(3.5) (3.6)
is an orthogonal decomposition of the observation vector y Y { R n into the I -LESS vector y l Y = R n and the I LESS vector of inconsistency i l Y = R n subject to (3.7) y l = Axl = A( A cA) 1 A cy i l = y y l =[I n A( A cA) 1 Ac] y.
(3.8)
Due to y l = A( AcA) 1 Acy , I-LESS has the reproducing property. As projection matrices A( A cA) 1 A c and [I n A( AcA) 1 Ac] are independent. The “goodness of fit” of I-LESS is & y Axl &2I =& i l &2I = y c[I n A( A cA) 1 A c]y .
(3.9)
Case (ii) : G y positive definite, rk ( A cG y A) = rk A Lˆ = ( A cG y A ) 1 A cG y (weighted left inverse)
(3.10)
xl = ( A cG y A) AcG y y.
(3.11)
1
114
3 The second problem of algebraic regression
(3.12) y = y l + il is an orthogonal decomposition of the observation vector y Y { R n into the G y -LESS vector y l Y = R n and the G y LESS vector of inconsistency i l Y = R n subject to y l = Axl = A( A cG y A) 1 AcG y y , (3.13) i l = y Axl =[I n A( AcG y A) 1 AcG y ] y .
(3.14)
Due to y l = A( A cG y A) 1 A cG y y G y -LESS has the reproducing property. As projection matrices A( A cG y A) 1 A cG y and [I n A ( A cG y A ) 1 A cG y ] are independent. The “goodness of fit” of G y -LESS is & y Axl &2G =& i l &2G = y c[I n A ( A cG y A ) 1 A cG y ]y . y
y
(3.15)
The third case G y positive semidefinite will be treated independently. The proof of Lemma 3.1 is straightforward. The result that LESS generates the left inverse, G y -LESS the weighted left inverse will be proved later. An alternative way of producing the least squares solution with respect to the G y - seminorm of the linear model is based upon the constrained Lagrangean (3.16), namely L(i, x, Ȝ ) . Indeed L(i, x, Ȝ ) integrates the linear model (3.1) by a vector valued Lagrange multiplyer to the objective function of type “least squares”, namely the distance function in a finite dimensional Hilbert space. Such an approach will be useful when we apply “total least squares” to the mixed linear model (error-in-variable model). Lemma 3.4 (least squares solution with respect to the G y - norm, rk A = dim X , constrained Lagrangean): G y -LESS is assumed to be defined with respect to the constrained Lagrangean L(i, x, Ȝ ) := i cG y i + 2Ȝ c( Ax + i y ) = min . i , x, Ȝ
(3.16)
A vector [i cl , xcl , Ȝ cl ]c R ( n + m + n )×1 is G y -LESS of (3.1) in the sense of the constrained Lagrangean L(i, x, Ȝ ) = min if and only if the system of normal equations ªG y 0 I n º ª i l º ª 0 º « 0 0 A c» « x » = « 0 » (3.17) « »« l» « » «¬ I n A 0 »¼ «¬ Ȝ l »¼ «¬ y »¼ with the vector Ȝ l R n×1 of “Lagrange multiplyer” is fulfilled. (i l , xl , Ȝ l ) exists and is in particular unique, if G y is positive semidefinite. There holds (i l , xl , Ȝ l ) = arg{L(i, x, Ȝ ) = min} .
(3.18)
3-2 The least squares solution: “LESS”
115
: Proof : G y -LESS is based on the constrained Lagrangean L(i, x, Ȝ ) := i cG y i + 2Ȝ c( $x + i y ) = min i , x, Ȝ
such that the first derivatives wL (i l , xl , Ȝ l ) = 2(G y i l + Ȝ l ) = 0 wi wL (i l , xl , Ȝ l ) = 2$ cȜ l = 0 wx wL (i l , xl , Ȝ l ) = 2( $xl + i l y ) = 0 wȜ or ªG y « 0 « «¬ I n
0 0
I n º ª il º ª 0 º A c»» «« xl »» = «« 0 »» A 0 »¼ «¬ Ȝ l »¼ «¬ y »¼
constitute the necessary conditions. (The theory of vector derivative is presented in Appendix B.) The second derivatives 1 w2L ( xl ) = G y t 0 2 w i w ic due to the positive semidefiniteness of the matrix G y generate the sufficiency condition for obtaining the minimum of the constrained Lagrangean.
ƅ Lemma 3.5 (least squares solution with respect to the G y - norm, rk A = dim X , constrained Lagrangean): If G y -LESS of the linear equations (3.1) is generated by the constrained Lagrangean (3.16) with respect to a positive definite weight matrix G y , rk G y = n, then the normal equations (3.17) are uniquely solved by xl = ( AcG y A) 1 AcG y y,
(3.19)
i l =[I n A( A cG y A) 1 A cG y ] y,
(3.20)
Ȝ l =[G y A( A cG y A) 1 A c I n ] G y y.
(3.21)
:Proof : A basis of the proof could be C. R. Rao´s Pandora Box, the theory of inverse partitioned matrices (Appendix A: Fact: Inverse Partitioned Matrix /IPM/ of a
116
3 The second problem of algebraic regression
symmetric matrix). Due to the rank identities rk G y = n, rk A = rk ( A cG y A) = m < n, the normal equations can be solved faster directly by Gauss elimination. G y il + Ȝ l = 0 A cȜ l = 0 Axl + i l y = 0. Multiply the third normal equation by A cG y , multiply the first normal equation by Ac and substitute A c Ȝ l from the second normal equation in the modified first one. A cG y Axl + AcG y i l A cG y y = 0 º » A cG y i l + A cȜ l = 0 » »¼ A cȜ l = 0
A cG y Axl + AcG y i l A cG y y = 0 º » A cG y i l = 0 ¼ A cG y Axl A cG y y = 0,
xl = ( A cG y A) 1 AcG y y. Let us subtract the third normal equation and solve for i l . i l = y Axl , i l =[I n A( AcG y A) 1 AcG y ] y. Finally we determine the Lagrange multiplier: substitute i l in the first normal equation in order to find Ȝ l = G y i l Ȝ l =[G y A( AcG y A) 1 A cG y G y ] y.
ƅ Of course the G y -LESS of type (3.2) and the G y -LESS solution of type constrained Lagrangean (3.16) are equivalent, namely (3.11) ~ (3.19) and (3.14) ~ (3.20). In order to analyze the finite dimensional linear space Y called “the observation space”, namely the case of a singular matrix of its metric, in more detail, let us take reference to the following.
3-2 The least squares solution: “LESS”
117
Theorem 3.6 (bilinear form) : Suppose that the bracket i i or g (i,i) : Y × Y o \ is a bilinear form or a finite dimensional linear space Y , dim Y = n , for instance a vector space over the field of real numbers. There exists a basis {e1 ,..., en } such that ei e j = 0 or g (ei , e j ) = 0 for i z j
(i)
(
)
ei ei = +1 or g ei , ei = +1 for 1 d i1 d p ° ° ® ei ei = 1 or g ei , ei = 1 for p + 1 d i2 d p + q = r ° ei ei = 0 or g ei , ei = 0 for r + 1 d i3 d n . °¯ 1
(ii)
2
1
(
2
3
3
2
1
)
2
(
1
3
3
)
The numbers r and p are determined exclusively by the bilinear form. r is called the rank, r p = q is called the relative index and the ordered pair (p,q) the signature. The theorem states that any two spaces of the same dimension with bilinear forms of the same signature are isometrically isomorphic. A scalar product (“inner product”) in this context is a nondegenerate bilinear form, for instance a form with rank equal to the dimension of Y . When dealing with low dimensional spaces as we do, we will often indicate the signature with a series of plus and minus signs when appropriate. For instance the signature of \ 14 may be written (+ + + ) instead of (3,1). Such an observation space Y is met when we are dealing with observations in Special Relativity. For instance, let us summarize the peculiar LESS features if the matrix G y \ n×n of the observation space is semidefinite, rk G y := ry < n . By means of Box 3.7 we have collected the essential items of the eigenspace analysis as well as the eigenspace synthesis G *y versus G y of the metric. ȁ y = = Diag(O1 ,..., Or ) denotes the matrix of non-vanishing eigenvalues {O1 ,..., Or } . Note the norm identity y
y
|| i ||G2 = || i ||2U ȁ U c , y
1
y
(3.22)
1
which leads to the U1 ȁ y U1c -LESS normal equations A cU1 ȁ y U1c x A = A cU1 ȁ y U1c y. Box 3.7: Canonical representation of the rank deficient matrix of the matrix of the observation space Y rk G y =: ry , ȁ y := Diag(O1 ,..., Or ) . y
(3.23)
118
3 The second problem of algebraic regression
“eigenspace analysis”
“eigenspace synthesis”
ªU c º (3.24) G *y = « 1 » G y [ U1 , U 2 ] = «U c » ¬ 2¼ ªȁ =« y ¬ 02
ªȁ G y = [ U1 , U 2 ] « y ¬ 02
01 º ª U1c º « » (3.25) 03 »¼ « U c » ¬ 2¼
\ n× n
01 º \ n× n 03 »¼ subject to
{
}
U SO(n(n 1) / 2) := U \ n× n | UcU = I n , U = +1 U1 \ 01 \
n× ry
ry × n
, U2 \
, 02 \
n×( n ry )
( n ry )× ry
, ȁy \
, 03 \
ry ×ry
( n ry )×( n ry )
“norms” (3.26)
|| i ||G2 = || i ||2U ȁ U c y
1
y
i cG y i = i cU1 ȁ y U1ci
~
1
(3.27)
LESS: || i ||G2 = min || i ||2U ȁ U c = min y
x
1
y
1
x
A cU1 ȁ y U1c xA = A cU1 ȁ y U1c y . Another example relates to an observation space Y = \ 12 k
( k {1,..., K })
of even dimension, but one negative eigenvalue. In such a pseudo-Euclidean space of signature (+ + ) the determinant of the matrix of metric G y is negative, namely det G y = O1 ...O2 K 1 O2 K . Accordingly x max = arg{|| i ||G2 = max | y = Ax + i, rk A = m} y
is G y -MORE (Maximal ObseRvational inconsistEncy solution), but not G y LESS. Indeed, the structure of the observational space, either pseudo-Euclidean or Euclidean, decides upon MORE or LESS. 3-21 A discussion of the metric of the parameter space X With the completion of the proof we have to discuss the basic results of Lemma 3.3 in more detail. At first we have to observe that the matrix G y of the met-
3-2 The least squares solution: “LESS”
119
ric of the observation space Y has to be given a priori. We classified LESS according to (i) G y = I n , (ii) G y positive definite and (iii) G y positive semidefinite. But how do we know the metric of the observation space Y? Obviously we need prior information about the geometry of the observation space Y, namely from the empirical sciences like physics, chemistry, biology, geosciences, social sciences. If the observation space Y R n is equipped with an inner product ¢ y1 | y 2 ² = y1cG y y 2 , y1 Y, y 2 Y where the matrix G y of the metric & y &2 = y cG y y is positive definite, we refer to the metric space Y R n as Euclidean E n . In contrast, if the observation space is positive semidefinite we call the observation space semi Euclidean E n , n . n1 is the number of positive eigenvalues, n2 the number of zero eigenvalues of the positive semidefinite matrix G y of the metric (n = n1 + n2 ). In various applications, namely in the adjustment of observations which refer to Special Relativity or General Relativity we have to generalize the metric structure of the observation space Y: If the matrix G y of the pseudometric & y &2 = y cG y y is built on n1 positive eigenvalues (signature +), n2 zero eigenvalues and n3 negative eigenvalues (signature -), we call the pseudometric parameter space pseudo Euclidean E n , n , n , n = n1 + n2 + n3 . For such an observation space LESS has to be generalized to & y Ax &2G = extr , for instance "maximum norm solution" . 1
2
1
2
3
y
3-22 Alternative choices of the metric of the observation space Y Another problem associated with the observation space Y is the norm choice problem. Up to now we have used the A 2 -norm, for instance A 2 -norm: & y Ax & 2 := ( y Ax)( y Ax) = i c i = = i12 + i22 + " + in21 + in2 , A p -norm: & y Ax & p :=
p
p
p
p
p
i1 + i2 + " + in 1 + in ,
1< p < f A f -norm: & i & f := max | ii | 1di d n
are alternative norms of choice. Beside the choice of the matrix G y of the metric within the weighted A 2 -norm we like to discuss the result of the LESS matrix G l of the metric. Indeed we have constructed LESS from an a priori choice of the metric G called G y and were led to the a posteriori choice of the metric G l of type (3.9) and (3.15). The matrices (i) G l = I n A( A cA) 1 Ac (ii) G l = I n A( A cG y A) A cG y 1
are (i) idempotent and (ii) G y1 idempotent, in addition.
(3.9) (3.15)
120
3 The second problem of algebraic regression
There are various alternative scales or objective functions for projection matrices for substituting Euclidean metrics termed robustifying. In special cases those objective functions operate on (3.11) xl = Hy subject to H x = ( AcG y A) 1 AG y , (3.13) y A = H y y subject to H y = A( A cG y A) 1 AG y , (3.14) i A = H A y subject to H A = ª¬I n A( A cG y A) 1 AG y º¼ y , where {H x , H y , H A } are called “hat matrices”. In other cases analysts have to accept that the observation space is non-Euclidean. For instance, direction observations in R p locate points on the hypersphere S p 1 . Accordingly we have to accept an objective function of von Mises-Fisher type which measures the spherical distance along a great circle between the measurement points on S p 1 and the mean direction. Such an alternative choice of a metric of a non- Euclidean space Y will be presented in chapter 7. Here we discuss in some detail alternative objective functions, namely
• • •
optimal choice of the weight matrix G y : second order design SOD optimal choice of the weight matrix G y by means of condition equations robustifying objective functions
3-221 Optimal choice of weight matrix: SOD The optimal choice of the weight matrix , also called second order design (SOD), is a traditional topic in the design of geodetic networks. Let us refer to the review papers by A. A. Seemkooei (2001), W. Baarda (1968, 1973), P. Cross (1985), P. Cross and K. Thapa (1979), E. Grafarend (1970, 1972, 1974, 1975), E. Grafarend and B. Schaffrin (1979), B. Schaffrin (1981, 1983, 1985), F. Krumm (1985), S. L. Kuang (1991), P. Vanicek, K. Thapa and D. Schröder (1981), B. Schaffrin, E. Grafarend and G. Schmitt (1977), B. Schaffrin, F. Krumm and D. Fritsch (1980), J. van Mierlo (1981), G. Schmitt (1980, 1985), C. C. Wang (1970), P. Whittle (1954, 1963), H. Wimmer (1982) and the textbooks by E. Grafarend, H. Heister, R. Kelm, H. Knopff and B. Schaffrin (1979) and E. Grafarend and F. Sanso (1985, editors). What is an optimal choice of the weight matrix G y , what is “a second order design problem”? Let us begin with Fisher’s Information Matrix which agrees to the half of the Hesse matrix, the matrix of second derivatives of the Lagrangean L(x):=|| i ||G2 = || y Ax ||G2 , namely y
y
3-2 The least squares solution: “LESS” G x = A c(x)G y A(x) =
121 1 2
w2L =: FISHER wx A wxcA
at the “point“ x A of type LESS. The first order design problem aims at determining those points x within the Jacobi matrix A by means of a properly chosen risk operating on “FISHER”. Here, “FISHER” relates the weight matrix of the observations G y , previously called the matrix of the metric of the observation space, to the weight matrix G x of the unknown parameters, previously called the matrix of the metric of the parameter space. Gx
Gy
weight matrix of
weight matrix of
the unknown parameters
the observations
or
or
matrix of the metric of the parameter space X
matrix of the metric of the observation space Y .
Being properly prepared, we are able to outline the optimal choice of the weight matrix G y or X , also called the second order design problem, from a criterion matrix Y , an ideal weight matrix G x (ideal) of the unknown parameters, We hope that the translation of G x and G y “from metric to weight” does not cause any confusion. Box 3.8 elegantly outlines SOD. Box 3.8: Second order design SOD, optimal fit to a criterion matrix of weights “weight matrix of the parameter space“ (3.28)
Y :=
1 2
w2L wx A wx Ac
3-21 “weight matrix of the observation space”
(
)
X := G y = Diag g1y ,..., g ny (3.29)
= Gx
x := ª¬ g1y ,..., g ny º¼c
“inconsistent matrix equation of the second order design problem“ A cXA + A = Y
(3.30)
“optimal fit” || ǻ ||2 = tr ǻcǻ = (vec ǻ)c(vec ǻ) = min
(3.31)
x S := arg{|| ǻ ||2 = min | A cXA + ǻ = Y, X = Diag x}
(3.32)
X
122
3 The second problem of algebraic regression
vec ǻ = = vec Y vec( A cXA) = vec Y ( A c
A c) vec X
(3.33)
vec ǻ = vec Y ( A c
A c)x
(3.34)
x \ n , vec Y \ n ×1 , vech Y \ n ( n +1) / 2×1 2
vec ǻ \ n ×1 , vec X \ n ×1 , ( A c
A c) \ n ×n , A c : A c \ n ×n 2
2
2
2
x S = [ ( A c : A c)c( A c : A c) ] ( Ac : Ac) vec Y . 1
2
(3.35)
In general, the matrix equation A cXA + ǻ = Y is inconsistent. Such a matrix inconsistency we have called ǻ \ m × m : For a given ideal weight matrix G x (ideal ) , A cG y A is only an approximation. The unknown weight matrix of the observations G y , here called X \ n× n , can only be designed in its diagonal form. A general weight matrix G y does not make any sense since “oblique weights” cannot be associated to experiments. A natural restriction is therefore X = Diag g1y ,..., g ny . The “diagonal weights” are collected in the unknown vector of weights
(
)
x := ª¬ g1y ,..., g ny º¼c \ n . The optimal fit “ A cXA to Y “ is achieved by the Lagrangean || ǻ ||2 = min , the optimum of the Frobenius norm of the inconsistency matrix ǻ . The vectorized form of the inconsistency matrix, vec ǻ , leads us first to the matrix ( A c
A c) , the Zehfuss product of Ac , second to the Kronecker matrix ( A c : A c) , the Khatri- Rao product of Ac , as soon as we implement the diagonal matrix X . For a definition of the Kronecker- Zehfuss product as well as of the Khatri- Rao product and related laws we refer to Appendix A. The unknown weight vector x is LESS, if x S = [ ( A c : A c)c( A c : A c) ] ( A c : Ac)c vec Y . 1
Unfortunately, the weights x S may come out negative. Accordingly we have to build in extra condition, X = Diag( x1 ,..., xm ) to be positive definite. The given references address this problem as well as the datum problem inherent in G x (ideal ) . Example 3.2 (Second order design):
3-2 The least squares solution: “LESS”
123 PȖ
y3 = 6.94 km
Pį y1 = 13.58 km
y2 = 9.15 km
PĮ
Pȕ
Figure 3.4: Directed graph of a trilateration network, known points {PD , PE , PJ } , unknown point PG , distance observations [ y1 , y2 , y3 ]c Y The introductory example we outline here may serve as a firsthand insight into the observational weight design, also known as second order design. According to Figure 3.4 we present you with the graph of a two-dimensional planar network. From three given points {PD , PE , PJ } we measure distances to the unknown point PG , a typical problem in densifying a geodetic network. For the weight matrix G x Y of the unknown point we postulate I 2 , unity. In contrast, we aim at an observational weight design characterized by a weight matrix G x X = Diag( x1 , x2 , x3 ) . The second order design equation A c Diag( x1 , x2 , x3 ) A + ǻ = I 2 is supposed to supply us with a circular weight matrix G y of the Cartesian coordinates ( xG , yG ) of PG . The observational equations for distances ( sDG , sEG , sJG ) = (13.58 km, 9.15 km, 6.94 km) have already been derived in chapter 1-4. Here we just take advantage of the first design matrix A as given in Box 3.9 together with all further matrix operations. A peculiar situation for the matrix equation A cXA + ǻ = I 2 is met: In the special configuration of the trilateration network the characteristic equation of the second order design problem is consistent. Accordingly we have no problem to get the weights
124
3 The second problem of algebraic regression
0 0 º ª0.511 « 0.974 0 »» , Gy = « 0 «¬ 0 0 0.515»¼ which lead us to the weight G x = I 2 a posteriori. Note that the weights came out positive. Box 3.9: Example for a second order design problem, trilateration network ª 0.454 0.891º A = «« 0.809 +0.588»» , X = Diag( x1 , x2 , x3 ), Y = I 2 «¬ +0.707 +0.707 »¼ A c Diag( x1 , x2 , x3 ) A = I 2 ª0.206 x1 + 0.654 x2 + 0.5 x3 « ¬ 0.404 x1 0.476 x2 + 0.5 x3
0.404 x1 0.476 x2 + 0.5 x3 º = 0.794 x1 + 0.346 x2 + 0.5 x3 »¼
ª1 0 º =« » ¬0 1 ¼ “inconsistency ǻ = 0 ” (1st) 0.206 x1 + 0.654 x2 + 0.5 x3 = 1 (2nd) 0.404 x1 0.476 x2 + 0.5 x3 = 0 (3rd) 0.794 x1 + 0.346 x2 + 0.5 x3 = 1 x1 = 0.511, x2 = 0.974, x3 = 0.515. 3-222 The Taylor Karman criterion matrix ? What is a proper choice of the ideal weight matrix G x ? There has been made a great variety of proposals. First, G x (ideal ) has been chosen simple: A weight matrix G x is called ideally simple if G x (ideal ) = I m . For such a simple weight matrix of the unknown parameters Example 3.2 is an illustration of SOD for a densification problem in a trilateration network. Second, nearly all geodetic networks have been SOD optimized by a criterion matrix G x (ideal ) which is homogeneous and isotropic in a two-dimensional or
3-2 The least squares solution: “LESS”
125
three-dimensional Euclidean space. In particular, the Taylor-Karman structure of a homogeneous and isotropic weight matrix G x (ideal ) has taken over the SOD network design. Box 3.10 summarizes the TK- G x (ideal ) of a two-dimensional, planar network. Worth to be mentioned, TK- G x (ideal ) has been developed in the Theory of Turbulence, namely in analyzing the two-point correlation function of the velocity field in a turbulent medium. (G. I. Taylor 1935, 1936, T. Karman (1937), T. Karman and L. Howarth (1936), C. C. Wang (1970), P. Whittle (1954, 1963)). Box 3.10: Taylor-Karman structure of a homogeneous and isotropic tensor- valued, two-point function, two-dimensional, planar network ª gx x « « gy x Gx = « « gx x «g «¬ y x
1 1
1 1
2 1
2 1
gx y gy y
1 1
gx x gy x
gx y gy y
gx x gy x
1 1
2 1
2 1
gx y º » gy y » » G x (xD , x E ) gx y » g y y »» ¼
1 2
1 2
1 2
1 2
2 2
2 2
2 2
2 2
PD (xD , yD )
“Euclidean distance function of points PE (x E , y E ) ”
and
sDE :=|| xD x E ||= ( xD xE ) 2 + ( yD yE ) 2 “decomposition of the tensor-valued, two-point weight function G x (xD , x E ) into the longitudinal weight function f A and the transversal weight function f m ” G x (xD , x E ) = ª¬ g j j (xD , x E ) º¼ = 1 2
ª x j ( PD ) x j ( PE ) º¼ ª¬ x j ( PD ) x j ( PE ) º¼ = f m ( sDE )G j j + ª¬ f A ( sDE ) f m ( sDE ) º¼ ¬ (3.36) s2 1
1
2
2
1 2
DE
j1 , j2 {1, 2} , ( xD , yD ) = ( x1 , y1 ), ( xE , yE ) = ( x2 , y2 ). 3-223 Optimal choice of the weight matrix: The space R ( A ) and R ( A) A In the introductory paragraph we already outlined the additive basic decomposition of the observation vector into y = y R (A) + y R
( A )A
y R ( A ) = PR ( A ) y , y R where PR( A ) and PR
( A )A
= y A + iA ,
( A )A
= PR
( A )A
y,
are projectors as well as
126
3 The second problem of algebraic regression
y A R ( A ) is an element of the range space R ( A ) , in general the tangent space Tx M of the mapping f (x)
i A R ( A ) is an element of its orthogonal complement in general the normal A space R ( A ) . A
versus
G y -orthogonality y A i A
Gy
= 0 is proven in Box 3.11.
Box 3.11 G y -Orthogonality of y A = y ( LESS ) and i A = i ( LESS ) “ G y -orthogonality” yA iA yA iA
GA
Gy
=0
(3.37)
= y c ¬ªG y A( AcG y A) 1 A c¼º G y ¬ª I n A( AcG y A) 1 A cG y ¼º y =
= y cG y A( A cG y A) 1 A cG y y cG y A ( A cG y A ) 1 A cG y A( A cG y A) 1 A cG y y = = 0. There is an alternative interpretation of the equations of G y -orthogonality i A y A G = i AcG y y A = 0 of i A and y A . First, replace iA = PR A y where PR A is ( ) ( ) a characteristic projection matrix. Second, substitute y A = Ax A where x A is G y LESS of x . As outlined in Box 3.12, G y -orthogonality i AcG y y A of the vectors i A and y A is transformed into the G y -orthogonality of the matrices A and B . The columns of the matrices A and B are G y -orthogonal. Indeed we have derived the basic equations for transforming +
y
parametric adjustment
into
y A = Ax A ,
adjustment of conditional equations BcG y y A = 0,
by means of BcG y A = 0. Box 3.12 G y -orthogonality of A and B i A R ( A) A , dim R ( A) A = n rk A = n m y A R ( A ) , dim R ( A ) = rk A = m
+
3-2 The least squares solution: “LESS” iA yA
Gy
127
= 0 ª¬I n A( AcG y A) 1 A cG y º¼c G y A = 0
rk ª¬ I n A( A cG y A) 1 A cG y º¼ = n rk A = n m
(3.38) (3.39)
“horizontal rank partioning” ª¬I n A( A cG y A) 1 A cG y º¼ = [ B, C]
(3.40)
B \ n× ( n m ) , C \ n× m , rk B = n m iA yA
Gy
= 0 BcG y A = 0 .
(3.41)
Example 3.3 finally illustrates G y -orthogonality of the matrices A und B . Example 3.3 (gravimetric leveling, G y -orthogonality of A and B ). Let us consider a triangular leveling network {PD , PE , PJ } which consists of three observations of height differences ( hDE , hEJ , hJD ) . These height differences are considered holonomic, determined from gravity potential differences, known as gravimetric leveling. Due to hDE := hE hD , hEJ := hJ hE , hJD := hD hJ the holonomity condition
³9 dh = 0
or
hDE + hEJ + hJD = 0
applies. In terms of a linear model the observational equations can accordingly be established by ª hDE º ª 1 0 º ªiDE º ª hDE º « » « » « » « hEJ » = « 0 1 » « h » + « iEJ » « hJD » «¬ 1 1»¼ ¬ EJ ¼ « iJD » ¬ ¼ ¬ ¼ ª hDE º ª1 0º ª hDE º « » y := « hEJ » , A := «« 0 1 »» , x := « » ¬ hEJ ¼ « hJD » «¬ 1 1»¼ ¬ ¼ y \ 3×1 , A \ 3× 2 , rk A = 2, x \ 2×1 . First, let us compute ( x A , y A , i A ,|| i A ||) I -LESS of ( x, y , i,|| i ||) . A. Bjerhammar’s left inverse supplies us with
128
3 The second problem of algebraic regression
ª y1 º ª 2 1 1º « » x A = A A y = ( AcA) 1 Acy = 13 « » « y2 » ¬ 1 2 1¼ « » ¬ y3 ¼ ª hDE º ª 2 y y2 y3 º xA = « » = 13 « 1 » h ¬ y1 + 2 y2 y3 ¼ ¬ EJ ¼ A ª 2 1 1º 1 1« c c y A = AxA = AA y = A( A A) A y = 3 « 1 2 1»» y «¬ 1 1 2 »¼ A
ª 2 y1 y2 y3 º y A = «« y1 + 2 y2 y3 »» «¬ y1 y2 + 2 y3 »¼ 1 3
(
)
i A = y Ax A = I n AA A y = ª¬I n A ( A cA) 1 A cº¼ y ª1 1 1º i A = ««1 1 1»» y = «¬1 1 1»¼ 1 3
1 3
ª y1 + y2 + y3 º «y + y + y » 2 3» « 1 «¬ y1 + y2 + y3 »¼
|| i A ||2 = y c(I n AA A )y = y c ª¬I n A( AcA) 1 A cº¼ y ª1 1 1º ª y1 º || i A ||2 = [ y1 , y2 , y3 ] 13 ««1 1 1»» «« y2 »» «¬1 1 1»¼ «¬ y3 »¼ || i A ||2 = 13 ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 ) . Second, we identify the orthogonality of A and B . A is given, finding B is the problem of horizontal rank partitioning of the projection matrix. ª1 1 1º G A := I n H y = I n AA = I n A ( A cA ) A c = ««1 1 1»» \ 3×3 , «¬1 1 1»¼ A
1
1 3
with special reference to the “hat matrix H y := A( A cA) 1 Ac ”. The diagonal elements of G A are of special interest for robust approximation. They amount to the uniform values hii = 13 (2, 2, 2), ( gii )A = (1 hii ) = 13 (1,1,1).
3-2 The least squares solution: “LESS”
129
Note
(
)
det G A = det I n AA A = 0, rk(I n AA A ) = n m = 1 ª1 1 1º « » G A = ª¬I 3 AA º¼ = [ B, C] = «1 1 1» «¬1 1 1»¼ A
1 3
B \ 3×1 , C \ 3× 2 . The holonomity condition hDE + hEJ + hJD = 0 is reestablished by the orthogonality of BcA = 0 . ª1 0º BcA = 0 [1,1,1] «« 0 1 »» = [ 0, 0] . «¬ 1 1»¼ 1 3
ƅ The G y -orthogonality condition of the matrices A and B has been successfully used by G. Kampmann (1992, 1994, 1997), G. Kampmann and B. Krause (1996, 1997), R. Jurisch, G. Kampmann and B. Krause (1997), R. Jurisch and G. Kampmann (1997, 1998, 2001 a, b, 2002), G. Kampmann and B. Renner (1999), R. Jurisch, G. Kampmann and J. Linke (1999 a, b, c, 2000) in order to balance the observational weights, to robustify G y -LESS and to identify outliers. The A Grassmann- Plücker coordinates which span the normal space R ( A ) will be discussed in Chapter 10 when we introduce condition equations. 3-224 Fuzzy sets While so far we have used geometry to classify the objective functions as well as the observation space Y, an alternative concept considers observations as elements of the set Y = [ y1 ," , yn ] . The elements of the set get certain attributes which make them fuzzy sets. In short, we supply some references on “fuzzy sets”, namely G. Alefeld and J. Herzberger (1983), B. F. Arnold and P. Stahlecker (1999), A. Chaturvedi and A. T. K. Wan (1999), S. M. Guu, Y. Y. Lur and C. T. Pang (2001), H. Jshibuchi, K. Nozaki and H. Tanaka (1992), H. Jshibuchi, K. Nozaki, N. Yamamoto and H. Tanaka (1995), B. Kosko (1992), H. Kutterer (1994, 1999), V. Ravi, P. J. Reddy and H. J. Zimmermann (2000), V. Ravi and H. J. Zimmermann (2000), S. Wang, T. Shi and C. Wu (2001), L. Zadch (1965), H. J. Zimmermann (1991). 3-23 G x -LESS and its generalized inverse A more formal version of the generalized inverse which is characteristic for G y LESS is presented by
130
3 The second problem of algebraic regression
Lemma 3.7 (characterization of G y -LESS): x A = Ly is I-LESS of the inconsistent system of linear equations (3.1) Ax + i = y , rk A = m , (or y R ( A) ) if and only if the matrix L \ m× n fulfils ª ALA = A « AL = ( AL)c. ¬
(3.42)
The matrix L is the unique A1,2,3 generalized inverse, also called left inverse A L . x A = Ly is G y -LESS of the inconsistent system of linear equations (3.1) Ax + i = y , rk A = m (or y R ( A) ) if and only if the matrix L fulfils ª G y ALA = G y A «G AL = (G AL)c. y ¬ y
(3.43)
The matrix L is the G y weighted A1,2,3 generalized inverse, in short A A , also called weighted left inverse. : Proof : According to the theory of the generalized inverse presented in Appendix A x A = Ly is G y -LESS of (3.1) if and only if A cG y AL = AcG y is fulfilled. Indeed A cG y AL = AcG y is equivalent to the two conditions G y ALA = G y A and G y AL = (G y AL)c . For a proof of such a statement multiply A cG y AL = AcG y left by Lc and receive LcA cG y AL = LcAcG y . The left-hand side of such a matrix identity is a symmetric matrix. In consequence, the right-hand side has to be symmetric, too. When applying the central symmetry condition to A cG y AL = AcG y
or
G y A = LcAcG y A ,
we are led to G y AL = LcA cG y AL = (G y AL)c , what had to be proven. ? How to prove uniqueness of A1,2,3 = A A ? Let us fulfil G y Ax A by G y AL1 y = G y AL1 AL1 y = L1c AcG y AL1 y = L1c AcL1c AcG y y =
3-2 The least squares solution: “LESS”
131
= L1c A cLc2 A cG y y = L1c A cG y L 2 y = G y AL1 AL 2 y = G y AL 2 y , in particular by two arbitrary matrices L1 and L 2 , respectively, which fulfil (i) G y ALA = G y A as well as (ii) G y AL = (G y AL)c . Indeed we have derived one result irrespective of L1 or L 2 .
ƅ If the matrix of the metric G y of the observation space is positive definite, we can prove the following duality Theorem 3.8 (duality): Let the matrix of the metric G x of the observation space be positive definite. Then x A = Ly is G y -LESS of the linear model (3.1) for any observation vector y \ n , if x ~m = Lcy ~ is G y1 -MINOS of the linear model y ~ = A cx ~ for all m × 1 columns y ~ R ( A c) . : Proof : If G y is positive definite, there exists the inverse matrix G y1 . (3.43) can be transformed into the equivalent condition A c = A cLcA
and
G y1LcAc = (G y1LcAc)c ,
which is equivalent to (1.33). 3-24 Eigenvalue decomposition of G y -LESS: canonical LESS For the system analysis of an inverse problem the eigenspace analysis and eigenspace synthesis of x A G y -LESS of x is very useful and gives some peculiar insight into a dynamical system. Accordingly we are confronted with the problem to construct “canonical LESS”, also called the eigenvalue decomposition of G y -LESS. First, we refer to the canonical representation of the parameter space X as well as the observation space introduced to you in the first Chapter, Box 1.8 and Box 1.9. But here we add by means of Box 3.13 the comparison of the general bases versus the orthonormal bases spanning the parameter space X as well as the observation space Y . In addition, we refer to Definition 1.5 and Lemma 1.6 where the adjoint operator A # has been introduced and represented. Box 3.13: General bases versus orthonormal bases spanning the parameter space X as well as the observation space Y
132
3 The second problem of algebraic regression
“left”
“right”
“parameter space”
“observation space”
“general left base”
“general right base”
span{a1 ,..., am } = X
Y = span{b1 ,..., bn }
: matrix of the metric :
: matrix of the metric : bbc = G y
aac = G x
(3.44)
(3.45)
“orthonormal left base”
“orthonormal right base”
span{e1x ,..., emx } = X
Y = span{e1y ,..., eny }
: matrix of the metric :
: matrix of the metric :
e x ecx = I m
e y ecy = I n
“base transformation”
“base transformation”
a = ȁ x 9e x
1 2
b = ȁ y Ue y
versus
versus
(3.46)
(3.48)
(3.47)
1 2
- 12
(3.49)
- 12
e x = V cȁ x a
e y = Ucȁ y b
span{e1x ,..., e xm } = X
Y = span{e1y ,..., e yn } .
(3.50)
(3.51)
Second, we are going to solve the overdetermined system of {y = Ax | A \ n× m , rk A = n, n > m} by introducing
• •
the eigenspace of the rectangular matrix A \ n× m of rank r := rk A = m , n > m : A 6 A* the left and right canonical coordinates: x o x* , y o y *
as supported by Box 3.14. The transformations x 6 x* (3.52), y 6 y * (3.53) from the original coordinates ( x1 ,..., xm ) to the canonical coordinates ( x1* ,..., xm* ) , the left star coordinates, as well as from the original coordinates ( y1 ,..., yn ) to the canonical coordinates ( y1* ,..., yn* ) , the right star coordinates, are polar decompositions: a rotation {U, V} is followed by a general stretch {G y , G x } . Those root matrices are generated by product decompositions of type G y = (G y )cG y as well as G x = (G x )cG x . Let us substitute the inverse transformations (3.54) x* 6 x = G x Vx* and (3.55) y * 6 y = G y Uy * into the system of linear equa1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
3-2 The least squares solution: “LESS”
133
tions (3.1) y = Ax + i or its dual (3.57) y * = A* x* + i* . Such an operation leads us to (3.58) y * = f x* as well as (3.59) y = f ( x ) . Subject to the orthonormality conditions (3.60) U cU = I n and (3.61) V cV = I m we have generated the left– right eigenspace analysis (3.62)
( )
ªȁº ȁ* = « » ¬0¼ subject to the horizontal rank partitioning of the matrix U = [ U1 , U 2 ] . Alternatively, the left–right eigenspace synthesis (3.63) ªȁº A = G y [ U1 , U 2 ] « » V cG x ¬0¼ 1 2
1 2
- 12
is based upon the left matrix (3.64) L := G y U and the right matrix (3.65) R := G x V . Indeed the left matrix L by means of (3.66) LLc = G -1y reconstructs the inverse matrix of the metric of the observation space Y . Similarly, the right matrix R by means of (3.67) RR c = G -1x generates the inverse matrix of the metric of the parameter space X . In terms of “ L , R ” we have summarized the eigenvalue decompositions (3.68)-(3.73). Such an eigenvalue decomposition helps us to canonically invert y * = A* x* + i* by means of (3.74), (3.75), namely the rank partitioning of the canonical observation vector y * into y1* \ r×1 and y *2 \ ( n r )×1 to determine x*A = ȁ -1 y1* leaving y *2 “unrecognized”. Next we shall proof i1* = 0 if i1* is LESS. 1 2
Box 3.14: Canonical representation, overdetermined system of linear equations “parameter space X ” (3.52) x* = V cG x x
“observation space Y ” y * = U cG y y (3.53)
versus
1 2
1 2
and - 12 x
and
x = G Vx
(3.54)
- 12
y = G y Uy *
*
(3.55)
“overdetermined system of linear equations” {y = Ax + i | A \ n× m , rk A = m, n > m} y = Ax + i
(3.56) - 12
- 12
- 12
G y Uy * = AG x Vx* + G y Ui*
(
1 2
y * = A * x* + i *
versus
- 12
)
(3.58) y * = UcG y AG x V x* + i*
1 2
1 2
(3.57) 1 2
U cG y y = A* V cG x x + U cG y i
(
1 2
1 2
)
y = G y UA* V cG x x + i (3.59)
134
3 The second problem of algebraic regression
subject to
subject to
U cU = UUc = I n
(3.60)
V cV = VV c = I m
versus
(3.61)
“left and right eigenspace” “left-right eigenspace analysis”
“left-right eigenspace synthesis”
ª Uc º ªȁº ªȁº A = G y [ U1 , U 2 ] « » V cG x (3.63) (3.62) A* = « 1 » G y AG x V = « » ¬0¼ ¬0¼ ¬ U c2 ¼ “dimension identities” 1 2
1 2
1 2
1 2
ȁ \ r × r , U1 \ n × r 0 \ ( n r )× r , U 2 \ n × ( n r ) , V \ r × r r := rk A = m, n > m “right eigenspace”
“left eigenspace” - 12
1 2
(3.64) L := G y U L-1 = U cG y 12
- 12
1 2
versus R := G x V R -1 = V cG x (3.65)
12
L1 := G y U1 , L 2 := G y U 2 -1 -1 -1 (3.66) LLc = G -y1 (L-1 )cL-1 = G y versus RR c = G x (R )cR = G x (3.67)
ª L º ª Uc º L1 = « 1 » G y =: « 1 » ¬ U c2 ¼ ¬L 2 ¼ 1 2
(3.68)
A = LA* R -1
(3.70) A = [ L1 , L 2 ] A # R 1
(3.72)
A* = L-1 AR
versus
ª A # AL1 = L1 ȁ 2 « # «¬ A AL 2 = 0
versus
ª ȁ º ª L º A* = « » = « 1 » AR (3.71) ¬ 0 ¼ ¬L 2 ¼
versus
AA # R = Rȁ 2
“overdetermined system of linear equations solved in canonical coordinates” (3.74)
(3.69)
ªi* º ª y * º ªȁº y * = A* x* + i* = « » x* + « 1* » = « *1 » ¬0¼ ¬«i 2 ¼» ¬ y 2 ¼ “dimension identities”
(3.73)
3-2 The least squares solution: “LESS”
135
y *1 \ r ×1 , y *2 \ ( n r )×1 , i*1 \ r ×1 , i*2 \ ( n r )×1 y *1 = ȁx* + i*1 x* = ȁ 1 (y *1 i*1 )
(3.75)
“if i* is LESS, then x*A = ȁ 1 y *1 , i*1 = 0 ”. Consult the commutative diagram of Figure 3.5 for a shorthand summary of the newly introduced transformations of coordinates, both of the parameter space X as well as the observation space Y . Third, we prepare ourselves for LESS of the overdetermined system of linear equations {y = Ax + i | A \ n×m , rk A = m, n > m,|| i ||G2 = min} y
by introducing Lemma 3.9, namely the eigenvalue-eigencolumn equations of the matrices A # A and AA # , respectively, as well as Lemma 3.11, our basic result of “canonical LESS”, subsequently completed by proofs. Throughout we refer to the adjoint operator which has been introduced by Definition 1.5 and Lemma 1.6. X x
A
y R(A) Y
1 2
1 2
U cG y
V cG x
X x*
*
y* R(A* ) Y
A Figure 3.5:Commutative diagram of coordinate transformations Lemma 3.9
(eigenspace analysis versus eigenspace synthesis of the matrix {A \ n× m , r := rk A = m < n} )
The pair of matrices {L, R} for the eigenspace analysis and the eigenspace synthesis of the rectangular matrix A \ n× m of rank r := rk A = m < n , namely A* = L-1 AR or ª ȁ º ª L º A* = « » = « 1 » AR ¬ 0 ¼ ¬L 2 ¼
versus
A = LA* R -1 or
versus
ªȁº A = [ L1 , L 2 ] « » R ¬0¼
136
3 The second problem of algebraic regression
are determined by the eigenvalue–eigencolumn equations (eigenspace equations) of the matrices A # A and AA # , respectively, namely A # AR = Rȁ 2
ª AA # L1 = L1 ȁ 2 « # ¬ AA L 2 = 0
versus subject to
ªO12 … 0 º « » ȁ 2 = « # % # » , ȁ = Diag + O12 ,..., + Or2 . « 0 " Or2 » ¬ ¼
)
(
Let us prove first A # AR = Rȁ 2 , second A # AL1 = L1 ȁ 2 , AA # L 2 = 0 . (i) A # AR = Rȁ 2 A # AR = G -1x AcG y AR = ª Uc º ªȁº = G -1xG x V [ ȁ, 0c] « 1 » (G y )cG y G y [ U1 , U 2 ] « » V cG x G x V ¬0 ¼ ¬ U c2 ¼ ªȁº A # AR = G x V [ ȁ, 0c] « » = G x Vȁ 2 0 ¬ ¼ 1 2
1 2
1 2
1 2
1 2
1 2
1 2
A # AR = Rȁ 2 .
ƅ
(ii) AA # L1 = L1 ȁ 2 , AA # L 2 = 0 AA # L = AG -1x A cG y L = ª Uc º ªȁº = G y [ U1 , U 2 ] « » V cG x G -1x G x V [ ȁ, 0c] « 1 » (G y )cG y G y [ U1 , U 2 ] c ¬0¼ ¬U2 ¼ ª U c U U1c U 2 º ªȁº AA # L = [ L1 , L 2 ] « » [ ȁ, 0c] « 1 1 » ¬0¼ ¬ U c2 U1 U c2 U 2 ¼ 1 2
1 2
1 2
ªȁ2 AA # L = [ L1 , L 2 ] « ¬0
1 2
0c º ª I r »« 0¼¬0
1 2
0 º I n-r »¼
AA # [ L1 , L 2 ] = ª¬ L1 ȁ 2 , 0 º¼ , AA # L1 = L1 ȁ 2 , AA # L 2 = 0.
ƅ
The pair of eigensystems {A # AR = Rȁ 2 , AA # [L1 , L 2 ] = ª¬L1 ȁ 2 ,0º¼} is unfortunately based upon non-symmetric matrices AA # = AG -1x A cG y and A # A = G -1x AcG y A which make the left and right eigenspace analysis numerically more complex. It appears that we are forced to use the Arnoldi method rather than the more efficient Lanczos method used for symmetric matrices.
3-2 The least squares solution: “LESS”
137
In this situation we look out for an alternative. Actually as soon as we substitute - 12
- 12
{L, R} by {G y U, G x V} - 12
into the pair of eigensystems and consequently multiply AA # L by G x , we achieve a pair of eigensystems identified in Corollary 3.10 relying on symmetric matrices. In addition, such a pair of eigensystems produces the canonical base, namely orthonormal eigencolumns. Corollary 3.10 (symmetric pair of eigensystems): The pair of eigensystems 1 2
1 2
- 12
- 12
(3.76) G y AG -1x A c(G y )cU1 = ȁ 2 U1 versus (G x )cA cG y AG x V = Vȁ 2 (3.77) 1 2
- 12 y
(3.78) | G y AG Ac(G )c Ȝ I |= 0 versus -1 x
2 i n
- 12
- 12
| (G x )cA cG y AG x Ȝ 2j I r |= 0 (3.79)
is based upon symmetric matrices. The left and right eigencolumns are orthogonal. Such a procedure requires two factorizations, 1 2
1 2
- 12
- 12
G x = (G x )cG x , G -1x = G x (G x )c
and
1 2
1 2
- 12
- 12
G y = (G y )cG y , G -1y = G y (G y )c
via Choleskifactorization or eigenvalue decomposition of the matrices G x and Gy . Lemma 3.11 (canonical LESS): Let y * = A* x* + i* be a canonical representation of the overdetermined system of linear equations {y = Ax + i | A \ n× m , r := rk A = m, n > m} . Then the rank partitioning of y * = ª¬(y *1 )c, (y *2 )cº¼c leads to the canonical unknown vector (3.80)
ª y* º ª y* º y * \ r ×1 x*A = ª¬ ȁ -1 , 0 º¼ « *1 » = ȁ -1 y *1 , y * = « *1 » , * 1 ( n r )×1 ¬y 2 ¼ ¬y 2 ¼ y 2 \ and to the canonical vector of inconsistency
(3.82)
ª i* º ª y* º ª ȁ º i* = 0 i*A = « *1 » := « *1 » « » ȁ -1 y *1 or *1 * i2 = y2 ¬i 2 ¼ A ¬ y 2 ¼ A ¬ 0 ¼
(3.81)
138
3 The second problem of algebraic regression
of type G y -LESS. In terms if the original coordinates x X a canonical representation of G y -LESS is ª Uc º x A = G x V ª¬ ȁ -1 , 0 º¼ « 1 » G y y ¬ U c2 ¼ 1 2
1 2
- 12
(3.83)
1 2
x A = G x Vȁ -1 U1c G y y = Rȁ -1 L-1 y.
(3.84)
x A = A A y is built on the canonical (G x , G y ) weighted right inverse. For the proof we depart from G y -LESS (3.11) and replace the matrix A \ n× m by its canonical representation, namely by eigenspace synthesis. -1 x A = ( A cG y A ) A cG y y º » » ªȁº A = G y [ U1 , U 2 ] « » V cG x » ¬0¼ ¼» 1 2
1 2
ª Uc º ªȁº A cG y A = (G x )cV [ ȁ, 0] « 1 » (G y )cG y G y [ U1 , U 2 ] « » V cG x ¬0¼ ¬ U c2 ¼ 1 2
1 2
1 2
1 2
A cG y A = (G x )cVȁ 2 V cG x , ( AcG y A ) = G x Vȁ -2 V c(G x )c 1 2
-1
1 2
- 12
- 12
ª Uc º x A = G x Vȁ 2 V c(G x )c(G x )cV [ ȁ, 0] « 1 » (G y )cG y y ¬ U c2 ¼ 1 2
1 2
1 2
1 2
ª Uc º x A = G x V ª¬ ȁ -1 , 0 º¼ « 1 » G y y ¬ U c2 ¼ 1 2
1 2
- 12
1 2
x A = G x Vȁ -1 U1c G y y = A -A y - 12
1 2
A A- = G x Vȁ -1 U1c G y A1,2,3 G y
( G y weighted reflexive inverse) º ª Uc º x*A = V cG x x A = ȁ -1 U1c G y y = ª¬ ȁ -1 , 0 º¼ « 1 » G y y » ¬ U c2 ¼ » » * ª y º ª Uc º » y * = « *1 » = « 1 » G y y c U »¼ y ¬ 2¼ ¬ 2¼ 1 2
1 2
1 2
1 2
ª y* º x*A = ª¬ ȁ -1 , 0 º¼ « *1 » = ȁ -1 y 1* . ¬y 2 ¼
3-2 The least squares solution: “LESS”
139
Thus we have proven the canonical inversion formula. The proof for the canonical representation of the vector of inconsistency is a consequence of the rank partitioning ª i* º ª y* º ª ȁ º i* , y * \ r ×1 i*l = « 1* » := « 1* » « » x*A , * 1 * 1 ( n r )×1 , i2 , y2 \ ¬i 2 ¼ A ¬ y 2 ¼ ¬ 0 ¼ ª i* º ª y * º ª ȁ º ª0º i*A = « 1* » = « 1* » « » ȁ -1 y1* = « * » . ¬y 2 ¼ ¬i 2 ¼ A ¬ y 2 ¼ ¬ 0 ¼
ƅ The important result of x*A based on the canonical G y -LESS of {y * = A* x* + i* | A* \ n× m , rk A* = rk A = m, n > m} needs a comment. The rank partitioning of the canonical observation vector y * , namely y1* \ r , y *2 \ n r again paved the way for an interpretation. First, we appreciate the simple “direct inversion” x*A = ȁ -1 y1* , ȁ = Diag + O12 ,..., + Or2 , for instance
)
(
ª x º ª Ȝ1-1y1* º « » « » « ... » = « ... » . « x*m » « Ȝ -1r y *r » ¬ ¼A ¬ ¼ Second, i1* = 0 , eliminates all elements of the vector of canonical inconsistencies, for instance ª¬i1* ,..., ir* º¼ c = 0 , while i*2 = y *2 identifies the deficient elements of the A vector of canonical inconsistencies with the vector of canonical observations for * * instance ª¬ir +1 ,..., in º¼ c = ª¬ yr*+1 ,..., yn* º¼ c . Finally, enjoy the commutative diagram A A of Figure 3.6 illustrating our previously introduced transformations of type LESS and canonical LESS, by means of A A and A* , respectively. * 1
Y y
AA
1 2
A
xA X
1 2
UcG y
Y y*
( )
V cG x
( A* )A
x*A X
Figure 3.6: Commutative diagram of inverse coordinate transformations A first example is canonical LESS of the Front Page Example by G y = I 3 , Gx = I2 .
140
3 The second problem of algebraic regression
ª1 º ª1 1 º ª i1 º ªx º y = Ax + i : «« 2 »» = ««1 2 »» « 1 » + ««i2 »» , r := rk A = 2 x «¬ 4 »¼ «¬1 3 »¼ ¬ 2 ¼ «¬ i3 »¼ left eigenspace
right eigenspace
AA # U1 = AAcU1 = U1 ȁ 2
A # AV = A cAV = Vȁ 2
AA U 2 = AAcU 2 = 0 #
ª2 3 4 º AA c = «« 3 5 7 »» «¬ 4 7 10 »¼
ª3 6 º «6 14 » = A cA ¬ ¼ eigenvalues
| AAc Oi2 I 3 |= 0
| A cA O j2 I 2 |= 0
i {1, 2,3}
j {1, 2}
O12 =
17 1 17 1 + 265, O22 = 265, O32 = 0 2 2 2 2
left eigencolumns ª 2 O12 « (1st) « 3 « 4 ¬
right eigencolumns
3 4 º ª u11 º » 2 5 O1 7 » ««u21 »» = 0 7 10 O12 »¼ «¬u31 »¼
ª3 O12 6 º ª v11 º (1st) « »« » = 0 14 O12 ¼ ¬ v 21 ¼ ¬ 6
subject to
subject to
2 u112 + u21 + u312 = 1
v112 + v 221 = 1
ª(2 O12 )u11 + 3u21 + 4u31 = 0 « 2 ¬ 3u11 + (5 O1 )u21 + 7u31 = 0
versus
(3 O12 ) v11 + 6 v 21 = 0
36 72 ª 2 « v11 = 36 + (3 O 2 ) 2 = 265 + 11 265 1 « 2 « 2 (3 O1 ) 2 193 + 11 265 = « v 21 = 2 2 36 + (3 O1 ) «¬ 265 + 11 265 ª u112 º 1 « 2» «u21 » = (1 + 4O 2 ) 2 + (2 7O 2 ) 2 + (1 7O 2 + O 4 ) 2 1 1 1 1 « 2» ¬u31 ¼
ª (1 + 4O12 ) 2 º « » 2 2 « (2 7O1 ) » 2 4 2» « ¬ (1 7O1 + O1 ) ¼
3-2 The least squares solution: “LESS”
141
(
)
(
)
ª 35 + 2 265 2 º « » ª u112 º « 2» 2 « 2» «§ 115 + 7 265 · » ¨ ¸ » «u21 » = 43725 + 2685 265 «© 2 2 ¹ 2 » «u31 « » ¬ ¼ 2 « 80 + 5 265 » »¼ ¬« ª3 O22 (2nd) « ¬ 7
ª 2 O22 « (2nd) « 3 « 5 ¬
7 º ª u12 º »« » = 0 21 O22 ¼ ¬u22 ¼
subject to 2 u122 + u22 + u322 = 1
3 5 º ª v12 º » 2 5 O2 9 » «« v 22 »» = 0 9 17 O22 »¼ «¬ v32 »¼ subject to 2 v122 + v 22 =1
ª(2 O22 )u12 + 3u22 + 4u32 = 0 « 2 ¬ 3u12 + (5 O2 )u22 + 7u32 = 0
versus
(3 O22 ) v12 + 6 v 22 = 0
36 72 ª 2 « v12 = 36 + (3 O 2 ) 2 = 265 11 265 2 « 2 2 « 2 (3 O2 ) 193 11 265 = « v 22 = 2 2 36 + (3 ) O 265 11 265 ¬« 2 ª u122 º 1 « 2» «u22 » = (1 + 4O 2 ) 2 + (2 7O 2 ) 2 + (1 7O 2 + O 4 ) 2 2 2 2 2 2 » «u32 ¬ ¼
ª (1 + 4O22 ) 2 º « » 2 2 « (2 7O2 ) » « (1 7O22 + O24 ) 2 » ¬ ¼
ª (35 2 265) 2 º ª u122 º « » 2 115 7 « 2» 2» « ( 265) «u22 » = » 43725 2685 265 « 2 2 2 » «u32 « » 2 ¬ ¼ ¬« (80 5 265) »¼ ª 2 3 4 º ª u13 º (3rd) «« 3 5 7 »» ««u23 »» = 0 «¬ 4 7 10 »¼ «¬u33 »¼
subject to
2 u132 + u23 + u332 = 1
2u13 + 3u23 + 4u33 = 0 3u13 + 5u23 + 7u33 = 0 ª u13 º ª 2 3º ª u13 º ª 4 º ª 5 3º ª 4º « 3 5» «u » = « 7 » u33 «u » = « 3 2 » « 7 » u33 ¬ ¼ ¬ 23 ¼ ¬ ¼ ¬ ¼¬ ¼ ¬ 23 ¼
142
3 The second problem of algebraic regression
u13 = +u33 , u23 = 2u33 1 2 1 2 u132 = , u23 = , u332 = . 6 3 6 There are four combinatorial solutions to generate square roots. ª u11 u12 «u « 21 u22 «¬u31 u32 ª v11 «v ¬ 21
2 ª u13 º « ± u11 2 u23 »» = « ± u21 « u33 »¼ « ± u 2 31 ¬ 2 v12 º ª ± v11 « = v 22 »¼ « ± v 2 21 ¬
± u122 2 ± u22 2 ± u32
± u132 º » 2 » ± u23 » 2 » ± u33 ¼
2 º ± v12 ». ± v 222 »¼
Here we have chosen the one with the positive sign exclusively. In summary, the eigenspace analysis gave the result as follows. § 17 + 265 17 265 ȁ = Diag ¨ , ¨ 2 2 © ª « « « « U=« « « « « ¬
2 2 2 2
35 + 2 265 43725 + 2685 265 115 + 7 265 43725 + 2685 265 80 + 5 265 43725 + 2685 265
· ¸ ¸ ¹
35 2 265
2
43725 2685 265 2 2
115 7 265 43725 2685 265 80 5 265
2
43725 2685 265
72 ª « « 265 + 11 265 V=« « 193 + 11 265 « ¬ 265 + 11 265
72
º » 265 11 265 » ». 193 11 265 » » 265 11 265 ¼
º 1 » 6» 6 » 1 » 6 = [ U1 , U 2 ] 3 » » 1 » 6 6 » » ¼
3-3 Case study
143
3-3 Case study: Partial redundancies, latent conditions, high leverage points versus break points, direct and inverse Grassmann coordinates, Plücker coordinates This case study has various targets. First we aim at a canonical analysis of the hat matrices Hx and Hy for a simple linear model with a leverage point. The impact of a high leverage point is studied in all detail. Partial redundancies are introduced and interpreted in their peculiar role of weighting observations. Second, preparatory in nature, we briefly introduce multilinear algebra, the operations "join and meet", namely the Hodge star operator. Third, we go "from A to B": Given the columns space R ( A) = G m , n ( A) , identified as the Grassmann space G m , n R n of the matrix A R n× m , n > m, rk A = m , we construct the column space R (B) = R A ( A) = G n m , n R n of the matrix B which agrees to the orthogonal column space R A ( A) of the matrix A. R A ( A) is identified as Grassmann space G n m , n R n and is covered by Grassmann coordinates, also called Plücker coordinates pij. The matrix B, alternatively the Grassmann coordinates (Plücker coordinates), constitute the latent restrictions, also called latent condition equations, which control parameter adjustment and lead to a proper choice of observational weights. Fourth, we reverse our path: we go “from B to A”: Given the column space R (B) of the matrix of restrictions B RA × n , A < n , rk B = A we construct the column space R A (B) = R ( A) R n , the orthogonal column space of the matrix B which is apex to the column space R (A) of the matrix A. The matrix A, alternatively the Grassmann coordinates (Plücker coordinates) of the matrix B constitute the latent parametric equations which are “behind a conditional adjustment”. Fifth, we break-up the linear model into pieces, and introduce the notion of break points and their determination. The present analysis of partial redundancies and latent restrictions has been pioneered by G. Kampmann (1992), R. Jurisch, G. Kampmann and J. Linke (1999a, b) as well as R. Jurisch and G. Kampmann (2002 a, b). Additional useful references are D. W. Behmken and N. R. Draper (1972), S. Chatterjee and A. S. Hadi (1988), R. D. Cook and S. Weisberg (1982). Multilinear algebra, the operations “join and meet” and the Hodge star operator are reviewed in W. Hodge and D. Pedoe (1968), C. Macinnes (1999), S. Morgera (1992), W. Neutsch (1995), B. F. Doolin and C. F. Martin (1990). A sample reference for break point synthesis is C. H. Mueller (1998), N. M. Neykov and C. H. Mueller (2003) and D. Tasche (2003). 3-31 Canonical analysis of the hat matrix, partial redundancies, high leverage points A beautiful example for the power of eigenspace synthesis is the least squares fit of a straight line to a set of observation: Let us assume that we have observed a dynamical system y(t) which is represented by a polynomial of degree one with respect to time t.
144
3 The second problem of algebraic regression
y (ti ) = 1i x1 + ti x2 i {1," , n} . Due to y • (t ) = x2 it is a dynamical system with constant velocity or constant first derivative with result to time t0. The unknown polynomial coefficients are collected in the column array x = [ x1 , x2 ]c, x X = R 2 , dim X = 2 and constitute the coordinates of the two-dimensional parameter space X . For this example we choose n = 4 observations, namely y = [ y (t1 ), y (t2 ), y (t3 ), y (t4 )]c , y Y = R 4 , dim Y = 4 . The samples of the polynomial are taken at t1 = 1, t2 = 2, t3 = 3 and t4 = a. With such a choice of t4 we aim at modeling the behavior of high leverage points, e.g. a >> (t1 , t2 , t3 ) or a o f , illustrated by Figure 3.7. y4
*
y3
*
y (t ) y2 y1
* t1 = 1
* t2 = 2
t3 = 3
t4 = a
t Figure 3.7: Graph of the function y(t), high leverage point t4=a Box 3.15 summarizes the right eigenspace analysis of the hat matrix H y : =A(AcA)- Ac . First, we have computed the spectrum of A cA and ( A cA) 1 for the given matrix A R 4× 2 , namely the eigenvalues squared 2 O1,2 = 59 ± 3261 . Note the leverage point t4 = a = 10. Second, we computed the right eigencolumns v1 and v2 which constitute the orthonormal matrix V SO(2) . The angular representation of the orthonormal matrix V SO(2) follows: Third, we take advantage of the sine-cosine representation (3.85) V SO(2) , the special orthonormal group over R2. Indeed, we find the angular parameter J = 81o53ƍ25.4Ǝ. Fourth, we are going to represent the hat matrix Hy in terms of the angular parameter namely (3.86) – (3.89). In this way, the general representation (3.90) is obtained, illustrated by four cases. (3.86) is a special case of the general angular representation (3.90) of the hat matrix Hy. Five, we sum up the canonical representation AV cȁ 2 V cA c (3.91), of the hat matrix Hy, also called right eigenspace synthesis. Note the rank of the hat matrix, namely rk H y = rk A = m = 2 , as well as the peculiar fourth adjusted observation 1 yˆ 4 = y4 (I LESS) = ( 11 y1 + y2 + 13 y3 + 97 y4 ) , 100 which highlights the weight of the leverage point t4: This analysis will be more pronounced if we go through the same type of right eigenspace synthesis for the leverage point t4 = a, ao f , outlined in Box 3.18.
145
3-3 Case study
Box 3.15 Right eigenspace analysis of a linear model of an univariate polynomial of degree one - high leverage point a =10 “Hat matrix H y = A( A cA) 1 A = AVȁ 2 V cAc ” ª A cAV = Vȁ 2 « right eigenspace analysis: «subject to « VV c = I 2 ¬ ª1 1 º «1 2 » » , A cA = ª 4 16 º , ( AA) 1 = 1 ª 57 8º A := « « » «1 3 » 100 ¬« 8 2 ¼» ¬16 114¼ « » ¬1 10 ¼ spec( A cA) = {O12 , O 22 } : A cA O 2j I 2 = 0, j {1, 2} 4 O2 16 = 0 O 4 118O 2 + 200 = 0 2 16 114 O 2 O1,2 = 59 ± 3281 = 59 ± 57.26 = 0
spec( A cA ) = {O12 , O 22 } = {116.28, 1.72} versus spec( A cA) 1 = {
1 1 , } = {8.60 *103 , 0.58} O12 O 22
ª( A cA O 2j I 2 )V = 0 « right eigencolumn analysis: «subject to « VV c = I 2 ¬ ªv º 2 =1 (1st) ( A cA O12 I ) « 11 » = 0 subject to v112 + v21 v ¬ 21 ¼ (4 O12 )v11 + 16v21 = 0 º » 2 v112 + v21 =1 »¼
146
3 The second problem of algebraic regression
16
v11 = + v112 =
2 v21 = + v21 =
256 + (4 O12 ) 2 4 O12 256 + (4 O12 ) 2
= 0.141
= 0.990
ªv º 2 (2nd) ( A cA O 22 I 2 ) « 12 » = 0 subject to v122 + v22 =1 v ¬ 22 ¼ (4 O 22 )v12 + 16v22 = 0 º » 2 v122 + v22 =1 ¼ v12 = + v122 =
2 v22 = + v22 =
16 256 + (4 O 22 ) 2 4 O 22 256 + (4 O 22 ) 2
= 0.990
= 0.141
spec( A cA) = {116.28, 1.72} right eigenspace: spec( A cA) 1 = {8.60 *103 , 0.58} ªv V = « 11 ¬ v21
v12 º ª 0.141 0.990 º = SO(2) v22 »¼ «¬ 0.990 0.141»¼
V SO(2) := {V R 2×2 VV c = I 2 , V = 1} “Angular representation of V SO(2) ” ª cos J sin J º ª 0.141 0.990º V=« »=« » ¬ sin J cos J ¼ ¬ 0.990 0.141¼
(3.85)
sin J = 0.990, cos J = 0.141, tan J = 7.021 J=81o.890,386 = 81o53’25.4” hat matrix H y = A( A cA) 1 Ac = AVȁ 2 V cAc 1 1 1 ª 1 º 2 2 « O 2 cos J + O 2 sin J ( O 2 + O 2 ) sin J cos J » 2 1 2 » (3.86) ( A cA) 1 = V/ 2 V = « 1 « 1 » 1 1 1 2 2 sin J + 2 cos J » «( 2 + 2 ) sin J cos J 2 O1 O2 ¬ O1 O 2 ¼
147
3-3 Case study
( A cA) j 1j = 1 2
m=2
1
¦O j3 =1
cos J j j cos J j
2 j3
1 3
(3.87)
2 j3
subject to m=2
VV c = I 2 ~
¦ cos J j3 =1
j1 j3
cos J j
2 j3
= Gj j
(3.88)
1 2
case 1: j1=1, j2=1:
case 2: j1=1, j2=2:
cos 2 J11 + cos 2 J12 = 1
cos J11 cos J 21 + cos J12 cos J 22 = 0
(cos 2 J + sin 2 J = 1)
( cos J sin J + sin J cos J = 0)
case 3: j1=2, j2=1:
case 4: j1=2, j2=2:
cos J 21 cos J11 + cos J 22 cos J12 = 0
cos 2 J 21 + cos 2 J 22 = 1
( sin J cos J + cos J sin J = 0)
(sin 2 J + cos 2 J = 1)
( A cA) 1 = ª O12 cos 2 J 11 + O22 cos 2 J 12 « 2 2 ¬O1 cos J 21 cos J 11 + O2 cos J 22 cos J 12 H y = AVȁ 2 V cA c ~ hi i = 12
O12 cos J 11 cos J 21 + O22 cos J 12 cos J 22 º » O12 cos 2 J 21 + O22 cos 2 J 22 ¼ (3.89)
m=2
¦
j1 , j2 , j3 =1
ai j ai 1 1
2 j2
1 cos J j j cos J j O j2 1 3
2 j3
H y = A( A cA) 1 Ac = AVȁ 2 V cAc ª 0.849 « 1.839 A ~ := AV = « « 2.829 « ¬ 9.759
1.131 º 1.272 »» 2 , ȁ = Diag(8.60 × 103 , 0.58) 1.413 » » 2.400 ¼
ª 43 37 31 11º « » 1 « 37 33 29 1 » H y = A ~ ȁ 2 ( A ~ )c = 100 « 31 29 27 13 » « » ¬« 11 1 13 97 ¼» rk H y = rk A = m = 2 yˆ 4 = y4 (I -LESS) =
(3.90)
3
1 ( 11 y1 + y2 + 13 y3 + 97 y4 ) . 100
(3.91)
148
3 The second problem of algebraic regression
By means of Box 3.16 we repeat the right eigenspace analysis for one leverage point t4 = a, lateron a o f , for both the hat matrix H x : = ( A cA) 1 A c and H y : = A( A cA) 1 Ac . First, Hx is the linear operator producing xˆ = x A (I -LESS) . Second, Hy as linear operator generates yˆ = y A (I -LESS) . Third, the complementary operator I 4 H y =: R as the matrix of partial redundancies leads us to the inconsistency vector ˆi = i A (I -LESS) . The structure of the redundancy matrix R, rk R = n – m, is most remarkable. Its diagonal elements will be interpreted soonest. Fourth, we have computed the length of the inconsistency vector || ˆi ||2 , the quadratic form y cRy . The highlight of the analysis of hat matrices is set by computing 1st : H x (a o f) versus 2nd : H y (a o f) 3rd : R (a o f) versus 4th : || ˆi ( a o f) ||2 for “highest leverage point” a o f , in detail reviewed Box 3.17. Please, notice the two unknowns xˆ1 and xˆ2 as best approximations of type I-LESS. xˆ1 resulted in the arithmetic mean of the first three measurements. The point y4 , t4 = a o f , had no influence at all. Here, xˆ2 = 0 was found. The hat matrix H y (a o f) has produced partial hats h11 = h22 = h33 = 1/ 3 , but h44 = 1 if a o f . The best approximation of the I LESS observations were yˆ1 = yˆ 2 = yˆ 3 as the arithmetic mean of the first three observations but yˆ 4 = y4 has been a reproduction of the fourth observation. Similarly the redundancy matrix R (a o f) produced the weighted means iˆ1 , iˆ2 and iˆ3 . The partial redundancies r11 = r22 = r33 = 2 / 3, r44 = 0 , sum up to r11 + r22 + r33 + r44 = n m = 2 . Notice the value iˆ4 = 4 : The observation indexed four is left uncontrolled. Box 3.16 The linear model of a univariate polynomial of degree one - one high leverage point ª y1 º ª1 « y » «1 y = Ax + i ~ « 2 » = « « y3 » «1 « » « ¬« y4 ¼» ¬«1
1º ª i1 º » 2 » ª x1 º ««i2 »» + 3 » «¬ x2 »¼ « i3 » « » » a ¼» ¬«i4 ¼»
x R 2 , y R 4 , A R 4× 2 , rk A = m = 2 dim X = m = 2 versus dim Y = n = 4 (1st) xˆ = xA (I -LESS) = ( A cA) 1 A cy = H x y
(3.92)
149
3-3 Case study
Hx =
1 18 12a + 3a 2
ª8 a + a « ¬ 2 a
2
2 2a + a 2a
2
4 3a + a 6a
2
ª y1 º « » 14 6a º « y2 » » 6 + 3a ¼ « y3 » « » «¬ y4 »¼
(2nd ) yˆ = y A (I -LESS) = A c( A cA) 1 A cy = H y y
(3.93)
“hat matrix”: H y = A c( A cA) 1 A c, rk H y = m = 2 ª 6 2a + a 2 « 2 1 « 4 3a + a Hy = 18 18a + 3a 2 « 2 4a + a 2 « ¬« 8 3a
º 4 3a + a 2 2 4a + a 2 8 3a » 2 2 6 4a + a 8 5a + a 2 » 6 5a + a 2 14 6a + a 2 4 + 3a » » 2 4 + 3a 14 12a + 3a 2 ¼»
(3rd) ˆi = i A (I -LESS) = (I 4 A( A cA) 1 A c) y = Ry “redundancy matrix”: R = I 4 A( AcA) 1 Ac, rk R = n m = 2 “redundancy”: n – rk A = n – m = 2 ª12 10a + 2a 2 4 + 3a a 2 « 2 12 6a + 2a 2 1 « 4 + 3a a R= 18 12a + 3a 2 « 2 + 4a a 2 8 + 5a a 2 « 2 «¬ 8 + 3a
2 + 4 a a 2 8 + 5a a 2 4 6a + 2a 2 4 3a
8 + 3a º » 2 » 4 3a » » 4 »¼
(4th) || ˆi ||2 =|| i A (I -LESS) ||2 = y cRy . At this end we shall compute the LESS fit lim || iˆ(a ) ||2 ,
a of
which turns out to be independent of the fourth observation. Box 3.17 The linear model of a univariate polynomial of degree one - extreme leverage point a o f (1st ) H x (a o f) 2 2 4 3 14 6 º ª8 1 « a2 a + 1 + a2 a + 1 a2 a + 1 a2 a » 1 Hx = « » 18 12 2 1 6 1 6 3» « 2 1 + 3 + 2 + 2 2+ «¬ a 2 a a2 a a a a »¼ a a a
150
3 The second problem of algebraic regression
1 ª1 1 1 0 º lim H x = « aof 3 ¬0 0 0 0 »¼ 1 xˆ1 = ( y1 + y2 + y3 ), xˆ2 = 0 3 (2nd ) H y (a o f) 4 3 ª6 2 « a2 a + 1 a2 a + 1 « « 4 3 +1 6 4 +1 « a2 a 1 a2 a Hy = « 18 12 2 4 8 5 + 3 « 2 +1 2 +1 a2 a «a a a a « 8 3 2 « 2 a a2 ¬ a ª1 « 1 1 lim H y = « a of 3 «1 « ¬«0
1 1 1 0
1 1 1 0
2 4 8 3 º +1 2 a a a2 a » » 8 5 2 » +1 2 2 » a a a » 14 6 4 3 » +1 2 + a » a2 a a 4 3 14 12 » 2+ + 3» a a2 a a ¼
0º 0 »» , lim h44 = 1 0 » a of » 3 ¼»
1 yˆ1 = yˆ 2 = yˆ 3 = ( y1 + y2 + y3 ), yˆ 4 = y4 3 (3rd ) R (a o f) 4 3 2 4 8 3º ª 10 10 « a2 a + 2 a2 + a 1 a2 + a 1 a2 + a » « » 2 » « 4 + 3 1 12 8 + 2 8 + 5 1 2 « a2 a 1 a2 a a2 a a » R= « » 18 12 2 4 8 5 4 6 4 3 » « + 3 + 1 + 1 + 2 a2 a « a2 a a2 a a2 a a2 a » « 8 3 2 4 3 4 » « 2+ » 2 a a a2 a a2 ¼ ¬ a ª 2 1 1 « 1 1 2 1 lim R (a ) = « a of 3 « 1 1 2 « ¬0 0 0
0º 0 »» . 0» » 0¼
151
3-3 Case study
1 1 1 iˆ1 = (2 y1 y2 y3 ), iˆ2 = ( y1 + 2 y2 y3 ), iˆ3 = ( y1 y2 + 2 y3 ), iˆ4 = 0 3 3 3 (4th ) LESS fit : || iˆ ||2 ª 2 1 1 « 1 2 1 1 lim || iˆ(a ) ||2 = y c « a of 3 « 1 1 2 « «¬ 0 0 0
0º 0 »» y 0» » 0 »¼
1 lim || iˆ(a ) ||2 = (2 y12 + 2 y22 + 2 y32 2 y1 y2 2 y2 y3 2 y3 y1 ) . 3
aof
A fascinating result is achieved upon analyzing (the right eigenspace of the hat matrix H y (a o f) . First, we computed the spectrum of the matrices A cA and ( A cA) 1 . Second, we proved O1 (a o f) = f , O2 (a o f) = 3 or O11 (a o f) = 0 , O21 (a o f) = 1/ 3 . Box 3.18 Right eigenspace analysis of a linear model of a univariate polynomial of degree one - extreme leverage point a o f “Hat matrix H y = A c( A cA) 1 A c ” ª A cAV = Vȁ 2 « right eigenspace analysis: «subject to « VV c = I mc ¬ spec( A cA) = {O12 , O22 } : A cA O j2 I = 0 j {1, 2} 4 O2 6+a = 0 O 4 O 2 (18 + a 2 ) + 20 12a + 3a 2 = 0 2 2 6 + a 14 + a O 2 O1,2 =
1 tr ( A cA) ± (tr A cA) 2 4 det A cA 2
tr A cA = 18 + a 2 , det A cA = 20 12a + 3a 3 (tr A cA) 2 4 det AcA = 244 + 46a + 25a 2 + a 4
152
3 The second problem of algebraic regression
a2 a4 ± 61 + 12a + 6a 2 + 2 4 2 2 spec( A cA) = {O1 , O2 } =
2 O1,2 = 9+
° a 2 a4 a2 a4 = ®9 + + 61 + 12a + 6a 2 + , 9 + 61 + 12a + 6a 2 + 2 4 2 4 °¯ “inverse spectrum ” 1 1 spec( A cA) = {O12 , O22 } spec( A cA) 1 = { 2 , 2 } O1 O2 1 = O22
1 = O12
9+
9+
½° ¾ °¿
9 1 61 12 6 1 a2 a4 + 4+ 3+ 2+ 61 + 12a + 6a 2 + 2 2 a a a 4 2 4 =a 20 12 20 12a + 3a 2 +3 a2 a 1 lim =0 a of O 2 1 9 1 61 12 6 1 a2 a4 + + + + + + 61 + 12a + 6a 2 + 2 2 a 4 a3 a 2 4 2 4 = a 20 12 20 12a + 3a 2 +3 a2 a 1 1 lim = a of O 2 3 2
1 lim spec( A cA)(a) = {f,3} lim spec( A cA) 1 = {0, } aof 3 2 A cAV = Vȁ º 1 2 2 » A cA = Vȁ V c ( A cA ) = Vȁ V c VV c = I m ¼ aof
“Hat matrix H y = AVȁ 2 V cA c ”. 3-32 Multilinear algebra, “join” and ”meet”, the Hodge star operator Before we can analyze the matrices “hat Hy” and “red R” in more detail, we have to listen to an “intermezzo” entitled multilinear algebra, “join” and “meet” as well as the Hodge star operator. The Hodge star operator will lay down the foundation of “latent restrictions” within our linear model and of Grassmann coordinates, also referred to as Plücker coordinates. Box 3.19 summarizes the definitions of multilinear algebra, the relations “join and meet”, denoted by “ ” and “*”, respectively. In terms of orthonormal base vectors ei , " , ei , we introduce by (3.94) the exterior product ei " ei also known as “join”, “skew product” or 1st Grassmann relation. Indeed, such an exterior product is antisymmetric as defined by (3.95), (3.96), (3.97) and (3.98). 1
k
1
m
153
3-3 Case study
The examples show e1 e 2 = - e 2 e1 and e1 e1 = 0 , e 2 e 2 = 0 . Though the operations “join”, namely the exterior product, can be digested without too much of an effort, the operation ”meet”, namely the Hodge star operator, needs much more attention. Loosely speaking the Hodge star operator or 2nd Grassmann relation is a generalization of the conventional “cross product” symbolized by “ × ”. Let there be given an exterior form of degree k as an element of /k(Rn) over the field of real numbers Rn . Then the “Hodge *” transforms the input exterior form of degree m to the output exterior form of degree n – m, namely an element of /n-k(R n). Input: X/ /m(R n) o Output: *X/ /n-m. Applying the summation convention over repeated indices, (3.100) introduces the input operation “join”, while (3.101) provides the output operation “meet”. We say that X , (3.101) is a representation of the adjoint form based on the original form X , (3.100). The Hodge dualizer is a complicated exterior form (3.101) which is based upon Levi-Civita’s symbol of antisymmetry (3.102) which is illustrated by 3 examples. H k1"kA is also known as the permutation operator. Unfortunately, we have no space and time to go deeper into “join and meet“. Instead we refer to those excellent textbooks on exterior algebra and exterior analysis, differential topology, in short exterior calculus. Box 3.19 “join and meet” Hodge star operator “ , ” I := {i1 ," , ik , ik +1 ," , in } {1," , n} “join”: exterior product, skew product, 1st Grassmann relation ei "i := ei " e j e j +1 " ei
(3.94)
“antisymmetry”: ei ...ij ...i = ei ... ji...i i z j
(3.95)
1
m
m
1
1
m
1
m
ei ... e j e j ... ei = ei ... e j e j ... ei
(3.96)
ei "i i "i = 0 i = j
(3.97)
ei " ei e j " ei = 0 i = j
(3.98)
1
k +1
k
m
1
1
i j
k +1
1
k
m
m
m
Example: e1 e 2 = e 2 e1 or e i e j = e j e i i z j Example: e1 e1 = 0, e 2 e 2 = 0 or e i e j = 0 “meet”: Hodge star operator, Hodge dualizer 2nd Grassmann relation
i = j
154
3 The second problem of algebraic regression
: ȁ m ( R n ) o n m ȁ ( R n )
(3.99)
“a m degree exterior form X ȁ m ( R n ) over R n is related to a n-m degree exterior form *X called the adjoint form” :summation convention: “sum up over repeated indices” input: “join” X=
1 e i " e i X i "i m! 1
(3.100)
m
m
1
output: “meet” 1 g e j " e j H i "i j " j Xi "i m !(n m)! antisymmetry operator ( “Eddington’s epsilons” ):
*X :=
H k "k 1
A
1
nm
1
1
m 1
(3.101)
m
nm
ª +1 for an even permutation of the indices k1 " kA := «« 1 for an oded permutation of the indices k1 " kA «¬ 0 otherwise (for a repetition of the indices).
(3.102)
Example: H123 = H 231 = H 312 = +1 Example: H 213 = H 321 = H132 = 1 Example: H112 = H 223 = H 331 = 0. For our purposes two examples on “Hodge’s star” will be sufficient for the following analysis of latent restrictions in our linear model. In all detail, Box 3.20 illustrates “join and meet” for
: ȁ 2 ( R 3 ) o ȁ 1 ( R 3 ) . Given the exterior product a b of two vectors a and b in R 3 with ai 1 = col1 A, ai 2 = col 2 A 1
2
as their coordinates, the columns of the matrix A with respect to the orthonormal frame of reference {e1 , e 2 , e 3 |0} at the origin 0. ab =
n =3
¦e
i1 ,i2 =1
i1
ei ai 1ai 2 ȁ 2 (R 3 ) 2
1
2
is the representation of the exterior form a b =: X in the multibasis ei i = ei ei . By cyclic ordering, (3.105) is an explicit write-up of a b R ( A) . Please, notice that there are 12
1
2
155
3-3 Case study
§n · §3· ¨ ¸=¨ ¸=3 © m¹ © 2¹ subdeterminants of A . If the determinant of the matrix G = I 4 , g = 1 , then according to (3.106), (3.107)
det G = 1
(a b) R ( A) A = G1,3 represent the exterior form *X , which is an element of R ( A) called Grassmann space G1,3 . Notice that (a b) is a vector whose Grassmann coordinate (Plücker coordinate) are §n · §3· ¨ ¸=¨ ¸=3 © m¹ © 2¹ subdeterminants of the matrix A, namely a21a32 a31a22 , a31a12 a11a32 , a11a23 a21a12 . Finally, (3.108) (e 2 e 3 ) = e 2 × e 3 = e1 for instance demonstrates the relation between " , " called “join, meet” and the “cross product”. Box 3.20 The first example: “join and meet”
: ȁ 2 (R 3 ) o ȁ1 (R 3 ) Input: “join” n =3
n =3
a = ¦ ei ai 1 , i =1
1
b =¦ ei ai
1
2
i =1
2
2
(3.103)
ai 1 = col1 A; ai 2 = col 2 A 1
ab =
2
1 n =3 ¦ ei ei 2! i ,i =1 1
2
ai 1ai 2 ȁ 2 (R 3 ) 1
2
(3.104)
1 2
“cyclic order ab =
1 e 2 e3 (a21a32 a31a22 ) + 2! 1 + e3 e1 (a31a12 a11a32 ) + 2! 1 + e1 e 2 (a11a23 a21a12 ) R ( A ) = G 2,3 . 2!
(3.105)
156
3 The second problem of algebraic regression
Output: “meet” ( g = 1, G y = I 3 , m = 2, n = 3, n m = 1)
(a b) =
n=2
1 e j H i ,i , j ai 1ai i ,i , j =1 2!
¦
1 2
1
2
2
(3.106)
1 2
1 e1 ( a21a32 a31a22 ) + 2! 1 + e 2 (a31a12 a11a32 ) + 2! 1 + e3 ( a11a23 a21a12 ) R A ( A ) = G1,3 2!
*(a b) =
(3.107)
§n · §3· ¨ ¸ = ¨ ¸ subdeterminant of A © m¹ © 2¹ Grassmann coordinates (Plücker coordinates)
(e 2 e3 ) = e1 , (e3 e1 ) = e 2 , (e1 e 2 ) = e3 .
(3.108)
Alternatively, Box 3.21 illustrates “join and meet” for selfduality
: ȁ 2 ( R 4 ) o ȁ 2 ( R 4 ) . Given the exterior product a b of two vectors a R 4 and b R 4 , namely the two column vectors of the matrix A R 4× 2 , ai 1 = col1 A, ai 2 = col 2 A 1
2
as their coordinates with respect to the orthonormal frame of reference {e1 , e 2 , e 3 , e 4 | 0 } at the origin 0. ab =
n=4
¦e
i1 ,i2 =1
i1
ei
2
ai 1ai 2 ȁ 2 (R 4 ) 1
2
is the representation of the exterior form a b := X in the multibasis ei i = ei ei . By lexicographic ordering, (3.111) is an explicit write-up of a b ( R ( A)) . Notice that these are 12
1
2
§ n · § 4· ¨ ¸=¨ ¸=6 © m¹ © 2¹ subdeterminants of A . If the determinant of the matrix G of the metric is one G = I 4 , det G = g = 1 , then according to (3.112), (3.113)
(a b) R ( A) A =: G 2,4
157
3-3 Case study
represents the exterior form X , an element of R ( A) A , called Grassmann space G 2,4 . Notice that (a b) is an exterior 2-form which has been generated by an exterior 2-form, too. Such a relation is called “selfdual”. Its Grassmann coordinates (Plücker coordinates) are
§ n · § 4· ¨ ¸=¨ ¸=6 © m¹ © 2¹ subdeterminants of the matrix A, namely a11a12 a21a12 , a11a32 a31a22 , a11a42 a41a12 , a21a32 a31a22 , a21a42 a41a22 , a31a41 a41a32 . Finally, (3.113), for instance (e1 e 2 ) = e3 e 4 , demonstrates the operation " , " called “join and meet”, indeed quite a generalization of the “cross product”. Box 3.21 The second example “join and meet”
: / 2 ( R 4 ) o / 2 ( R 4 ) “selfdual” Input : “join” n=4
n=4
a = ¦ ei ai 1 , b = ¦ ei ai i1 =1
1
1
i2 =1
2
2
2
(3.109)
(ai 1 = col1 ( A), ai 2 = col 2 ( A)) 1
ab =
2
1 n=4 ¦ ei ei ai 1ai 2 ȁ 2 (R 4 ) 2! i ,i =1 1
2
1
2
(3.110)
1 2
“lexicographical order” 1 e1 e 2 ( a11a22 a21a12 ) + 2! 1 + e1 e 3 ( a11a32 a31a22 ) + 2! 1 + e1 e 4 (a11a42 a41a12 ) + 2!
ab =
(3.111)
158
3 The second problem of algebraic regression
1 e 2 e3 (a21a32 a31a22 ) + 2! 1 + e 2 e 4 (a21a42 a41a22 ) + 2! 1 + e3 e 4 (a31a42 a41a32 ) R ( A) A = G 2,4 2! +
§ n · § 4· ¨ ¸ = ¨ ¸ subdeterminants of A: © m¹ © 2¹ Grassmann coordinates ( Plücker coordinates). Output: “meet” g = 1, G y = I 4 , m = 2, n = 4, n m = 2
(a b) =
1 n=4 ¦ 2! i ,i , j , j 1 2
1
2
1 e j e j Hi i =1 2! 1
2
1 2 j1 j2
ai 1ai 1
2
2
1 e3 e 4 (a11a22 a21a12 ) + 4 1 + e 2 e 4 (a11a32 a31a22 ) + 4 1 + e3 e 2 (a11a42 a41a12 ) + 4 1 + e 4 e1 (a21a32 a31a22 ) + 4 1 + e3 e1 (a21a22 a41a22 ) + 4 1 + e1 e 2 (a31a42 a41a32 ) R ( A) A = G 2,4 4 =
(3.112)
§ n · § 4· ¨ ¸ = ¨ ¸ subdeterminants of A : © m¹ © 2¹ Grassmann coordinates (Plücker coordinates).
(e1 e 2 ) = e3 e 4 , (e1 e3 ) = e 2 e 4 , (e1 e 4 ) = e3 e 2 ,
(e 2 e3 ) = e 4 e1 , (e 2 e 4 ) = e3 e1 ,
(3.113)
(e3 e 4 ) = e1 e 2 . 3-33
From A to B: latent restrictions, Grassmann coordinates, Plücker coordinates.
Before we return to the matrix A R 4× 2 of our case study, let us analyze the matrix A R 2×3 of Box 3.22 for simplicity. In the perspective of the example of
159
3-3 Case study
our case study we may say that we have eliminated the third observation, but kept the leverage point. First, let us go through the routine to compute the hat matrices H x = ( A c A) 1 A c and H y = A( A c A) 1 A c , to be identified by (3.115) and (3.116). The corresponding estimations xˆ = x A (I -LESS) , (3.116), and y = y A (I -LESS) , (3.118), prove the different weights of the observations ( y1 , y2 , y3 ) influencing xˆ1 and xˆ2 as well as ( yˆ1 , yˆ 2 , yˆ3 ) . Notice the great weight of the leverage point t3 = 10 on yˆ 3 . Second, let us interpret the redundancy matrix R = I 3 A( AcA) 1 Ac , in particular the diagonal elements. r11 =
A cA (1)
=
A cA (2) A cA (3) 64 81 1 = = , r22 = , r33 = , 146 det AcA 146 det AcA 146
det A cA n =3 1 tr R = ¦ (AcA)(i ) = n rk A = n m = 1, det A cA i =1
the degrees of freedom of the I 3 -LESS problem. There, for the first time, we meet the subdeterminants ( A cA )( i ) which are generated in a two step procedure. “First step” eliminate the ith row from A as well as the ith column of A.
“Second step” compute the determinant A c( i ) A ( i ) .
Example : ( A cA)1 1 1 1 2 1 10
A c(1) A (1) 1 1
1
1 2 10
2
( A cA )(1) = det A c(1) A (1) = 64 det A cA = 146 12
12 104 Example: ( AcA) 2
A c( 2) A ( 2)
1 1 1 2 1 10
1 1
2
1
1 2 10
11
11 101
( A cA )(2) = det A c(2) A (2) = 81 det A cA = 146
160
3 The second problem of algebraic regression
Example: ( AcA)3
A c(3) A (3) 1 1
1 1 1 2 1 10
1
2 3
1 2 10
3 5
( A cA )(3) = det A c(3) A (3) = 1 det A cA = 146
Obviously, the partial redundancies (r11 , r22 , r33 ) are associated with the influence of the observation y1, y2 or y3 on the total degree of freedom. Here the observation y1 and y2 had the greatest contribution, the observation y3 at a leverage point a very small influence. The redundancy matrix R, properly analyzed, will lead us to the latent restrictions or “from A to B”. Third, we introduce the rank partitioning R = [ B, C] , rk R = rk B = n m = 1, (3.120), of the matrix R of spatial redundancies. Here, b R 3×1 , (3.121), is normalized to generate b = b / || b || 2 , (3.122). Note, C R 3× 2 is a dimension identity. We already introduced the orthogonality condition bcA = 0 or bcAxA = bcyˆ = 0 (b )cA = 0
or
(b )cAxA = (b )cyˆ = 0,
which establishes the latent restrictions (3.127) 8 yˆ1 9 yˆ 2 + yˆ 3 = 0. We shall geometrically interpret this essential result as soon as possible. Fourth, we aim at identifying R ( A) and R ( A) A for the linear model {Ax + i = y, A R n ×m , rk A = m = 2} ª1º wy y y y « » t1 := = [e1 , e 2 , e3 ] «1» , wx1 «¬1»¼ ª1 º wy y y y « t2 := = [e1 , e 2 , e3 ] « 2 »» , wx 2 «¬10 »¼ as derivatives of the observation functional y = f (x1 , x 2 ) establish the tangent vectors which span a linear manifold called Grassmann space. G 2,3 = span{t1 , t2 } R 3 ,
161
3-3 Case study
in short GRASSMANN (A). Such a notation becomes more obvious if we compute ª a11 x1 + a12 x2 º n =3 m = 2 y y y « y = [e1 , e 2 , e3 ] « a21 x1 + a22 x2 »» = ¦ ¦ eiy aij x j , «¬ a31 x1 + a32 x2 »¼ i =1 j =1 ª a11 º n =3 wy y y y « (x1 , x 2 ) = [e1 , e 2 , e3 ] « a21 »» = ¦ eiy ai1 wx1 «¬ a31 »¼ i =1 ª a12 º n =3 wy y y y « (x1 , x 2 ) = [e1 , e 2 , e3 ] « a22 »» = ¦ eiy ai2 . wx 2 «¬ a32 »¼ i =1 Indeed, the columns of the matrix A lay the foundation of GRASSMANN (A). Five, let us turn to GRASSMANN (B) which is based on the normal space R ( A) A . The normal vector n = t1 × t 2 = (t1 t 2 ) which spans GRASSMANN (B) is defined by the “cross product” identified by " , " , the skew product symbol as well as the Hodge star symbol. Alternatively, we are able to represent the normal vector n, (3.130), (3.132), (3.133), constituted by the columns {col1A, col2A} of the matrix, in terms of the Grassmann coordinates (Plücker coordinates). a a22 a a32 a a12 p23 = 21 = 8, p31 = 31 = 9, p12 = 11 = 1, a31 a32 a11 a12 a21 a22 identified as the subdeterminants of the matrix A, generated by n =3
¦ (e
i1 ,i2 =1
i1
ei )ai 1ai 2 . 2
1
2
If we normalize the vector b to b = b / || b ||2 and the vector n to n = n / || n ||2 , we are led to the first corollary b = n . The space spanned by the normal vector n, namely the linear manifold G1,3 R 3 defines GRASSMANN (B). In exterior calculus, the vector built on Grassmann coordinates (Plücker coordinates) is called Grassmann vector g or normalized Grassmann vector g*, here ª p23 º ª 8 º ª 8º g 1 « » « »
« » g := p31 = 9 , g := = 9 . « » « » & g & 2 146 « » «¬ p12 »¼ «¬ 1 »¼ «¬ 1 »¼ The second corollary identifies b = n = g .
162
3 The second problem of algebraic regression
“The vector b which constitutes the latent restriction (latent condition equation) coincides with the normalized normal vector n R ( A) A , an element of the space R ( A) A , which is normal to the column space R ( A) of the matrix A. The vector b is built on the Grassmann coordinates (Plücker coordinates), [ p23 , p31 , p12 ]c , subdeterminant of vector g in agreement with b .” Box 3.22 Latent restrictions Grassmann coordinates (Plücker coordinates) the second example ª y1 º ª1 1 º ª1 1 º « y » = «1 2 » ª x1 º A = «1 2 » , rk A = 2 « 2» « » «x » « » «¬ y3 »¼ «¬1 10 »¼ ¬ 2 ¼ «¬1 10 »¼ (1st) H x = ( A cA ) 1 A c 1 ª 92 79 25º 146 «¬ 10 7 17 »¼
(3.115)
1 ª 92 y1 + 79 y2 25 y3 º 146 «¬ 10 y1 7 y2 + 17 y3 »¼
(3.116)
H x = ( AcA) 1 Ac = xˆ = x A (I LESS) =
(2nd) H y = A( A cA) 1 A c ª 82 72 8 º 1 « H y = ( A cA) Ac = 72 65 9 »» , rk H y = rk A = 2 146 « «¬ 8 9 145»¼
(3.117)
ª82 y1 + 72 y2 8 y3 º 1 « yˆ = y A (I LESS) = 72 y1 + 65 y2 + 3 y3 »» 146 « «¬ 8 y1 + 9 y2 + 145 y3 »¼
(3.118)
1
yˆ 3 =
1 (8 y1 + 9 y2 + 145 y3 ) 146
(3rd) R = I 3 A( A cA ) 1 Ac
(3.119)
163
3-3 Case study
R = I 3 A( A cA) 1 Ac =
r11 =
ª 64 72 8 º 1 « 72 81 9 »» « 146 «¬ 8 9 1 »¼
(3.120)
A cA (1) A cA (2) A cA (3) 64 81 1 = , r22 = = , r33 = = 146 det A cA 146 det A cA 146 det A cA tr R =
n =3 1 ( A cA)(i ) = n rk A = n m = 1 ¦ det A cA i =1
latent restriction 8º ª 64 72 1 « R = [B, C] = 72 81 9 » , rk R = 1 « » 146 «¬ 8 9 1»¼ b :=
ª 64 º ª 0.438 º 1 « 72 »» = «« 0.493»» « 146 «¬ 8 »¼ «¬ 0.053 »¼
(3.120)
(3.121)
ª 8º ª 0.662 º b 1 « » « b := = 9 » = « 0.745 »» « &b& 146 «¬ 1 »¼ «¬ 0.083 »¼
(3.122)
(3.123)
bcA = 0 ( b )cA = 0
(3.124)
(3.125)
bcyˆ = 0 (b )cyˆ = 0
(3.126)
8 yˆ1 9 yˆ 2 + yˆ 3 = 0
(3.127)
" R (A) and R ( A) A : tangent space Tx M 2 versus normalspace N x M 2 , 3 Grassmann manifold G m2,3 R 3 versus Grassmann manifold G1,3 nm R "
ª1º wy y y y « » = [e1 , e 2 , e 3 ] 1 “the first tangent vector”: t1 := « » wx1 «¬1»¼
(3.128)
164
3 The second problem of algebraic regression
“the second tangent vector”: t 2 :=
ª1 º wy = [e1y , e 2y , e 3y ] « 2 » « » wx2 «¬10»¼
(3.129)
“ Gm,n ” G 2,3 = span{t1 , t 2 } R 3 : Grassmann ( A ) “the normal vector” n := t1 × t 2 = ( t1 t 2 ) n =3
n =3
t1 = ¦ ei ai 1 i =1
1
¦ee
i1 ,i2 =1
i1 i2
i =1
ai 1ai 2 = 1
t1 = ¦ ei ai
and
1
n =3
n=
(3.130)
2
2
2
(3.131)
2
n =3
¦ (e
i1 ,i2 =1
i1
ei )ai 1ai 2
1
2
2
(3.132)
i, i1 , i2 {1," , n = 3}
versus
n= (3.133)
n=
= e 2 × e3 (a21a32 a31a22 )
= (e 2 × e3 )( a21a32 a31a22 ) +
+e3 × e1 (a31a12 a11a32 )
+ (e3 × e1 )(a31a12 a11a32 ) + (3.134)
+e1 × e 2 (a11a22 a21a12 )
+ (e1 × e 2 )( a11a22 a21a12 )
Hodge star operator :
ª (e 2 e 3 ) = e 2 × e 3 = e1 « (e e ) = e × e = e 3 1 2 « 3 1 «¬ (e1 e 2 ) = e1 × e 2 = e 3
(3.135)
ª8 º n = t1 × t 2 = ( t1 × t 2 ) = [e , e , e ] « 9 » « » «¬1 »¼
(3.136)
ª8 º n 1 « » y y y n := = [e1 , e 2 , e3 ] 9 || n || 146 « » «¬1 »¼
(3.137)
y 1
y 2
y 3
Corollary: b = n “Grassmann manifold G n m ,n “
165
3-3 Case study
G1,3 = span n R 3 : Grassmann(B) Grassmann coordinates (Plücker coordinates) ª1 1º a a22 a31 a32 a11 a12 A = « 1 2 » , g ( A ) := { 21 , , }= « » a31 a32 a11 a12 a21 a22 «¬10 10»¼ 1 2 1 10 1 1 ={ , , } = {8, 9,1} 1 10 1 1 1 2
(3.138)
(cyclic order) g ( A) = { p23 , p31 , p12 } p23 = 8, p31 = 9, p12 = 1 ª p23 º ª8 º Grassmann vector : g := «« p31 »» = «« 9 »» «¬ p12 »¼ ¬«1 ¼»
(3.139)
ª8 º g 1 « » = 9 normalized Grassmann vector: g := || g || 146 « » «¬1 »¼
(3.140)
Corollary : b = n = g .
(3.141)
Now we are prepared to analyze the matrix A R 2× 4 of our case study. Box 3.23 outlines first the redundancy matrix R R 2× 4 (3.142) used for computing the inconsistency coordinates iˆ4 = i4 (I LESS) , in particular. Again it is proven that the leverage point t4=10 has little influence on this fourth coordinate of the inconsistency vector. The diagonal elements (r11, r22, r33, r44) of the redundancy matrix are of focal interest. As partial redundancy numbers (3.148), (3.149), (3.150) and (3.151) r11 =
AA (1) AA ( 2) AA (3) AA ( 4) 57 67 73 3 = , r22 = = , r33 = = , r44 = = , det A cA 100 det A cA 100 det A cA 100 det A cA 100 they sum up to tr R =
n=4 1 ¦ (AcA)(i ) = n rk A = n m = 2 , det A cA i =1
the degree of freedom of the I 4 -LESS problem. Here for the second time we meet the subdeterminants ( A cA )( i ) which are generated in a two-step procedure.
166
3 The second problem of algebraic regression
“First step”
“Second step”
eliminate the ith row from A as well as the ith column of Ac .
compute the determinant of A c( i ) A ( i )
Box 3.23 Redundancy matrix of a linear model of a uninvariant polynomial of degree one - light leverage point a=10 “Redundancy matrix R = (I 4 A( A cA) 1 A c) ” ª 57 37 31 11 º « » 1 « 37 67 29 1 » I 4 A( AcA) 1 Ac = 100 « 31 29 73 13» « » 1 13 3 »¼ «¬ 11
(3.142)
iˆ4 = i4 (I -LESS) = Ry
(3.143)
1 iˆ4 = i4 (I -LESS) = (11 y1 y2 13 y3 + 3 y4 ) 100
(3.144)
r11 =
57 67 73 3 , r22 = , r33 = , r44 = 100 100 100 100 “rank partitioning”
(3.145)
R R 4×4 , rk R = n rk A = n m = 2, B R 4×2 , C R 4×2 R = I 4 A( A cA) 1 A c = [B, C]
(3.146)
ª 57 37 º « 67 » 1 « 37 » , then BcA = 0 ” “ if B := 100 « 31 29 » « » ¬ 11 1 ¼
(3.147)
A cA (1) A cA ( 2 ) , r22 = det A cA det A cA c A A (3) A cA ( 4 ) r33 = (3.150) , r44 = det A cA det A cA n =4 1 tr R = ¦ (AcA)(i ) = n rk A = n m = 2 det A cA i =1
(3.148)
r11 =
(3.149) (3.151) (3.152)
167
3-3 Case study
Example: ( A cA )(1) 1 1
A c(1) A (1)
1 2
1 3 1 10 1 1 1
1
1 2 3 10
3
( A cA )(1) =det ( A c(1) A (1) ) =114 det A cA = 200
15
15 113
Example: ( A cA)( 2) 1 1
A c( 2) A ( 2)
1 2
1 3 1 10 1 1 1
1
1 2 3 10
3
( A cA)( 2) =det ( A c( 2) A ( 2) ) =134 det A cA = 200
14
14 110
Example: ( A cA)(3) 1 1
A c(3) A (3)
1 2
1 3 1 10 1 1 1
1
1 2 3 10
3
( A cA)(3) =det ( A c(3) A (3) ) =146 det A cA = 200
13
13 105
Example: ( A cA)( 4) 1 1
A c( 2) A ( 2)
1 2
1 3 1 10 1 1 1
1
1 2 3 10
3
6
6 10
( A cA)( 4) =det ( A c( 4) A ( 4) ) =6 det A cA = 200
168
3 The second problem of algebraic regression
Again, the partial redundancies (r11 ," , r44 ) are associated with the influence of the observation y1, y2, y3 or y4 on the total degree of freedom. Here the observations y1, y2 and y3 had the greatest influence, in contrast the observation y4 at the leverage point a very small impact. The redundancy matrix R will be properly analyzed in order to supply us with the latent restrictions or the details of “from A to B”. The rank partitioning R = [B, C], rk R = rk B = n m = 2 , leads us to (3.22) of the matrix R of partial redundancies. Here, B R 4× 2 , with two column vectors is established. Note C R 4×2 is a dimension identity. We already introduced the orthogonality conditions in (3.22) BcA = 0 or BcAxA = Bcy A = 0 , which establish the two latent conditions 57 37 31 11 yˆ1 yˆ 2 yˆ 3 + yˆ 4 = 0 100 100 100 100 37 67 29 1 yˆ1 + yˆ 2 yˆ 3 yˆ 4 = 0. 100 100 100 100 Let us identify in the context of this paragraph R( A) and R ( A) A for the linear model {Ax + i := y , A R n× m , rk A = m = 2} . The derivatives ª1º ª1 º «1» «2» wy wy t1 := = [e1y , e 2y , e 3y , e 4y ] « » , t 2 := = [e1y , e 2y , e 3y , e 4y ] « » , «1» «3» wx1 wx 2 « » « » ¬1¼ ¬10¼ of the observational functional y = f (x1 , x 2 ) generate the tangent vectors which span a linear manifold called Grassmann space G 2,4 = span{t1 , t 2 } R 4 , in short GRASSMANN (A). An illustration of such a linear manifold is ª a11 x1 + a12 x2 º « a x + a x » n=4 m=2 y = [e1y , e 2y , e3y , e 4y ] « 21 1 22 2 » = ¦ ¦ eiy aij x j , « a31 x1 + a32 x2 » i =1 j =1 « » ¬« a41 x1 + a42 x2 ¼»
169
3-3 Case study
ª a11 º «a » n=4 wy y y y y « 21 » ( x1 , x2 ) = [e1 , e 2 , e3 , e 4 ] = ¦ eiy ai1 , « » a31 wx1 i =1 « » ¬« a41 ¼» ª a12 º «a » n=4 wy 22 ( x1 , x2 ) = [e1y , e 2y , e3y , e 4y ] « » = ¦ eiy ai 2 . « a32 » i =1 wx2 « » «¬ a42 »¼ Box 3.24 Latent restrictions Grassmann coordinates (Plücker coordinates) the first example (3.153)
BcA = 0 Bcy = 0
(3.154)
(3.155)
ª1 1 º ª 57 37 º «1 2 » « » » B = 1 « 37 67 » A=« «1 3 » 100 « 31 29 » « » « » 1 »¼ «¬1 10 »¼ «¬ 11
(3.156)
“ latent restriction” 57 yˆ1 37 yˆ 2 31yˆ 3 + 11yˆ 4 = 0
(3.157)
37 yˆ1 + 67 yˆ 2 29 yˆ 3 yˆ 4 = 0
(3.158)
“ R( A) : the tangent space Tx M 2 the Grassmann manifold G 2,4 ” ª1º «» wy y y y y «1» [e1 , e 2 , e3 , e 4 ] “the first tangent vector”: t1 := «1» wx1 «» «¬1»¼
(3.159)
ª1 º « » wy y y y y « 2 » [e1 , e 2 , e3 , e 4 ] “the second tangent vector”: t 2 := «3 » wx 2 « » ¬«10 ¼»
(3.160)
170
3 The second problem of algebraic regression
G 2,4 = span{t1 , t 2 } R 4 : Grassmann ( A ) “the first normal vector”: n1 :=
b1 || b1 ||
(3.161)
|| b1 ||2 = 104 (572 + 372 + 312 + 112 ) = 57 102
(3.162)
ª 0.755 º « 0.490» » n1 = [e1y , e 2y , e 3y , e 4y ] « « 0.411» « » ¬ 0.146¼
(3.163)
“the second normal vector”: n 2 :=
b2 || b 2 ||
(3.164)
|| b 2 ||2 = 104 (37 2 + 67 2 + 292 + 12 ) = 67 102
(3.165)
ª 0.452 º « »
y y y y « 0.819 » n 2 = [e1 , e 2 , e3 , e 4 ] « 0.354 » « » ¬« 0.012 ¼»
(3.166)
Grassmann coordinates (Plücker coordinates) ª1 1 º «1 2 » » g ( A) := °® 1 1 , 1 1 , 1 1 , 1 2 , 1 2 , 1 3 °½¾ = A=« «1 3 » °¯ 1 2 1 3 1 10 1 3 1 10 1 10 °¿ « » ¬1 10 ¼ = { p12 , p13 , p14 , p23 , p24 , p34 } (3.167) p12 = 1, p13 = 2, p14 = 9, p23 = 1, p24 = 8, p34 = 7. Again, the columns of the matrix A lay the foundation of GRASSMANN (A). Next we turn to GRASSMANN (B) to be identified as the normal space R ( A) A . The normal vectors ªb11 º ªb21 º «b » «b » n1 = [e1y , e 2y , e3y , e 4y ] « 21 » ÷ || col1 B ||, n 2 = [e1y , e 2y , e3y , e 4y ] « 22 » ÷ || col2 B || «b31 » «b32 » « » « » ¬«b41 ¼» ¬«b42 ¼» are computed from the normalized column vectors of the matrix B = [b1 , b 2 ] .
171
3-3 Case study
The normal vectors {n1 , n 2 } span the normal space R ( A) A , also called GRASSMANN(B). Alternatively, we may substitute the normal vectors n1 and n 2 by the Grassmann coordinates (Plücker coordinates) of the matrix A, namely by the Grassmann column vector. p12 =
1 1 1 1 1 1 = 1, p13 = = 2, p14 = =9 1 2 1 3 1 10
p23 =
1 2 1 2 1 3 = 1, p24 = = 8, p34 = =7 1 3 1 10 1 10 n = 4, m = 2, n–m = 2
n=4
¦
i1 ,i2 =1
1 n=4 ¦ e j e j H i ,i , j , j ai 1ai 2! i ,i , j , j =1
(ei ei )ai 1ai 2 = 1
2
1
2
1
1 2
1
2
1 2
1
2
1
2
2
2
ª p12 º ª1 º «p » « » « 13 » « 2 » « p14 » «9 » g := « » = « » R 6×1 . « p23 » «1 » « p » «8 » « 24 » « » «¬ p34 »¼ ¬«7 ¼» ?How do the vectors {b1, b2},{n1, n2} and g relate to each other? Earlier we already normalized, {b1 , b 2 } to {b1 , b 2 }, when we constructed {n1 , n 2 } . Then we are left with the question how to relate {b1 , b 2 } and {n1 , n 2 } to the Grassmann column vector g. The elements of the Grassmann column vector g(A) associated with matrix A are the Grassmann coordinates (Plücker coordinates){ p12 , p13 , p14 , p23 , p24 , p34 } in lexicographical order. They originate from the dual exterior form D m = E n m where D m is the original m-exterior form associated with the matrix A. n = 4, n–m = 2
D 2 :=
1 n=4 ¦ ei ei ai ai = 2! i i =1 1
2
1
2
1, 2
1 1 e1 e 2 (a11a22 a21a12 ) + e1 e3 ( a11a32 a31a22 ) + 2! 2! 1 1 + e1 e 4 (a11a42 a41a12 ) + e 2 e3 ( a21a32 a31a22 ) + 2! 2! 1 1 + e 2 e 4 (a21a42 a41a22 ) + e3 e 4 (a31a42 a41a32 ) 2! 2!
=
172
3 The second problem of algebraic regression
E := D 2 (R 4 ) =
1 n=4 ¦ e j e j Hi i 4 i i , j , j =1 1
1, 2
1
2
1 2 j1 j2
ai 1ai 2 = 1
1
2
1 1 1 e3 e 4 p12 + e 2 e 4 p13 + e3 e 2 p14 + 4 4 4 1 1 1 + e 4 e1 p23 + e3 e1 p24 + e1 e 2 p34 . 4 4 4 =
The Grassmann coordinates (Plücker coordinates) { p12 , p13 , p14 , p23 , p24 , p34 } refer to the basis {e3 e 4 , e 2 e 4 , e3 e 2 , e 4 e1 , e3 e1 , e1 e 2 } . Indeed the Grassmann space G 2,4 spanned by such a basis can be alternatively covered by the chart generated by the column vectors of the matrix B,
J 2 :=
n=4
¦e
j1
e j b j b j GRASSMANN(Ǻ), 2
1
2
j1 , j2
a result which is independent of the normalisation of {b j 1 , b j 2 } . 1
2
As a summary of the result of the two examples (i) A \ 3× 2 and (ii) A \ 4× 2 for a general rectangular matrix A \ n × m , n > m, rkA = m is needed. “The matrix B constitutes the latent restrictions also called latent condition equations. The column space R (B) of the matrix B coincides with complementary column space R ( A) A orthogonal to column space R ( A) of the matrix A. The elements of the matrix B are the Grassmann coordinates, also called Plücker coordinates, special sub determinants of the matrix A = [a i1 ," , a im ] p j j := 1 2
n
¦
i1 ," , im =1
Hi "i 1
m j1 " jn-m
ai 1 "ai 1
mm
.
The latent restrictions control the parameter adjustment in the sense of identifying outliers or blunders in observational data.” 3-34 From B to A: latent parametric equations, dual Grassmann coordinates, dual Plücker coordinates While in the previous paragraph we started from a given matrix A \ n ×m , n > m, rk A = m representing a special inconsistent systems of linear equations y=Ax+i, namely in order to construct the orthogonal complement R ( A) A of R ( A) , we now reverse the problem. Let us assume that a matrix B \ A× n , A < n , rk B = A is given which represents a special inconsistent system of linear homogeneous condition equations Bcy = Bci . How can we construct the orthogonal complement R ( A) A of R (B) and how can we relate the elements of R (B) A to the matrix A of parametric adjustment?
173
3-3 Case study
First, let us depart from the orthogonality condition BcA = 0 or A cB = 0 we already introduced and discussed at length. Such an orthogonality condition had been the result of the orthogonality of the vectors y A = yˆ (LESS) and i A = ˆi (LESS) . We recall the general condition of the homogeneous matrix equation. BcA = 0 A = [I A B(BcB) 1 Bc]Z , which is, of course, not unique since the matrix Z \ A× A is left undetermined. Such a result is typical for an orthogonality conditions. Second, let us construct the Grassmann space G A ,n , in short GRASSMANN (B) as well as the Grassmann space G n A , n , in short GRASSMANN (A) representing R (B) and R (B) A , respectively. 1 n JA = (3.168) ¦ e j " e j b j 1"b j A A ! j " j =1 A
1
G n A := J A =
A
1
A
1
1 (n A)! i ,", i
nA
1
n
¦
1 ei " ei H i "i =1 A ! nA
1
, j1 ," , jA
1
n A j1 " jA
b j 1 "b j A . A
1
The exterior form J A which is built on the column vectors {b j 1 ," , b j A } of the matrix B \ A× n is an element of the column space R (B) . Its dual exterior form
J = G nA , in contrast, is an element of the orthogonal complement R (B) A . A
1
q i "i 1
nA
:= Hi "i 1
n A j1 " jA
b j 1"b j A
(3.169)
A
1
denote the Grassmann coordinates (Plücker coordinates) which are dual to the Grassmann coordinates (Plücker coordinates) p j " j . q := [ q i … q n A ] is constituted by subdeterminants of the matrix B, while p := [ p j … p n m ] by subdeterminants of the matrix A. 1
nm
1
1
The (D, E, J, G) -diagram of Figure 3.8 is commutative. If R (B) = R ( A) A , then R (B) A = R ( A) . Identify A = n m in order to convince yourself about the (D, E, J, G) - diagram to be commutative.
G n A = J A
id ( A = n m )
Dm
JA
id ( A = n m )
E n m = D m
Figure 3.8: Commutative diagram D m o D m = En-m = J n-m o J n-m = En-m =
D m = (1) m ( n-m ) D m
174
3 The second problem of algebraic regression
Third, let us specialize R ( A) = R (B) A and R ( A) A = R (B) by A = n - m . D m o D m = En-m = J n-m o J n-m = En-m =
D m = (1) m ( n-m ) D m
(3.170)
The first and second example will be our candidates for test computations of the diagram of Figure 3.8 to be commutative. Box 3.25 reviews direct and inverse Grassmann coordinates (Plücker coordinates) for A \ 3× 2 , B \ 3×1 , Box 3.26 for A \ 4× 2 , B \ 4× 2 . Box 3.25 Direct and inverse Grassmann coordinates (Plücker coordinates) first example The forward computation ª1 1 º n =3 n =3 A = ««1 2 »» \ 3×2 : a1 = ¦ ei ai 1 and a 2 = ¦ ei ai i =1 i =1 «¬1 10 »¼ n =3 1 D 2 := ¦ ei ei ai 1ai 2 ȁ 2 (\ 3 ) ȁ m ( \ n ) 2! i ,i =1 1
1
2
1
1
2
1
2
2
2
2
1 2
E1 := D 2 :=
n =3
1 e j Hi i j ai 1ai 2 ȁ 2 (\ 3 ) ȁ m ( \ n ) 2! i ,i , j =1
¦
1 2
1
12 1
1
2
1
Grassmann coordinates (Plücker coordinates) 1 1 1 E1 = e1 p23 + e 2 p31 + e3 p12 2 2 2 p23 = a21 a32 a31 a22 , p31 = a31 a12 a11 a32 , p12 = a11 a22 a21 a12 , p23 = 8, p31 = 9, p12 = 1 The backward computation J1 :=
n =3
¦
1 e j Hi i j ai 1 ai 2 = e1 p23 + e 2 p31 + e3 p12 ȁ1 (\ 3 ) 1! 1
i1 ,i2 , j1 =1
12 1
G 2 := J1 := G2 = G2 =
1 2!
1 2
1 2!
1
2
n =3
¦
ei ei H i i j H j 1
2
12 1
2 j3 j1
a j 1a j 2
3
2
i1 ,i2 , j1 , j2 , j3 =1
n =3
¦
e i e i (G i j G i 1
2
1 2
2 j3
Gi j Gi j ) a j 1 a j 1 3
2 2
2
3
2
i1 ,i2 , j1 , j2 , j3 =1
n =3
¦e
i1
ei ai 1 ai 2 = D 2 ȁ 2 (\ 3 ) ȁ m ( \ n ) 2
1
2
i1 ,i2 =1
inverse Grassmann coordinates (dual Grassmann coordinates, dual Plücker coordinates)
175
3-3 Case study G2 = D 2 =
1 1 e 2 e 3 ( a 21 a 32 a 31 a 22 ) + e 3 e1 ( a 31 a12 a11 a 32 ) + 2 2 1 + e1 e 2 ( a11 a 22 a 21 a12 ) 2
G2 = D 2 =
1 1 1 e 2 e3 q23 + e 2 e3 q31 + e 2 e3 q12 ȁ 2 (\ 3 ) . 2 2 2 Box 3.26
Direct and inverse Grassmann coordinates (Plücker coordinates) second example The forward computation ª1 1 º A = «1 2 » \ 4× 2 : a1 = «1 3 » «¬1 10 »¼
n=4
e i ai 1 ¦ i =1 1
1
and a 2 =
1
n=4
e i ai 2 ¦ i =1 2
2
2
n=4
D 2 :=
1 ei ei ai 1ai 2 ȁ 2 (\ 4 ) ȁ m (\ n ) 2! =1
¦
i1 ,i2
1
2
1
2
1 n=4 1 ¦ e j e j Hi i j j ai 1ai 2 ȁ 2 (\ 4 ) ȁ n-m (\ n ) 2! i ,i , j , j =1 2! 1 1 1 E2 = e3 e 4 p12 + e 2 e 4 p13 + e3 e 2 p14 + 4 4 4 1 1 1 + e 4 e1 p23 + e3 e1 p24 + e1 e 2 p34 4 4 4 p12 = 1, p13 = 2, p14 = 9, p23 = 1, p34 = 7
E2 := D 2 :=
1
1 2
1
2
12 1 2
1
2
2
The backward computation J 2 :=
1 n=4 ¦ e j e j Hi i 2! i ,i , j , j =1 1
1 2
1
G 2 := J 2 :=
1 2 j1 j2
2
ai 1ai 2 ȁ 2 (\ 4 ) ȁ n-m (\ n ) 1
2
2
n=4 1 ¦ 2! i ,i , j , j , j , j 1 2
1
2
3
1 ei ei H i i =1 2! 1
4
1 2 j1 j2
2
Hj j
1 2 j3 j4
a j 1a j 2 = 3
4
= D 2 ȁ 2 (\ 4 ) ȁ m (\ n ) G2 = D 2 =
1 n=4 ¦ ei ei ai 1ai 4 i ,i =1 1
2
1
2
2
1 2
1 1 1 e3 e 4 q12 + e 2 e 4 q13 + e3 e 2 q14 + 4 4 4 1 1 1 + e 4 e1q23 + e3 e1q24 + e1 e 2 q34 4 4 4 q12 = p12 ,q13 = p13 ,q14 = p14 ,q23 = p23 ,q24 = p24 ,q34 = p34 . G2 = D 2 =
176 3-35
3 The second problem of algebraic regression
Break points
Throughout the analysis of high leverage points and outliers within the observational data we did assume a fixed linear model. In reality such an assumption does not apply. The functional model may change with time as Figure 3.9 indicates. Indeed we have to break-up the linear model into pieces. Break points have to be introduced as those points when the linear model changes. Of course, a hypothesis test has to decide whether the break point exists with a certain probability. Here we only highlight the notion of break points in the context of leverage points. For localizing break points we apply the Gauss-Jacobi Combinatorial Algorithm following J. L. Awange (2002), A. T. Hornoch (1950), S. Wellisch (1910). Figure 3.9:
Figure 3.10:
Graph of the function y(t), two
Gauss-Jacobi Combinatorial
break points
Algorithm, piecewise linear model, 1st cluster : ( ti , t j )
Figure 3.11:
Figure 3.12:
Gauss-Jacobi Combinatorial Algorithm, 2nd cluster : ( ti , t j )
Gauss-Jacobi Combinatorial Algorithm, 3rd cluster : ( ti , t j ).
177
3-3 Case study
Table 3.1: Test “ break points” observations for a piecewise linear model y1 y2 y3 y4 y5 y6 y7 y8 y9 y10
y 1 2 2 3 2 1 0.5 2 4 4.5
t 1 2 3 4 5 6 7 8 9 10
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
Table 3.1 summarises a set of observations yi with n=10 elements. Those measurements have been taken at time instants {t1 ," , t10 } . Figure 3.9 illustrates the graph of the corresponding function y(t). By means of the celebrated Gauss-Jacobi Combinatorial Algorithm we aim at localizing break points. First, outlined in Box 3.27 we determine all the combinations of two points which allow the fit of a straight line without any approximation error. As a determined linear model y = Ax , A \ 2× 2 , r k A = 2 namely x = A 1 y we calculate (3.172) x1 and (1.173) x 2 in a closed form. For instance, the pair of observations ( y1 , y2 ) , in short (1, 2) at (t1 , t2 ) = (1, 2) determines ( x1 , x2 ) = (0,1) . Alternatively, the pair of observations ( y1 , y3 ) , in short (1, 3), at (t1, t3) = (1, 3) leads us to (x1, x2) = (0.5, 0.5). Table 3.2 contains the possible 45 combinations which determine ( x1 , x2 ) from ( y1 ," , y10 ) . Those solutions are plotted in Figure 3.10, 3.11 and 3.12. Box 3.27 Piecewise linear model Gauss-Jacobi combinatorial algorithm 1st step ª y (ti ) º ª1 ti º ª x1 º y=« »=« » « » = Ax ¬ y (t j ) ¼ ¬1 t j ¼ ¬ x2 ¼
i < j {1," , n} (3.171)
y R 2 , A R 2× 2 , rk A = 2, x R 2 ªx º 1 x = A 1y « 1 » = x t ¬ 2¼ j ti x1 =
t j y1 ti y2 t j ti
and
ª t j ti º ª y (ti ) º « 1 1 » « y (t ) » ¬ ¼¬ j ¼ x2 =
y j yi t j ti
.
(3.172)
(3.173)
178
3 The second problem of algebraic regression
Example:
ti = t1 = 1, t j = t2 = 2 y (t1 ) = y1 = 1, y (t2 ) = y2 = 2 x1 = 0, x2 = 1.
Example:
ti = t1 = 1, t j = t3 = 3 y (t1 ) = y1 = 1, y (t3 ) = y3 = 2 x1 = 0.5 and Table 3.2
x2 = 0.5 .
179
3-3 Case study
Second, we introduce the pullback operation G y o G x . The matrix of the metric G y of the observation space Y is pulled back to generate by (3.174) the matrix of the metric G x of the parameter space X for the “determined linear model” y = Ax , A R 2× 2 , rk A = 2 , namely G x = A cG y A . If the observation space Y = span{e1y ,e2y} is spanned by two orthonormal vectors e1y ,e 2y relating to a pair of observations (yi, yj), i<j, i, j {1," ,10} , then the matrix of the metric G y = I 2 of the observation space is the unit matrix. In such an experimental situation (3.175) G x = A cA is derived. For the first example (ti, tj)=(1, 2) we are led to vech G x = [2,3,5]c . “Vech half” shortens the matrix of the metric G x R 2× 2 of the parameter space X ( x1 , x2 ) by stacking the columns of the lower triangle of the symmetric matrix G x . Similarly, for the second example (ti, tj)=(1,3) we produce vech G x = [2, 4,10]c . For all the 45 combinations of observations (yi, yj). In the last column Table 3.2 contains the necessary information of the matrix of the metric G x of the parameter space X formed by (vech G x )c . Indeed, such a notation is quite economical. Box 3.28 Piecewise linear model: Gauss-Jacobi combinatorial algorithm 2nd step pullback of the matrix G X the metric from G y G x = A cG y A
(3.174)
“if G y = I 2 , then G x = A cA ” ª 2 G x = A cA = « ¬ ti + t j
ti + t j º ti2 + t 2j »¼
i < j {1," , n}.
(3.175)
Example: ti = t1 = 1 , t j = t2 = 2 ª 2 3º Gx = « » , vech G x = [2,3,5]c. ¬ 3 5¼ Example: ti = t1 = 1 , t j = t3 = 3 ª2 4 º Gx = « » , vech G x = [2, 4,10]c . ¬ 4 10¼ Third, we are left the problem to identify the break points. C.F. Gauss (1828) and C.G.J. Jacobi (1841) have proposed to take the weighted arithmetic mean of the combinatorial solutions (x1,x2)(1,2), (x1,x2)(1,3), in general (x1,x2)(i,j), i<j, are considered as Pseudo-observations.
180
3 The second problem of algebraic regression
Box 3.29 Piecewise linear model: Gauss-Jacobi combinatorial algorithm 3rd step pseudo-observations Example ª x1(1,2) º ª1 « (1,2) » « « x2 » = «0 « x1(1,3) » «1 « (1,3) » « ¬« x2 ¼» ¬«0
0º ª i1 º » 1 » ª x1 º ««i2 »» + 0 » «¬ x2 »¼ «i3 » « » » 1 ¼» ¬«i4 ¼»
\ 4×1
(3.176)
G x -LESS ª x1(1,2) º « (1,2) » ª xˆ1 º (1,2) (1,3) 1 (1,2) (1,3) « x2 x A := xˆ = « » = ¬ªG x + G x º¼ ª¬G x , G x º¼ (1,3) » « x1 » ¬ xˆ2 ¼ « (1,3) » «¬ x2 »¼
\ 2×1 (3.177)
) vech G (1,2 = [2, 3,5]c , vech G(1,3) = [2, 4,10]c x x
ª 2 3º ) G (1,2 =« x », ¬ 3 5¼
ª2 4 º G(1,3) =« x » ¬ 4 10¼
ª x1(1,2) º ª0º « (1,2) » = « » , ¬ x2 ¼ ¬ 1 ¼
ª x1(1,3) º ª0.5º « (1,3) » = « » ¬ x2 ¼ ¬0.5¼
1
ª4 7 º 1 ª 15 7 º (1,2) (1,3) 1 G =« = « » » = [G x + G x ] 7 15 7 4 11 ¬ ¼ ¬ ¼ 1 x
ª xˆ º 1 ª6º 6 x A := « 1 » = « » xˆ1 = xˆ2 = = 0.545, 454 . ˆ 11 ¬ x2 ¼ 11 ¬6¼ Box 3.29 provides us with an example. For establishing the third step of the Gauss-Jacobi Combinatorial Algorithm. We outline G X LESS for the set of pseudo-observations (3.176) ( x1 , x2 )(1,2) and ( x1 , x2 )(1,3) solved by (3.177) and G (1,3) representing the metric of the paxA = ( xˆ1 , xˆ2 ) . The matrices G(1,2) x x rameter space X derived from ( x1 , x2 )(1,2 ) and ( x1 , x2 )(1,3) are additively composed and inverted, a result which is motivated by the special design matrix A = [I 2 , I 2 ]c of “direct” pseudo-observations. As soon as we implement the ) weight matrices G (1,2 and G (1,3) from Table 3.2 as well as ( x1 , x2 )(1,2 ) and x x (1,3) we are led to the weighted arithmetic mean xˆ1 = xˆ2 = 6 /11 . Such a ( x1 , x2 ) result has to be compared with the componentwise median x1 ( median) = 1/4 and x2 ( median) = 3/4.
181
3-3 Case study
(1, 2), (1,3) ª (1, 2), (1,3) « combination combination « median « G x LESS « xˆ1 = 0.545, 454 x1 (median) = 0.250 « «¬ xˆ2 = 0.545, 454 x2 (median) = 0.750. Here, the arithmetic mean of x1(1,2) , x1(1,3) and x2(1,2) , x2(1,3) coincides with the median neglecting the weight of the pseudo-observations. Box 3.30 Piecewise linear models and two break points “Example”
ª y1 º ª1n « y2 » = « 0 « » « ¬ y3 ¼ «¬ 0
0 1n 0
tn 0 0
1
1
0 tn 0
2
ª x1 º « » 0 º « x2 » ª i y » x « 0 » « 3 » + «i y « x4 » t n »¼ « x » «¬i y 5 «x » ¬« 6 ¼»
0 0 1n
2
1
2
3
3
3
º » » »¼
(3.178)
I n -LESS, I n -LESS, I n -LESS, 1
2
3
ª(t cn t n )(1cn y n ) (1cn t n )(t cn y n ª x1 º 1 «x » = c 2 « ¬ 2 ¼ A n1t n t n (1cn t n ) «¬ (1cn t n )(1cn y1 ) + n1t cn y1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
)º » »¼
(3.179)
ª(t cn t n )(1cn y n ) (1cn t n )(t cn y n ) º ª x3 º 1 » «x » = 2 « ¬ 4 ¼ A n2 t cn t n (1cn t n ) «¬ (1cn t n )(1cn y 2 ) + n2 t cn y 2 ¼»
(3.180)
ª(t cn t n )(1cn y n ) (1cn t n )(t cn y n ) º ª x5 º 1 = « ». « » 2 ¬ x6 ¼ A n3t cn t n (1cn t n ) «¬ (1cn t n )(1cn y 3 ) + n3t cn y 3 ¼»
(3.181)
2
2
2
2
2
2
2
3
3
3
3
2
3
2
3
3
3
2
2
2
2
3
3
2
2
2
3
3
3
3
3
3
Box 3.31 Piecewise linear models and two break points “ Example” 1st interval: n = 4, m = 2, t {t1 , t2 , t3 , t4 } ª1 «1 y1 = « «1 « ¬1
t1 º t2 » ª x1 º » + i y = 1n x1 + t n x2 + i y t3 » «¬ x2 »¼ » t4 ¼ 1
1
(3.182)
182
3 The second problem of algebraic regression
2nd interval: n = 4, m = 2, t {t4 , t5 , t6 , t7 } ª1 «1 y2 = « «1 « ¬1
t4 º t5 » ª x 3 º » + i = 1n x3 + t n x4 + i y t6 » «¬ x4 »¼ y » t7 ¼ 2
2
(3.183)
3rd interval: n = 4, m = 2, t {t7 , t8 , t9 , t10 } ª1 «1 y3 = « «1 « ¬1
t7 º t8 » ª x5 º » + i = 1n x5 + t n x6 + i y . t9 » «¬ x6 »¼ y » t10 ¼ 3
3
(3.184)
Figure 3.10, 3.11 and 3.12 have illustrated the three clusters of combinatorial solutions referring to the first, second and third straight line. Outlined in Box 3.30 and Box 3.31, namely by (3.178) to (3.181), n1 = n2 = n3 = 4 , we have computed ( x1 , x2 ) A for the first segment, ( x3 , x4 ) A for the second segment and ( x5 , x6 ) A for the third segment of the least squares fit of the straight line. Table 3.3 contains the results explicitly. Similarly, by means of the Gauss-Jacobi Combinatorial Algorithm of Table 3.4 we have computed the identical solution ( x1 , x2 ) A , ( x3 , x4 ) A and ( x5 , x6 ) A as “weighted arithmetic means” numerically presented only for the first segment. Table 3.3 I n -LESS solutions for those segments of a straight line, two break points ªx º ª 0.5º I 4 -LESS : « 1 » = « » : y (t ) = 0.5 + 0.6t ¬ x2 ¼ A ¬0.6 ¼ ªx º 1 ª126 º I 4 -LESS : « 3 » = « » : y (t ) = 6.30 0.85t ¬ x4 ¼ A 20 ¬ 17 ¼ ªx º 1 ª 183º I 4 -LESS : « 5 » = « » : y (t ) = 9.15 + 1.40t . ¬ x6 ¼ A 20 ¬ 28 ¼
Table 3.4 Gauss-Jacobi Combinatorial Algorithm for the first segment of a straight line,
183
3-3 Case study
ª xˆ1 º (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) 1 « xˆ » = ª¬G x + G x + G x + G x + G x + G x º¼ ¬ 2¼ ª x1(1,2) º « (1,2) » « x2 » « x1(1,3) » « (1,3) » « x2 » « x (1,4) » « 1(1,4) » x ª¬G (1,2) º¼ «« 2(2,3) »» , G (1,3) , G (1,4) , G (2,3) , G(2,4) , G(3,4) x x x x x x x « 1(2,3) » « x2 » « x (2,4) » « 1 » « x2(2,4) » « (3,4) » « x1 » «¬« x2(3,4) »¼» 1 ª 15 5º ) ) ) 1 ª¬G (1,2 º¼ = + G (1,3) + G (1,4 + G (x2,3) + G (x2,4 ) + G (3,4 x x x x 30 «¬ 5 2 »¼ ª x1(1,2) º ª 2 3º ª 0º ª 3º G (1,2) « (1,2) » = « x »« » = « » ¬ x2 ¼ ¬ 3 5¼ ¬1¼ ¬5¼ ª x1(1,3) º ª 2 4 º ª 0.5º ª 3º G (1,3) « (1,3) » = « x »« » = « » ¬ x2 ¼ ¬ 4 10¼ ¬ 0.5¼ ¬ 7¼ ª x1(1,4) º ª 2 5 º ª 0.333º ª 4 º G (1,4) « (1,4) » = « x »« »=« » ¬ x2 ¼ ¬ 5 17 ¼ ¬ 0.666¼ ¬13¼ ª x1(2,3) º ª 2 5 º ª 2 º ª 4 º G (2,3) « (2,3) » = « x »« » = « » ¬ x2 ¼ ¬ 5 13¼ ¬ 0 ¼ ¬10¼ ª x1(2,4) º ª 2 6 º ª 1 º ª 5 º G (2,4) « (2,4) » = « x »« » = « » ¬ x2 ¼ ¬ 6 20¼ ¬ 0.5¼ ¬16¼ ª x1(3,4) º ª 2 7 º ª 1º ª 5 º G (3,4) « (3,4) » = « x »« » = « » ¬ x2 ¼ ¬ 7 25¼ ¬ 1 ¼ ¬18¼ (3,4) ª x1(1,2) º º ª 24 º (3,4) ª x1 " G (1,2) + + G « (1,2) » « (3,4) » = « » x x ¬ x2 ¼ ¬ x2 ¼ ¬ 69 ¼
ª xˆ1 º 1 ª15 5º ª 24 º ª 0.5º « xˆ » = 30 « 5 2 » « 69 » = « 0.6» . ¬ ¼¬ ¼ ¬ ¼ ¬ 2¼
(3.185)
184
3-4 Special linear and nonlinear models
3-4 Special linear and nonlinear models: A family of means for direct observations In case of direct observations, LESS of the inconsistent linear model y = 1n x + i has led to x A = arg{y = 1n x + i || i ||2 = min} , x A = (1c1) 11cy =
1 ( y1 + " + yn ) . n
Such a mean has been the starting point of many alternatives we present to you by Table 3.5 based upon S.R. Wassel´s (2002) review. Table 3.5 A family of means Name
arithmetic mean
Formula xA =
1 ( y1 + " + yn ) n
1 ( y1 + y2 ) 2 x A = (1cG y 1) 11cG y y
n = 2 : xA =
if G y = Diag( g1 ," , g1 ) weighted arithmetic mean
geometric mean
then xA =
g1 y1 + g n yn g1 + " + g n
xg =
n
§ n · y1 " yn = ¨ yi ¸ © i =1 ¹
n = 2 : xg = logarithmic mean
1
n
y1 y2
1 (ln y1 + " + ln yn ) n = ln xg
xlog = xlog
y(1) < " < y( n ) ordered set of observations
median
y( k +1) if n = 2k + 1 ª « " add " med y = « «[ y( k ) + y( k +1) ] / 2 if n = 2k « " even " ¬
3-5 A historical note
Wassel´s family of means
185 xp =
( y1 ) p +1 + " + ( yn ) p +1 ( y1 ) p + " + ( yn ) p n
x p = ¦ ( yi ) p +1 i =1
n
¦(y )
p
i
i =1
Case p=0: x p = xA Case p= 1/ 2 , n=2: x p = xA Hellenic mean
Case p= –1: n=2: 1
§ y 1 + y21 · 2 y1 y2 H = H ( y1 , y2 ) = ¨ 1 . ¸ = 2 y1 + y2 © ¹
3-5 A historical note on C.F. Gauss, A.M. Legendre and the inventions of Least Squares and its generalization The historian S.M. Stigler (1999, pp 320, 330-331) made the following comments on the history of Least Squares. “The method of least squares is the automobile of modern statistical analysis: despite its limitations, occasional accidents, and incidental pollution, this method and its numerous variations, extensions, and related conveyances carry the bulk of statistical analyses, and are known and valued by nearly all. But there has been some dispute, historically, as to who is the Henry Ford of statistics. Adrian Marie Legendre published the method in 1805, an American, Robert Adrian, published the method in late 1808 or early 1809, and Carl Fiedrich Gauss published the method in 1809. Legendre appears to have discovered the method in early 1805, and Robert Adrain may have “discovered” it in Legendre’s 1805 book (Stigler 1977c, 1978c), but in 1809 Gauss had the temerity to claim that he had been using the method since 1795, and one of the most famous priority disputes in the history of science was off and running. It is unnecessary to repeat the details of the dispute – R.L. Plackett (1972) has done a masterly job of presenting and summarizing the evidence in the case. Let us grant, then, that Gauss’s later accurate were substantially accunate, and that he did device the method of least squares between 1794 and 1799, independently of Legendre or any other discoverer. There still remains the question, what importance did he attach to the discovery? Here the answer must be that while Gauss himself may have felt the method useful, he was unsuccessful in communicating its importance to other before 1805. He may indeed have mentioned the method to Olbers, Lindemau, or von Zach before 1805, but in the total lack of applications by others, despite ample opportunity, suggests the message was not understood. The fault may have been more in the listener than in the teller, but in this case its failure serves only to enhance our admiration for Legendre’s 1805 success. For Legendre’s description of the method had an immediate and
186
3 The second problem of algebraic regression widespread effect – as we have seen, it even caught the eye and understanding of at least one of those astronomers (Lindemau) who had been deaf to Gauss’s message, and perhaps it also had an influence upon the form and emphasis of Gauss’s exposition of the method. When Gauss did publish on least squares, he went far beyond Legendre in both conceptual and technical development, linking the method to probability and providing algorithms for the computation of estimates. His work has been discussed often, including by H.L. Seal (1967), L. Eisenhart (1968), H. Goldsteine (1977, §§ 4.9, 4.10), D.A. Sprott (1978),O.S. Sheynin (1979, 1993, 1994), S.M. Stigler (1986), J.L. Chabert (1989), W.C. Waterhouse (1990), G.W. Stewart (1995), and J. Dutka (1996). Gauss’s development had to wait a long time before finding an appreciative audience, and much was intertwined with other’s work, notably Laplace’s. Gauss was the first among mathematicians of the age, but it was Legendre who crystallized the idea in a form that caught the mathematical public’s eye. Just as the automobile was not the product of one man of genius, so too the method of least squares is due to many, including at least two independent discoverers. Gauss may well have been the first of these, but he was no Henry Ford of statistics. If these was any single scientist who first put the method within the reach of the common man, it was Legendre.”
Indeed, these is not much to be added. G.W. Stewart (1995) recently translated the original Gauss text “Theoria Combinationis Observationum Erroribus Minimis Obmaxial, Pars Prior. Pars Posterior “ from the Latin origin into English. F. Pukelsheim (1998) critically reviewed the sources, the reset Latin text and the quality of the translation. Since the English translation appeared in the SIAM series “ Classics in Applied Mathematics”, he concluded: “ Opera Gaussii contra SIAM defensa”. “Calculus probilitatis contra La Place defenses.” This is Gauss’s famous diary entry of 17 June 1798 that he later quoted to defend priority on the Method of Least Squares (Werke, Band X, 1, p.533). C.F. Gauss goes Internet With the Internet Address http://gallica.bnf.fr you may reach the catalogues of digital texts of Bibliotheque Nationale de France. Fill the window “Auteur” by “Carl Friedrich Gauss” and you reach “Types de documents”. Continue with “Touts les documents” and click “Rechercher” where you find 35 documents numbered 1 to 35. In total, 12732 “Gauss pages” are available. Only the GaussGerling correspondence is missing. The origin of all texts are the resources of the Library of the Ecole Polytechnique. Meanwhile Gauss’s Werke are also available under http://www.sub.unigoettingen.de/. A CD-Rom is available from “Niedersächsische Staats- und Universitätsbibliothek.” For the early impact of the Method of Least Squares on Geodesy, namely W. Jordan, we refer to the documentary by S. Nobre and M. Teixeira (2000).
4
The second problem of probabilistic regression – special Gauss-Markov model without datum defect – Setup of BLUUE for the moments of first order and of BIQUUE for the central moment of second order : Fast track reading : Read only Theorem 4.3 and Theorem 4.13.
Lemma 4.2 ȟˆ : Ȉ y -BLUUE of ȟ
Definition 4.1 ˆȟ : Ȉ -BLUUE of ȟ y
Theorem 4.3 ȟˆ : Ȉ y -BLUUE of ȟ
Lemma 4.4 E{yˆ }, Ȉ y -BLUUE of E{y} e y , D{e y }, D{y}
“The first guideline of chapter four: definition, lemmas and theorem” In 1823, supplemented in 1828, C. F. Gauss put forward a new substantial generalization of “least squares” pointing out that an integral measure of loss, more definitely the principle of minimum variance, was preferable to least squares and to maximum likelihood. He abandoned both his previous postulates and set high store by the formula Vˆ 2 which provided an unbiased estimate of variance V 2 . C. F. Gauss’s contributions to the treatment of erroneous observations, lateron extended by F. R. Helmert, defined the state of the classical theory of errors. To the analyst C. F. Gauss’s preference to estimators of type BLUUE (Best Linear Uniformly Unbiased Estimator) for the moments of first order as well as of type BIQUUE (Best Invariant Quadratic Uniformly Unbiased Estimator) for the moments of second order is completely unknown. Extended by A. A. Markov who added correlated observations to the Gauss unbiased minimum variance
188
4 The second problem of probabilistic regression
estimator we present to you BLUUE of fixed effects and Ȉ y -BIQUUE of the variance component. “The second guideline of chapter four: definitions, lemmas, corollaries and theorems”
Theorem 4.5 equivalence of Ȉ y -BLUUE and G y -LESS Corollary 4.6 multinomial inverse
Definition 4.7 invariant quadratic estimation Vˆ 2 of V 2 : IQE
Lemma 4.8 invariant quadratic estimation Vˆ 2 of V 2 : IQE
Definition 4.9 variance-covariance components model Vˆ k IQE of V k
Lemma 4.10 invariant quadratic estimation Vˆ k of V k : IQE eigenspace
Definition 4.11 invariant quadratic unformly unbiased estimation: IQUUE
Lemma 4.12 invariant quadratic unformly unbiased estimation: IQUUE Lemma 4.13 var-cov components: IQUUE
Corollary 4.14 translational invariance
189
4 The second problem of probabilistic regression
Corollary 4.15 IQUUE of Helmert type: HIQUUE Corollary 4.16 Helmert equation det H z 0 Corollary 4.17 Helmert equation det H = 0
Definition 4.18 best IQUUE
Corollary 4.19 Gauss normal IQE
Lemma 4.20 Best IQUUE
Theorem 4.21 2 Vˆ BIQUUE of V
In the third chapter we have solved a special algebraic regression problem, namely the inversion of a system of inconsistent linear equations of full column rank classified as “overdetermined”. By means of the postulate of a least squares solution || i ||2 =|| y Ax ||2 = min we were able to determine m unknowns from n observations ( n > m : more equations n than unknowns m). Though “LESS” generated a unique solution to the “overdetermined” system of linear equations with full column rank, we are unable to classify “LESS”. There are two key questions we were not able to answer so far: In view of “MINOS” versus “LUMBE” we want to know whether “LESS” produces an unbiased estimation or not. How can we attach to an objective accuracy measure “LESS”?
190
4 The second problem of probabilistic regression
The key for evaluating “LESS” is handed over to us by treating the special algebraic regression problem by means of a special probabilistic regression problem, namely a special Gauss-Markov model without datum defect. We shall prove that uniformly unbiased estimations of the unknown parameters of type “fixed effects” exist. “LESS” is replaced by “BLUUE” (Best Linear Uniformly Unbiased Estimation). The fixed effects constitute the moments of first order of the underlying probability distributions of the observations to be specified. In contrast, its central moments of second order, known as the variance-covariance matrix or dispersion matrix, open the door to associate to the estimated fixed effects an objective accuracy measure. ? What is a probabilistic problem ? By means of certain statistical objective function, here of type “best linear uniformly unbiased estimation” (BLUUE) for moments of first order
“best quadratic invariant uniformly unbiased estimation” (BIQUUE) for the central moments of second order
we solve the inverse problem of linear, lateron nonlinear equations with fixed effects which relates stochastic observations to parameters. According to the Measurement Axiom, observations are elements of a probability space. In terms of second order statistics the observation space Y of integer dimension, dim Y = n , is characterized by the first moment E{y} , the expectation of y {Y, pdf }
and
the central second moment D{y} the dispersion matrix or variance-covariance matrix Ȉ y .
In the case of “fixed effects” we consider the parameter space Ȅ , dim Ȅ = m , to be metrical. Its metric is induced from the probabilistic measure of the metric, the variance-covariance matrix Ȉ y of the observations y {Y, pdf } . In particular, its variance-covariance matrix is pulled-back from the variancecovariance matrix Ȉ y . In the special probabilistic regression model with unknown “fixed effects” ȟ Ȅ (elements of the parameter space) are estimated while the random variables like y E{y} are predicted.
4-1 Introduction Our introduction has four targets. First, we want to introduce Pˆ , a linear estimation of the mean value of “direct” observations, and Vˆ 2 , a quadratic estimation of their variance component. For such a simple linear model we outline the postulates of uniform unbiasedness and of minimum variance. We shall pay special
4-1 Introduction
191
attention to the key role of the invariant quadratic estimation (“IQE”) Vˆ 2 of V 2 . Second, we intend to analyse two data sets, the second one containing an outlier, by comparing the arithmetic mean and the median as well as the “root mean square error” (r.m.s.) of type BIQUUE and the “median absolute deviation” (m.a.d.). By proper choice of the bias term we succeed to prove identity of the weighted arithmetic mean and the median for the data set corrupted by an obvious outlier. Third, we discuss the competitive estimator “MALE”, namely Maximum Likelihood Estimation which does not produce an unbiased estimation Vˆ 2 of V 2 , in general. Fourth, in order to develop the best quadratic uniformly unbiased estimation Vˆ 2 of V 2 , we have to highlight the need for fourth order statistic. “IQE” as well as “IQUUE” depend on the central moments of fourth order which are reduced to central moments of second order if we assume “quasi-normal distributed” observations. 4-11
The front page example
By means of Table 4.1 let us introduce a set of “direct” measurements yi , i {1, 2, 3, 4, 5} of length data. We shall outline how we can compute the arithmetic mean 13.0 as well as the standard deviation of 1.6. Table 4.1: “direct” observations, comparison of mean and median (mean y = 13, med y = 13, [n / 2] = 2, [n / 2]+1 = 3, med y = y (3) , mad y = med| y ( i ) med y | = 1, r.m.s. (I-BIQUUE) = 1.6) number of observation
observation
1 2 3 4 5
15 12 14 11 13
yi
difference of difference of observation observation and mean and median
+2 -1 +1 -2 0
+2 -1 +1 -2 0
ordered set ordered set of ordered set of of observa- | y ( i ) med y | tions y( i ) y( i ) mean y
11 12 13 14 15
0 1 1 2 2
+2 -1 +1 -1 0
In contrast, Table 4.2 presents an augmented observation vector: The observations six is an outlier. Again we have computed the new arithmetic mean 30.16 as well as the standard deviation 42.1. In addition, for both examples we have calculated the sample median and the sample absolute deviation for comparison. All definitions will be given in the context as well as a careful analysis of the two data sets. Table 4.2: “direct” observations, effect of one outlier (mean y = 30.16 , med y = (13+14) / 2 = 13.5, r.m.s. (I-BLUUE) = 42.1, med y ( i ) med y = mad y = 1.5)
192
4 The second problem of probabilistic regression
number of
observation
observation
yi
1 2 3 4 5 6 4-12
difference of difference of
ordered set ordered set of
observation
observation
of observa-
and mean
and median
tions
15.16 18.16 16.16 19.16 17.16 +85.83
+1.5 -1.5 +0.5 -2.5 -0.5 +102.5
15 12 14 11 13 116
ordered set
y( i ) med y
y( i )
11 12 13 14 15 116
of y( i ) mean y
0.5 0.5 1.5 1.5 2.5 +102.5
15.16 16.16 17.16 18.16 19.16 +85.83
Estimators of type BLUUE and BIQUUE of the front page example
In terms of a special Gauss-Markov model our data set can be described as following. The statistical moment of first order, namely the expectation E{y} = 1P of the observation vector y R n , here n = 5, and the central statistical moment of second order, namely the variance-covariance matrix Ȉ y , also called the dispersion matrix D{y} = I nV 2 , D{y} =: Ȉ y R n×n , rk Ȉ y = n, of the observation vector y R n , with the variance V 2 characterize the stochastic linear model. The mean P R of the “direct” observations and the variance factor V 2 are unknown. We shall estimate ( P , V 2 ) by means of three postulates:
•
first postulate: Pˆ : linear estimation, Vˆ 2 : quadratic estimation n
Pˆ = ¦ l p y p
or
Pˆ = l cy
or
Vˆ 2 = y cMy = (y c
y c)(vec M ) = (vecM )c(y
y )
p =1
Vˆ 2 =
n
¦m
p , q =1
•
pq
y p yq
the second postulate: uniform unbiasedness E{Pˆ } = P for all P R E{Vˆ 2 } = V 2 for all V 2 R +
•
the third postulate: minimum variance
D{Pˆ } = E{[ Pˆ E{Pˆ }]2 } = min A
and
D{Vˆ 2 } = E{[Vˆ 2 E{Vˆ 2 }]2 } = min
Pˆ = arg min D{Pˆ | Pˆ = A cy, E{Pˆ } = P} A
Vˆ 2 = arg min D{Vˆ 2 | Vˆ 2 = y cMy, E{Vˆ 2 } = V 2 } . M
M
4-1 Introduction
193
First, we begin with the postulate that the fixed unknown parameters ( P , V 2 ) are estimated by means of a certain linear form Pˆ = A cy + N = y cA + N and by means of a certain quadratic form Vˆ 2 = y cMy + xcy + Z = (vec M )c(y
y ) + + xcy + Z of the observation vector y, subject to the symmetry condition M SYM := {M R n× n | M = M c}, namely the space of symmetric matrices. Second we demand E{Pˆ } = P , E{Vˆ 2 } = V 2 , namely unbiasedness of the estimations ( Pˆ , Vˆ 2 ) . Since the estimators ( Pˆ , Vˆ 2 ) are special forms of the observation vector y R n , an intuitive understanding of the postulate of unbiasedness is the following: If the dimension of the observation space Y y , dim Y = n , is going to infinity, we expect information about the “two values” ( P , V 2 ) , namely lim Pˆ (n) = P , lim Vˆ 2 ( n) = V 2 .
nof
nof
Let us investigate how LUUE (Linear Uniformly Unbiased Estimation) of P as well as IQUUE (Invariant Quadratic Uniformly Unbiased Estimation) operate. LUUE E{Pˆ } = E{A cy + N } = A cE{y} + N º » E{y} = 1n P ¼ E{Pˆ } = A cE{y} + N = A c1n P + N E{Pˆ } = P N = 0, (A c1n 1) P = 0 N = 0, A c1n 1 = 0 for all P R. Indeed Pˆ is LUUE if and only if N = 0 and (A c1n 1) P = 0 for all P R. The zero identity (A c1n 1) P = 0 is fulfilled by means of A c1n 1 = 0, A c1n = 1, if we restrict the solution by the quantor “ for all P R ”. P = 0 is not an admissible solution. Such a situation is described as “uniformly unbiased”. We summarize that LUUE is constrained by the zero identity A c1n 1 = 0 . Next we shall prove that Vˆ 2 is IQUUE if and only if IQUUE E{Vˆ 2 } = E{y cMy + xcy + Z } = E{(vec M )c( y
y ) + xcy + Z} = (vec M )c E{y
y} + xcE{y} + Z
E{Vˆ 2 } = E{y cMy + xcy + Z } = E{(y c
y c)(vec M )c + y cx + Z} = E{y c
y c}(vec M )c + E{y c}x + Z .
Vˆ 2 is called translational invariant with respect to y 6 y E{y} if Vˆ 2 = y cMy + xcy + Z = (y E{y})cM (y E{y}) + xc( y E{y}) + Z and uniformly unbiased if
194
4 The second problem of probabilistic regression
E{Vˆ 2 } = (vec M )c E{y
y} + xcE{y} + Z = V 2 for all V 2 R + . Finally we have to discuss the postulate of a best estimator of type BLUUE of P and BIQUUE of V 2 . We proceed sequentially, first we determine Pˆ of type BLUUE and second Vˆ 2 of type BIQUUE. At the end we shall discuss simultaneous estimation of ( Pˆ , Vˆ 2 ) . The scalar Pˆ = A cy is BLUUE of P (Best Linear Uniformly Unbiased Estimation) with respect to the linear model E{y} = 1n P , D{y} = I nV 2 , if it is uniformly unbiased in the sense of E{Pˆ } = P for all P R and in comparison of all linear, uniformly unbiased estimations possesses the smallest variance in the sense of D{Pˆ } = E{[ Pˆ E{Pˆ }]2 } =
V 2 A cA = V 2 tr A cA = V 2 || A ||2 = min . The constrained Lagrangean L (A, O ) , namely
L (A, O ) := V 2 A cA + 2O (A c1n 1) = = V 2 A cA + 2(1n A 1)O = min, A ,O
produces by means of the first derivatives 1 wL ˆ ˆ (A, O ) =V 2 Aˆ +1n Oˆ = 0 2 wA 1 wL ˆ ˆ ˆ (A, O ) = A c1n 1= 0 2 wO the normal equations for the augmented unknown vector (A, O ) , also known as the necessary conditions for obtaining an optimum. Transpose the first normal equation, right multiply by 1n , the unit column and substitute the second normal equation in order to solve for the Lagrange multiplier Oˆ . If we substitute the solution Oˆ in the first normal equation, we directly find the linear operator Aˆ .
V 2 Aˆ c + 1cn Oˆ = 0c V 2 Aˆ c1 n + 1 cn 1 n Oˆ = V
2
+ 1 cn 1 n Oˆ = 0
V2 V2 Oˆ = = n 1cn1n 2
V = 0c V 2 Aˆ +1n Oˆ =V 2 lˆ 1n n
1 1 Aˆ = 1n and Pˆ = Aˆ cy = 1cn y . n n
4-1 Introduction
195
The second derivatives 1 w 2L ˆ ˆ ( A , O ) = V 2 I n > 0c 2 wAwA c constitute the sufficiency condition which is automatically satisfied. The theory of vector differentiation is presented in detail in Appendix B. Let us briefly summarize the first result Pˆ BLUUE of P . The scalar Pˆ = A cy is BLUUE of P with respect to the linear model E{y}= 1n P , D{y}= I nV 2 , if and only if 1 1 Aˆ c = 1cn and Pˆ = 1cn y n n is the arithmetic mean. The observation space y{Y, pdf } is decomposed into y (BLUUE):= 1n Pˆ 1 y (BLUUE) = 1n 1cn y n
versus versus
e y (BLUUE):= y y (BLUUE), 1 e y (BLUUE) =[I n (1n 1cn )]y, n
which are orthogonal in the sense of e y (BLUUE) y (BLUUE) = 0
or
1 1 [I n (1n1cn )] (1n1cn ) = 0. n n
Before we continue with the setup of the Lagrangean which guarantees BIQUUE, we study beforehand e y := y E{y} and e y (BLUUE):= y y (BLUUE) . Indeed the residual vector e y (BLUUE) is a linear form of residual vector e y . 1 e y (BLUUE) =[I n (1n1cn )] e y . n For the proof we depart from 1 e y (BLUUE):= y 1n Pˆ =[I n (1n1cn )]y n 1 =[I n (1n1cn )]( y E{y}) n 1 = I n (1n1cn ) , n where we have used the invariance property y 6 y E{y} based upon the idempotence of the matrix [I n (1n1cn ) / n] . Based upon the fundamental relation e y (BLUUE) = De y , where D:= I n (1n1cn ) / n is a projection operator onto the normal space R (1n ) A , we are able to derive an unbiased estimation of the variance component V 2 . Just compute
196
4 The second problem of probabilistic regression
E{ecy ( BLUUE )e y ( BLUUE )} = = tr E{e y (BLUUE)ecy (BLUUE)} = = tr D E{e y ecy }Dc = V 2 tr D Dc = V 2 tr D tr D = tr ( I n ) tr 1n (1n1cn ) = n 1 E{ecy (BLUUE)e y (BLUUE)} = V 2 ( n 1) . Let us define the quadratic estimator Vˆ 2 of V 2 by
Vˆ 2 =
1 ecy (BLUUE)e y (BLUUE) , n 1
which is unbiased according to E{Vˆ 2 } =
1 E{ecy (BLUUE)e y (BLUUE)} = V 2 . n 1
Let us briefly summarize the first result Vˆ 2 IQUUE of V 2 . The scalar Vˆ 2 = ecy (BLUUE)e y (BLUUE) /( n 1) is IQUUE of V 2 based upon the BLUUE-residual vector e y (BLUUE) = ª¬I n 1n (1n1cn ) º¼ y . Let us highlight Vˆ 2 BIQUUE of V 2 . A scalar Vˆ 2 is BIQUUE of V 2 (Best Invariant Quadratic Uniformly Unbiased Estimation) with respect to the linear model E{y} = 1n P , D{y} = I nV 2 , if it is (i)
uniformly unbiased in the sense of E{Vˆ 2 } = V 2 for all V 2 \ + ,
(ii) quadratic in the sense of Vˆ 2 = y cMy for all M = M c , (iii) translational invariant in the sense of y cMy = (y E{y})cM ( y E{y}) = ( y 1n P )cM ( y 1n P ) , (iv) best if it possesses the smallest variance in the sense of D{Vˆ 2 } = E{[Vˆ 2 E{Vˆ 2 }]2 } = min . M
First, let us consider the most influential postulate of translational invariance of the quadratic estimation
Vˆ 2 = y cMy = (vec M )c(y
y ) = (y c
y c)(vec M) to comply with Vˆ 2 = ecy Me y = (vec M )c(e y
e y ) = (ecy
ecy )(vec M )
4-1 Introduction
197 subject to M SYM := {M \ n× n| M = M c} .
Translational invariance is understood as the action of transformation group y = E{y} + e y = 1n P + e y with respect to the linear model of “direct” observations. Under the action of such a transformation group the quadratic estimation Vˆ 2 of V 2 is specialized to
Vˆ 2 = y cMy = ª¬ E{y} + e y º¼c M ª¬ E{y} + e y º¼ = (1cn P + ecy )M (1n P + e y ) Vˆ 2 = P 2 1cn M1n + P 1cn Me y + P ecy M1n + ecy Me y y cMy = ecy Me y 1cn M = 0c and 1cn M c = 0c . IQE, namely 1cn M = 0c and 1cn M c = 0c has a definite consequence. It is independent of P , the first moment of the probability distribution (“pdf”). Indeed, the estimation procedure of the central second moment V 2 is decoupled from the estimation of the first moment P . Here we find the key role of the invariance principle. Another aspect is the general solution of the homogeneous equation 1cn M = 0c subject to the symmetry postulate M = M c . ªM = ªI n 1cn (1cn1n ) 11cn º Z ¬ ¼ 1cM = 0c « «¬ M = (I n 1n 1n1cn )Z , where Z equation takes an Z \ n× n
is an arbitrary matrix. The general solution of the homogeneous matrix contains the left inverse (generalized inverse (1cn 1n ) 1 1cn = 1-n ) which exceptionally simple form, here. The general form of the matrix is in no agreement with the symmetry postulate M = M c . 1cn M = 0c M = D (I n 1n 1n1cn ). M = Mc
Indeed, we made the choice Z = D I n which reduces the unknown parameter space to one dimension. Now by means of the postulate “best” under the constraint generated by “uniform inbiasedness” Vˆ 2 of V 2 we shall determine the parameter D = 1/(n 1) . The postulate IQUUE is materialized by ª E{Vˆ 2 } = V 2 º E{ecy Me y } = mij E{eiy e jy } » « + 2 2 2 Vˆ 2 = ecy Me y ¼» ¬« = mij S ij = V mij G ij = V V \ E{Vˆ 2 | Ȉ y = I nV 2 } = V 2 tr M = 1 tr M 1 = 0 .
198
4 The second problem of probabilistic regression
For the simple case of “i.i.d.” observations, namely Ȉ y = I nV 2 , E{Vˆ 2 } = V 2 for an IQE, IQUUE is equivalent to tr M = 1 or (tr M ) 1 = 0 as a condition equation.
tr M = 1
D tr(I n 1n 1n1cn ) = D (n 1) = 1 1 D= . n 1
IQUUE of the simple case invariance : (i ) 1cM = 0c and M = M cº 1 M= (I n 1n 1n1cn ) » QUUE : (ii ) tr M 1 = 0 n 1 ¼ has already solved our problem of generating the symmetric matrix M .
Vˆ 2 = y cMy =
1 y c(I n 1n 1n1cn )y IQUUE n 1
? Is there still a need to apply “best” as an optimability condition for BIQUUE ? Yes, there is! The general solution of the homogeneous equations 1cn M = 0c and M c1n = 0 generated by the postulate of translational invariance of IQE did not produce a symmetric matrix. Here we present the simple symmetrization. An alternative approach worked depart from 1 2
(M + M c) = 12 {[I n 1n (1cn1n ) 11cn ]Z + Zc[I n 1n (1cn1n ) 11cn ]} ,
leaving the general matrix Z as an unknown to be determined. Let us therefore develop BIQUUE for the linear model E{y} = 1n P , D{y} = I nV 2 D{Vˆ 2 } = E{(Vˆ 2 E{Vˆ 2 }) 2 } = E{Vˆ 4 } E{Vˆ 2 }2 . Apply the summation convention over repeated indices i, j , k , A {1,..., n}. 1st : E{Vˆ 2 }2 E{Vˆ 2 }2 = mij E{eiy e jy }mk A E{eky eAy } = mij mklS ijS k A subject to
S ij := E{e e } = V G ij and S k A := E{eky eAy } = V 2G k A y y i j
2
E{Vˆ 2 }2 = V 4 mijG ij mk AG k A = V 4 (tr M ) 2 2nd : E{Vˆ 4 } E{Vˆ 4 } = mij mk A E{eiy e jy eky eAy } = mij mk AS ijk A
4-1 Introduction
199 subject to
S ijk A := E{eiy e jy eky eAy } i, j , k , A {1,..., n} . For a normal pdf, the fourth order moment S ijk A can be reduced to second order moments. For a more detailed presentation of “normal models” we refer to Appendix D.
S ijk A = S ijS k A + S ik S jA + S iAS jk = V 4 (G ijG k A + G ik G j A + G iAG jk ) E{Vˆ 4 } = V 4 mij mk A (G ijG k A + G ik G jA + G iAG jk ) E{Vˆ 4 } = V 4 [(tr M ) 2 + 2 tr M cM ]. Let us briefly summarize the representation of the variance D{Vˆ 2 } = E{(Vˆ 2 E{Vˆ 2 }) 2 } for normal models. Let the linear model of i.i.d. direct observations be defined by E{y | pdf } = 1n P , D{y | pdf } = I nV 2 . The variance of a normal IQE can be represented by D{Vˆ 2 } := E{(Vˆ 2 E{Vˆ 2 }) 2 } = = 2V 4 [(tr M ) 2 + tr(M 2 )]. In order to construct BIQUUE, we shall define a constrained Lagrangean which takes into account the conditions of translational invariance, uniform unbiasedness and symmetry.
L (M, O0 , O1 , O 2 ) := 2 tr M cM + 2O0 (tr M 1) + 2O11cn M1 n + 2O 2 1cn M c1 n = min . M , O0 , O1 , O2
Here we used the condition of translational invariance in the special form 1cn 12 (M + M c)1 n = 0 1cn M1 n = 0 and 1cn M c1 n = 0 , which accounts for the symmetry of the unknown matrix. We here conclude with the normal equations for BIQUUE generated from wL wL wL wL = 0, = 0, = 0, = 0. w (vec M ) wO0 wO1 wO2 ª 2(I n
I n ) vec I n I n
1 n 1 n
I n º ª vec M º ª0 º « (vec I )c 0 0 0 »» «« O0 »» ««1 »» n « = . « I n
1cn 0 0 0 » « O1 » «0 » « »« » « » 0 0 0 ¼ ¬ O2 ¼ ¬0¼ ¬ 1cn
I n
200
4 The second problem of probabilistic regression
These normal equations will be solved lateron. Indeed M = (I n 1n 1 n1cn ) / ( n 1) is a solution. 1 y c(I n 1n 1n1cn )y n 1 2 D{Vˆ 2 } = V4 n 1
Vˆ 2 = BIQUUE:
Such a result is based upon (tr M ) 2 (BIQUUE) =
1 1 , (tr M 2 )(BIQUUE) = , n 1 n 1
D{Vˆ 2 | BIQUUE} = D{Vˆ 2 } = 2V 4 [(tr M ) 2 + (tr M 2 )](BIQUUE), D{Vˆ 2 } =
2 V 4. n 1
Finally, we are going to outline the simultaneous estimation of {P , V 2 } for the linear model of direct observations.
•
first postulate: inhomogeneous, multilinear (bilinear) estimation
Pˆ = N 1 + A c1y + mc1 (y
y ) Vˆ 2 = N 2 + A c2 y + (vec M 2 )c( y
y ) mc1 º ª y º ª Pˆ º ªN 1 º ª A c1 «Vˆ 2 » = «N » + « A c (vec M )c» « y
y » ¬ ¼ ¬ 2¼ ¬ 2 ¼ 2 ¼¬ m1c º ªN A c1 ª Pˆ º x = XY x := « 2 » , X = « 1 » ¬Vˆ ¼ ¬N 2 A c2 (vec M 2 )c¼ ª 1 º Y := «« y »» «¬ y
y »¼
•
second postulate: uniform unbiasedness ª Pˆ º ª P º E {x} = E{« 2 »} = « 2 » ¬Vˆ ¼ ¬V ¼
•
third postulate: minimum variance D{x} := tr E{ª¬ x E {x}º¼ ª¬ x E {x}º¼ c } = min .
4-1 Introduction 4-13
201
BLUUE and BIQUUE of the front page example, sample median, median absolute deviation
According to Table 4.1 and Table 4.2 we presented you with two sets of observations yi Y, dim Y = n, i {1,..., n} , the second one qualifies to certain “one outlier”. We aim at a definition of the median and of the median absolute deviation which is compared to the definition of the mean (weighted mean) and of the root mean square error. First we order the observations according to y(1) < y( 2) < ... < y( n1) < y( n ) by means of the permutation matrix ª y(1) º ª y1 º « y » « y2 » « (2) » « » « ... » = P « ... » , « y( n 1) » « yn 1 » « » «¬ yn »¼ ¬« y( n ) ¼» namely data set one ª11º ª 0 «12 » « 0 « » « «13» = « 0 « » « «14 » « 0 «¬15»¼ «¬1 ª0 «0 « P5 = « 0 « «0 «¬1
0 0 1 0º ª15º 1 0 0 0 »» ««12»» 0 0 0 1 » «14» »« » 0 1 0 0 » «11» 0 0 0 0»¼ «¬13»¼ 0 0 1 0º 1 0 0 0»» 0 0 0 1» » 0 1 0 0» 0 0 0 0»¼
versus
versus
data set two 0 0 1 0 0º ª 15 º 1 0 0 0 0»» «« 12 »» 0 0 0 1 0» « 14 » »« » 0 1 0 0 0» « 11 » 0 0 0 0 0» « 13 » »« » 0 0 0 0 1 »¼ «¬116»¼ ª0 0 0 1 0 0º «0 1 0 0 0 0» « » «0 0 0 0 1 0» P6 = « ». «0 0 1 0 0 0» «1 0 0 0 0 0 » « » «¬0 0 0 0 0 1 »¼
ª 11 º ª 0 « 12 » « 0 « » « « 13 » « 0 « »=« « 14 » « 0 « 15 » «1 « » « «¬116 »¼ «¬ 0
Note PP c = I , P 1 = P c . Second, we define the sample median med y as well as the median absolute deviation mad y of y Y by means of y([ n / 2]+1) if n is an odd number ª med y := « 1 ¬ 2 ( y( n / 2) + y( n / 2+1) ) if n is an even number mad y := med | y( i ) med y | , where [n/2] denotes the largest integer (“natural number”) d n / 2 .
202
4 The second problem of probabilistic regression
Table 4.3: “direct” observations, comparison two data sets by means of med y, mad y (I-LESS, G y -LESS), r.m.s. (I-BIQUUE) data set one
data set two
n = 5 (“odd”)
n = 6 (“even”)
n / 2 = 2.5, [n / 2] = 2
n / 2 = 3, n / 2 + 1 = 4
[n / 2] + 1 = 3 med y = y(3) = 13
med y = 13.5
mad y = 1
mad y = 1.5
mean y (I -LESS) = 13
mean y (I -LESS) = 30.16 weighted mean y (G y -LESS) = 13.5 24 G y = Diag(1,1,1,1,1, 1000 )
Pˆ (I -BLUUE ) = 13
Pˆ (I -BLUUE ) = 30.16
Vˆ 2 (I -BIQUUE ) = 2.5
Vˆ 2 (I-BIQUUE ) = 1770.1
r.m.s. (I -BIQUUE ) =
r.m.s. (I -BIQUUE ) =
Vˆ (I -BIQUUE ) = 1.6
Vˆ (I-BIQUUE ) = 42.1
Third, we compute I-LESS, namely mean y = (1c1) 1 y = 1n 1cy listed in Table 4.3. Obviously for the second observational data set the Euclidean metric of the observation space Y is not isotropic. Indeed let us compute G y -LESS, namely the weighted mean y = (1cG y 1) 1 1cG y y . A particular choice of the matrix G y \ 6×6 of the metric, also called “weight matrix”, is G y = Diag(1,1,1,1,1, x) such that y + y2 + y3 + y4 + y5 + y6 x , weighted mean y = 1 5+ x where x is the unknown weight of the extreme value (“outlier”) y6 . A special robust design of the weighted mean y is the median y , namely weighted mean y = med y such that x=
y1 + y2 + y3 + y4 + y5 5med y med y y6
4-1 Introduction
203 here x = 0.024, 390, 243
24 . 1000
Indeed the weighted mean with respect to the weight matrix G y = Diag(1,1,1, 1,1, 24 /1000) reproduces the median of the second data set. The extreme value has been down-weighted by a weight 24 /1000 approximately. Four, with respect to the simple linear model E{y} = 1P , D{y} = IV 2 we compute I-BLUUE of P and I-BIQUUE of V 2 , namely
Pˆ = (1c1) 11cy = 1n 1cy Vˆ 2 =
1 1 1 (y 1Pˆ )c(y 1Pˆ ) . y c ªI 1(1c1) 1 º¼ y = y c ªI 1 11cº y = n 1 ¬ n 1 ¬ n ¼ n 1
Obviously the extreme value y6 in the second data set has spoiled the specification of the simple linear model. The r.m.s. (I-BLUUE) = 1.6 of the first data set is increased to the r.m.s. (I-BIQUUE) = 42.1 of the second data set. Five, we setup the alternative linear model for the second data set, namely ª y1 º ª P1 º ª P º ª1º ª0º « y2 » « P » « P » «1» «0» « » « 1» « » « » « » y P P » 1 0 E{« 3 »} = « 1 » = « = « » P + « »Q « y4 » « P1 » « P » «1» «0» « y5 » « P1 » « P » «1» «0» « » « » « » «» «1 » «¬ y6 »¼ ¬ P 2 ¼ ¬ P + Q ¼ ¬1¼ ¬ ¼ ª A := 1, a \ 5×2 , 1 := 1,1,1,1,1,1 c \ 6×1 [ ] [ ] E{y} = Aȟ : « « 2×1 6×1 c c ¬ȟ := [ P ,Q ] \ , a := [ 0, 0, 0, 0, 0,1] \ ª1 «0 « 0 D{y} = « «0 «0 «0 ¬
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0º 0» » 0» 2 V \ 6×6 0» 0» 1 »¼
D{y} = I 6V 2 , V 2 \ + , adding to the observation y6 the bias term Q . Still we assume the variancecovariance matrix D{y} of the observation vector y \ 6×1 to be isotropic with one variance component as an unknown. ( Pˆ ,Qˆ ) is I 6 -BLUUE if
204
4 The second problem of probabilistic regression
ª Pˆ º 1 «Qˆ » = ( A cA) A cy ¬ ¼ ª Pˆ º ª 13 º «Qˆ » = «103» ¬ ¼ ¬ ¼
Pˆ = 13, Qˆ = 103, P1 = Pˆ = 13, yˆ 2 = 116 ª Pˆ º D{« »} = ( A cA) 1 V 2 ¬Qˆ ¼
V P2ˆ =
ª Pˆ º V 2 D{« »} = 5 ¬Qˆ ¼
ª 1 1º « 1 6 » ¬ ¼
V2 6 1 2 , V Q2ˆ = V 2 , V PQ ˆˆ = V 5 5 5 Vˆ 2 is I 6 -BIQUUE if
Vˆ 2 =
1 y c ªI 6 A ( A cA ) 1 A cº¼ y n rk A ¬
ª4 « 1 1 « 1 I 6 A ( A cA ) 1 A c = « 5 « 1 « 1 «0 ¬
1 4 1 1 1 0
1 1 4 1 1 0
1 1 1 4 1 0
1 1 1 1 4 0
0º 0» » 0» 0» 0» 5 »¼
§4 4 4 4 4 · ri := ª¬I 6 A( A cA) 1 A cº¼ = ¨ , , , , ,1¸ i {1,..., 6} ii ©5 5 5 5 5 ¹ are the redundancy numbers. y c(I 6 A( A cA) 1 A c) y = 13466 1 Vˆ 2 = 13466 = 3366.5, Vˆ = 58.02 4 3366.5 2 2 V Pˆ (Vˆ ) = = 673.3, V Pˆ (Vˆ ) = 26 5 6 V Q2ˆ (Vˆ 2 ) = 3366.5 = 4039.8, V Qˆ (Vˆ ) = 63.6 . 5 Indeed the r.m.s. value of the partial mean Pˆ as well as of the estimated bias Qˆ have changed the results remarkably, namely from r.m.s. (simple linear model) 42.1 to r.m.s. (linear model) 26. A r.m.s. value of the bias Qˆ in the order of 63.6 has been documented. Finally let us compute the empirical “error vector” l and is variance-covariance matrix by means of
4-1 Introduction
205
e y = ª¬I 6 A ( A cA ) 1 A cº¼ y , D{e y } = ª¬I 6 A( A cA) 1 A cº¼ V 2 , e y = [ 2 1 1 2 0 116]c ª4 « 1 « D l = « 1 1 « 1 «¬ 0
{}
1 4 1 1 1 0
1 1 4 1 1 0
ª4 « 1 « 2 D{l | Vˆ } = 673.3 « 1 1 « 1 «¬ 0 4-14
1 1 1 4 1 0 1 4 1 1 1 0
1 1 1 1 4 0 1 1 4 1 1 0
0º 0» 2 0» V 0» 5 0» 5 »¼ 1 1 1 1 1 1 4 1 1 4 0 0
0º 0» 0» . 0» 0» 5 »¼
Alternative estimation Maximum Likelihood (MALE)
Maximum Likelihood Estimation ("MALE") is a competitor to BLUUE of the first moments E{y} and to the BIQUUE of the second central moments D{y} of a random variable y
{Y, pdf } , which we like to present at least by an example. Maximum Likelihood Estimation :linear model: E{y} = 1n P , D{y} = I nV 2 "independent, identically normal distributed observations" [ y1 ,..., yn ]c "direct observations" unknown parameter:{P , V } {R, R + } =: X "simultaneous estimations of {P , V 2 } ". Given the above linear model of independent, identically, normal distributed observations [y 1 , ..., y n ]c = y {R n , pdf } . The first moment P as well as the central second moment V 2 constitute the unknown parameters ( P , V 2 ) R × R + where R × R + is the admissible parameter space. The estimation of the unknown parameters ( P , V 2 ) is based on the following optimization problem Maximize the log-likelihood function n
ln f ( y1 , ..., yn P , V 2 ) = ln f ( yi P , V 2 ) = i =1
= ln{
1
( 2S )
n n V 2
1 exp( 2 2V
n n 1 = ln 2S ln V 2 2 2 2 V
n
P ) 2 )} =
¦(y i =1
i
n
¦(y i =1
i
P ) 2 = max P ,V 2
206
4 The second problem of probabilistic regression
of the independent, identically normal distributed random variables { y1 ,… , yn } . The log-likelihood function is simple if we introduce the first sample moment m1 and the second sample moment m2 , namely m1 :=
(
1 n 1 1 n 2 1 c y = 1 y m = , : ¦ i n ¦ yi = n y cy 2 n i =1 n i =1
)
(
)
n n n ln f y1 , " , yn P , V 2 = ln 2S ln V 2 2 m2 2m1 P + P 2 . 2 2 2V Now we are able to define the optimization problem
(
)
(
)
A P , V 2 := ln f y1 , " , yn P , V 2 = max P, V 2
more precisely. Definition (Maximum Likelihood Estimation, linear model E{y} = 1n P , D{y} = I nV 2 , independent, identically normal distributed observations { y1 ,… , yn } ): A 2x1 vector [ PA , V A2 ]' is called MALE of [ P , V 2 ]' , (Maximum Likelihood Estimation) with respect to the linear model 0.1 if its loglikelihood function A( P , V 2 ) := ln f ( y1 ,… , yn P , V 2 ) = n n n = ln 2S ln V 2 2 (m2 2m1 + P 2 ) 2 2 2V is minimal. The simultaneous estimation of ( P , V 2 ) of type MALE can be characterized as following. Corollary (MALE with respect to the linear model E{y}= 1n P , D{y}= I nV 2 , independent identically normal distributed observations { y1 ,… , yn } ):
(
)
The log-likelihood function A P , V 2 with respect to the linear model E{y} = 1n P , D{y} = I nV 2 , ( P , V 2 R × R + ) , of independent, identically normal distributed observations { y1 ,… , yn } is maximal if 1 1 PA = m1 = 1c y, V A2 = m2 m12 = ( y yA )c( y yA ) n n is a simultaneous estimation of the mean volume (first moment) PA and of the variance (second moment) V A2 . :Proof: The Lagrange function n n L( P , V 2 ) := ln V 2 2 (m2 2m1 P + P 2 ) = max P ,V 2 2V 2
4-1 Introduction
207
leads to the necessary conditions nm nP wL ( P , V 2 ) = 21 = 2 = 0 wP V V wL n n ( P , V 2 ) = 2 + 4 (m2 2 P m1 + P 2 ) = 0, wV 2 2V 2V also called the likelihood normal equations. Their solution is 1cy ª P1 º ª m1 º 1 ª º «V 2 » = « m m 2 » = « y cy - (1cy)2 » . n¬ ¼ ¬ A¼ ¬ 2 1 ¼ The matrix of second derivates constitutes as a negative matrix the sufficiency conditions. ª 1 «V 2 w L « A ( P , V ) = n A A w ( P , V 2 )w ( P , V 2 ) ' « 0 « ¬ 2
º 0 » »>0. 1 » V A4 »¼
h Finally we can immediately check that A( P , V 2 ) o f as ( P , V 2 ) approaches the boundary of the parameter space. If the log-likelihood function is sufficiently regular, we can expand it as ª PP º A( P , V 2 ) = A( PA , V A2 ) + DA( PA , V A2 ) « 2 A2 » + ¬V V A ¼ ª PP º ª PP º 1 + D 2 A( PA , V A2 ) « 2 A2 »
« 2 A2 » + O3 . 2 ¬V V A ¼ ¬V V A ¼
Due to the likelihood normal likelihood equations DA( PA , V A2 ) vanishes. Therefore the behavior of A( P , V 2 ) near ( PA , V A2 ) is largely determined by D 2 A( PA , V A2 ) R × R + , which is a measure of the local curvature the loglikelihood function A( P , V 2 ) . The non-negative Hesse matrix of second derivatives I ( PA , V A2 ) =
w2A ( PA , V A2 ) > 0 2 2 w ( P , V )w ( P , V ) '
is called observed Fischer information. It can be regarded as an index of the steepness of the log-likelihood function moving away from ( P , V 2 ) , and as an indicator of the strength of preference for the MLE point with respect to the other points of the parameter space.
208
4 The second problem of probabilistic regression
Finally, compare by means of Table 4.4 ( PA , V A2 ) MALE of ( P , V 2 ) for the front page example of Table 4.1 and Table 4.2 Table 4.4 : ( PA , V A2 ) MALE of ( P , V 2 ) {R, R + } : the front page examples
PA
V A2
13
2
30.16
1474.65
| VA |
st
1 example (n=5) 2nd example (n=6)
2 36.40
4-2 Setup of the best linear uniformly unbiased estimator of type BLUUE for the moments of first order Let us introduce the special Gauss-Markov model y = Aȟ + e specified in Box 4.1, which is given for the first order moments in the form of a inconsistent system of linear equations relating the first non-stochastic (“fixed”), real-valued vector ȟ of unknowns to the expectation E{y} of the stochastic, real-valued vector y of observations, Aȟ = E{y} , since E{y} R ( A) is an element of the column space R ( A ) of the real-valued, non-stochastic ("fixed") "first order design matrix" A \ n× m . The rank of the fixed matrix A, rk A, equals the number m of unknowns, ȟ \ m . In addition, the second order central moments, the regular variance-covariance matrix Ȉ y , also called dispersion matrix D{y} constitute the second matrix Ȉ y \ n×n of unknowns to be specified as a linear model furtheron. Box 4.1: Special Gauss-Markov model y = Aȟ + e 1st moments Aȟ = E{y}, A \ n× m , E{y} R ( A), rk A = m 2nd moments
(4.1)
Ȉ y = D{y} \ n×n , Ȉ y positive definite, rk Ȉ y = n
(4.2)
ȟ, E{y}, y E{y} = e unknown Ȉ y unknown . 4-21 The best linear uniformly unbiased estimation ȟˆ of ȟ : Ȉ y -BLUUE Since we are dealing with a linear model, it is "a natural choice" to setup a linear form to estimate the parameters ȟ of fixed effects, namely ȟˆ = Ly + ț ,
(4.3)
4-2 Setup of the best linear uniformly unbiased estimators
209
where {L \ m × n , ț \ m } are fixed unknowns. In order to determine the realvalued m×n matrix L and the real-valued m×1 vector ț , independent of the variance-covariance matrix Ȉ y , the inhomogeneous linear estimation ȟˆ of the vector ȟ of fixed effects has to fulfil certain optimality conditions. (1st) ȟˆ is an inhomogeneous linear unbiased estimation of ȟ E{ȟˆ} = E{Ly + ț} = ȟ for all ȟ R m , and
(4.4)
(2nd) in comparison to all other linear uniformly unbiased estimations ȟˆ has minimum variance tr D{ȟˆ}: = E{(ȟˆ ȟ )c(ȟˆ ȟ )} = = tr L Ȉ y Lc =|| Lc ||Ȉ = min .
(4.5)
L
First the condition of a linear uniformly unbiased estimation E{ȟˆ} = E{Ly + ț} = ȟ for all ȟ R m with respect to the Special Gauss-Markov model (4.1), (4.2) has to be considered in more detail. As soon as we substitute the linear model (4.1) into the postulate of uniformly unbiasedness (4.4) we are led to (4.6) E{ȟˆ} = E{Ly + ț} = LE{y} + ț = ȟ for all ȟ R m and (4.7) LAȟ + ț = ȟ for all ȟ R m . Beside ț = 0 the postulate of linear uniformly unbiased estimation with respect to the special Gauss-Markov model (4.1), (4.2) leaves us with one condition, namely (4.8) (LA I m )ȟ = 0 for all ȟ R m or (4.9) LA I m = 0. Note that there are locally unbiased estimations such that (LA I m )ȟ 0 = 0 for LA I m z 0. Alternatively, B. Schaffrin (2000) has softened the constraint of unbiasedness (4.9) by replacing it by the stochastic matrix constraint A cLc = I m + E0 subject to E{vec E0 } = 0, D{vec E0 } = (I m
Ȉ 0 ), Ȉ 0 a positive definite matrix. For Ȉ0 o 0 , uniform unbiasedness is restored. Estimators which fulfill the stochastic matrix constraint A cLc = I m + E0 for finite Ȉ0 are called “softly unbiased” or “unbiased in the mean”. Second, the choice of norm for "best" of type minimum variance has to be discussed more specifically. Under the condition of a linear uniformly unbiased estimation let us derive the specific representation of the weighted Frobenius matrix norm of Lc . Indeed let us define the dispersion matrix
210
4 The second problem of probabilistic regression
D{ȟˆ} := E{(ȟˆ E{ȟˆ})(ȟˆ E{ȟˆ})c} = = E{(ȟˆ ȟ )(ȟˆ ȟ )c},
(4.10)
which by means of the inhomogeneous linear form ȟˆ = Ly + ț is specified to D{ȟˆ} = LD{y}Lc and
(4.11)
Definition 4.1 ( ȟˆ Ȉ y - BLUUE of ȟ ): An m×1 vector ȟˆ = Ly + ț is called Ȉ y - BLUUE of ȟ (Best Linear Uniformly Unbiased Estimation) with respect to the Ȉ y -norm in (4.1) if (1st) ȟˆ is uniformly unbiased in the sense of tr D{ȟˆ} : = tr L D{y} Lc =|| Lc ||Ȉ .
(4.12)
y
Now we are prepared for Lemma 4.2: ( ȟˆ Ȉ y - BLUUE of ȟ ): An m×1 vector ȟˆ = Ly + ț is Ȉ y - BLUUE of ȟ in (4.1) if and only if ț=0 holds and the matrix L fulfils the system of normal equations ª Ȉ y A º ª Lcº ª 0 º « Ac 0 » « ȁ » = «I » ¬ ¼¬ ¼ ¬ m¼
(4.13)
(4.14)
or Ȉ y Lc + Aȁ = 0 and
(4.15)
A cLc = I m with the m × m matrix of "Lagrange multipliers".
(4.16)
:Proof: Due to the postulate of an inhomogeneous linear uniformly unbiased estimation with respect to the parameters ȟ \ m of the special Gauss-Markov model we were led to ț = 0 and one conditional constraint which makes it plausible to minimize the constraint Lagrangean
L ( L, ȁ ) := tr LȈ y Lc + 2tr ȁ( A cLc I m ) = min L ,ȁ
(4.17)
for Ȉ y - BLUUE. The necessary conditions for the minimum of the quadratic constraint Lagrangean L (L, ȁ ) are
4-2 Setup of the best linear uniformly unbiased estimators wL ˆ ˆ ˆ )c = 0 ( L, ȁ ) := 2( Ȉ y Lˆ c + Aȁ wL wL ˆ ˆ (L, ȁ ) := 2( A cLˆ c I m ) = 0 , wȁ
211 (4.18) (4.19)
which agree to the normal equations (4.14). The theory of matrix derivatives is reviewed in Appendix B, namely (d3) and (d4). The second derivatives w 2L ˆ ) = 2( Ȉ
I ) > 0 (Lˆ , ȁ y m w (vecL)w (vecL)c
(4.20)
constitute the sufficiency conditions due to the positive-definiteness of the matrix Ȉ for L (L, ȁ ) = min . (The Kronecker-Zehfuss Product A
B of two arbitrary matrices A and B, is explained in Appendix A.) h Obviously, a homogeneous linear form ȟˆ = Ly is sufficient to generate Ȉ BLUUE for the special Gauss-Markov model (4.1), (4.2). Explicit representations of Ȉ - BLUUE of type ȟˆ as well as of its dispersion matrix D{ȟˆ | ȟˆ Ȉ y BLUUE} generated by solving the normal equations (4.14) are collected in Theorem 4.3 ( ȟˆ Ȉ y -BLUUE of ȟ ): Let ȟˆ = Ly be Ȉ - BLUUE of ȟ in the special linear Gauss-Markov model (4.1),(4.2). Then ȟˆ = ( A cȈ y 1 A) 1 A cȈ y1 y
(4.21)
ȟˆ = Ȉȟˆ A cȈ y1y
(4.22)
are equivalent to the representation of the solution of the normal equations (4.14) subjected to the related dispersion matrix D{ȟˆ}:= Ȉȟˆ = ( A cȈ y1A ) 1 . :Proof: We shall present two proofs of the above theorem: The first one is based upon Gauss elimination in solving the normal equations (4.14), the second one uses the power of the IPM method (Inverse Partitioned Matrix, C. R. Rao's Pandora Box). (i) forward step (Gauss elimination): Multiply the first normal equation by Ȉ y1 , multiply the reduced equation by Ac and subtract the result from the second normal equation. Solve for ȁ
212
4 The second problem of probabilistic regression
ˆ = 0 (first equation: º Ȉ y Lˆ c + Aȁ multiply by -A cȈy1 ) » » (second equation) »¼ A cLˆ c = I m
ˆ = 0º A cLˆ c A cȈy1Aȁ » ˆc=I A cL »¼ m ˆ =I A cȈ y1Aȁ ˆ = ( A cȈ 1A ) 1 . ȁ y
(4.23)
(ii) backward step (Gauss elimination): ˆ in the modified first normal equation and solve for Lˆ . Substitute ȁ
ˆ = 0 Lˆ = ȁ ˆ cA cȈ 1 º Lˆ c + Ȉy1Aȁ y » 1 1 ˆ ȁ = ( A cȈ y A ) »¼
Lˆ = ( A cȈy1A ) 1 A cȈy1 .
(4.24)
(iii) IPM (Inverse Partitioned Matrix): Let us partition the symmetric matrix of the normal equations (4.14) ª Ȉ y A º ª A11 « Ac 0 » = « Ac ¬ ¼ ¬ 12
A12 º . 0 »¼
According to Appendix A (Fact on Inverse Partitioned Matrix: IPM) its Cayley inverse is partitioned as well. ª Ȉy « Ac ¬
1
Aº ªA = « 11 c 0 »¼ ¬ A12
1
A12 º ªB = « 11 c 0 »¼ ¬B12
B12 º B 22 »¼
B11 = I m Ȉ y1A( A cȈ y1A) 1 A cȈ y1 c = ( A cȈ y1A) 1 A cȈ y1 B12 B 22 = ( AcȈ y1A) 1. The normal equations are now solved by ªLˆ cº ª A11 « »=« ˆ c ¬« ȁ ¼» ¬ A12
1
A12 º ª 0 º ª B11 = c 0 »¼ «¬ I m »¼ «¬ B12
B12 º ª 0 º B 22 »¼ «¬ I m »¼
c = ( AcȈ y1A) 1 AcȈ y1 Lˆ = B12 ˆ = B = ( A cȈ 1A) 1. ȁ 22 y
(4.25)
4-2 Setup of the best linear uniformly unbiased estimators
213
(iv) dispersion matrix The related dispersion matrix is computed by means of the "Error Propagation Law". D{ȟˆ} = D{Ly | Lˆ = ( A cȈ y1A) 1} = Lˆ D{y}Lˆ c D{ȟˆ} = ( A cȈ y1A) 1 A cȈ y1Ȉ y Ȉ y1A( A cȈ y1 A) 1 D{ȟˆ} = ( A cȈ y1A ) 1 .
(4.26)
Here is my proof's end. h By means of Theorem 4.3 we succeeded to produce ȟˆ - BLUUE of ȟ . In consen quence, we have to estimate E {y} as Ȉ y - BLUUE of E{y} as well as the "error vector" (4.27) e y := y E{y} n e y := y E {y} = y Aȟˆ = ( I n AL) y
(4.28)
out of n Lemma 4.4: ( E {y} Ȉ y - BLUUE of E{y} , e y , D{e y }, D{y} ): n (i) Let E {y} be Ȉ - BLUUE of E{y} = Aȟ with respect to the special Gauss-Markov model (4.1), (4.2) , Then n E {y} = Aȟˆ = A( A cȈ y 1A ) 1 A cȈy 1 y (4.29) leads to the singular variance-covariance matrix (dispersion matrix) D{Aȟˆ} = A( A cȈ y1A ) 1 A c .
(4.30)
(ii) If the error vector e is empirically determined, we receive for the residual vector e y = [I n A( A cȈy1A ) 1 A cȈy1 ]y (4.31) and its singular variance-covariance matrix (dispersion matrix) D{e y } = Ȉ y A( A cȈ y1A ) 1 A c, rk D{e y } = n m .
(4.32)
(iii) The dispersion matrices of the special Gauss-Markov model (4.1), (4.2) are related by D{y} = D{Aȟˆ + e y } = D{Aȟˆ} + D{e y } = = D{e y e y } + D{e y }, C{e y , Aȟˆ} = 0, C{e y , e y e y } = 0. e y and Aȟˆ are uncorrelated .
(4.33) (4.34)
214
4 The second problem of probabilistic regression
:Proof: n (i ) E {y} = Aȟˆ = A ( A cȈ y 1 A ) 1 A cȈ y 1 y As soon as we implement ȟˆ Ȉ y - BLUUE of ȟ , namely (4.21), into Aȟˆ we are directly led to the desired result. (ii ) D{Aȟˆ} = A ( A cȈ y 1 A ) 1 A c ȟˆ Ȉ y - BLUUE of ȟ , namely (4.21), implemented in D{Aȟˆ} := E{A(ȟˆ E{ȟˆ})(ȟˆ E{ȟˆ})cA c} D{Aȟˆ} = AE{(ȟˆ E{ȟˆ})(ȟˆ E{ȟˆ})c}A c D{Aȟˆ} = A( A cȈ y1A) 1 AcȈ y1 E{(y E{y})( y E{y})c}Ȉ y1A( AcȈ y1A) 1 Ac D{Aȟˆ} = A( A cȈ y1A) 1 AcȈ y1A( A cȈ y1A) 1 Ac D{Aȟˆ} = A( A cȈ y1A) 1 Ac leads to the proclaimed result. (iii ) e y = [I n A( AcȈ y1 A) 1 AcȈ y1 ]y. Similarly if we substitute Ȉ y - BLUUE of ȟ , namely (4.21), in n e y = y E {y} = y Aȟˆ = [I n A ( A cȈ y 1A ) 1 A cȈ y 1 ]y we gain what we wanted! (iv ) D{eˆ y } = Ȉ A( A cȈ y1A ) 1 A c D{e y } := E{(e y E{e y })(e y E{e y })c}. As soon as we substitute E{e y } = [I n A( A cȈy1A ) 1 A cȈy1 ]E{y} in the definition of the dispersion matrix D{e y } , we are led to D{e y }:= [I n A ( A cȈy1A ) 1 A cȈy1 ] Ȉ [I n Ȉy1A ( A cȈy1A ) 1 A c], D{e y } = = [ Ȉ y A ( A cȈ A ) A c][I n Ȉ y1A( A cȈ y1A ) 1 A c] = 1 y
1
= Ȉ y A ( A cȈ y1A ) 1 A c A( A cȈy1A ) 1 A c + A( A cȈy1A ) 1 A c = = Ȉ y A ( A cȈ y1A ) 1 A c.
4-2 Setup of the best linear uniformly unbiased estimators
215
rk D{e y } = rk D{y} rk A( AȈ y1A) 1 A c = n m. ( v ) D{y} = D{Aȟˆ + e y } = D{Aȟˆ} + D{e y } = D{e y e y } + D{e y } y E{y} = y Aȟ = y Aȟˆ + A(ȟˆ ȟ ) y E{y} = A(ȟˆ ȟ ) + e . y
The additive decomposition of the residual vector y-E{y} left us with two terms, namely the predicted residual vector e y and the term which is a linear functional of ȟˆ ȟ. The corresponding product decomposition [ y E{y}][ y E{y}]c = = A (ȟˆ ȟ )(ȟˆ ȟ )c + A(ȟˆ ȟ )e cy + e y (ȟˆ ȟ )cA c + e y e cy for ȟˆ Ȉ y - BLUUE of ȟ , in particular E{ȟˆ} = ȟ, and [ y E{y}][ y E{y}]c = = A (ȟˆ E{ȟˆ})(ȟˆ E{ȟˆ})c + A(ȟˆ E{ȟˆ})e cy + e y (ȟˆ E{ȟˆ})cA c + e y e cy D{y} = E{[y E{y}][ y E{y}]c} = D{Aȟˆ} + D{e y } = D{e y e y } + D{e y } due to E{A (ȟˆ E{ȟˆ})e cy } = E{A(ȟˆ E{ȟˆ})( y Aȟˆ )c} = 0 E{e y (ȟˆ E{ȟˆ})cA c} = E{e y (ȟˆ E{ȟˆ})cA c} = 0 or C{Aȟˆ , e y } = 0, C{e y , Aȟˆ} = 0. These covariance identities will be proven next. C{Aȟˆ , e y } = A( A cȈ y1A ) 1 A cȈ y1 E{(y E{y})( y E{y})c}[I n A( AcȈ y1A) 1 AcȈ y1 ]c C{Aȟˆ , e y } = A( A cȈ y1A) 1 A cȈ y1Ȉ y [I n Ȉ y1A( A cȈ y1A) 1 A c] C{Aȟˆ , e y } = A( A cȈ y1A) 1 A c A( A cȈ y1A) 1 A c = 0. Here is my proof’s end. h
216
4 The second problem of probabilistic regression
We recommend to consider the exercises as follows. Exercise 4.1 (translation invariance: y 6 y E{y}) : Prove that the error prediction of type ȟˆ Ȉ - BLUUE of ȟ , namely y
e y = [I n A( A cȈy1A ) 1 A cȈy1 ]y is translation invariant in the sense of y 6 y E{y} that is e y = [I n A( A cȈy1A ) 1 A cȈy1 ]e y subject to e y := y E{y} . Exercise 4.2 (idempotence): 1 y
Is the matrix I n A ( A cȈ A ) 1 A cȈy1 idempotent ? Exercise 4.3 (projection matrices): Are the matrices A ( A cȈy1A ) 1 A cȈy1 and I n A ( A cȈy1A ) 1 A cȈy1 projection matrices? 4-22
The Equivalence Theorem of G y -LESS and Ȉ y -BLUUE
We have included the fourth chapter on Ȉ y -BLUUE in order to interpret G y LESS of the third chapter. The key question is open: ?When are Ȉ y -BLUUE and G y -LESS equivalent? The answer will be given by Theorem 4.5 (equivalence of Ȉ y -BLUUE and G y -LESS): With respect to the special linear Gauss-Markov model of full column rank (4.1), (4.2) ȟˆ = Ly is Ȉ y -BLUUE, if ȟ A = Ly is G y -LESS of (3.1) for G y = Ȉ y1 G y1 = Ȉ y .
(4.35)
In such a case, ȟˆ = ȟ A is the unique solution of the system of normal equations (4.36) ( A cȈ1A )ȟˆ = A cȈ1y y
y
attached with the regular dispersion matrix D{ȟˆ} = ( A cȈ y1A ) 1 .
(4.37)
The proof is straight forward if we compare the solution (3.11) of G y -LESS and (4.21) of Ȉ y -BLUUE. Obviously the inverse dispersion matrix D{y}, y{Y,pdf} is equivalent to the matrix of the metric G y of the observation space Y. Or conversely the inverse matrix of the metric of the observation space Y determines the variance-covariance matrix D{y} Ȉ y of the random variable y {Y, pdf} .
4-3 Setup of BIQUUE
217
4-3 Setup of the best invariant quadratic uniformly unbiased estimator of type BIQUUE for the central moments of second order The subject of variance -covariance component estimation within Mathematical Statistics has been one of the central research topics in the nineteen eighties. In a remarkable bibliography up-to-date to the year 1977 H. Sahai listed more than 1000 papers on variance-covariance component estimations, where his basic source was “Statistical Theory and Method“ abstracts (published for the International Statistical Institute by Longman Groups Limited), "Mathematical Reviews" and "Abstract Service of Quality Control and Applied Statistics". Excellent review papers and books exist on the topic of variance-covariance estimation such as C.R. Rao and J. Kleffe, R.S. Rao (1977) S. B. Searle (1978), L.R. Verdooren (1980), J. Kleffe (1980), and R. Thompson (1980). The PhD Thesis of B. Schaffrin (1983) offers a critical review of state-of-the-art of variance-covariance component estimation. In Geodetic Sciences variance components estimation originates from F. R. Helmert (1924) who used least squares residuals to estimate heterogeneous variance components. R. Kelm (1974) and E. Grafarend, A. Kleusberg and B. Schaffrin (1980) proved the relation of Ȉ0 Helmert type IQUUE balled Ȉ - HIQUUE to BIQUUE and MINQUUE invented by C. R. Rao. Most notable is the Ph. D. Thesis of M. Serbetci (1968) whose gravimetric measurements were analyzed by Ȉ 0 -HIQUUE Geodetic extensions of the Helmert method to compete variance components originate from H. Ebner (1972, 1977), W. Förstner (1979, 1980), W. Welsch (1977, 1978, 1979, 1980), K. R. Koch (1978, 1981), C. G. Persson (1981), L. Sjoeberg (1978), E. Grafarend and A. d'Hone (1978), E. Grafarend (1984) B. Schaffrin (1979, 1980, 1981). W. Förstner (1979), H. Fröhlich (1980), and K.R. Koch (1981) used the estimation of variance components for the adjustment of geodetic networks and the estimation of a length dependent variance of distances. A special field of geodetic application has been oscillation analysis based upon a fundamental paper by H. Wolf (1975), namely M. Junasevic (1977) for the estimation of signal-to-noise ratio in gyroscopic azimuth observations. The Helmert method of variance component estimation was used by E. Grafarend and A. Kleusberg (1980) and A. Kleusberg and E. Grafarend (1981) to estimate variances of signal and noise in gyrocompass observations. Alternatively K. Kubik (1967a, b, c, 1970) pioneered the method of Maximum Likelihood (MALE) for estimating weight ratios in a hybrid distance – direction network. "MALE" and "FEMALE" extensions were proposed by B. Schaffrin (1983), K. R. Koch (1986), and Z. C. Yu (1996). A typical problem with Ȉ0 -Helmert type IQUUE is that it does not produce positive variances in general. The problem of generating a positive-definite variance-covariance matrix from variance-covariance component estimation has
218
4 The second problem of probabilistic regression
already been highlighted by J. R. Brook and T. Moore (1980), K.G. Brown (1977, 1978), O. Bemk and H. Wandl (1980), V. Chew (1970), Han Chien-Pai (1978), R. R. Corbeil and S. R. Searle (REML, 1976), F. J. H. Don and J. R. Magnus (1980), H. Drygas (1980), S. Gnot, W. Klonecki and R. Zmyslony (1977). H. O. Hartley and J. N. K. Rao (ML, 1967), in particular J. Hartung (1979, 1980), J. L. Hess (1979), S. D. Horn and R. A. Horn (1975), S. D. Horn, R. A. Horn and D. B. Duncan (1975), C. G. Khatri (1979), J. Kleffe (1978, 1980), ), J. Kleffe and J. Zöllner (1978), in particular L. R. Lamotte (1973, 1980), S. K. Mitra (1971), R. Pincus (1977), in particular F. Pukelsheim (1976, 1977, 1979, 1981 a, b), F. Pukelsheim and G. P. Styan (1979), C. R. Rao (1970, 1978), S. R. Searle (1979), S. R. Searle and H. V. Henderson (1979), J. S. Seely (1972, 1977), in particular W. A. Thompson (1962, 1980), L. R. Verdooren (1979), and H. White (1980). In view of available textbooks, review papers and basic contributions in scientific journals we are only able to give a short introduction. First, we outline the general model of variance-covariance components leading to a linear structure for the central second order moment, known as the variance-covariance matrix. Second, for the example of one variance component we discuss the key role of the postulate's (i) symmetry, (ii) invariance, (iii) uniform unbiasedness, and (iv) minimum variance. Third, we review variance-covariance component estimations of Helmert type. 4-31
Block partitioning of the dispersion matrix and linear space generated by variance-covariance components
The variance-covariance component model is defined by the block partitioning (4.33) of a variance-covariance matrix Ȉ y , also called dispersion matrix D{y} , which follows from a corresponding rank partitioning of the observation vector y = [ y1c,… , yAc ]c . The integer number A is the number of blocks. For instance, the variance-covariance matrix Ȉ R n× n in (4.41) is partitioned into A = 2 blocks. The various blocks consequently factorized by variance V 2j and by covariances V jk = U jk V jV k . U jk [1, +1] denotes the correlation coefficient between the blocks. For instance, D{y1 } = V11V 12 is a variance factorization, while D{y1 , y 2 } = V12V 12 = V12 U12V 1 V 2 is a covariance factorization. The matrix blocks V jj are built into the matrix C jj , while the off-diagonal blocks V jk , V jkc into the matrix C jk of the same dimensions. dim Ȉ = dim C jj = dim C jk = n × n . The collective matrices C jj and C jk enable us to develop an additive decomposition (4.36), (4.43) of the block partitioning variance-covariance matrix Ȉ y . As soon as we collect all variance-covariance components in an peculiar true order, namely ı := [V 12 , V 12 , V 22 , V 13 , V 23 , V 32 ,..., V A 1A , , V A2 ]c , we are led to a linear form of the dispersion matrix (4.37), (4.43) as well as of the
219
4-3 Setup of BIQUUE
dispersion vector (4.39), (4.44). Indeed the dispersion vector d(y ) = Xı builds up a linear form where the second order design matrix X, namely X := [vec C1 ," , vec CA ( A +1) ] R n
2
× A ( A +1) / 2
,
reflects the block structure. There are A(A+1)/2 matrices C j , j{1," , A(A +1) / 2} . For instance, for A = 2 we are left with 3 block matrices {C1 , C2 , C3 } . Before we analyze the variance-covariance component model in more detail, we briefly mention the multinominal inverse Ȉ 1 of the block partitioned matrix Ȉ . For instance by “JPM” and “SCHUR” we gain the block partitioned inverse matrix Ȉ 1 with elements {U11 , U12 , U 22 } (4.51) – (4.54) derived from the block partitioned matrix Ȉ with elements {V11 , V12 , V22 } (4.47). “Sequential JPM” solves the block inverse problems for any block partitioned matrix. With reference to Box 4.2 and Box 4.3 Ȉ = C1V 1 + C2V 2 + C3V 3 Ȉ 1 = E1 (V ) + E2 (V ) + E3 (V ) is an example. Box 4.2 Partitioning of variance-covariance matrix ª V11V 12 « Vc V « 12 12 Ȉ=« # « V1cA 1V 1A 1 «¬ V1cAV 1A
V12V 12 V22V 22 # V2cA 1V 2A 1 V2cAV 2A
V1A 1V 1A 1 V2A 1V 2A 1 # " VA 1A 1V A21 " VAc1AV A 1A " "
V1AV 1A º V2AV 2A » » # »>0 VA 1AV A 1A » VAAV A2 »¼
(4.38)
"A second moments V 2 of type variance, A (A 1) / 2 second moment V jk of type covariance matrix blocks of second order design ª0 " 0º C jj := « # V jj # » « » ¬« 0 " 0 ¼» 0º ª0 «" 0 V jk " » » C jk := « Vkj «" " » «0 0 »¼ ¬
j {1," , A }
ª subject to j < k « and j , k {1," , A} ¬
A
A 1, A
j =1
j =1, k = 2, j < k
Ȉ = ¦ C jjV 2j + Ȉ=
A ( A +1) / 2
¦ j =1
¦
C jk V jk
C jV j R n× m
(4.39)
(4.40)
(4.41) (4.42)
220
4 The second problem of probabilistic regression
[V 12 , V 12 , V 22 , V 13 , V 23 , V 32 ,..., V A 1A , , V A2 ]' =: V
(4.43)
"dispersion vector" D{y} := Ȉ y d {y} = vec D{y} = vec Ȉ d (y ) =
A ( A +1) / 2
¦
(vec C j )V j = XV
(4.44)
j =1
" X is called second order design matrix" X := [vec C1 ," , vec CA ( A +1) / 2 ]
(4.45)
"dimension identities" d (y ) R n ×1 , V R, X R n ×A ( A +1) / 2 . 2
2
Box 4.3 Multinomial inverse :Input: Ȉ12 º ª V11V 12 V12V 12 º ªȈ Ȉ = « 11 »= »=« ' ¬ Ȉ12 Ȉ 22 ¼ ¬ V12c V 12 V22V 22 ¼ ª0 0 º 2 ª V 0 º 2 ª 0 V12 º n× m = « 11 V1 + « V 12 + « » »V 2 R » c V 0 0 V 0 0 ¬ ¼ ¬ 12 ¼ ¬ 22 ¼ ªV C11 := C1 := « 11 ¬ 0
0º ª 0 , C12 := C2 := « » 0¼ ¬ V12c
V12 º ª0 0 º , C22 := C3 := « » » 0 ¼ ¬ 0 V22 ¼
(4.46)
(4.47)
3
Ȉ = C11V 12 + C12V 12 + C22V 22 = C1V 1 + C2V 2 + C3V 3 = ¦ C jV j
(4.48)
ªV 1 º vec 6 = ¦ (vec C j )V j =[vec C1 , vec C2 , vec C3 ] ««V 2 »» = XV j =1 «¬V 3 »¼
(4.49)
j =1
3
vec C j R n
2
×1
j {1,..., A(A + 1) / 2}
" X is called second order design matrix" X := [vec C1 ," , vec CA (A +1) / 2 ] R n
2
×A ( A +1) / 2
here: A=2 ªU Ȉ 1 = « 11 ¬ 0
:output: 0 º 2 ª 0 U12 º 1 ª 0 0 º 2 V1 + « V 12 + « »V 2 c 0 »¼ 0 »¼ ¬ U12 ¬ 0 U 22 ¼
(4.50)
221
4-3 Setup of BIQUUE
subject to (4.51)
U11 = V111 + qV111 V12 SV12c V111 , U12 = Uc21 = qV111 V12 S
(4.53)
U 22 = S = (V22 qV12c V111 V12 ) 1 ; q := Ȉ 1 = E1 + E2 + E3 =
V 122 V 12V 22
(4.52) (4.54)
A ( A +1) / 2 = 3
¦
Ej
(4.55)
j =1
ªU E1 (V ) := « 11 ¬ 0
0 º 2 ª 0 V 1 , E 2 (V ) := « » c 0¼ ¬ U12
U12 º 1 ª 0 0 º 2 V 12 , E3 (V ) := « » »V 2 . 0 ¼ ¬ 0 U 22 ¼
(4.56)
The general result that inversion of a block partitioned symmetric matrix conserves the block structure is presented in Corollary 4.6 (multinomial inverse): Ȉ=
A ( A +1) / 2
¦
C j V j Ȉ 1 =
j =1
A ( A +1) / 2
¦
Ǽ j (V ) .
(4.57)
j =1
We shall take advantage of the block structured multinominal inverse when we are reviewing HIQUUE or variance-covariance estimations of Helmert type. The variance component model as well as the variance-covariance model are defined next. A variance component model is a linear model of type ª V11V 12 « 0 « Ȉ=« # « 0 « ¬ 0
0 V22V 22 # 0 0
" 0 " 0 % # " VA 1A 1V A21 " 0
0 º 0 »» # » 0 » » VAAV A2 ¼
ªV 12 º d {y} = vec Ȉ = [vec C11 ,… , vec C jj ] « " » «V 2 » ¬ A¼ + d {y} = XV V R .
(4.58)
(4.59) (4.60)
In contrast, the general model (4.49) is the variance-covariance model with a linear structure of type ª V 12 º «V » 12 (4.61) d {y} = vec Ȉ = [vec C11 , vec C12 , vec C12 ,… , vec CAA ] « V 22 » « » " « 2» «¬ V A »¼
222
4 The second problem of probabilistic regression
d {y} = Xı V 2j R + , Ȉ positive definite.
(4.62)
The most popular cases of variance-covariance components are collected in the examples. Example 4.1 (one variance components, i.i.d. observations) D{y} : Ȉ y = , nV 2 subject to Ȉ y SYM (R n×n ), V 2 R + . Example 4.2 (one variance component, correlated observations) D{y} : Ȉ y = 9nV 2 subject to Ȉ y SYM (R n×n ), V 2 R + . Example 4.3. (two variance components, two sets of totally uncorrected observations "heterogeneous observations") ª n = n1 + n2 ª I n V 12 0 º « D{y} : Ȉ y = « subject to « Ȉ y SYM (R n×n ) (4.63) 2» 0 I V » 2¼ n ¬« « 2 + + 2 ¬V 1 R , V 2 R . 1
2
Example 4.4 (two variance components, one covariance components, two sets of correlated observations "heterogeneous observations") n = n1 + n2 ª V12V 12 º n ×n n ×n « » subject to « V11 R , V22 R V11V 22 ¼ n ×n V12 R «¬
ªV V 2 D{y} : Ȉ y = « 11 1 ¬ V12c V 12
1
1
2
1
2
2
Ȉ y SYM (R n×n ), V 12 R + , V 22 R + , Ȉ y positive definite.
(4.64)
Special case: V11 = I n , V22 = I n . 1
2
Example 4.5 (elementary error model, random effect model)
(4.66)
A
A
j =1
j =1
e y = y E{y z} = ¦ A j (z j E{z j }) = ¦ A j e zj
(4.65)
E{e zj } = 0, E{e zj , eczk } = G jk I q
(4.67)
A
D{y} : Ȉ y = ¦ A j A cjV 2j + j =1
A
¦
j , k =1, j < k
( A j A ck + A k A cj )V jk .
(4.68)
At this point, we should emphasize that a linear space of variance-covariance components can be build up independently of the block partitioning of the dispersion matrix D{y} . For future details and explicit examples let us refer to B. Schaffrin (1983).
223
4-3 Setup of BIQUUE
4-32
Invariant quadratic estimation of variance-covariance components of type IQE
By means of Definition 4.7 (one variance component) and Definition 4.9 (variance-covariance components) we introduce
Vˆ 2 IQE of V 2 and
Vˆ k IQE of V k .
Those conditions of IQE, represented in Lemma 4.7 and Lemma 4.9 enable us to separate the estimation process of first moments ȟ j (like BLUUE) from the estimation process of central second moments V k (like BIQUUE). Finally we provide you with the general solution (4.75) of the in homogeneous matrix equations M1/k 2 A = 0 (orthogonality conditions) for all k {1, " ,A(A+1)/2} where A(A+1)/2 is the number of variance-covariance components, restricted to the special Gauss–Markov model E {y} = Aȟ , d {y} = XV of "full column rank", A R n× m , rk A = m . Definition 4.7 (invariant quadratic estimation Vˆ 2 of V 2 : IQE ): The scalar Vˆ 2 is called IQE (Invariant Quadratic Estimation) of V 2 R + with respect to the special Gauss-Markov model of full column rank. E{y} = Aȟ, A R n×m , rk A = m (4.69) D{y} = VV 2 , V R n×n , rk V = n, V 2 R + , if the “variance component V 2 is V ” (i) a quadratic estimation
Vˆ 2 = y cMy = (vec M )c(y
y ) = (y c
y c)(vec M)
(4.70)
subject to M SYM := {M R n× n | M c = M}
(4.71)
(ii) transformational invariant : y o y E{y} =: ey in the sense of
Vˆ 2 = y cMy = e ycMe y or 2 Vˆ = (vec M )c(y
y ) = (vec M )c(e y
e y ) or 2 Vˆ = tr(Myy c) = tr(Me e c ) . y y
(4.72) (4.73) (4.74)
Already in the introductory paragraph we emphasized the key of "IQE". Indeed by the postulate "IQE" the estimation of the first moments E{y} = Aȟ is
224
4 The second problem of probabilistic regression
supported by the estimation of the central second moments D{y} = VV 2 or d {y} = XV . Let us present to you the fundamental result of " Vˆ 2 IQE OF V 2 ". Lemma 4.8 (invariant quadratic estimation Vˆ 2 of Vˆ 2 :IQE) : Let M = (M1/ 2 )cM1/ 2 be a multiplicative decomposition of the symmetric matrix M . The scalar Vˆ 2 is IQE of V 2 , if and only if M1/ 2 = 0
(4.75) 1/ 2
for all M
R
n× n
or A c(M1/ 2 )c = 0
(4.76)
. :Proof:
First, we substitute the transformation y = E{y} + e y subject to expectation identity E{y} = Aȟ, A R n× m , rk A = m, into y cMy. y ' My = ȟ cA cMAȟ + ȟ cA cMe y + e ycMAȟ + e ycMe y . Second, we take advantage of the multiplicative decomposition of the matrix M , namely M = (M1/ 2 )cM1/ 2 ,
(4.77)
which generates the symmetry of the matrix M SYM := {M R m×n | M c = M} y cMy = ȟ cA c(M1/ 2 )cM1/ 2 Aȟ + ȟ cA c(M1/ 2 )cM1/ 2e y + e yc (M1/ 2 )cM1/ 2 Aȟ + e ycMe y . Third, we postulate "IQE". y cMy = e ycMe y M1/ 2 A = 0 A c(M1/ 2 )c = 0. For the proof, here is my journey's end.
h
Let us extend " IQE " from a " one variance component model " to a " variancecovariance components model ". First, we define " IQE " ( 4.83 ) for variancecovariance components, second we give necessary and sufficient conditions identifying " IQE " . Definition 4.9 (variance-covariance components model Vˆ k IQE of V k ) : The dispersion vector dˆ (y ) is called IQE ("Invariant Quadratic Estimation") with respect to the special Gauss-Markov model of full column rank. ª E{y} = Aȟ, A {R n×m }; rk A = m « «¬ d {y} = Xı, D{y} ~ Ȉ y positive definite, rk Ȉ y = n,
(4.78)
225
4-3 Setup of BIQUUE
if the variance-covariance components ı := [V 12 , V 12 , V 22 , V 13 , V 23 ," , V A2 ]c
(4.79)
(i) bilinear estimations
Vˆ k = y cMy = (vec M )c(y
y ) = tr M k yy c M k R n× n× A ( A +1) / 2 ,
(4.80)
subject to M k SYM := {M k R n× n× A ( A +1) / 2 | M k = M k c },
(4.81)
(ii) translational invariant y o y E{y} =: e y
Vˆ k = y cM k y = e ycM k e y
(4.82)
Vˆ k = (vec M k )c(y
y ) = (vec M k )c(e y
e y ).
(4.83)
Note the fundamental lemma " Vˆ k IQE of V k " whose proof follows the same line as the proof of Lemma 4.7. Lemma 4.10 (invariant quadratic estimation Vˆ k of V k : IQE): Let M k = (M1/k 2 )cM1/k 2 be a multiplicative decomposition of the symmetric matrix M k . The dispersion vector Vˆ k is IQE of V k , if and only if (4.84)
M1/k 2 A = 0 or A c(M1/k 2 )c = 0
(4.85)
for all M1/k 2 R n× n× A ( A +1) / 2 . ? How can we characterize " Vˆ 2 IQE of V 2 " or " Vˆ k IQE of V k " ? The problem is left with the orthogonality conditions (4.75), (4.76) and (4.84), (4.85). Box 4.4 reviews the general solutions of the homogeneous equations (4.86) and (4.88) for our " full column rank linear model ". Box 4.4 General solutions of homogeneous matrix equations M1/k 2 = 0
M k = Z k (I n - AA )
" for all A G := {A R n× m | AA A = A} " : rk A = m
(4.86)
226
4 The second problem of probabilistic regression
A = A L = ( A cG y A) 1 A cG y
(4.87)
" for all left inverses A L {A R m× n | ( A A)c = A A} " M1/k 2 = 0 º 1 1/ 2 » M k = Z k [I n A( A cG y A) A cG y ] rk A d m ¼
(4.88)
"unknown matrices : Z k and G y . First, (4.86) is a representation of the general solutions of the inhomogeneous matrix equations (4.84) where Z k , k {1," , A(A + 1) / 2}, are arbitrary matrices. Note that k = 1, M1 describes the " one variance component model ", otherwise the general variance-covariance components model. Here we are dealing with a special Gauss-Markov model of " full column rank ", rk A = m . In this case, the generalized inverse A is specified as the " weighted left inverse " A L of type (4.71) whose weight G y is unknown. In summarizing, representations of two matrices Z k and G y to be unknown, given H1/k 2 , M k is computed by M k = (M1/k 2 )cM1/k 2 = [I n G y A(A cG y A)1 A ']Zck Z k [I n A( A cG y A) 1 A cG y ] (4.89) definitely as a symmetric matrix. 4-33
Invariant quadratic uniformly unbiased estimations of variancecovariance components of type IQUUE
Unbiased estimations have already been introduced for the first moments E{y} = Aȟ, A R n× m , rk A = m . Similarly we like to develop the theory of the one variance component V 2 and the variance-covariance unbiased estimations for the central second moments, namely components Vk , k{1,…,A(A +1)/2}, where A is the number of blocks. Definition 4.11 tells us when we use the terminology " invariant quadratic uniformly unbiased estimation " Vˆ 2 of V 2 or Vˆ k of V k , in short " IQUUE ". Lemma 4.12 identifies Vˆ 2 IQUUE of V 2 by the additional tr VM = 1 . In contrast, Lemma 4.12 focuses on Vˆ k IQUUE of V k by means of the additional conditions tr C j M k = į jk . Examples are given in the following paragraphs. Definition 4.11 (invariant quadratic uniformly unbiased estimation Vˆ 2 of V 2 and Vˆ k of V k : IQUUE) : The vector of variance-covariance components Vˆ k is called IQUUE (Invariant Quadratic Uniformly Unbiased Estimation ) of V k with respect to the special Gauss-Markov model of full column rank.
227
4-3 Setup of BIQUUE
ª E{y}= Aȟ, AR n×m , rk A = m « d {y}= Xı, XR n ×A ( A+1) / 2 , D{y}~ Ȉ positive definite, rk Ȉ y y « «¬ rk Ȉ y = n, vech D{y}= d{y}, 2
(4.90) if the variance-covariance components ı := [V 12 , V 12 , V 22 , V 13 , V 23 ," , V A2 ]
(4.91)
are (i) a bilinear estimation
Vˆ k = y cM k y = (vec M k )c(y
y ) = tr M k yy c M k R n× n× A ( A +1) / 2
(4.92)
subject to M k = (M1/k 2 )c(M1/k 2 ) SYM := {M k R n× m× A ( A +1) / 2 | M k = M k c }
(4.93)
(ii) translational invariant in the sense of y o y E{y} =: ey
Vˆ k = y cM k y = e ycM k e y
(4.94)
or Vˆ k = (vec M k )c(y
y ) = (vec M k )c(e y
e y ) or
(4.95)
Vˆ k = tr Ȃ k yy c = tr M k e y e yc ,
(4.96)
(iii) uniformly unbiased in the sense of k = 1 (one variance component) : E{Vˆ 2 } = V 2 , V 2 R + ,
(4.97)
k t 1 (variance-covariance components): E{Vˆ k } = V k , V k {R A ( A +1) / 2 | Ȉ y positive definite},
(4.98)
with A variance components and A(A-1)/2 covariance components. Note the quantor “for all V 2 R + ” within the definition of uniform unbiasedness (4.81) for one variance component. Indeed, weakly unbiased estimators exist without the quantor (B. Schaffrin 2000). A similar comment applies to the quantor “for all V k {R A ( A +1) / 2 | Ȉ y positive definite} ” within the definition of uniform unbiasedness (4.82) for variance-covariance components. Let us characterize “ Vˆ 2 IQUUE of V 2 ”.
228
4 The second problem of probabilistic regression
Lemma 4.12 ( Vˆ 2 IQUUE of V 2 ): The scalar Vˆ 2 is IQUUE of V 2 with respect to the special GaussMarkov model of full column rank. ª"first moment " : E{y} = Aȟ, A R n×m , rk A = m « + 2 2 n× n «¬"centralsecond moment " : D{y} = V , V R , rk V = n, V R , if and only if (4.99)
(i) M1/ 2 A = 0
and
(ii) tr VM = 1 .
(4.100)
:Proof: First, we compute E{Vˆ 2 } .
Vˆ 2 = tr Me y e yc E{Vˆ 2 } = tr MȈ y = tr Ȉ y M. Second, we substitute the “one variance component model” Ȉ y = VV 2 . E{Vˆ 2 } := V 2 V 2 R
tr VM = 1.
Third, we adopt the first condition of type “IQE”.
h
The conditions for “ Vˆ k IQUUE of V k ” are only slightly more complicated. Lemma 4.13 ( Vˆ k IQUUE of V 2 ): The vector Vˆ k , k {1," , A(A + 1) / 2} is IQUUE of V k with respect to the block partitioned special Gauss-Markov model of full column rank. " first moment" ª y1 º ª A1 º «y » «A » « 2 » « 2 » E{« # »} = « # » ȟ = Aȟ, A \ n n "n « » « » « y A 1 » « A A 1 » «¬ y A »¼ «¬ A A »¼ 1 2
A 1 , nA × m
n1 + n2 + " + nA 1 + nA = n " central second moment"
, rk A = m
(4.101)
229
4-3 Setup of BIQUUE 2 ª y1 º ª V11V 1 « «y » « 2 » « V12V 12 D{« # »} = « # « » « « y A 1 » « V1A 1V 1A 1 «¬ y A »¼ « V1AV 1l ¬
V12V 12 V22V 22 # V2A 1V 2A 1 V1AV 1l
V1A 1V 1A 1 V2A 1V 2A 1 # " VA 1,A 1V A21 " VA 1,AV A 1,A " "
A
A ( A +1) / 2
j =1
j , k =1 j
D{y} = ¦ C jjV 2j +
D{y} =
¦
V1AV 1l º » V2AV 2A » » (4.102) # » VA 1,AV A 1,A » VA ,AV A2 »¼
C jk V jk
(4.103)
A ( A +1) / 2
¦
C jV j
(4.104)
j =1
ı := [V 12 , V 12 , V 22 , V 13 , V 23 , V 32 ," , V A2 ] \ A ( A +1) / 2 +1 C j \ n×n×A ( A +1) / 2
( 3d array) A 1 × nl
V11 \ n ×n , V12 \ n ×n ," , VA 1,A \ n 1
1
1
(4.105)
2
(4.106)
, VAA \ n ×n A
A
(4.107)
D{y} Ȉ y \ n×n = \ ( n +...+ n )×( n +...+ n )
(4.108)
rk Ȉ y = n, Ȉ y positive definite
(4.109)
l
1
1
l
if and only if (4.110)
(i) M1/k 2 A = 0
and
(ii)
tr C j M k = į jk .
(4.111)
Before we continue with the proof we have to comment on our setup of the variance-covariance components model. For a more easy access of an analyst we have demonstrated the blocks partitioning of the observation vector ª y1 º , dim y1 = n1 «#» # , « » «¬ y A »¼ , dim y A = nA
and
the variance-covariance ª V11V 12 " V1AV 1A º « » # ». Ȉy = « # « V1AV 1A " VAAV A2 » ¬ ¼
n1 observations build up the observation vector y1 as well as the variance factor V11. Similarly, n2 observations build up the variance factor V22. Both observations collected in the observations vectors y1 and y2, constitute the covariance factor V12. This scheme is to be continued for the other observations and their corresponding variance and covariance factors. The matrices C jj and C jk which
230
4 The second problem of probabilistic regression
map variance components V jk (k>j) to the variance-covariance matrix Ȉ y contain the variance factors V jj at {colj, rowj} while the covariance factors contain {V jkc , V jk } at {colk, rowj} and {colj, rowk}, respectively. The following proof of Lemma 4.12 is based upon the linear structure (4.88). :Proof: First, we compute E{Vˆ k } . E{Vˆ k } = tr M k Ȉ y = tr Ȉ y M k . Second, we substitute the block partitioning of the variance-covariance matrix Ȉy . A ( A +1) / 2
º A ( A +1) / 2 C jV j » tr Ȉ M = tr C j M kV j j =1 ¦ y k » j =1 » E{Vˆ k } = tr Ȉ y M k ¼
Ȉy =
¦
E{Vˆ k } = V k
A ( A +1) / 2
¦
(tr C j M k )V j = V k , V i R A ( A +1) / 2
(4.112)
j =1
tr C j M k G jk = 0 . Third, we adopt the first conditions of the type “IQE”. 4-34
Invariant quadratic uniformly unbiased estimations of one variance component (IQUUE) from Ȉ y BLUUE: HIQUUE
Here is our first example of “how to use IQUUE“. Let us adopt the residual vector e y as predicted by Ȉ y -BLUUE for a “one variance component“ dispersion model, namely D{y} = VV 2 , rk V = m . First, we prove that M1/ 2 generated by V-BLUUE fulfils both the conditions of IQUUE namely M1/ 2 A = 0 and tr VM = tr V (M1/ 2 )cM1/ 2 = 1 . As outlined in Box 4.5, the one condition of uniform unbiasedness leads to the solutions for one unknown D within the “ansatz” Z cZ = D V 1 , namely the number n-m of “degrees of freedom” or the “surjectivity defect”. Second, we follow “Helmert’s” ansatz to setup IQUUE of Helmert type, in Short “HIQUUE”. Box 4.5 IQUUE : one variance component 1st variations {E{y} = Ax, A R n× m , rk A = m, D{y} = VV 2 , rk V = m, V 2 R + } e y = [I n A ( A ' V 1A ) 1 A ' V 1 ]y
(4.31)
231
4-3 Setup of BIQUUE
1st test: IQE M1/ 2 A = 0 "if M1/ 2 = Z[I n A( A ' V 1 A ) 1 A ' V 1 ] , then M1/ 2 A = 0 " 2nd test : IQUUE "if tr VM = 1 , then tr{V[I n V 1 A( A ' V 1 A) 1 A ']Z cZ[I n A( A ' V 1 A) 1 A ' V 1 ]} = 1 ansatz : ZcZ = D V 1
(4.113)
tr VM = D tr{V[V V A( A cV A) ][I n A( AcV A) AcV ]} = 1 1
1
1
1
1
1
1
tr VM = D tr[I n A( A cV 1 A) A cV 1 ] = 1 tr I n = 0
º » tr[ A( A ' V A) A ' V ] = tr A( A ' V A) A ' V A = tr I m = m ¼ 1
1
1
1
1
tr VM = D (n m) = 1 D =
1
1 . nm
(4.114)
Let us make a statement about the translational invariance of e y predicted by Ȉ y - BLUUE and specified by the “one variance component” model Ȉ y = VV 2 . e y = e y ( Ȉ y - BLUUE) = [I n A( A ' Ȉ y1A) 1 A ' Ȉ y1 ]y .
(4.115)
Corollary 4.14 (translational invariance): e y = [I A( A ' Ȉ y1A) 1 A ' Ȉ y1 ]e y = Pe y
(4.116)
subject to P := I n A ( A ' Ȉ y1A) 1 A ' Ȉ y1 .
(4.117)
The proof is “a nice exercise”: Use e y = Py and replace y = E{y} + e y = A[ + e y . The result is our statement, which is based upon the “orthogonality condition” PA = 0 . Note that P is idempotent in the sense of P = P 2 . In order to generate “ Vˆ 2 IQUUE of V 2 ” we start from “Helmert’s ansatz”. Box 4.6 Helmert’s ansatz one variance component e cy Ȉ y1e y = ecy P ' Ȉ y1Pe y = tr PȈ y1Pe y ecy
(4.118)
E{e cy Ȉ y1e y } = tr(P ' Ȉ y1P E{e y ecy }) = tr(P ' Ȉ y1PȈ y )
(4.119)
232
4 The second problem of probabilistic regression
”one variance component“ Ȉ y = VV 2 = C1V 2 E{e cy V 1e y }= (tr P ' V 1PV )V 2
V 2 \ 2
(4.120)
tr P ' V 1 PV = tr[I n V 1 A ( A ' V 1 A ) A '] = n m
(4.121)
E{e cy V 1e y }= (n m)V 2
(4.122)
1 e cy V 1e y E{Vˆ 2 }=V 2 . nm Let us finally collect the result of “Helmert’s ansatz” in
Vˆ 2 :=
Corollary 4.15
(4.123)
( Vˆ 2 of HIQUUE of V 2 ): Helmert’s ansatz
Vˆ 2 =
1 e y ' V 1e y nm
(4.124)
is IQUUE, also called HIQUUE. 4-35
Invariant quadratic uniformly unbiased estimators of variance covariance components of Helmert type: HIQUUE versus HIQE
In the previous paragraphs we succeeded to prove that first M 1/ 2 generated by e y = e y ( Ȉ y - BLUUE) with respect to “one variance component” leads to IQUUE and second Helmert’s ansatz generated “ Vˆ 2 IQUUE of V 2 ”. Here we reverse the order. First, we prove that Helmert’s ansatz for estimating variancecovariance components may lead (or may, in general, not) lead to “ Vˆ k IQUUE of V k ”. Second, we discuss the proper choice of M1/k 2 and test whether (i) M1/k 2 A = 0 and (ii) tr H j M k = G jk is fulfilled by HIQUUE of whether M1/k 2 A = 0 is fulfilled by HIQE. Box 4.7 Helmert's ansatz variance-covariance components step one: make a sub order device of variance-covariance components:
V 0 := [V 12 , V 12 , V 2 2 , V 13 , V 12 ,..., V A 2 ]0c step two: compute Ȉ 0 := ( Ȉ y )0 = Ȉ
A ( A +1) / 2
¦ j =1
C jV j (V 0 )
(4.125)
233
4-3 Setup of BIQUUE
step three: compute e y = e y ( Ȉ 0 - BLUUE), namely 1
P (V 0 ) := (I A( A cȈ 0 A) 1 A cȈ 0 e y = P0 y = P0 e y step four: Helmert's ansatz
1
(4.126) (4.127)
e cy Ȉ 01e y = ecy P0cȈ 0-1P0e y = tr(P0 Ȉ 01P0ce y ecy )
(4.128)
E{eˆ cy Ȉ e } = tr (P0 Ȉ P c Ȉ)
(4.129)
1 0 0
-1 0 y
''variance-covariance components'' Ȉy = Ȉ
A ( A +1) / 2
¦ k =1
CkV k
(4.130)
E{e cy Ȉ -10 e cy } = tr(P0 Ȉ 01P0cCk )V k step five: multinomial inverse Ȉ=
A ( A +1) / 2
Ck V k Ȉ 1 =
¦ k =1
(4.131)
A ( A +1) / 2
¦ k =1
Ek (V j )
(4.132)
input: V 0 , Ȉ 0 , output: Ek (V 0 ). step six: Helmert's equation i, j {1," , A(A + 1) / 2} E{e cy Ei (V 0 )e y } =
A ( A +1) / 2
¦ k =1
(tr P(V 0 )Ei (V 0 )P c(V 0 )C j )V j
(4.133)
"Helmert's choice'' ecy Ei (V 0 ) e y =
A ( A +1) / 2
¦
(tr P(V 0 )Ei (V 0 )P c(V 0 )C j )V j
(4.134)
j =1
ª q := ey cEi (V 0 )ey « q = Hıˆ « H := tr P (V 0 )Ei (V 0 )P '(V 0 )C j (" Helmert ' s process ") (4.135) « 2 2 2 2 ¬ıˆ := [Vˆ1 , Vˆ12 , Vˆ 2 , Vˆ13 , Vˆ 23 , Vˆ 3 ,..., Vˆ A ] . Box 4.7 summarizes the essential steps which lead to “ Vˆ k HIQUUE of V k ” if det H = 0 , where H is the Helmert matrix. For the first step, we use some prior information V 0 = Vˆ 0 for the unknown variance-covariance components. For instance, ( Ȉ y )0 = Ȉ 0 = Diag[(V 12 ) 0 ,..., (V A2 ) 0 ] may be the available information on variance components, but leaving the covariance components with zero. Step two enforces the block partitioning of the variance-covariance matrix generating the linear space of variance-covariance components. e y = D0 e y in step three is the local generator of the Helmert ansatz in step four. Here we derive the key equation E{e y ' Ȉ -10 e y } = tr (D0 Ȉ 01D0c Ȉ) V k . Step five focuses on the multinormal inverse of the block partitioned matrix Ȉ , also called “multiple IPM”. Step six is
234
4 The second problem of probabilistic regression
taken if we replace 6 01 by the block partitioned inverse matrix, on the “Helmert’s ansatz”. The fundamental expectation equation which maps the variancecovariance components V j by means of the “Helmert traces” H to the quadratic terms q (V 0 ) . Shipping the expectation operator on the left side, we replace V j by their estimates Vˆ j . As a result we have found the aborted Helmert equation q = Hıˆ which has to be inverted. Note E{q} = Hı reproducing unbiasedness. Let us classify the solution of the Helmert equation q = Hı with respect to bias. First let us assume that the Helmert matrix is of full rank, vk H = A(A + 1) / 2 the number of unknown variance-covariance components. The inverse solution, Box 4.8, produces an update ıˆ 1 = H 1 (ıˆ 0 ) ' q(ıˆ 0 ) out of the zero order information Vˆ 0 we have implemented. For the next step, we iterate ıˆ 2 = H 1 (ıˆ 1 )q(ıˆ 1 ) up to the reproducing point Vˆ w = Vˆ w1 with in computer arithmetic when iteration ends. Indeed, we assume “Helmert is contracting”. Box 4.8 Solving Helmert's equation the fast case : rk H = A ( A + 1) / 2, det H z 0 :"iterated Helmert equation": Vˆ1 = H 1 (Vˆ 0 )q (Vˆ 0 ),..., VˆZ = HZ1 (VˆZ 1 ) q(VˆZ 1 )
(4.136)
"reproducing point" start: V 0 = Vˆ 0 Vˆ1 = H 01 q0 Vˆ 2 = H11 q1 subject to H1 := H (Vˆ1 ), q1 := q(Vˆ1 ) ... VˆZ = VˆZ 1 (computer arithmetic): end. ?Is the special Helmert variance-covariance estimator ıˆ x = H 1 q " JQUUE "? Corollary 4.16 gives a positive answer. Corollary 4.16 (Helmert equation, det H z 0); In case the Helmert matrix H is a full rank matrix, namely rk H = A ( A + 1) / 2 ıˆ = H 1q (4.137) is Ȉ f -HIQUUE at reproducing point. : Proof: q := e cy Ei e y E{ıˆ } = H E{q} = H 1 Hı = ı . 1
h
For the second case of our classification, let us assume that Helmert matrix is no longer of full rank, rk H < A(A + 1) / 2 , det H=0. Now we are left with the central question.
235
4-3 Setup of BIQUUE
? Is the special Helmert variance-covariance estimator ı = H l q = H + q of type “ MINOLESS” “ IQUUE”? n 1
Unfortunately, the MINOLESS of the rank factorized Helmert equation q = JKıˆ outlined in Box 4.9 by the weighted Moore-Penrose solution, indicates a negative answer. Instead, Corollary 4 proves Vˆ is only HIQE, but resumes also in establishing estimable variance-covariance components as “ Helmert linear combinations” of them. Box 4.9 Solving Helmert´s equation the second case: rk H < A(A + 1) / 2 , det H=0 " rank factorization" " MINOLESS" H = JK , rkH = rkF = rkG =: v
(4.138)
" dimension identities" H \ A ( A +1) / 2× A ( A +1) / 2 , J \ A ( A +1) / 2× v , G \ v × A ( A +1) / 2 H lm = H + ( weighted ) = K R ( weighted ) = J L ( weighted )
(4.139)
ıˆ lm = G ı-1K c(KG V-1K 1 )(J cG q J ) 1 G q q = HV+ , q q .
(4.140)
In case “ detH=0” Helmert´s variance-covariance components estimation is no longer unbiased, but estimable functions like Hıˆ exist: Corollary 4.17 (Helmert equation, det H=0): In case the Helmert matrix H, rkH< A(A + 1) / 2 , det H=0, is rank deficient, the Helmert equation in longer generates an unbiased IQE. An estimable parameter set is H Vˆ : Hıˆ = HH + q is Ȉ 0 HIQUUE (i) (4.141) (ii)
Vˆ is IQE . :Proof: (i) E{ıˆ } = H + E{q} = H + Hı z ı , ıˆ IQE
(ii) E{Hıˆ } = HH + E{q} = HH + Hı = Hı , Hıˆ HIQUUE.
h In summary, we lost a bit of our illusion that ı y ( Ȉ y BLUUE) now always produces IQUUE.
236
4 The second problem of probabilistic regression
“ The illusion of progress is short, but exciting” “ Solving the Helmert equations” IQUUE versus IQE
det H z 0
det H=0
ıˆ k is Ȉ 0 HIQUUE of V k
ıˆ k is only HIQE of ı k Hıˆ k is Ȉ 0 -IQUUE .
Figure 4.1 : Solving the Helmert equation for estimating variance-covariancecomponents Figure 4.1 illustrates the result of Corollary 4 and Corollary 5. Another drawback is that we have no guarantee that HIQE or HIQUUE ˆ . Such a postulate can generates a positive definite variance-covariance matrix Ȉ be enforced by means of an inequality constraint on the Helmert equation Hıˆ = q of type “ ıˆ > 0 ” or “ ıˆ > ı ” in symbolic writing. Then consult the text books on “ positive variance-covariance component estimation”. At this end, we have to give credit to B. Schaffrin (1.83, p.62) who classified Helmert´s variance-covariance components estimation for the first time correctly. 4-36
Best quadratic uniformly unbiased estimations of one variance component: BIQUUE
First, we give a definition of “best” Vˆ 2 IQUUE of V 2 within Definition 4.18 namely for a Gauss normal random variable y Y = {\ n , pdf} . Definition 4.19 presents a basic result representing “Gauss normal” BIQUUE. In particular we outline the reduction of fourth order moments to second order moments if the random variable y is Gauss normal or, more generally, quasi-normal. At same length we discuss the suitable choice of the proper constrained Lagrangean generating Vˆ 2 BIQUUE of V 2 . The highlighted is Lemma 4 where we resume the normal equations typical for BIQUUE and Theorem 4 with explicit representations of Vˆ 2 , D{Vˆ 2 } and Dˆ {Vˆ 2 } of type BIQUUE with respect to the special Gauss-Markov model with full column rank. ? What is the " best" Vˆ 2 IQUUE of V 2 ? First, let us define what is "best" IQUUE. Definition 4.18 ( Vˆ 2 best invariant quadratic uniformly unbiased estimation of V 2 : BIQUUE) Let y {\ n , pdf } be a Gauss normal random variable representing the stochastic observation vector. Its central moments up to order four
237
4-3 Setup of BIQUUE
E{eiy } = 0 , E{eiy e yj } = S ij = vijV 2
(4.142)
E{e e e } = S ijk = 0, (obliquity)
(4.143)
E{eiy e yj eky ely } = S ijkl = S ijS kl + S ik S jl + S ilS jk = = (vij vkl + vik v jl + vil v jk )V 4
(4.144)
y y y i j k
relate to the "centralized random variable" (4.145) e y := y E{y} = [eiy ] . The moment arrays are taken over the index set i, j, k, l {1000, n} when the natural number n is identified as the number of observations. n is the dimension of the observation space Y = {\ n , pdf } . The scalar Vˆ 2 is called BIQUUE of V 2 ( Best Invariant Quadratic Uniformly Unbiased Estimation) of the special Gauss-Markov model of full column rank. "first moments" : E{y} = Aȟ, A \ n× m , ȟ \ m , rk A = m (4.146) "central second moments": D{y} å y = VV 2 , V \ n×m , V 2 \ + , rk V = n m
(4.147) 2
+
where ȟ \ is the first unknown vector and V \ the second unknown " one variance component", if it is. (i)
a quadratic estimation (IQE): Vˆ 2 = y cMy = (vec M )cy
y = tr Myy c
(4.148)
subject to 1 2
1 2
M = (M )cM SYM := {M \ n×m M = M c} (ii)
translational invariant, in the sense of y o y E{y} =: e y ˆ V 2 = y cMy = ecy Mey or equivalently Vˆ 2 = (vec M )c y
y = (vec M )ce y
e y Vˆ 2 = tr Myy c = tr Me y ecy
(iii)
(4.149) (4.150) (4.151) (4.152) (4.153)
uniformly unbiased in the sense of
E{Vˆ 2 } = V 2 , V 2 \ + and (iv) of minimal variance in the sense
(4.154)
D{Vˆ 2 } := E{[Vˆ 2 E{Vˆ 2 }]2 } = min . M
(4.155)
238
4 The second problem of probabilistic regression
In order to produce "best" IQUUE we have to analyze the variance E{[Vˆ 2 E{Vˆ 2 }]1 } of the invariant quadratic estimation Vˆ 2 the "one variance component", of V 2 . In short, we present to you the result in Corollary 4.19 (the variance of Vˆ with respect to a Gauss normal IQE): If Vˆ 2 is IQE of V 2 , then for a Gauss normal observation space Y = {\ n , pdf } the variance of V 2 of type IQE is represented by E{[Vˆ 2 E{Vˆ 2 }]2 } = 2 tr M cVMV .
(4.156)
: Proof: ansatz: IQE
Vˆ = tr Me y ecy E{Vˆ 2 } = (tr MV )V 2 E{[Vˆ 2 E{Vˆ 2 }]} = E{[tr Me y ecy (tr MV)V 2 ][tr Me y ecy (tr MV)V 2 ]} = = E{(tr Me y ecy )(tr Me y ecy )} (tr MV) 2 V 4 (4.156). 2
h 2
2
With the “ansatz” Vˆ IQE of V we have achieved the first decomposition of var {Vˆ 2 } . The second decomposition of the first term will lead us to central moments of fourth order which will be decomposed into central moments of second order for a Gauss normal random variable y. The computation is easiest in “Ricci calculus“. An alternative computation of the reduction “fourth moments to second moments” in “Cayley calculus” which is a bit more advanced, is gives in Appendix D. E{(tr Me y ecy )(tr Me y ecy )} = = = =
n
n
¦
m ij m kl E{eiy e yj e ky ely } =
¦
m ij m kl ( ʌij ʌ kl + ʌik ʌ jl + ʌil ʌ jk ) =
¦
m ij m kl ( v ij v kl + v ik v jl + v il v jk )ı 4
i , j , k ,l =1 n i , j , k ,l =1 n i , j , k ,l =1
¦
i , j , k ,l =1
m ij m kl ʌijkl =
Ǽ{(tr Me y ecy )(tr 0 e y ecy )} = V 4 (tr MV ) 2 + 2V 4 tr(MV ) 2 .
(4.157)
A combination of the first and second decomposition leads to the final result. E{[Vˆ 2 E{Vˆ 2 }]} = E{(tr Me y ecy )(tr Me y ecy )} V 4 (tr MV) = = 2V 4 (tr MVMV ).
h A first choice of a constrained Lagrangean for the optimization problems “BIQUUE”, namely (4.158) of Box 4.10, is based upon the variance E{[Vˆ 2 E{Vˆ 2 }] IQE}
239
4-3 Setup of BIQUUE
constrained to “IQE” and the condition of uniform unbiasedness ( tr VM ) -1 = 0 as well as (ii) the condition of the invariant quadratic estimation A c(M1/ 2 ) = 0 . (i)
A second choice of a constrained Lagrangean generating Vˆ 2 BIQUUE of V 2 , namely (4.163) of Box 4.10, takes advantage of the general solution of the homogeneous matrix equation M1/ 2 A = 0 which we already obtained for “IQE”. (4.73) is the matrix container for M. In consequence, building into the Lagrangean the structure of the matrix M, desired by the condition of the invariance quadratic estimation Vˆ 2 IQE of V 2 reduces the first Lagrangean by the second condition. Accordingly, the second choice of the Lagrangean (4.163) includes only one condition, in particular the condition for an uniformly unbiased estimation ( tr VM )-1=0 . Still we are left with the problem to make a proper choice for the matrices ZcZ and G y . The first "ansatz" ZcZ = ĮG y produces a specific matrix M, while the second "ansatz" G y = V 1 couples the matrix of the metric of the observation space to the inverse variance factor V 1 . Those " natural specifications" reduce the second Lagrangean to a specific form (4.164), a third Lagrangean which only depends on two unknowns, D and O0 . Now we are prepared to present the basic result for Vˆ 2 BIQUUE of V 2 . Box 4.10 Choices of constrained Lagrangeans generating Vˆ 2 BIQUUE of V 2 "a first choice" L(M1/ 2 , O0 , A1 ) := 2 tr(MVMV ) + 2O0 [(tr VM ) 1] + 2 tr A1 Ac(M1/ 2 )c (4.158) M = (M
1/ 2
)cM
1/ 2
"a second choice" = [I n - G y A(A cG y A) 1 A c]Z cZ[I n A(A cG y A)-1 A cG y ] (4.159) ansatz : ZcZ = ĮG y M = ĮG y [I n A( A cG y A) 1 AcG y ]
(4.160)
VM = ĮVG y [I n A( A cG y A) 1 AcG y ]
(4.161)
ansatz : G y = V 1 VM = Į[I n A( AcV 1 A) 1 A cV 1 ]
(4.162)
L(Į, O0 ) = tr MVMV + 2O0 [( VM 1)]
(4.163)
tr MVMV = Į tr[I n A( AcV A) A cV ] = Į ( n m) 2
1
1
1
2
tr VM = Į tr[I n A( A cV 1 A) 1 A cV 1 ] = Į ( n m) L(Į, O0 ) = Į 2 (n m) + 2O0 [D (n m) 1] = min . Į , O0
(4.164)
240
4 The second problem of probabilistic regression
Lemma 4.20 ( Vˆ 2 BIQUUE of V 2 ): The scalar Vˆ 2 = y cMy is BIQUUE of V 2 with respect to special GaussMarkov model of full column rank, if and only if the matrix D together with the "Lagrange multiplier" fulfills the system of normal equations 1 º ª Dˆ º ª 0 º ª 1 «¬ n m 0 »¼ «Oˆ » = «¬1 »¼ ¬ 0¼
(4.165)
solved by 1 1 Dˆ = , O0 = . nm nm
(4.166)
: Proof: Minimizing the constrained Lagrangean L(D , O0 ) = D 2 (n m) + 2O0 [D ( n m) 1] = min D , O0
leads us to the necessary conditions 1 wL (Dˆ , Oˆ0 ) = Dˆ (n m) + Oˆ0 ( n m) = 0 2 wD 1 wL (Dˆ , Oˆ0 ) = Dˆ (n m) 1 = 0 2 wO0 or 1 º ª Dˆ º ª 0 º ª 1 «¬ n m 0 »¼ « Oˆ » = «¬1 »¼ ¬ 0¼ solved by Dˆ = Oˆ0 =
1 . nm
1 w2 L (Dˆ , Oˆ0 ) = n m × 0 2 wD 2 constitutes the necessary condition, automatically fulfilled. Such a solution for the parameter D leads us to the " BIQUUE" representation of the matrix M. M=
1 V 1 [I n A( AcV 1 A) 1 A cV 1 ] . nm
(4.167)
h 2
2
2
Explicit representations Vˆ BIQUUE of V , of the variance D{Vˆ } and its estimate D{ıˆ 2 } are highlighted by
241
4-3 Setup of BIQUUE
Theorem 4.21 ( Vˆ BIQUUE of V 2 ): Let Vˆ 2 = y cMy = (vec M )c(y
y ) = tr Myy c be BIQUUE of V 2 with reseat to the special Gauss-Markov model of full column rank. (i) Vˆ 2 BIQUUE of V 2 Explicit representations of Vˆ 2 BIQUUE of V 2
Vˆ 2 = (n m) 1 y c[V 1 V 1 A( A cV 1 A) 1 A cV 1 ]y
(4.168)
Vˆ 2 = (n m) 1 e cV 1e
(4.169)
subject to e = e ( BLUUE). (ii) D{ Vˆ 2
BIQUUE}
BIQUUE´s variance is explicitly represented by D{Vˆ 2 | BIQUUE} = E{[Vˆ 2 E{Vˆ 2 }]2 BIQUUE} = 2(n m) 1 (V 2 ) 2 . (4.170) (iii) D {Vˆ 2 } An estimate of BIQUUE´s variance is Dˆ {Vˆ 2 } = 2(n m) 1 (Vˆ 2 )
(4.171)
Dˆ {Vˆ } = 2(n m) (e cV e ) . 2
3
1
2
(4.172)
: Proof: We have already prepared the proof for (i). Therefore we continue to prove (ii) and (iii) (i) D{ıˆ 2 BIQUUE} D{Vˆ 2 } = E{[Vˆ 2 E{Vˆ 2 }]2 } = 2V 2 tr MVMV, 1 MV = [I n A( AcV 1 A) 1 AcV 1 ], nm 1 [I n A( A cV 1A) 1 A cV 1 ], MVMV = ( n m) 2 1 nm D{Vˆ 2 } = 2(n m) 1 (V 2 ) 2 . tr MVMV =
(iii) D{Vˆ 2 } Just replace within D{Vˆ 2 } the variance V 2 by the estimate Vˆ 2 . Dˆ {Vˆ 2 } = 2(n m) 1 (Vˆ 2 ) 2 .
h
242
4 The second problem of probabilistic regression
Upon writing the chapter on variance-covariance component estimation I learnt about the untimely death of J.F. Seely, Professor of Statistics at Oregon State University, on 23 February 2002. J.F. Seely, born on 11 February 1941 in the small town of Mt. Pleasant, Utah, who made various influential contributions to the theory of Gauss-Markov linear model, namely the quadratic statistics for estimation of variance components. His Ph.D. adviser G. Zyskind had elegantly characterized the situation where ordinary least squares approximation of fixed effects remains optimal for mixed models: the regression space should be invariant under multiplication by the variancecovariance matrix. J.F. Seely extended this idea to variance-covariance component estimation, introducing the notion of invariant quadratic subspaces and their relation to completeness. By characterizing the class of admissible embiased estimators of variance-covariance components. In particular, the usual ANOVA estimator in 2-variance component models is inadmissible. Among other contributions to the theory of mixed models, he succeeded in generalizing and improving on several existing procedures for tests and confidence intervals on variance-covariance components. Additional Reading Seely. J. and Lee, Y. (confidence interval for a variance: 1994), Azzam, A., Birkes, A.D. and Seely, J. (admissibility in linear models, polyhydral covariance structure: 1988), Seely, J. and Rady, E. (random effects – fixed effects, linear hypothesis: 1988), Seely, J. and Hogg, R.V. (unbiased estimation in linear models: 1982), Seely, J. (confidence intervals for positive linear combinations of variance components, 1980), Seely, J. (minimal sufficient statistics and completeness, 1977), Olsen, A., Seely, J. and Birkes, D. (invariant quadratic embiased estimators for two variance components, 1975), Seely, J. (quadratic subspaces and completeness, 1971) and Seely, J. (linear spaces and unbiased estimation, 1970).
5
The third problem of algebraic regression - inconsistent system of linear observational equations with datum defect: overdetermined- undertermined system of linear equations: {Ax + i = y | A \ n×m , y R ( A ) rk A < min{m, n}} :Fast track reading: Read only Lemma 5 (MINOS) and Lemma 5.9 (HAPS)
Lemma 5.2 G x -minimum norm, G y -least squares solution Lemma 5.3 G x -minimum norm, G y -least squares solution
Definition 5.1 G x -minimum norm, G y -least squares solution
Lemma 5.4 MINOLESS, rank factorization
Lemma 5.5 MINOLESS additive rank partitioning
Lemma 5.6 characterization of G x , G y -MINOS Lemma 5.7 eigenspace analysis versus eigenspace synthesis
244
5 The third problem of algebraic regression
Lemma 5.9 D -HAPS
Definition 5.8 D -HAPS
Lemma 5.10 D -HAPS
We shall outline three aspects of the general inverse problem given in discrete form (i) set-theoretic (fibering), (ii) algebraic (rank partitioning; “IPM”, the Implicit Function Theorem) and (iii) geometrical (slicing). Here we treat the third problem of algebraic regression, also called the general linear inverse problem: An inconsistent system of linear observational equations
{Ax + i = y | A \ n× m , rk A < min {n, m}} also called “under determined - over determined system of linear equations” is solved by means of an optimization problem. The introduction presents us with the front page example of inhomogeneous equations with unknowns. In terms of boxes and figures we review the minimum norm, least squares solution (“MINOLESS”) of such an inconsistent, rank deficient system of linear equations which is based upon the trinity
5-1 Introduction
245
5-1 Introduction With the introductory paragraph we explain the fundamental concepts and basic notions of this section. For you, the analyst, who has the difficult task to deal with measurements, observational data, modeling and modeling equations we present numerical examples and graphical illustrations of all abstract notions. The elementary introduction is written not for a mathematician, but for you, the analyst, with limited remote control of the notions given hereafter. May we gain your interest? Assume an n-dimensional observation space, here a linear space parameterized by n observations (finite, discrete) as coordinates y = [ y1 ," , yn ]c R n in which an m-dimensional model manifold is embedded (immersed). The model manifold is described as the range of a linear operator f from an m-dimensional parameter space X into the observation space Y. As a mapping f is established by the mathematical equations which relate all observables to the unknown parameters. Here the parameter space X , the domain of the linear operator f, will be also restricted to a linear space which is parameterized by coordinates x = [ x1 ," , xm ]c R m . In this way the linear operator f can be understood as a coordinate mapping A : x 6 y = Ax. The linear mapping f : X o Y is geometrically characterized by its range R(f), namely R(A), defined by R(f):= {y R n | y = f(x) for some x X} which in general is a linear subspace of Y and its kernel N(f), namely N(A), defined by N ( f ) := {x X | f (x) = 0}. Here the range R(f), namely the range space R(A), does not coincide with the ndimensional observation space Y such that y R (f ) , namely y R (A) . In addition, we shall assume here that the kernel N(f), namely null space N(A) is not trivial: Or we may write N(f) z {0}. First, Example 1.3 confronts us with an inconsistent system of linear equations with a datum defect. Second, such a system of equations is formulated as a special linear model in terms of matrix algebra. In particular we are aiming at an explanation of the terms “inconsistent” and “datum defect”. The rank of the matrix A is introduced as the index of the linear operator A. The left complementary index n – rk A is responsible for surjectivity defect, which its right complementary index m – rk A for the injectivity (datum defect). As a linear mapping f is neither “onto”, nor “one-to-one” or neither surjective, nor injective. Third, we are going to open the toolbox of partitioning. By means of additive rank partitioning (horizontal and vertical rank partitioning) we construct the minimum norm – least squares solution (MINOLESS) of the inconsistent system of linear equations with datum defect Ax + i = y , rk A d min{n, m }. Box 5.3 is an explicit solution of the MINOLESS of our front page example. Fourth, we present an alternative solution of type “MINOLESS” of the front page example by multiplicative rank partitioning. Fifth, we succeed to identify
246
5 The third problem of algebraic regression
the range space R(A) and the null space N(A) using the door opener “rank partitioning”. 5-11
The front page example Example 5.1 (inconsistent system of linear equations with datum defect: Ax + i = y, x X = R m , y Y R n A R n× m , r = rk A d min{n, m} ):
Firstly, the introductory example solves the front page inconsistent system of linear equations with datum defect, x1 + x2 1
x1 + x2 + i1 = 1
x2 + x3 1
x2 + x3 + i2 = 1
or
+ x1 x3 3
+ x1 x3 + i3 = 3
obviously in general dealing with the linear space X = R m x, dim X = m, here m=3, called the parameter space, and the linear space Y = R n y , dim Y = n, here n = 3 , called the observation space. 5-12
The front page example in matrix algebra
Secondly, by means of Box 5 and according to A. Cayley’s doctrine let us specify the inconsistent system of linear equations with datum defect in terms of matrix algebra. Box 5.1: Special linear model: three observations, three unknowns, rk A =2 ª y1 º ª a11 y = «« y2 »» = «« a21 ¬« y3 ¼» ¬« a31
a12 a22 a32
a13 º ª x1 º ª i1 º a23 »» «« x2 »» + ««i2 »» a33 ¼» ¬« x3 ¼» ¬« i3 ¼»
ª 1 º ª 1 1 0 º ª x1 º ª i1 º y = Ax + i : «« 1 »» = «« 0 1 1 »» «« x2 »» + ««i2 »» «¬ 3»¼ «¬ 1 0 1»¼ «¬ x3 »¼ «¬ i3 »¼ x c = [ x1 , x2 , x3 ], y c = [ y1 , y2 , y3 ] = [1, 1, 3], i c = [i1 , i2 , i3 ] x R 3×1 , y Z 3×1 R 3×1 ª 1 1 0 º A := «« 0 1 1 »» Z 3×3 R 3×3 «¬ 1 0 1»¼ r = rk A = 2 .
247
5-1 Introduction
The matrix A R n× m , here A R 3×3 , is an element of R n× m generating a linear mapping f : x 6 Ax. A mapping f is called linear if f (O x1 + x2 ) = O f ( x1 ) + f ( x2 ) holds. The range R(f), in geometry called “the range space R(A)”, and the kernel N(f), in geometry called “the null space N(A)” characterized the linear mapping as we shall see. ? Why is the front page system of linear equations called inconsistent ? For instance, let us solve the first two equations, namely -x1 + x3 = 2 or x1 – x3 = -2, in order to solve for x1 and x3. As soon as we compare this result to the third equation we are led to the inconsistency 2 = 3. Obviously such a system of linear equations needs general inconsistency parameters (i1 , i2 , i3 ) in order to avoid contradiction. Since the right-hand side of the equations, namely the in homogeneity of the system of linear equations, has been measured as well as the linear model (the model equations) has been fixed, we have no alternative but inconsistency. Within matrix algebra the index of the linear operator A is the rank r = rk A , here r = 2, which coincides neither with dim X = m, (“parameter space”) nor with dim Y = n (“observation space”). Indeed r = rk A < min {n, m}, here r = rk A < min{3, 3}. In the terminology of the linear mapping f, f is neither onto (“surjective”), nor one-to-one (“injective”). The left complementary index of the linear mapping f, namely the linear operator A, which accounts for the surjectivity defect, is given by d s = n rkA, also called “degree of freedom” (here d s = n rkA = 1 ). In contrast, the right complementary index of the linear mapping f, namely the linear operator A, which accounts for the injectivity defect is given by d = m rkA (here d = m rkA = 1 ). While “surjectivity” relates to the range R(f) or “the range space R(A)” and “injectivity” to the kernel N(f) or “the null space N(A)” we shall constructively introduce the notion of range R ( f ) range space R (A)
versus
kernel N ( f ) null space N ( f )
by consequently solving the inconsistent system of linear equations with datum defect. But beforehand let us ask: ? Why is the inconsistent system of linear equations called deficient with respect to the datum ? At this point we have to go back to the measurement process. Our front page numerical example has been generated from measurements with a leveling instrument: Three height differences ( yDE , yEJ , yJD ) in a triangular network have been observed. They are related to absolute height x1 = hD , x2 = hE , x3 = hJ by means of hDE = hE hD , hEJ = hJ hE , hJD = hD hJ at points {PD , PE , PJ } , outlined in more detail in Box 5.1.
248
5 The third problem of algebraic regression
Box 5.2: The measurement process of leveling and its relation to the linear model y1 = yDE = hDE + iDE = hD + hE + iDE y2 = yEJ = hEJ + iEJ = hE + hJ + iEJ y3 = yJD = hJD + iJD = hJ + hD + iJD ª y1 º ª hD + hE + iDE º ª x1 + x2 + i1 º ª 1 1 0 º ª x1 º ª i1 º « y » = « h + h + i » = « x + x + i » = « 0 1 1 » « x » + « i » . J EJ » « 2» « E « 2 3 2» « »« 2» « 2» «¬ y3 »¼ «¬ hJ + hD + iJD »¼ «¬ x3 + x1 + i3 »¼ «¬ 1 0 1»¼ «¬ x3 »¼ «¬ i3 »¼ Thirdly, let us begin with a more detailed analysis of the linear mapping f : Ax y or Ax + i = y , namely of the linear operator A R n× m , r = rk A d min{n, m}. We shall pay special attention to the three fundamental partitioning, namely
5-13
(i)
algebraic partitioning called additive and multiplicative rank partitioning of the matrix A,
(ii)
geometric partitioning called slicing of the linear space X (parameter space) as well as of the linear space Y (observation space),
(iii)
set-theoretical partitioning called fibering of the set X of parameter and the set Y of observations.
Minimum norm - least squares solution of the front page example by means of additive rank partitioning
Box 5.3 is a setup of the minimum norm – least squares solution of the inconsistent system of inhomogeneous linear equations with datum defect following the first principle “additive rank partitioning”. The term “additive” is taken from the additive decomposition y1 = A11x1 + A12 x 2 and y 2 = A 21x1 + A 22 x 2 of the observational equations subject to A11 R r × r , rk A11 d min{ n, m}. Box 5.3: Minimum norm-least squares solution of the inconsistent system of inhomogeneous linear equations with datum defect , “additive rank partitioning”. The solution of the hierarchical optimization problem (1st)
|| i ||2I = min : x
xl = arg{|| y Ax || I = min | Ax + i = y, A R n×m , rk A d min{ n, m }} 2
249
5-1 Introduction
(2nd)
|| x l ||2I = min : xl
xlm = arg{|| xl ||2I = min | AcAxl = Acy, AcA R m×m , rk AcA d m} is based upon the simultaneous horizontal and vertical rank partitioning of the matrix A, namely ªA A = « 11 ¬ A 21
A12 º , A R r × r , rk A11 = rk A =: r A 22 »¼ 11 with respect to the linear model y = Ax + i
y1 R r ×1 , x1 R r ×1 ª y1 º ª A11 A12 º ª x1 º ª i1 º + « », «y » = « A » « » y 2 R ( n r )×1 , x 2 R ( m r )×1 . ¬ 2 ¼ ¬ 21 A 22 ¼ ¬ x 2 ¼ ¬ i 2 ¼ First, as shown before, we compute the least-squares solution || i ||2I = min or ||y Ax ||2I = min which generates standard normal x x equations A cAxl = A cy c A11 + Ac21 A 21 ª A11 « Ac A + Ac A ¬ 12 11 22 21
or c A12 + Ac21 A 22 º ª x1 º ª A11 c A11 =« » « » c A12 + A c22 A 22 ¼ ¬ x 2 ¼ ¬ A12 c A12
Ac21 º ª y1 º A c22 »¼ ¬« y 2 ¼»
or ª N11 «N ¬ 21
N12 º ª x1l º ª m1 º = N 22 »¼ «¬ x 2 l »¼ «¬m 2 »¼ subject to
c A11 + A c21 A 21 , N12 := A11 c A12 + Ac21 A 22 , m1 = A11 c y1 + A c21y 2 N11 := A11 c A11 + A c22 A 21 , N 22 := A12 c A12 + A c22 A 22 , m 2 = A12 c y1 + A c22 y 2 , N 21 := A12 which are consistent linear equations with an (injectivity) defect d = m rkA . The front page example leads us to ªA A = « 11 ¬ A 21
ª 1 1 0 º A12 º « = 0 1 1 »» A 22 »¼ « «¬ 1 0 1»¼ or
ª 1 1 º ª0º A11 = « , A12 = « » » ¬ 0 1¼ ¬1 ¼ A 21 = [1 0] , A 22 = 1
250
5 The third problem of algebraic regression
ª 2 1 1º A cA = «« 1 2 1»» «¬ 1 1 2 »¼ ª 2 1º ª 1º N11 = « , N12 = « » , | N11 |= 3 z 0, » ¬ 1 2 ¼ ¬ 1¼ N 21 = [ 1 1] , N 22 = 2 ª1º ª 4 º c y1 + A c21y 2 = « » y1 = « » , m1 = A11 ¬1¼ ¬0¼ c y1 + A c22 y 2 = 4 . y 2 = 3, m 2 = A12 Second, we compute as shown before the minimum norm solution || x l ||2I = min or x1cx1 + x c2 x 2 which generates the standard normal x equations in the following way. l
L (x1 , x 2 ) = x1cx1 + xc2 x 2 = 1 1 1 1 c N11 = (xc2l N12 m1cN11 )(N11 N12 x 2l N11 m1 ) + xc2l x 2l = min
x2
“additive decomposition of the Lagrangean” L = L 0 + L1 + L 2 2 2 c N11 L 0 := m1cN11 m1 , L1:= 2xc2l N12 m1 2 c N11 L 2 := xc2l N12 N12 x 2l + xc2l x 2l
wL 1 wL1 1 wL2 (x 2lm ) = 0 (x 2lm ) + (x 2lm ) = 0 2 wx 2 2 wx 2 wx 2 2 2 c N11 c N11 N12 m1 + (I + N12 N12 )x 2lm = 0 2 2 c N11 c N11 x 2lm = (I + N12 N12 ) 1 N12 m1 ,
which constitute the necessary conditions. The theory of vector derivatives is presented in Appendix B. Following Appendix A Facts: Cayley inverse: sum of two matrices, formula (s9), (s10), namely (I + BC1 A c) 1 BC1 = B( AB + C) 1 for appropriate dimensions of the involved matrices, such that the identities holds 2 2 2 1 c N11 c N11 c ( N12 N12 c + N11 ( I + N12 N12 ) 1 N12 = N12 ) we finally find c (N12 N12 c + N112 ) 1 m1 . x 2 lm = N12 The second derivatives 1 w2L c N112 N12 + I ) > 0 (x 2 lm ) = (N12 2 wx 2 wxc2
251
5-1 Introduction 2 c N11 due to positive-definiteness of the matrix I + N12 N12 generate the sufficiency condition for obtaining the minimum of the unconstrained Lagrangean. Finally let us backward transform 1 1 x 2 l 6 x1m = N11 N12 x 2 l + N11 m1 , 1 2 1 c (N12 N12 c + N11 ) m1 + N11 x1lm = N111 N12 N12 m1 .
Let us right multiply the identity c = N11N11 c + N12 N12 c + N11N11 c N12 N12 c + N11 N11 c ) 1 such that by (N12 N12 c (N12 N12 c + N11 N11 c ) 1 = N11 N11 c (N12 N12 c + N11N11 c ) 1 + I N12 N12 holds, and left multiply by N111 , namely 1 c (N12 N12 c + N11 N11 c ) 1 = N11 c (N12 N12 c + N11 N11 c ) 1 + N11 N111 N12 N12 .
Obviously we have generated the linear form c (N12 N12 c + N11N11 c ) 1 m1 ª x1lm = N11 « c (N12 N12 c + N11N11 c ) 1 m1 ¬ x 2lm = N12 or ª x º ª Nc º c + N11N11 c ) 1 m1 xlm = « 1lm » = « 11 » (N12 N12 c ¼ ¬ x 2lm ¼ ¬ N12 or ª A c A + A c21A 21 º x lm = « 11 11
c A11 + A c22 A 21 »¼ ¬ A12 c A12 + A c21A 22 )( A12 c A11 + A c22 A 21 ) + ( A11 c A11 + A c21A 21 ) 2 ]1
[( A11 c y1 + A c21y 2 ].
[( A11 Let us compute numerically xlm for the front page example. ª 5 4 º ª1 1º c =« c =« N11N11 , N12 N12 » » ¬ 4 5 ¼ ¬1 1¼ ª 6 3 º 1 ª6 3º c + N11N11 c =« c + N11N11 c ]1 = N12 N12 , [N12 N12 » 27 «¬ 3 6 »¼ ¬ 3 6 ¼ ª 4 º 1 ª 4 º 4 4 2 m1 = « » x1lm = « » , x 2lm = , || xlm ||2I = 0 0 3 3 3 ¬ ¼ ¬ ¼
252
5 The third problem of algebraic regression
4 4 x1lm = hˆD = , x2lm = hˆE = 0, x3lm = hˆJ = 3 3 4 || xlm ||2I = 2 3 x + x + x = 0 ~ hˆ + hˆ + hˆ = 0. 1lm
2lm
D
3lm
E
J
The vector i lm of inconsistencies has to be finally computed by means of i lm = y Axlm ª1º 1 1 i lm = ««1»» , Aci l = 0, || i lm ||2I = 3. 3 3 «¬1»¼ The technique of horizontal and vertical rank partitioning has been pioneered by H. Wolf (1972,1973). h 5-14
Minimum norm - least squares solution of the front page example by means of multiplicative rank partitioning:
Box 5.4 is a setup of the minimum norm-least squares solution of the inconsistent system of inhomogeneous linear equations with datum defect following the first principle “multiplicative rank partitioning”. The term “multiplicative” is taken from the multiplicative decomposition y = Ax + i = DEy + i of the observational equations subject to A = DE, D R n×r , E R r × m , rk A = rk D = rk E d min{n, m} . Box 5.4: Minimum norm-least squares solution of the inconsistent system of inhomogeneous linear equations with datum defect multiplicative rank partitioning The solution of the hierarchical optimization problem (1st) ||i ||2I = min : x
xl = arg{|| y Ax ||2I = min | Ax + i = y , A R n×m , rk A d min{ n, m }} (2nd)
||x l ||2I = min : xl
x lm = arg{|| x l || I = min | A cAx l = A cy, AcA R m×m , rk AcA d m} 2
is based upon the rank factorization A = DE of the matrix A R n× m subject to simultaneous horizontal and vertical rank partitioning of the matrix A, namely
253
5-1 Introduction
ª D R n×r , rk D = rk A =: r d min{n, m} A = DE = « r ×m ¬E R , rk E = rk A =: r d min{n, m} with respect to the linear model y = Ax + i y = Ax + i = DEx + i
ª Ex =: z « DEx = Dz y = Dz + i . ¬
First, as shown before, we compute the least-squares solution || i ||2I = min or ||y Ax ||2I = min which generates standard normal x x equations DcDz l = Dcy z l = (DcD) 1 Dcy = Dcl y , which are consistent linear equations of rank rk D = rk DcD = rk A = r. The front page example leads us to ªA A = DE = « 11 ¬ A 21
ª 1 1 0 º ª 1 1 º A12 º « » = « 0 1 1 » , D = «« 0 1»» R 3×2 » A 22 ¼ «¬ 1 0 1»¼ «¬ 1 0 »¼ or
DcDE = DcA E = (DcD) 1 DcA ª 2 1º ª1 0 1º 1 ª2 1º 2×3 DcD = « , (DcD) 1 = « E=« » » »R 3 ¬1 2¼ ¬ 1 2 ¼ ¬ 0 1 1¼ 1 ª1 0 1º z l = (DcD) 1 Dc = « y 3 ¬0 1 1»¼ ª1º 4 ª2º y = «« 1 »» z l = « » 3 ¬1 ¼ «¬ 3»¼ 1 ª1 0 1º z l = (DcD) 1 Dc = « y 3 ¬ 0 1 1»¼ ª1º 4 ª2º y = «« 1 »» z l = « » . 3 ¬1 ¼ «¬ 3»¼ Second, as shown before, we compute the minimum norm solution || x A ||2I = min of the consistent system of linear equations with x datum defect, namely A
254
5 The third problem of algebraic regression
xlm = arg{|| xl ||2I = min | Exl = ( DcD) 1 Dcy }. xl
As outlined in Box1.3 the minimum norm solution of consistent equations with datum defect namely Exl = (DcD) 1 Dcy, rk E = rk A = r is xlm = Ec(EEc) 1 (DcD) 1 Dcy xlm = Em Dl y = A lm y = A+y ,
which is limit on the minimum norm generalized inverse. In summary, the minimum norm-least squares solution generalized inverse (MINOLESS g-inverse) also called pseudo-inverse A + or Moore-Penrose inverse is the product of the MINOS g-inverse Em (right inverse) and the LESS g-inverse Dl (left inverse). For the front page example we are led to compute ª1 0 1º ª2 1º E=« , EEc = « » » ¬1 2¼ ¬0 1 1¼ ª2 1 ª 2 1º 1« 1 (EEc) = « , Ec(EEc) = « 1 3 ¬ 1 2 »¼ 3 «¬ 1 ª 1 0 1« 1 1 xlm = Ec(EEc) (DcD) Dcy = « 1 1 3 «¬ 0 1 1
1º 2 »» 1 »¼ 1º 0 »» y 1»¼
ª1º ª 4 º ª 1º 1« » 4« » 4 « » y = « 1 » xlm = « 0 » = « 0 » , || xlm ||= 2 3 3 3 «¬ 3»¼ «¬ +4 »¼ «¬ +1»¼ ˆ ª x1lm º ª« hD º» ª 1º 4« » « » ˆ xlm = « x2lm » = « hE » = « 0 » 3 «¬ x3lm »¼ «« hˆ »» «¬ +1»¼ J ¬ ¼ 4 || xlm ||= 2 3 x1lm + x2lm + x3lm = 0 ~ hˆD + hˆE + hˆJ = 0. The vector i lm of inconsistencies has to be finally computed by means of i lm := y Axlm = [I n AAlm ]y, i lm = [I n Ec(EEc) 1 (DcD) 1 Dc]y;
255
5-1 Introduction
ª1º 1 1 3. i lm = ««1»» , Aci l = 0, || i lm ||= 3 3 «¬1»¼ h Box 5.5 summarizes the algorithmic steps for the diagnosis of the simultaneous horizontal and vertical rank partitioning to generate ( Fm Gy )-MINOS. 1
Box 5.5: algorithm The diagnostic algorithm for solving a general rank deficient system of linear equations y = Ax, A \ n× m , rk A < min{n, m} by means of simultaneous horizontal and vertical rank partioning Determine the rank of the matrix A rk A < min{n, m} .
Compute “the simultaneous horizontal and vertical rank partioning” r ×( m r ) r ×r ª A11 A12 º A11 \ , A12 \ A=« », n r ×r n r × m r ¬ A 21 A 22 ¼ A 21 \ ( ) , A 22 \ ( ) ( ) “n-r is called the left complementary index, m-r the right complementary index” “A as a linear operator is neither injective ( m r z 0 ) , nor surjective ( n r = 0 ) . ” Compute the range space R(A) and the null space N(A) of the linear operator A R(A) = span {wl1 ( A )," , wlr ( A )} N(A) = {x \ n | N11x1A + N12 x 2 A = 0} or 1 x1A = N11 N12 x 2 A .
256
5 The third problem of algebraic regression
Compute (Tm , Gy ) -MINOS ª x º ª Nc º ªy º c + N11N11 c ]1 [ A11 c G11y , A c21G12y ] « 1 » x Am = « 1 » = « 11 » = [ N12 N12 c ¼ ¬ x 2 ¼ ¬ N12 ¬y2 ¼ y y y y c G11A11 + A c21G 22 A 21 , N12 := A11 c G11A12 + A c21G 22 A 22 N11 := A11 y c , N 22 := A12 c G11y A12 + A c21G 22 N 21 := N12 A 22 .
5-15
The range R(f) and the kernel N(f) interpretation of “MINOLESS” by three partitionings (i) algebraic (rank partitioning) (ii) geometric (slicing) (iii) set-theoretical (fibering)
Here we will outline by means of Box 5.6 the range space as well as the null space of the general inconsistent system of linear equations. Box 5.6: The range space and the null space of the general inconsistent system of linear equations Ax + i = y , A \
n ×m
, rk A d min{n, m}
“additive rank partitioning”. The matrix A is called a simultaneous horizontal and vertical rank partitioning, if A12 º ªA r ×r A = « 11 » , A11 = \ , rk A11 = rk A =: r A A 22 ¼ ¬ 21 with respect to the linear model y = Ax + i, A \
n ×m
, rk A d min{n, m}
identification of the range space n
R(A) = span {¦ e i aij | j {1," , r}} i =1
“front page example”
257
5-1 Introduction
ª 1 1 0 º ª1º ª y1 º « » 3× 3 « » «¬ y2 »¼ = « 1 » , A = « 0 1 1 » \ , rk A =: r = 2 ¬ 3 ¼ ¬ 1 0 1¼ R(A)=span {e1 a11 + e 2 a21 + e3 a31 , e1 a12 + e 2 a22 + e3 a32 } \ 3 or R(A) = span {e1 + e3 , e1 e 2 } \ 3 = Y c1 = [1, 0,1], c 2 = [1, 1, 0], \ 3 = span{ e1 , e 2 , e3 }
ec1
ec2 O e3 y e1
e2
Figure 5.1 Range R (f ), range space R ( A) , (y R ( A))
identification of the null space c y1 + A c21 y 2 N11x1A + N12 x 2 A = A11 c y1 + A c22 y 2 N12 x1A + N 22 x 2 A = A12 N ( A ):= {x \ n | N11x1A + N12 x 2 A = 0} or 1 N11x1A + N12 x 2 A = 0 x1A = N11 N12 x 2 A “front page example” ª x1 º ªx3 º 1 ª 2 1 º ª 1º « x » = 3 « 1 2 » « 1» x 3A = « x » ¬ ¼¬ ¼ ¬ 3 ¼A ¬ 3 ¼A ª 2 1º ª 1º 1 ª2 1º 1 N11 = « , N11 = « , N12 = « » » » 3 ¬1 2¼ ¬ 1 2 ¼ ¬ 1¼ x1A = u, x 2A = u, x 3A = u N(A)= H 01 = G1,3 .
258
5 The third problem of algebraic regression
N(A)= L 0 1
N(A)=G1,3 \
3
x2
x1 Figure 5.2 : Kernel N( f ), null space N(A), “the null space N(A) as 1 the linear manifold L 0 (Grassmann space G1,3) slices the parameter space X = \ 3 ”, x3 is not displayed . Box 5.7 is a summary of MINOLESS of a general inconsistent system of linear equations y = Ax + i. Based on the notion of the rank r = rk A < min{n, m}, we designed the generalized inverse of MINOS type or A Am or A1,2,3,4 . Box 5.7 MINOLESS of a general inconsistent system of linear equations : f : x o y = Ax + i, x X = \ m (parameter space), y Y = \ n (observation space) r := rk A < min{n, m} A- generalized inverse of MINOS type: A1,2,3,4 or A Am Condition # 1 f(x)=f(g(y)) f = f D gD f
Condition # 1 Ax =AA-Ax AA-A=A
Condition # 2 g(y)=g(f(x)) g = gD f Dg
Condition # 2 A y=A-Ax=A-AA-y A-AA-=A-
Condition # 3 f(g(y))=yR(A)
Condition # 3 A-Ay= yR(A)
-
259
5-1 Introduction
f D g = PR ( A )
A A = PR ( A
Condition # 4 g(f(x))= y R ( A )
)
Condition # 4 AA = PR ( A )
A
g D f = PR (g) A A = PR ( A ) .
A
R(A-)
R(A ) D(A)
R(A) D(A-) D(A-) R(A) R(A)
-
A-
D(A) P R(A-)
A AAA = PR ( A ) f D g = PR ( f )
Figure 5.3 : Least squares, minimum norm generalized inverse A Am ( A1,2,3,4 or A + ) , the Moore-Penrose-inverse (Tseng inverse) A similar construction of the generalized inverse of a matrix applies to the diagrams of the mappings: (1) under the mapping A: D(A) o R(A) AA = PR ( A ) f D g = PR ( f )
(2) under the mapping A-: R (A ) o PR ( A
)
A A = PR ( A )
g D f = PR ( g ) .
In addition, we follow Figure 5.4 and 5.5 for the characteristic diagrams describing:
260
5 The third problem of algebraic regression
(i) orthogonal inverses and adjoints in reflexive dual Hilbert spaces Figure 5.4 A
X
Y
A =A
(A Gy A)
A G y A
Y
X +
A =A
( G y ) 1 = G y Gx G x = (G x ) 1
A G y A ( A G y A )
*
Gy
*
*
X
X
Y
A
A
Y
y 6 y = G y y y Y, y Y
(ii) Venn diagrams, trivial fiberings
Figure 5.5 : Venn diagram, trivial fibering of the domain D(f): Trivial fibers N ( f ) A , trivial fibering of the range R( f ): trivial fibers R ( f ) and R ( f ) A , f : \m = X o Y = \n , X set system of the parameter space, Y set system of the observation space In particularly, if Gy is rank defect we proceed as follows. Gy synthesis
ª / 0º G y = « y » ¬ 0 0¼ analysis
261
5-1 Introduction
ª Uc º G*y = UcG y U = « 1 » G y [U1 , U 2 ] ¬ Uc2 ¼ ª U1cG y U1 U1cG y U 2 º =« » ¬ Uc2G y U1c Uc2G y U 2 ¼
G y = UG *y U c = U1ȁ y U1c
ȁ y = U1cG y U1 U1ȁ y = G y U1 0 = G y Uc2 and U1cG y U 2 = 0 || y Ax ||G2 = || i ||2 = i cG y i º » G y = U1c ȁ y U1 »¼ (y Ax)cU1c ȁ y U1 (y Axc) = min y
x
U1 ( y Ax ) = U1i = i If we use simultaneous horizontal and vertical rank partitioning A12 º ª x1 º ª i1 º ª y º ªi º ªA y = « 1 » + « 1 » = « 11 » « » + «i » y i A A 22 ¼ ¬ x 2 ¼ ¬ 2 ¼ ¬ 2 ¼ ¬ 21 ¬ 2¼ subject to special dimension identities y1 \ r ×1 , y 2 \ ( n r )×1 A11 \ r × r , A12 \ r × ( m r ) A 21 \ ( n r )× r , A 22 \ ( n r )× ( m r ) , we arrive at Lemma 5.0. Lemma 5.0 ((Gx, Gy) –MINOLESS, simultaneous horizontal and vertical rank partitioning): ªy º ªi º ªA y = « 1 » + « 1 » = « 11 ¬ y 2 ¼ ¬i 2 ¼ ¬ A 21
A12 º ª x1 º ª i1 º + A 22 »¼ «¬ x 2 »¼ «¬ i 2 »¼
subject to the dimension identities y1 \ r ×1 , y 2 \ ( n r )×1 , x1 \ r ×1 , x 2 \ ( m r )×1 A11 \ r × r , A12 \ r ×( m r ) A 21 \ ( n r )× r , A 22 \ ( n r )×( m r ) is a simultaneous horizontal and vertical rank partitioning of the linear model (5.1)
(5.1)
262
5 The third problem of algebraic regression
{y = Ax + i, A \ n× m , r := rk A < min{n, m}}
(5.2)
r is the index of the linear operator A, n-r is the left complementary index and m-r is the right complementary index. x A is Gy-LESS if it fulfils the rank x Am is MINOS of A cG y Ax A = A cG y y , if x 1 c N111G11x 2G 21 (x1 )Am = N111 N12 [N12 N11 N12 + G x22 ]1 1 c N111G11x N111 2G x21 N11
(N12 )m1 + N111 m1 x c N111G11x N111 N12 2G 21 (x 2 )Am = [N12 N111 N12 + G x22 ]1 1 1 c N111G11x N11
(N12 2G x21 N11 )m1 .
(5.3)
(5.4)
The symmetric matrices (Gx, Gy) of the metric of the parameter space X as well as of the observation space Y are consequently partitioned as y ª G y G12 º G y = « 11 y y » ¬G 21 G 22 ¼
and
x x ª G11 º G12 = Gx « x x » ¬ G 21 G 22 ¼
(5.5)
subject to the dimension identities y y G11 \ r×r , G12 \ r×( n r ) y y G 21 \ ( n r )×r , G 22 \ ( n r )×( n r )
versus
x x G11 \ r×r , G12 \ r×( m r )
G x21 \ ( m r )×r , G x22 \ ( m r )×( m r )
deficient normal equations A cG y Ax A = A cG y y
(5.6)
or ª N11 «N ¬ 21
N12 º ª x1 º ª M11 = N 22 »¼ «¬ x 2 »¼ A «¬ M 21
M12 º ª y1 º ª m1 º = M 22 »¼ «¬ y 2 »¼ «¬ m2 »¼
(5.7)
subject to y y y c G11y A11 + A c21G 21 c G12 N11 := A11 A11 + A11 A 21 + A c21G 22 A 21
(5.8)
y y y c G11y A12 + A c21G 21 c G12 N12 := A11 A12 + A11 A 22 + A c21G 22 A 22 ,
(5.9)
c , N 21 = N12
(5.10)
y y y y c G11 c G12 N 22 := A12 A12 + A c22 G 21 A12 + A12 A 22 + Ac22 G 22 A 22 ,
(5.11)
y y y y c G11 c G12 M11 := A11 + A c21G 21 , M12 := A11 + A c21G 22 ,
(5.12)
263
5-2 MINOLESS and related solutions y c G11y + Ac22 G y21 , M 22 := A12 c G12 M 21 := A12 + A c22 G y22 ,
(5.13)
m1 := M11y1 + M12 y 2 , m2 := M 21y1 + M 22 y 2 .
(5.14)
5-2 MINOLESS and related solutions like weighted minimum norm-weighted least squares solutions 5-21
The minimum norm-least squares solution: "MINOLESS"
The system of the inconsistent, rank deficient linear equations Ax + i = y subject to A \ n× m , rk A < min{n, m} allows certain solutions which we introduce by means of Definition 5.1 as a solution of a certain hierarchical optimization problem. Lemma 5.2 contains the normal equations of the hierarchical optimization problems. The solution of such a system of the normal equations is presented in Lemma 5.3 for the special case (i) | G x |z 0 and case (ii) | G x + A cG y A |z 0, but | G x |= 0 . For the analyst: Lemma 5.4
Lemma 5.5
presents the toolbox of MINOLESS for multiplicative rank partitioning, known as rank factorization.
presents the toolbox of MINOLESS for additive rank partitioning.
and
Definition 5.1 ( G x -minimum norm- G y -least squares solution): A vector x Am X = \ m is called G x , G y -MINOLESS (MInimum NOrm with respect to the G x -seminorm-Least Squares Solution with respect to the G y -seminorm) of the inconsistent system of linear equations with datum defect ª rk A d min {n, m} A\ « Ax + i = y « y R ( A), N ( A) z {0} (5.15) «x X = \n , y Y = \n , «¬ if and only if first (5.16) x A = arg{|| i ||G = min | Ax + i = y, rk A d min{n, m}} , n× m
y
x
second x Am = arg{|| x ||G = min | A cG y Ax A = AcG y y} x
x
(5.17)
is G y -MINOS of the system of normal equations A cG y Ax A = A cG y y which are G x -LESS. The solutions of type G x , G y -MINOLESS can be characterized as following.
264
5 The third problem of algebraic regression
Lemma 5.2 ( G x -minimum norm, G y least squares solution): A vector x Am X = \ m is called G x , G y -MINOLESS of (5.1), if and only if the system of normal equations A cG y A º ª x Am º ª 0 º ª Gx = « A cG A 0 »¼ «¬ OAm »¼ «¬ A cG y y »¼ y ¬
(5.18)
with respect to the vector OAm of “Lagrange multipliers” is fulfilled. x Am always exists and is uniquely determined, if the augmented matrix [G x , A cG y A ] agrees to the rank identity rk[G x , A cG y A ] = m
(5.19)
or, equivalently, if the matrix G x + A cG y A is regular. :Proof: G y -MINOS of the system of normal equations A cG y Ax A = A cG y is constructed by means of the constrained Lagrangean L( x A , OA ) := xcA G x x A + 2OAc( A cG y Ax A A cG y y ) = min , x ,O
such that the first derivatives 1 wL º (x Am , OAm ) = G x x Am + A cG y AOAm = 0 » 2 wx » 1 wL (x Am , OAm ) = AcG y AxAm AcG y y = 0 » »¼ 2 wO A cG y A º ª x Am º ª 0 º ª Gx « = 0 »¼ «¬ OAm »¼ «¬ AcG y y »¼ ¬ A cG y A constitute the necessary conditions. The second derivatives 1 w 2L ( x Am , OAm ) = G x t 0 2 wxwx c
(5.20)
due to the positive semidefiniteness of the matrix G x generate the sufficiency condition for obtaining the minimum of the constrained Lagrangean. Due to the assumption A cG y y R( A cG x A ) the existence of G y -MINOS x Am is granted. In order to prove uniqueness of G y -MINOS x Am we have to consider case (i)
and
G x positive definite
case (ii) G x positive semidefinite .
case (i): G x positive definite
5-2 MINOLESS and related solutions
265
Gx A cG y A = G x A cG y AG x1A cG y A = 0. 0 A cG y A
(5.21)
First, we solve the system of normal equations which characterize x Am G x , G y MINOLESS of x for the case of a full rank matrix of the metric G x of the parametric space X, rk G x = m in particular. The system of normal equations is solved for
A cG y A º ª 0 º ª C1 C2 º ª 0 º ª x Am º ª G x = « O » = « A cG A 0 »¼ «¬ A cG y y »¼ «¬ C3 C4 »¼ «¬ A cG y y »¼ y ¬ Am ¼ ¬
(5.22)
subject to A cG y A º ª C1 C2 º ª G x A cG y A º ª G x A cG y A º ª Gx =« « A cG A » « » « » 0 ¼ ¬ C3 C4 ¼ ¬ A cG y A 0 ¼ ¬ A cG y A 0 »¼ y ¬ (5.23) as a postulate for the g-inverse of the partitioned matrix. Cayley multiplication of the three partitioned matrices leads us to four matrix identities. G x C1G x + G x C2 A cG y A + A cG y AC3G x + A cG y AC4 A cG y A = G x
(5.24)
G x C1A cG y A + A cG y AC3 A cG y A = A cG y A
(5.25)
A cG y AC1G x + A cG y AC2 A cG y A = AcG y A
(5.26)
A cG y AC1 A cG y A = 0.
(5.27)
Multiply the third identity by G x1A cG y A from the right side and substitute the fourth identity in order to solve for C2. A cG y AC2 A cG y AG x1A cG y A = A cG y AG x1A cG y A (5.28) C2 = G x1A cG y A ( A cG y AG x1A cG y A ) solves the fifth equation A cG y AG x1A cG y A ( A cG y AG x1A cG y A ) A cG y AG x1A cG y A = = A cG y AG x1A cG y A
(5.29)
by the axiom of a generalized inverse x Am = C2 A cG y y
(5.30)
x Am = G y1A cG y A ( A cG y AG x1A cG y A ) A cG y y . We leave the proof for “ G x1A cG y A ( A cG y AG x1A cG y A ) A cG y y is the weighted pseudo-inverse or Moore Penrose inverse A G+ G ” as an exercise. y
x
(5.31)
266
5 The third problem of algebraic regression
case (ii): G x positive semidefinite Second, we relax the condition rk G x = m by the alternative rk[G x , A cG y A ] = m G x positive semidefinite. Add the second normal equation to the first one in order to receive the modified system of normal equations ªG x + A cG y A A cG y A º ª x Am º ª A cG y y º = « A cG A 0 »¼ «¬ OAm »¼ «¬ A cG y y »¼ y ¬
(5.32)
rk(G x + A cG y A ) = rk[G x , A cG y A ] = m .
(5.33)
The condition rk[G x , A cG y A ] = m follows from the identity ªG G x + A cG y A = [G x , A cG y A ] « x ¬ 0
º ª Gx º »« », ( A cG y A ) ¼ ¬ A cG y A ¼ 0
(5.34)
namely G x + AcG y A z 0. The modified system of normal equations is solved for
ª x Am º ªG x + A cG y A AcG y A º ª A cG y y º = « O » = « A cG A 0 »¼ «¬ A cG y y »¼ y ¬ Am ¼ ¬ ª C C2 º ª A cG y y º ª C1A cG y y + C2 A cG y y º =« 1 »=« » »« ¬C3 C4 ¼ ¬ A cG y y ¼ ¬ C3A cG y y + C4 A cG y y ¼
(5.35)
subject to ªG x + A cG y A A cG y A º ª C1 C2 º ªG x + A cG y A A cG y A º = « A cG A 0 »¼ «¬C3 C4 »¼ «¬ A cG y A 0 »¼ y ¬ ªG x + A cG y A A cG y A º =« 0 »¼ ¬ A cG y A
(5.36)
as a postulate for the g-inverse of the partitioned matrix. Cayley multiplication of the three partitioned matrices leads us to the four matrix identities “element (1,1)” (G x + A cG y A)C1 (G x + A cG y A) + A cG y AC3 (G x + A cG y A) + +(G x + A cG y A)C2 A cG y A + A cG y AC4 A cG y A = G x + A cG y A
(5.37)
“element (1,2)” (G x + A cG y A)C1 A cG y A + A cG y AC3 = A cG y A
(5.38)
5-2 MINOLESS and related solutions
267
“element (2,1)” A cG y AC1 (G x + AcG y A) + AcG y AC2 AcG y A = AcG y A
(5.39)
“element (2,2)” A cG y AC1 A cG y A = 0.
(5.40)
First, we realize that the right sides of the matrix identities are symmetric matrices. Accordingly the left sides have to constitute symmetric matrices, too. (1,1):
(G x + A cG y A)C1c (G x + A cG y A) + (G x + A cG y A)Cc3 A cG y A + + A cG y ACc2 (G x + A cG y A) + A cG y ACc4 AcG y A = G x + A cG y A
(1,2): A cG y AC1c (G x + AcG y A) + Cc3 A cG y A = A cG y A (2,1): (G x + A cG y A )C1cA cG y A + A cG y ACc2 A cG y A = A cG y A (2,2): A cG y AC1cA cG y A = A cG y AC1A cG y A = 0 . We conclude C1 = C1c , C2 = Cc3 , C3 = Cc2 , C4 = Cc4 .
(5.41)
Second, we are going to solve for C1, C2, C3= C2 and C4. C1 = (G x + A cG y A) 1{I m A cG y A[ AcG y A(G x + A cG y A) 1 A cG y A]
A cG y A(G x + A cG y A ) 1}
(5.42)
C2 = (G x + A cG y A) 1 A cG y A[ A cG y A(G x + A cG y A) 1 A cG y A]
(5.43)
C3 = [ A cG y A(G x + A cG y A) 1 A cG y A] AcG y A(G x + A cG y A) 1
(5.44)
C4 = [ A cG y A (G x + A cG y A ) 1 A cG y A ] .
(5.45)
For the proof, we depart from (1,2) to be multiplied by A cG y A(G x + A cG y A) 1 from the left and implement (2,2) A cG y AC2 AcG y A(G x + A cG y A ) 1 A cG y A = A cG y A (G x + A cG y A ) 1 A cG y A . Obviously, C2 solves the fifth equation on the basis of the g-inverse [ A cG y A(G x + A cG y A) 1 A cG y A] or A cG y A(G x + A cG y A) 1 A cG y A[ A cG y A(G x + A cG y A) 1 A cG y A]
A cG y A(G x + A cG y A ) 1 A cG y A = A cG y A(G x + A cG y A) 1 A cG y A .
(5.46)
268
5 The third problem of algebraic regression
We leave the proof for “ (G x + A cG y A) 1 AcG y A[ A cG y A(G x + A cG y A ) 1 A cG y A] A cG y is the weighted pseudo-inverse or Moore-Penrose inverse A G+
y
( G x + AcG y A )
”
as an exercise. Similarly, C1 = (I m C2 A cG y A)(G x + A cG y A) 1
(5.47)
solves (2,2) where we again take advantage of the axiom of the g-inverse, namely A cG y AC1 AcG y A = 0
(5.48)
A cG y A(G x + A cG y A) 1 A cG y A(G x + A cG y A) 1 A cG y A A cG y A(G x + A cG y A) 1 A cG y A[ A cG y A(G x + A cG y A) 1 A cG y A]
A cG y A(G x + A cG y A) 1 A cG y A = 0 A cG y A(G x + A cG y A ) 1 A cG y A A cG y A(G x + A cG y A ) 1 A cG y A( A cG y A(G x + A cG y A ) 1 A cG y A)
A cG y A(G x + A cG y A ) 1 A cG y A = 0. For solving the system of modified normal equations, we have to compute C1 A cG y = 0 C1 = A cG y A = 0 A cG y AC1 AcG y A = 0 , a zone identity due to (2,2). In consequence, x Am = C2 A cG y y
(5.49)
has been proven. The element (1,1) holds the key to solve for C4 . As soon as we substitute C1 , C2 = Cc3 , C3 = Cc2 into (1,1) and multiply and
left by A cG y A(G x + AcG y A) 1
right by (G x + AcG y A) 1 AcG y A,
we receive 2AcGy A(Gx + AcGy A)1 AcGy A[AcGy A(Gx + AcGy A)1 AcGy A] AcGy A
(Gx + AcGy A)1 + AcGy A(Gx + AcGy A)1 AcGy AC4 AcGy A(Gx + AcGy A)1 AcGy A = = AcGy A(Gx + AcGy A)1 AcGy A. Finally, substitute C4 = [ A cG y A(G x + A cG y A) 1 A cG y A]
(5.50)
5-2 MINOLESS and related solutions
269
to conclude A cG y A(G x + A cG y A) 1 A cG y A[ A cG y A(G x + A cG y A) 1 A cG y A] A cG y A
(G x + A cG y A) 1 AcG y A = AcG y A(G x + AcG y A) 1 AcG y A , namely the axiom of the g-inverse. Obviously, C4 is a symmetric matrix such that C4 = Cc4 . Here ends my elaborate proof. The results of the constructive proof of Lemma 5.2 are collected in Lemma 5.3. Lemma
( G x -minimum
5.3
norm, G y -least
squares
solution:
MINOLESS): ˆ is G -minimum norm, G -least squares solution of (5.1) x Am = Ly y x subject to r := rk A = rk( A cG y A ) < min{n, m} rk(G x + A cG y A ) = m if and only if Lˆ = A G+
y Gx
= ( A Am ) G
y Gx
Lˆ = (G x + A cG y A ) 1 A cG y A[ A cG y A(G x + A cG y A ) 1 A cG y A ] AcG y
(5.51) (5.52)
xAm = (G x + AcG y A)1 AcG y A[ AcG y A(G x + AcG y A)1 AcG y A] AcG y y , (5.53) where A G+ G = A1,2,3,4 G G is the G y , G x -weighted Moore-Penrose inverse. If y
x
y
x
rk G x = m , then Lˆ = G x1A cG y A( A cG y AG x1A cG y A ) A cG y
(5.54)
x Am = G x1A cG y A ( A cG y AG x1A cG y A ) A cG y y
(5.55)
is an alternative unique solution of type MINOLLES. Perhaps the lengthy formulae which represent G y , G x - MINOLLES in terms of a g-inverse motivate to implement explicit representations for the analyst of the G x -minimum norm (seminorm), G y -least squares solution, if multiplication rank partitioning, also known as rank factorization, or additive rank partitioning of the first order design matrix A is available. Here, we highlight both representations of A + = A Am .
270
5 The third problem of algebraic regression
Lemma 5.4 ( G x -minimum norm, G y -least squares solution: MINOLESS, rank factorization) ˆ is G -minimum norm, G -least squares solution (MINOLLES) x Am = Ly y x of (5.1) {Ax + i = y | A \ n×m , r := rk A = rk( A cG y A ) < min{n, m}} , if it is represented by multiplicative rank partitioning or rank factorization A = DE, D \ n r , E \ r ×m as case (i): G y = I n , G x = I m
(5.56)
Lˆ = A Am = Ec( EEc) 1 ( DcD) 1 Dc
(5.57)
1 right inverse ˆ = E D ª«ER = Em = Ec(EEc) L R L 1 ¬ DL = DA = (DcD) Dc left inverse
x Am = A Am y = A + y = Ec(EEc) 1 (DcD) 1 Dcy .
(5.58) (5.59)
The unknown vector x Am has the minimum Euclidean length || x Am ||2 = xcAm x Am = y c( A + )cA + y = y c(DA )c(EEc) 1 DA y .
(5.60)
(5.61) y = y Am + i Am is an orthogonal decomposition of the observation vector y Y = \ n into Ax Am = y Am R ( A) and y AxAm = i Am R ( A) A ,
(5.62)
the vector of inconsistency. y Am = Ax Am = AA + y = = D( DcD) 1 Dcy = DDA y
and
i Am = y y Am = ( I n AA + ) y = = [I n D( DcD) 1 Dc]y = ( I n DD A ) y
AA + y = D( DcD) 1 Dcy = DDA y = y Am is the projection PR ( A ) and ( I n AA + ) y = [I n D( DcD) 1 Dc]y = ( I n DD A ) y is the projection PR ( A ) . A
i Am and y Am are orthogonal in the sense of ¢ i Am | y Am ² = 0 or ( I n AA + )cA = [I n D( DcD) 1 Dc]cD = 0 . The “goodness of fit” of MINOLESS is || y Ax Am ||2 =|| i Am ||2 = y c(I n AA + )y = = y c[I n D(DcD) 1 Dc]y = y c(I n DDA1 )y .
(5.63)
5-2 MINOLESS and related solutions
271
case (ii): G x and G y positive definite Lˆ = ( A m A ) (weighted) = G x Ec( EG x1Ec) 1 ( DcG y D) 1 DcG y
(5.64)
ª E = E weighted right inverse Lˆ = ER (weighted) D L (weighted) « m ¬ E L = EA weighted left inverse
(5.65)
R
x Am = ( A Am )G G y d A G+ G y = y
x
y
x
(5.66)
= G Ec(EG Ec) ( DcG y D) 1 DG y y. 1 x
1 x
1
The unknown vector x Am has the weighted minimum Euclidean length || x Am ||G2 = xcAm G x x Am = y c( A + )cG x A + y = x
= y cG y D(DcG y D) 1 (EG x1Ec) 1 EEc(EG x1Ec) 1 (DcG y D) 1 DcG y y c. y = y Am + i Am
(5.67) (5.68)
is an orthogonal decomposition of the observation vector y Y = \ n into Ax Am = y Am R ( A) and y AxAm =: i Am R ( A) A
(5.69)
of inconsistency. y Am = AA G+ G y y
AA G+
yGx
x
i A = ( I n AA G+ G ) y
and
y
I n AA G+
= PR ( A )
yGx
(5.70)
x
= PR ( A )
A
are G y -orthogonal ¢ i Am | y Am ² G = 0 or (I n AA + ( weighted ))cG y A = 0 . y
(5.71)
The “goodness of fit” of G x , G y -MINOLESS is || y Ax Am ||G2 =|| i Am ||G2 = y
= y c[I n AA
+ Gy Gx
y
]cG y [I n AA G+ G ]y = y
x
= y c[I n D(DcG y D) DcG y ]cG y [I n D(DcG y D) 1 DcG y ]y = 1
(5.72)
= y c[G y G y D(DcG y D) 1 DcG y ]y.
While Lemma 5.4 took advantage of rank factorization, Lemma 5.5 will alternatively focus on additive rank partitioning.
272
5 The third problem of algebraic regression
5.5 ( G x -minimum norm, G y -least MINOLESS, additive rank partitioning)
Lemma
ˆ is x Am = Ly G x -minimum (MINOLESS) of (5.1)
norm, G y -least
squares
solution:
squares
solution
{Ax + i = y | A \ n×m , r := rk A = rk( A cG y A ) < min{n, m}} , if it is represented by additive rank partitioning ªA A = « 11 ¬ A 21
A12 º A11 \ r × r , A12 \ r ×( m r ) , A 22 »¼ A 21 \ ( n r )× r , A 22 \ ( n r )×( m r )
(5.73)
subject to the rank identity rk A = rk A11 = r as
(5.74)
case (i): G y = I n , G x = I m ª Nc º c + N11 N11 c ) 1 [ A11 c , Ac21 ] Lˆ = A Am = « 11 » (N12 N12 c N ¬ 12 ¼
(5.75)
subject to c A11 + Ac21A 21 , N12 := A11 c A12 + Ac21A 22 N11 := A11 c c c A12 + A c22 A 22 N 21 := A12 A11 + A 22 A 21 , N 22 := A12 or ª Nc º ªy º c + N11 N11 c ) 1 [ A11 c , Ac21 ] « 1 » . x Am = « 11 » (N12 N12 c ¼ ¬ N12 ¬y 2 ¼
(5.76) (5.77)
(5.78)
The unknown vector xAm has the minimum Euclidean length || x Am ||2 = x cAm x Am = ªA º ªy º c + N11N11 c ) 1[ A11 c , A c21 ] « 1 » . = [ y1c , y c2 ] « 11 » ( N12 N12 ¬ A 21 ¼ ¬y2 ¼
(5.79)
y = y Am + i Am is an orthogonal decomposition of the observation vector y Y = \ n into Ax Am = y Am R ( A) and y AxAm =: i Am R ( A) A ,
(5.80)
5-2 MINOLESS and related solutions
273
the vector of inconsistency. y Am = Ax Am = AA Am y
i Am = y Ax Am =
and
= ( I n AA Am ) y are projections onto R(A) and R ( A) A , respectively. i Am and y Am are orthogonal in the sense of ¢ i Am | y Am ² = 0 or (I n AA Am )cA = 0 . The “goodness of fit” of MINOLESS is || y Ax Am ||2 =|| i Am ||2 = y c(I n AA Am )y . I n AA Am , rk( I n AA Am ) = n rk A = n r , is the rank deficient a posteriori weight matrix (G y )Am . case (ii): G x and G y positive definite )G Lˆ = ( A Am
5-22
yGx
.
(G x , G y ) -MINOS and its generalized inverse
A more formal version of the generalized inverse which is characteristic for G x MINOS, G y -LESS or (G x , G y ) -MINOS is presented by Lemma 5.6 (characterization of G x , G y -MINOS): (5.81)
rk( A cG y A) = rk A ~ R ( A cG y ) = R ( A c)
(5.82)
is assumed. x Am = L y is (G x , G y ) -MINOLESS of (5.1) for all y \ n if and only if the matrix L \ m ×n fulfils the four conditions G y ALA = G y A
(5.83)
G x LAL = G x L
(5.84)
G y AL = (G y AL)c
(5.85)
G x LA = (G x LA )c .
(5.86)
In this case G x x Am = G x L y is always unique. L, fulfilling the four conditions, is called the weighted MINOS inverse or weighted Moore-Penrose inverse. :Proof: The equivalence of (5.81) and (5.82) follows from R( A cG y ) = R( A cG y A ) .
274
5 The third problem of algebraic regression
(i) G y ALA = G y A and G y AL = (G y AL)c . Condition (i) G y ALA = G y A and (iii) G y AL = (G y AL)c are a consequence of G y -LESS. || i ||G2 =|| y Ax ||G2 = min AcG y AxA = AcG y y. y
y
x
If G x is positive definite, we can represent the four conditions (i)-(iv) of L by (G x , G y ) -MINOS inverse of A by two alternative solutions L1 and L2, namely AL1 = A( A cG y A ) A cG y AL1 = A( A cG y A ) A cL1cG y = = A ( A cG y A ) A cG y = = A ( A cG y A ) A cLc2 A cG y = A( A cG y A ) A cG y AL 2 = AL 2 and L 2 A = G x1 ( A cLc2 G x ) = G x1 ( A cLc2 A cLc2 G x ) = G x1 ( A cL1cA cLc2 G x ) = = G x1 ( A cL1cG x L 2 A ) = G x1 (G x L1AL 2 A ) = = G x1 (G x L1AL1 A ) = L1 A, L1 = G x1 (G x L1 AL1 ) = G x1 (G x L2 AL2 ) = L2 concludes our proof. The inequality || x Am ||G2 =|| L y ||G2 d|| L y ||G2 +2 y cLcG x ( I n LA) z + x
x
y
+ || ( I m LA ) z ||G2 y \ n
(5.87)
x
is fulfilled if and only if the “condition of G x -orthogonality” LcG x ( I m LA ) = 0
(5.88)
applies. An equivalence is LcG x = LcG x LA or LcG x L = LcG x LAL , which is produced by left multiplying with L. The left side of this equation is a symmetric matrix. Consequently, the right side has to be a symmetric matrix, too. G x LA = (G x LA )c . Such an identity agrees to condition (iv). As soon as we substitute in the “condition of G x -orthogonality” we are led to LcG x = LcG x LA G x L = (G x LA )cL = G x LAL , a result which agrees to condition (ii). ? How to prove uniqueness of A1,2,3,4 = A Am = A + ?
5-2 MINOLESS and related solutions
275
Uniqueness of G x x Am can be taken from Lemma 1.4 (characterization of G x MINOS). Substitute x A = Ly and multiply the left side by L. A cG y ALy = A cG y y AcG y AL = AcG y LcA cG y AL = LcA cG y G y AL = (G y AL)c = LcA cG y . The left side of the equation LcA cG y AL = LcA cG y is a symmetric matrix. Consequently the right side has to be symmetric, too. Indeed we have proven condition (iii) (G y AL)c = G y AL . Let us transplant the symmetric condition (iii) into the original normal equations in order to benefit from A cG y AL = A cG y or G y A = LcA cG y A = (G y AL)cA = G y ALA . Indeed, we have succeeded to have proven condition (i), in condition (ii) G x LAL = G x L and G x LA = (G x LA)c. Condition (ii) G y LAL = G x L and (iv) G x LA = (G x LA )c are a consequence of G x -MINOS. The general solution of the normal equations A cG y Ax A = A cG y y is x A = x Am + [I m ( A cG y A ) ( A cG y A )]z for an arbitrary vector z \ m . A cG y ALA = A cG y A implies x A = x Am + [I m ( A cG y A ) A cG y ALA ]z = = x Am + [I m LA ]z. Note 1: The following conditions are equivalent:
(1st)
ª (1) AA A = A « (2) A AA = A « « (3) ( AA )cG y = G y AA « «¬ (4) ( A A )cG x = G x A A
ª A #G y AA = A cG y « ¬ ( A )cG x A A = ( A )cG x “if G x and G y are positive definite matrices, then (2nd)
A #G y = G x A # or A # = G x1A cG y
(5.89)
276
5 The third problem of algebraic regression
are representations for the adjoint matrix” “if G x and G y are positive definite matrices, then ( A cG y A ) AA = A cG y ( A )cG x A A = ( A )cG x ” ª AA = PR ( A ) « «¬ A A = PR ( A ) .
(3rd)
The concept of a generalized inverse of an arbitrary matrix is originally due to E.H. Moore (1920) who used the 3rd definition. R. Penrose (1955), unaware of E.H. Moore´s work, defined a generalized inverse by the 1st definition to G x = I m , G y = I n of unit matrices which is the same as the Moore inverse. Y. Tseng (1949, a, b, 1956) defined a generalized inverse of a linear operator between function spaces by means of AA = PR ( A ) , A A = P
R ( A )
,
where R( A ) , R( A ) , respectively are the closure of R ( A ) , R( A ) , respectively. The Tseng inverse has been reviewed by B. Schaffrin, E. Heidenreich and E. Grafarend (1977). A. Bjerhammar (1951, 1957, 1956) initiated the notion of the least-squares generalized inverse. C.R. Rao (1967) presented the first classification of g-inverses. Note 2: Let || y ||G = ( y cG y y )1 2 and || x ||G = ( x cG x x )1 2 , where G y and G x are positive semidefinite. If there exists a matrix A which satisfies the definitions of Note 1, then it is necessary, but not sufficient that y
x
(1) G y AA A = G y A (2) G x A AA = G x A (3) ( AA )cG y = G y AA (4) ( A A )cG x = G x A A . Note 3: A g-inverse which satisfies the conditions of Note 1 is denoted by A G+ G and referred to as G y , G x -MINOLESS g-inverse of A. y
A G+ G is unique if G x is positive definite. When both G x and G y are general positive semi definite matrices, A G+ G may not be unique . If | G x + A cG y A |z 0 holds, A G+ G is unique. y
x
y
y
x
x
x
5-2 MINOLESS and related solutions
277
Note 4: If the matrices of the metric are positive definite, G x z 0, G y z 0 , then (i)
( A G+ G )G+ G = A , y
x
x
y
(ii) ( A G+ G ) # = ( A c)G+ y
5-23
x
1 1 x Gy
.
Eigenvalue decomposition of (G x , G y ) -MINOLESS
For the system analysis of an inverse problem the eigenspace analysis and eigenspace synthesis of x Am (G x , G y ) -MINOLESS of x is very useful and give some peculiar insight into a dynamical system. Accordingly we are confronted with the problem to develop “canonical MINOLESS”, also called the eigenvalue decomposition of (G x , G y ) -MINOLESS. First we refer again to the canonical representation of the parameter space X as well as the observation space Y introduced to you in the first chapter, Box 1.6 and Box 1.9. But we add here by means of Box 5.8 the forward and backward transformation of the general bases versus the orthogonal bases spanning the parameter space X as well as the observation space Y. In addition, we refer to Definition 1.5 and Lemma 1.6 where the adjoint operator A has been introduced and represented. Box 5.8 General bases versus orthogonal bases spanning the parameter space X as well as the observation space Y.
(5.90)
“left”
“right”
“parameter space”
“observation space”
“general left base”
“general right base”
span {a1 ,… , am } = X
Y=span {b1 ,… , bn }
:matrix of the metric:
:matrix of the metric:
aac = G x
bbc = G y
“orthogonal left base”
(5.92)
(5.94)
“orthogonal right base”
span {e ,… , e } = X
Y=span {e1y ,… , e ny }
:matrix of the metric:
:matrix of the metric:
e x ecx = I m
e y ecy = I n
“base transformation”
“base transformation”
x 1
x m
a = ȁ1x 2 9e x
(5.91)
b = ȁ1y 2 Ue y
(5.93)
(5.95)
278
5 The third problem of algebraic regression
versus
versus e y = Ucȁ y1 2 b
e x = V cȁ x1 2 a
(5.96)
span {e1x ,… , e xm } = X
(5.97)
Y=span {e1y ,… , e ny } .
Second, we are solving the general system of linear equations {y = Ax | A \ n ×m , rk A < min{n, m}} by introducing
•
the eigenspace of the rank deficient, rectangular matrix of rank r := rk A < min{n, m}: A 6 A
•
the left and right canonical coordinates: x 6 x , y 6 y
as supported by Box 5.9. The transformations x 6 x (5.97), y 6 y (5.98) from the original coordinates ( x1 ,… , x m ) to the canonical coordinates ( x1 ,… , x m ) , the left star coordinates, as well as from the original coordinates ( y1 ,… , y n ) to the canonical coordinates ( y1 ,… , y n ), the right star coordinates, are polar decompositions: a rotation {U, V}is followed by a general stretch {G1y 2 , G1x 2 } . Those root matrices are generated by product decompositions of type G y = (G1y 2 )cG1y 2 as well as G x = (G1x 2 )cG1x 2 . Let us substitute the inverse transformations (5.99) x 6 x = G x1 2 Vx and (5.100) y 6 y = G y1 2 Uy into the system of linear equations (5.1), (5.101) y = Ax + i, rk A < min{n, m} or its dual (5.102) y = A x + i . Such an operation leads us to (5.103) y = f( x ) as well as (5.104) y = f (x). Subject to the orthonormality condition (5.105) UcU = I n and (5.106) V cV = I m we have generated the left-right eigenspace analysis (5.107) ªȁ ȁ = « ¬ O2
O1 º O3 »¼
subject to the rank partitioning of the matrices U = [U1 , U 2 ] and V = [ V1 , V2 ] . Alternatively, the left-right eigenspace synthesis (5.118) ªȁ A = G y1 2 [U1 , U 2 ] « ¬O2
O1 º ª V1c º 1 2 G O3 »¼ «¬ V2c »¼ x
is based upon the left matrix (5.109) L := G y1 2 U decomposed into (5.111) L1 := G y1 2 U1 and L 2 := G y1 2 U 2 and the right matrix (5.100) R := G x1 2 V decomposed into R1 := G x1 2 V1 and R 2 := G x1 2 V2 . Indeed the left matrix L by means of (5.113) LLc = G y1 reconstructs the inverse matrix of the metric of the observation space Y. Similarly, the right matrix R by means of (5.114) RR c = G x1 generates
5-2 MINOLESS and related solutions
279
the inverse matrix of the metric of the parameter space X. In terms of “L, R” we have summarized the eigenvalue decompositions (5.117)-(5.122). Such an eigenvalue decomposition helps us to canonically invert y = A x + i by means of (5.123), namely the “full rank partitioning” of the system of canonical linear equations y = A x + i . The observation vector y \ n is decomposed into y1 \ r ×1 and y 2 \ ( n r )×1 while the vector x \ m of unknown parameters is decomposed into x1 \ r ×1 and x 2 \ ( m r )×1 . (x1 ) Am = ȁ 1 y1 is canonical MINOLESS leaving y 2 ”unrecognized” and x 2 = 0 as a “fixed datum”. Box 5.9: Canonical representation, the general case: overdetermined and unterdetermined system without full rank “parameter space X”
versus
x = V cG1x 2 x and
(5.98)
x = G x1 2 Vx
(5.100)
“observation space” y = UcG1y 2 y and
(5.99)
y = G y1 2 Uy
(5.101)
“overdetermined and unterdetermined system without full rank” {y = Ax + i | A \ n× m , rk A < min{n, m}} y = Ax + i
(5.102)
versus
+ UG1y 2 i
+ G y1 2 Ui versus
y = (G y1 2 UA V cG x1 2 )x + i (5.105)
subject to (5.106)
UcU = UUc = I n
(5.103)
UcG1y 2 y = A V cG x1 2 x +
G y1 2 Uy = AG x1 2 x +
(5.104) y = ( UcG1y 2 AG x1 2 V )x
y = A x + i
subject to versus
V cV = VV c = I m
(5.107)
“left and right eigenspace” “left-right eigenspace “left-right eigenspace analysis” synthesis” ª Uc º A = « 1 » G1y 2 AG x1 2 [ V1 , V2 ] = ¬ Uc2 ¼ (5.108) ªȁ G y1 2 [U1 , U 2 ] « ª ȁ O1 º =« ¬O2 » ¬ O2 O3 ¼
A= O1 º ª V1c º 1 2 (5.109) Gx O3 »¼ «¬ V2c »¼
280
5 The third problem of algebraic regression
“dimension identities” ȁ \ r × r , O1 \ r ×( m r ) , U1 \ n × r , V1 \ m × r O2 \ ( n r )× r , O3 \ ( n r )×( m r ) , U 2 \ n ×( n r ) , V2 \ m ×( m r ) “left eigenspace”
“right eigenspace”
(5.110) L := G y1 2 U L1 = UcG1y 2
R := G x1 2 V R 1 = V cG1x 2
(5.111)
(5.112) L1 := G y1 2 U1 , L 2 := G y1 2 U 2
R1 := G x1 2 V1 , R 2 := G x1 2 V2
(5.113)
(5.114) LLc = G y1 ( L1 )cL1 = G y
RR c = G x1 (R 1 )cR 1 = G x
(5.115)
ª L º ª Uc º (5.116) L1 = « 1 » G1y 2 =: « 1 » ¬ Uc2 ¼ ¬L2 ¼
ªR º ª Vcº R 1 = « 1 » G1x 2 =: « 1 » ¬ V2c ¼ ¬R 2 ¼
(5.117)
(5.118)
A = LA R 1
A = L1AR
versus
1 2
(5.120)
ªR º A = [L1 , L 2 ]A « » ¬R ¼
versus
(5.122)
AA # L1 = L1ȁ 2 º » AA # L 2 = 0 ¼
versus
ªȁ A = « ¬O2
O1 º = O3 »¼
ª L º = « 1 » A[R1 , R 2 ] ¬L2 ¼ ª A # AR1 = R1ȁ 2 « # ¬ A AR 2 = 0
(5.119)
(5.121)
(5.123)
“inconsistent system of linear equations without full rank” (5.124)
ªȁ y = A x + i = « ¬ O2
O1 º ª x1 º ª i1 º ª y1 º « »+« » = « » O3 »¼ ¬ x 2 ¼ ¬ i 2 ¼ ¬ y 2 ¼
y1 \ r ×1 , y *2 \ ( n r )×1 , i1* \ r ×1 , i*2 \ ( n r )×1 x1* \ r ×1 , x*2 \ ( m r )×1 “if ( x* , i* ) is MINOLESS, then x*2 = 0, i* = 0 : (x1* )Am = ȁ 1 y1* . ” Consult the commutative diagram of Figure 5.6 for a shortened summary of the newly introduced transformation of coordinates, both of the parameter space X as well as the observation space Y.
5-2 MINOLESS and related solutions
281 A
X x
R ( A) Y
V cG1x 2
UcG1y 2
y R ( A ) Y
X x
Figure 5.6 : Commutative diagram of coordinate transformations Third, we prepare ourselves for MINOLESS of the general system of linear equations {y = Ax + i | A \ n × m , rk A < min{n, m} , || i ||G2 = min subject to || x ||G2 = min} y
x
by introducing Lemma 5.4-5.5, namely the eigenvalue-eigencolumn equations of the matrices A#A and AA#, respectively, as well as Lemma 5.6, our basic result of “canonical MINOLESS”, subsequently completed by proofs. Throughout we refer to the adjoint operator which has been introduced by Definition 1.5 and Lemma 1.6. Lemma 5.7 (eigenspace analysis versus eigenspace synthesis A \ n × m , r := rk A < min{n, m} ) The pair of matrices {L, R} for the eigenspace analysis and the eigenspace synthesis of the rectangular matrix A \ n ×m of rank r := rk A < min{n, m} , namely A = L1 AR or ªȁ A = « ¬O2
O1 º = O3 »¼
ª L º = « 1 » A[R1 , R 2 ] ¬L2 ¼
versus
A = LA R 1 or A=
versus
ªR º = [L1 , L 2 ]A* « 1 » ¬R 2 ¼
are determined by the eigenvalue-eigencolumn equations (eigenspace equations) A # AR1 = R1ȁ 2 A # AR 2 = 0
versus
AA # L1 = L1ȁ 2 AA # L 2 = 0
282
5 The third problem of algebraic regression
subject to ªO12 " 0 º « » 2 2 « # % # » , ȁ = Diag(+ O1 ,… , + Or ) . « 0 " Or2 » ¬ ¼ 5-24
Notes
The algebra of eigensystems is treated in varying degrees by most books on linear algebra, in particular tensor algebra. Special mention should be made of R. Bellman’s classic “Introduction to matrix analysis” (1970) and Horn’s and Johnson’s two books (1985, 1991) on introductory and advanced matrix analysis. More or less systematic treatments of eigensystem are found in books on matrix computations. The classics of the field are Householder’s “Theory of matrices in numerical analysis” (1964) and Wilkinson’s “The algebraic eigenvalue problem” (1965) . G. Golub’s and Van Loan’s “Matrix computations” (1996) is the currently definite survey of the field. Trefethen’s and Bau’s “Numerical linear algebra” (1997) is a high-level, insightful treatment with a welcomed stress on geometry. G.W. Stewart’s “Matrix algorithm: eigensystems” (2001) is becoming a classic as well. The term “eigenvalue” derives from the German Eigenwert, which was introduced by D. Hilbert (1904) to denote for integral equations the reciprocal of the matrix eigenvalue. At some point Hilbert’s Eigenwert inverted themselves and became attached to matrices. Eigenvalues have been called many things in their day. The “characteristic value” is a reasonable translation of Eigenwert. However, “characteristic” has an inconveniently large number of syllables and survives only in the terms “characteristic equation” and “characteristic polynomial”. For symmetric matrices the characteristic equation and its equivalent are also called the secular equation owing to its connection with the secular perturbations in the orbits of planets. Other terms are “latent value” and “proper value” from the French “valeur propre”. Indeed the day when purists and pedants could legitimately object to “eigenvalue” as a hybrid of German and English has long since passed. The German “eigen” has become thoroughly materialized English prefix meaning having to do with eigenvalues and eigenvectors. Thus we can use “eigensystem”, “eigenspace” or “eigenexpansion” without fear of being misunderstood. The term “eigenpair” used to denote an eigenvalue and eigenvector is a recent innovation.
5-3 The hybrid approximation solution: D-HAPS and TykhonovPhillips regularization G x ,G y MINOLESS has been built on sequential approximations. First, the surjectivity defect was secured by G y LESS . The corresponding normal
5-3 The hybrid approximation solution
283
equations suffered from the effect of the injectivity defect. Accordingly, second G x LESS generated a unique solution the rank deficient normal equations. Alternatively, we may constitute a unique solution of the system of inconsistent, rank deficient equations {Ax + i = y | AG \ n× m , r := rk A < min{n, m}} by the D -weighted hybrid norm of type “LESS” and “MINOS”. Such a solution of a general algebraic regression problem is also called • Tykhonov- Phillips regularization
• •
ridge estimator D HAPS.
Indeed, D HAPS is the most popular inversion operation, namely to regularize improperly posed problems. An example is the discretized version of an integral equation of the first kind. Definition 5.8 (D-HAPS): An m × 1vector x is called weighted D HAPS (Hybrid AP proximative Solution) with respect to an D -weighted G x , G y -seminorm of (5.1), if x h = arg{|| y - Ax ||G2 +D || x ||G2 = min | Ax + i = y , y
A\
n× m
x
(5.125)
; rk A d min{n, m}}.
Note that we may apply weighted D HAPS even for the case of rank identity rkA d min{n, m} . The factor D \ + balances the least squares norm and the minimum norm of the unknown vector which is illustrated by Figure 5.7.
Figure 5.7. Balance of LESS and MINOS to general MORE Lemma 5.9 (D HAPS ) : x h is weighted D HAPS of x of the general system of inconsistent, possibly of inconsistent, possibly rank deficient system of linear equations (5.1) if and only if the system of normal equations 1 1 (5.126) (D G x + A cG y A )x h = AcG y y or (G x + A cG y A )x h = A cG y y (5.127) D D is fulfilled. xh always exists and is uniquely determined if the matrix (5.128) D G x + A'G y A is regular or rk[G x , A cG y A] = m.
284
5 The third problem of algebraic regression
: Proof : D HAPS is constructed by means of the Lagrangean L( x ) :=|| y - Ax ||G2 +D || x ||G2 = ( y - Ax )cG y ( y - Ax) + D ( xcG y x) = min , y
y
x
such that the first derivates dL ( x h ) = 2(D G x + A cG y A )x h 2A cG y y = 0 dx constitute the necessary conditions. Let us refer to the theory of vector derivatives in Appendix B. The second derivatives w2L ( x h ) = 2(D G x + A cG y A ) t 0 wxwx c generate the sufficiency conditions for obtaining the minimum of the unconstrained Lagrangean. If D G x + A ' G y A is regular of rk[G y , A cG y A ] = m , there exists a unique solution. h Lemma 5.10 (D HAPS ) : If x h is D HAPS of x of the general system of inconsistent, possibly of inconsistent, possibly rank deficient system of linear equations (5.1) fulfilling the rank identity rk[G y , A cG y A ] = m or det(D G x + A cG y A ) z 0 then x h = (D G x + A cG y A ) 1 A cG y y or 1 1 x h = (G x + A cG y A ) 1 A cG y y D D or x h = (D I + G x1A cG y A ) 1 G x1A cG y y or 1 1 1 G x A cG y A ) 1 G x1A cG y y D D are four representations of the unique solution. x h = (I +
6
The third problem of probabilistic regression – special Gauss - Markov model with datum problem – Setup of BLUMBE and BLE for the moments of first order and of BIQUUE and BIQE for the central moment of second order {y = Aȟ + c y , A \ n×m , rk A < min{n, m}} :Fast track reading: Read only Definition 6.1, Theorem 6.3, Definition 6.4-6.6, Theorem 6.8-6.11
Lemma 6.2 ȟˆ hom Ȉ y , S-BLUMBE of ȟ
Theorem 6.3 hom Ȉ y , S-BLUMBE of ȟ
Definition 6.1 ȟˆ hom Ȉ y , S-BLUMBE
Lemma 6.4 n E {y}, D{Aȟˆ}, D{e y }
Theorem 6.5 Vˆ 2 BIQUUE of Vˆ 2
Theorem 6.6 Vˆ 2 BIQE of V 2
286
6 The third problem of probabilistic regression
Definition 6.7 ȟˆ hom BLE of ȟ
Lemma 6.10 hom BLE, hom S-BLE, hom D -BLE
Theorem 6.11 ȟˆ hom BLE
Definition 6.8 ȟˆ S-BLE of ȟ
Definition 6.9 ȟˆ hom hybrid var-min bias
6
Theorem 6.12 ȟˆ hom S-BLE
Theorem 6.13 ȟˆ hom D -BLE
Definition 6.7 and Lemma 6.2, Theorem 6.3, Lemma 6.4, Theorem 6.5 and 6.6 review ȟˆ of type hom Ȉ y , S-BLUMBE, BIQE, followed by the first example. Alternatively, estimators of type best linear, namely hom BLE, hom S-BLE and hom D -BLE are presented. Definitions 6.7, 6.8 and 6.9 relate to various estimators followed by Lemma 6.10, Theorem 6.11, 6.12 and 6.13. In the fifth chapter we have solved a special algebraic regression problem, namely the inversion of a system of inconsistent linear equations with a datum defect. By means of a hierarchic postulate of a minimum norm || x ||2 = min , least squares solution || y Ax ||2 = min (“MINOLESS”) we were able to determine m unknowns from n observations through the rank of the linear operator, rk A = r < min{n, m} , was less than the number of observations or less the number of unknowns. Though “MINOLESS” generates a rigorous solution, we were left with the problem to interpret our results. The key for an evolution of “MINOLESS” is handed over to us by treating the special algebraic problem by means of a special probabilistic regression problem, namely as a special Gauss-Markov model with datum defect. The bias
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
287
generated by any solution of a rank deficient system of linear equations will again be introduced as a decisive criterion for evaluating “MINOLESS”, now in the context of a probabilistic regression problem. In particular, a special form of “LUMBE” the linear uniformly minimum bias estimator || LA I ||= min , leads us to a solution which is equivalent to “MINOS”. “Best” of LUMBE in the sense of the average variance || D{ȟˆ} ||2 = tr D{ȟˆ} = min also called BLUMBE, will give us a unique solution of ȟˆ as a linear estimation of the observation vector y {Y, pdf} with respect to the linear model E{y} = Aȟ , D{y} = Ȉ y of “fixed effects” ȟ ; . Alternatively, in the fifth chapter we had solved the ill-posed problem y = Ax+i, A\ n×m , rk A < min{n,m} , by means of D -HAPS. Here with respect to a special probabilistic regression problem we succeed to compute D -BLE ( D weighted, S modified Best Linear Estimation) as an equivalence to D - HAPS of a special algebraic regression problem. Most welcomed is the analytical optimization problem to determine the regularization parameter D by means of a special form of || MSE{D} ||2 = min , the weighted Mean Square Estimation Error. Such an optimal design of the regulator D is not possible in the Tykhonov-Phillips regularization in the context of D -HAPS, but definitely in the context of D -BLE.
6-1 Setup of the best linear minimum bias estimator of type BLUMBE Box 6.1 is a definition of our special linear Gauss-Markov model with datum defect. We assume (6.1) E{y} = Aȟ, rk A < min{n, m} (1st moments) and (6.2) D{y} = Ȉ y , Ȉ y positive definite, rk Ȉ y = n (2nd moments). Box 6.2 reviews the bias vector as well as the bias matrix including the related norms. Box 6.1 Special linear Gauss-Markov model with datum defect {y = Aȟ + c y , A \ n×m , rk A < min{n, m}} 1st moments E{y} = Aȟ , rk A < min{n, m}
(6.1)
2nd moments D{y} =: Ȉ y , Ȉ y positive definite, rk Ȉ y = n, ȟ \ m , vector of “fixed effects”, unknown, Ȉ y unknown or known from prior information.
(6.2)
288
6 The third problem of probabilistic regression
Box 6.2 Bias vector, bias matrix Vector and matrix bias norms Special linear Gauss-Markov model of fixed effects subject to datum defect A \ n× m , rk A < min{n, m} E{y} = Aȟ, D{y} = Ȉ y
(6.3)
“ansatz” ȟˆ = Ly
(6.4)
bias vector ȕ := E{ȟˆ ȟ} = E{ȟˆ} ȟ z 0
(6.5)
ȕ = LE{y} ȟ = [I m LA ]ȟ z 0
(6.6)
bias matrix B := I n LA
(6.7)
“bias norms” || ȕ ||2 = ȕcȕ = ȟ c[I m LA]c[I m LA]ȟ
(6.8)
2 || ȕ ||2 = tr ȕȕc = tr[I m LA]ȟȟ c[I m LA ]c =|| B ||ȟȟ c
(6.9)
|| ȕ ||S2 := tr[I m LA]S[I m LA ]c =|| B ||S2
(6.10)
“dispersion matrix” D{ȟˆ} = LD{y}Lc = L6 y Lc
(6.11)
“dispersion norm, average variance” || D{ȟˆ} ||2 := tr LD{y}Lc = tr L6 y Lc =:|| Lc ||Ȉ
y
(6.12)
“decomposition” ȟˆ ȟ = (ȟˆ E{ȟˆ}) + ( E{ȟˆ} ȟˆ )
(6.13)
ȟˆ ȟ = L(y E{y}) [I m LA]ȟ
(6.14)
“Mean Square Estimation Error” MSE{ȟˆ} := E{(ȟˆ ȟ )(ȟˆ ȟ )c}
(6.15)
289
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
MSE{ȟˆ} = LD{y}Lc + [I m LA ]ȟȟ c[I m LA ]c
(6.16)
“modified Mean Square Estimation Error” MSES {ȟˆ} := LD{y}Lc + [I m LA ]S[I m LA ]c
(6.17)
“MSE norms, average MSE” || MSE{ȟˆ} ||2 := tr E{(ȟˆ ȟ )(ȟˆ ȟ )c} || MSE{ȟˆ} ||2 =
(6.18)
= tr LD{y}Lc + tr[I m LA]ȟȟ c[I m LA ]c =
(6.19)
= || Lc ||
2 Ȉy
+ || (I m LA)c ||
2 ȟȟ c
|| MSES {ȟˆ} ||2 := := tr LD{y}Lc + tr[I m LA]S[I m LA]c = =|| Lc ||
2 Ȉy
(6.20)
+ || (I m LA)c || . 2 ȟȟ c
Definition 6.1 defines (1st) ȟˆ as a linear homogenous form, (2nd) of type “minimum bias” and (3rd) of type “smallest average variance”. Chapter 6-11 is a collection of definitions and lemmas, theorems basic for the developments in the future. 6-11
Definitions, lemmas and theorems Definition 6.1 (ȟˆ hom Ȉ , S-BLUMBE) : y
An m × 1 vector ȟˆ = Ly is called homogeneous Ȉ y , S- BLUMBE (homogeneous Best Linear Uniformly Minimum Bias Estimation) of ȟ in the special inconsistent linear Gauss Markov model of fixed effects of Box 6.1, if ȟˆ is a homogeneous linear form (1st) (6.21) ȟˆ = Ly ˆ (2nd) in comparison to all other linear estimations ȟ has the minimum bias in the sense of || B ||S2 :=|| ( I m LA )c ||S2 = min,
(6.22)
L
(3rd)
in comparison to all other minimum bias estimations ȟˆ has the smallest average variance in the sense of || D{ȟˆ} ||2 = tr LȈ y Lc =|| L ' ||2Ȉ = min . y
L
(6.23)
290
6 The third problem of probabilistic regression
The estimation ȟˆ of type hom Ȉ y , S-BLUMBE can be characterized by Lemma 6.2 (ȟˆ hom Ȉ y , S-BLUMBE of ȟ ) : An m × 1vector ȟˆ = Ly is hom Ȉ y , S-BLUMBE of ȟ in the special inconsistent linear Gauss- Markov model with fixed effects of Box 6.1, if and only if the matrix L fulfils the normal equations ASA 'º ªL 'º ª º ª Ȉy = « ASA ' 0 »¼ «¬ ȁ »¼ «¬ AS »¼ ¬
(6.24)
with the n × n matrix ȁ of “Lagrange multipliers”. : Proof : First, we minimize the S-modified bias matrix norm, second the MSE( ȟˆ ) matrix norm. All matrix norms have been chosen “Frobenius”. (i) || (I m LA) ' ||S2 = min . L
The S -weighted Frobenius matrix norm || (I m LA ) ' ||S2 establishes the Lagrangean
L (L) := tr(I m LA)S(I m LA) ' = min L
for S-BLUMBE . ª ASA ' Lˆ ' AS = 0 L (L) = min « L ¬ ( ASA ')
I m > 0, according to Theorem 2.3. ASA cº ª C1 ª Ȉy « ASA c 0 »¼ «¬C3 ¬
C2 º ª Ȉ y ASA cº ª Ȉ y ASA cº =« » « » C4 ¼ ¬ ASA c 0 ¼ ¬ ASA c 0 »¼
(6.25)
Ȉ y C1 Ȉ y + Ȉ y C2 ASA c + ASAcC3 Ȉ y + ASAcC4 ASAc = Ȉ y
(6.26)
Ȉ y C1 ASAc + ASA cC3 ASAc = ASAc
(6.27)
ASA cC1 Ȉ y + ASA cC2 ASA c = ASA c
(6.28)
ASA cC1ASA c = 0.
(6.29)
Let us multiply the third identity by Ȉ -1y ASAc = 0 from the right side and substitute the fourth identity in order to solve for C2 . ASAcC2 ASAcȈ y1 ASAc = ASAcȈ y1 ASAc
(6.30)
291
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
C2 = Ȉ -1y ASA c( ASA cȈ y1 ASAc)
(6.31)
solves the fifth equation A cSAȈ-1y ASA c( ASA cȈ-y1ASA c) ASA cȈ-y1ASA c = = ASA cȈ-y1ASA c
(6.32)
by the axiom of a generalized inverse. (ii) || L ' ||2Ȉ = min . y
L
The Ȉ y -weighted Frobenius matrix norm of L subject to the condition of LUMBE generates the constrained Lagrangean
L (L, ȁ) = tr LȈ y L '+ 2 tr ȁ '( ASA ' L ' AS) = min . L,ȁ
According to the theory of matrix derivatives outlined in Appendix B wL ˆ ˆ ˆ ) ' = 0, (L, ȁ ) = 2( Ȉ y Lˆ '+ ASA ' ȁ wL wL ˆ ˆ (L, ȁ ) = 2( ASA ' L ' AS) = 0 , wȁ ˆ ) constitute the necessary conditions, while at the “point” (Lˆ , ȁ w2L ˆ ) = 2( Ȉ
I ) > 0 , (Lˆ , ȁ y m w (vec L) w (vec L ') to be a positive definite matrix, the sufficiency conditions. Indeed, the first matrix derivations have been identified as the normal equations of the sequential optimization problem.
h For an explicit representation of ȟˆ hom Ȉ y , S-BLUMBE of ȟ we solve the normal equations for Lˆ = arg{|| D(ȟˆ ) ||= min | ASA ' L ' AS = 0} . L
In addition, we compute the dispersion matrix D{ȟˆ | hom BLUMBE} as well as the mean square estimation error MSE{ȟˆ | hom BLUMBE}. Theorem 6.3 ( hom Ȉ y , S-BLUMBE of ȟ ): Let ȟˆ = Ly be hom Ȉ y , S-BLUMBE in the special GaussMarkov model of Box 6.1. Then independent of the choice of the generalized inverse ( ASA ' Ȉ y ASA ') the unique solution of the normal equations (6.24) is
292
6 The third problem of probabilistic regression
ȟˆ = SA '( ASA ' Ȉ -1y ASA ') ASA ' Ȉ-1y y ,
(6.33)
completed by the dispersion matrix D(ȟˆ ) = SA '( ASA ' Ȉ-1y ASA ') AS ,
(6.34)
the bias vector ȕ = [I m SA '( ASA ' Ȉ -1y ASA ') ASA ' Ȉ -1y A] ȟ ,
(6.35)
and the matrix MSE {ȟˆ} of mean estimation errors E{(ȟˆ ȟ )(ȟˆ ȟ ) '} = D{ȟˆ} + ȕȕ '
(6.36)
modified by E{(ȟˆ ȟ )(ȟˆ ȟ ) '} = D{ȟˆ} + [I m LA]S[I m LA]' = = D{ȟˆ} + [S SA '( ASA ' Ȉ -1 ASA ') ASA ' Ȉ -1 AS ], y
(6.37)
y
based upon the solution of ȟȟ ' by S. rk MSE{ȟˆ} = rk S
(6.38)
is the corresponding rank identity. :Proof: (i) ȟˆ hom Ȉ y , S-BLUMBE of ȟ . First, we prove that the matrix of the normal equations ASA cº ª Ȉy , « ASA c 0 »¼ ¬
ASAcº ª Ȉy =0 « ASA c 0 »¼ ¬
is singular. Ȉy ASAc =| Ȉ y | | ASAcȈ y 1 ASAc |= 0 , c 0 ASA due to rk[ ASAcȈ y1ASAc] = rk A < min{n, m} assuming rk S = m , rk Ȉ y = n . Note A11 A 21
A12 ª A \ m ×m =| A11 | | A 22 A 21 A111 A12 | if « 11 A 22 ¬ rk A11 = m1 1
1
293
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
with reference to Appendix A. Thanks to the rank deficiency of the partitioned normal equation matrix, we are forced to compute secondly its generalized inverse. The system of normal equations is solved for ªLˆ cº ª Ȉ y ASA cº ª 0 º ª C1 = « »=« ˆ 0 »¼ «¬ AS »¼ «¬C3 «¬ ȁ »¼ ¬ ASA c
C2 º ª 0 º C4 »¼ «¬ AS »¼
Lˆ c = C2 AS
(6.39) (6.40)
Lˆ = SA cCc2
(6.41)
Lˆ = SA( ASA cȈ y1ASA c) ASA cȈ y1
(6.42)
such that ˆ = SA c( ASA cȈ 1ASA c) ASAcȈ 1y. ȟˆ = Ly y y
(6.43)
We leave the proof for “ SA c( ASA cȈ y1ASA c) ASA cȈ y1 is a weighted pseudo-inverse or Moore-Penrose inverse” as an exercise. (ii) Dispersion matrix D{ȟˆ} . The residual vector ȟˆ E{y} = Lˆ (y E{y})
(6.44)
leads to the variance-covariance matrix ˆ Lˆ c = D{ȟˆ} = LȈ y = SA c( ASA cȈ y1ASA c) ASAcȈ y1 ASA c( ASA cȈ y1 ASA c) AS =
(6.45)
= SAc( ASAcȈ ASAc) AS . 1 y
(iii) Bias vector E ˆ )ȟ = ȕ := E{ȟˆ ȟ} = (I m LA = [I m SA c( ASA cȈ y1ASA c) ASA cȈ y1A]ȟ .
(6.46)
Such a bias vector is not accessible to observations since ȟ is unknown. Instead it is common practice to replace ȟ by ȟˆ (BLUMBE), the estimation ȟˆ of ȟ of type BLUMBE. (iv) Mean Square Estimation Error MSE{ȟˆ}
294
6 The third problem of probabilistic regression
MSE{ȟˆ} := E{(ȟˆ ȟ )(ȟˆ ȟ )c} = D{ȟˆ} + ȕȕ c = ˆ Lˆ c + (I LA ˆ )ȟȟ c(I LA ˆ )c . = LȈ m
y
(6.47)
m
Neither D{ȟˆ | Ȉ y } , nor ȕȕ c are accessible to measurements. ȟȟ c is replaced by K.R. Rao’s substitute matrix S, Ȉ y = 9V 2 by a one variance component model V 2 by Vˆ 2 (BIQUUE) or Vˆ 2 (BIQE), for instance.
h
n Lemma 6.4 ( E {y} , D{Aȟˆ} , e y , D{ey } for ȟˆ hom Ȉ y , S of ȟ ): (i)
With respect to the model (1st) Aȟ = E{y} , E{y} \( A ), rk A =: r d m and VV 2 = D{y}, V positive definite, rkV = n under the condition dim R(SA c) = rk(SA c) = rk A = r , namely V, S-BLUMBE, is given by n E {y} = Aȟˆ = A( AcV 1 A) AcV 1 y
(6.48)
with the related singular dispersion matrix D{Aȟˆ} = V 2 A( A cV 1A) A c
(6.49)
for any choice of the generalized inverse ( AcV 1 A) . (ii)
The empirical error vector e y = y E{y} results in the residual error vector e y = y Aȟˆ of type e y = [I n A( A cV 1A) A cV 1By ]
(6.50)
with the related singular dispersion matrices D{e y } = V 2 [ V A( A cV 1A ) A c]
(6.51)
for any choice of the generalized inverse ( AcV 1 A) . (iii)
The various dispersion matrices are related by D{y} = D{Aȟˆ + e y } = D{Aȟˆ} + D{e y } = = D{e y e y } + D{e y },
(6.52)
where the dispersion matrices e y
and
Aȟˆ
(6.53)
are uncorrected, in particularly, C{e y , Aȟˆ} = C{e y , e y e y } = 0 .
(6.54).
295
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
When we compute the solution by Vˆ of type BIQUUE and of type BIQE we arrive at Theorem 6.5 and Theorem 6.6. Theorem 6.5
( Vˆ 2 BIQUUE of V 2 , special Gauss-Markov model: E{y} = Aȟ , D{y} = VV 2 , A \ n× m , rk A = r d m , V \ n× m , rk V = n ):
Let Vˆ 2 = y cMy = (vec M )cy
y = y c
y c(vec M ) be BIQUUE with respect to the special Gauss-Markov model 6.1. Then
Vˆ 2 = (n - r )-1 y c[V 1 - V 1 A( A cV 1 A) A cV 1 ]y
(6.55)
Vˆ 2 = (n - r )-1 y c[V 1 - V 1 ASA c( ASA cV 1 ASA c) ASA cV 1 ]y
(6.56)
Vˆ 2 = (n - r )-1 y cV 1e y = (n - r )-1 e cy V 1e y
(6.57)
are equivalent representations of the BIQUUE variance component Vˆ 2 which are independent of the generalized inverses ( A cV 1 A)
or
( ASAcV 1 AcSA) .
The residual vector e y , namely e y (BLUMBE) = [I n A ( A cV 1A ) 1 A cV 1 ]y ,
(6.58)
is of type BLUMBE. The variance of Vˆ 2 BIQUUE of V 2 D{Vˆ 2 } = 2(n r ) 1 V 4 = 2( n r ) 1 (V 2 ) 2
(6.59)
can be substituted by the estimation D{Vˆ 2 } = 2(n r ) 1 (Vˆ 2 ) 2 = 2(n r ) 1 (e cy V 1e y ) 2 .
(6.60)
( Vˆ 2 BIQE of V 2 , special Gauss-Markov model: E{y} = Aȟ , D{y} = VV 2 , A \ n× m , rk A = r d m , V \ n× m , rk V = n ): Let Vˆ 2 = y cMy = (vec M )cy
y = y c
y c(vec M ) be BIQE with respect to the special Gauss-Markov model 6.1. Independent of the matrix S and of the generalized inverses ( A cV 1 A ) or ( ASAcV 1 AcSA) ,
Theorem 6.6
Vˆ 2 = (n r + 2) 1 y c[V 1 V 1 A( A cV 1 A) 1 A cV 1 ]y
(6.61)
Vˆ 2 = (n r + 2) 1 y c[V 1 V 1 ASA c( ASAcV 1 ASAc) 1 ASAcV 1 ]y (6.62)
296
6 The third problem of probabilistic regression
Vˆ 2 = ( n r + 2) 1 y cV 1e y = ( n r + 2) 1 e cy V 1e y
(6.63)
are equivalent representations of the BIQE variance component Vˆ 2 . The residual vector e y , namely e y (BLUMBE) = [I m A ( A cV 1A ) 1 A cV 1 ]y ,
(6.64)
is of type BLUMBE. The variance of Vˆ 2 BIQE of V2 D{Vˆ 2 } = 2(n r )(n r + 2) 2 V 4 = 2(n r )[( n r + 2) 1 V 2 ]2
(6.65)
can be substituted by the estimation Dˆ {Vˆ 2 } = 2(n r )( n r + 2) 4 (e cy V 1e y ) 2 .
(6.66)
The special bias ȕV := E{Vˆ 2 V 2 } = 2( n r + 2) 1V 2 2
(6.67)
can be substituted by the estimation ȕˆ V = Eˆ {Vˆ 2 V 2 } = 2( n r + 2) 2 e cy V 1e y . 2
(6.68)
Its MSE (Vˆ 2 ) (Mean Square Estimation Error) MSE{Vˆ 2 }:= Eˆ {(Vˆ 2 V 2 ) 2 } = D{Vˆ 2 } + (V 2 E{Vˆ 2 }) 2 = = 2( n r + 2) 1 (V 2 ) 2
(6.69)
can be substituted by the estimation n{Vˆ 2 } = Eˆ {(Vˆ 2 V 2 ) 2 } = Dˆ {Vˆ 2 } + ( Eˆ {V 2 }) 2 = MSE = 2(n r + 2) 3 (e cy V 1e y ). 6-12
(6.70)
The first example: BLUMBE versus BLE, BIQUUE versus BIQE, triangular leveling network
The first example for the special Gauss-Markov model with datum defect {E{y} = Aȟ, A \ n×m , rk A < min{n, m}, D{y} = VV 2 , V \ n×m , V 2 \ + , rk V = n} is taken from a triangular leveling network. 3 modal points are connected, by leveling measurements [ hĮȕ , hȕȖ , hȖĮ ]c , also called potential differences of absolute potential heights [hĮ , hȕ , hȖ ]c of “fixed effects”. Alternative estimations of type (i)
I, I-BLUMBE of ȟ \ m
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
(ii)
V, S-BLUMBE of ȟ \ m
(iii)
I, I-BLE of ȟ \ m
(iv)
V, S-BLE of ȟ \ m
(v)
BIQUUE of V 2 \ +
(vi)
BIQE of V 2 \ +
297
will be considered. In particular, we use consecutive results of Appendix A, namely from solving linear system of equations based upon generalized inverse, in short g-inverses. For the analyst, the special Gauss-Markov model with datum defect constituted by the problem of estimating absolute heights [hĮ , hȕ , hȖ ] of points {PĮ , Pȕ , PȖ } from height differences is formulated in Box 6.3. Box 6.3 The first example ª hĮȕ º ª 1 +1 0 º ª hĮ º « » « » E{« hȕȖ »} = «« 0 1 +1»» « hȕ » « » « » ¬ hȖĮ ¼ ¬« +1 0 1¼» ¬ hȖ ¼ ª hĮȕ º « » y := « hȕȖ » , « hȖĮ » ¬ ¼
ª 1 + 1 0 º A := «« 0 1 +1»» \ 3×3 , «¬ +1 0 1»¼
ª hĮ º « » ȟ := « hȕ » « hȖ » ¬ ¼
ª hĮȕ º « » D{« hȕȖ »} = D{y} = VV 2 , V 2 \ + « hȖĮ » ¬ ¼ :dimensions: ȟ \ 3 , dim ȟ = 3, y \ 3 , dim{Y, pdf } = 3 m = 3, n = 3, rk A = 2, rk V = 3. 6-121 The first example: I3, I3-BLUMBE In the first case, we assume a dispersion matrix D{y} = I 3V 2 of i.i.d. observations [y1 , y 2 , y 3 ]c
and
a unity substitute matrix S=,3, in short u.s. .
Under such a specification ȟˆ is I3, I3-BLUMBE of ȟ in the special GaussMarkov model with datum defect.
298
6 The third problem of probabilistic regression
ȟˆ = A c( AA cAA c) AA cy ª 2 1 1º c c AA AA = 3 «« 1 2 1»» , «¬ 1 1 2 »¼
( AA cAA c) =
ª2 1 0º 1« 1 2 0 »» . « 9 «¬ 0 0 0 »¼
?How did we compute the g-inverse ( AA cAAc) ? The computation of the g-inverse ( AAcAAc) has been based upon bordering.
ª ª 6 3º 1 0 º ª 6 3 3 º ª 2 1 0º 1 «« » « » « » ( AA cAAc) = « 3 6 3» = « ¬ 3 6 ¼» 0 » = «1 2 0 » . 9 « 0 0 «¬ 3 3 6 »¼ «¬ 0 0 0 »¼ 0 »¼ ¬ Please, check by yourself the axiom of a g-inverse:
ª +6 3 3º ª +6 3 3º ª +6 3 3º ª +6 3 3º « »« » « » « » « 3 +6 3» « 3 +6 3» « 3 +6 3» = « 3 +6 3» ¬« 3 3 +6 ¼» ¬« 3 3 +6 ¼» ¬« 3 3 +6¼» ¬« 3 3 +6¼» or
ª +6 3 3º ª 2 1 0 º ª +6 3 3º ª +6 3 3º « »1« » « » « » « 3 +6 3» 9 « 1 2 0 » « 3 +6 3» = « 3 +6 3» «¬ 3 3 +6 »¼ «¬ 0 0 0 »¼ «¬ 3 3 +6 »¼ «¬ 3 3 +6»¼ ª hĮ º ª y1 + y3 º 1 « » ȟˆ = « hȕ » (I 3 , I 3 -BLUMBE) = «« y1 y2 »» 3 « hȖ » «¬ y2 y3 »¼ ¬ ¼
[ˆ1 + [ˆ2 + [ˆ3 = 0 . Dispersion matrix D{ȟˆ} of the unknown vector of “fixed effects” D{ȟˆ} = V 2 A c( AA cAAc) A ª +2 1 1º V2 « ˆ D{ȟ} = 1 + 2 1 » » 9 « ¬« 1 1 +2 ¼» “replace V 2 by Vˆ 2 (BIQUUE):
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
299
Vˆ 2 = (n rk A) 1 e cy e y ” e y (I 3 , I 3 -BLUMBE) = [I 3 A ( AA c) A c]y ª1 1 1º ª1º 1 1 e y = «1 1 1» y = ( y1 + y2 + y3 ) «1» » «» 3« 3 «¬1 1 1»¼ «¬1»¼ ª1 1 1º ª1 1 1º 1 e cy e y = y c ««1 1 1»» ««1 1 1»» y 9 «¬1 1 1»¼ «¬1 1 1»¼ 1 e cy e y = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 ) 3 1 Vˆ 2 (BIQUUE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 ) 3 ª +2 1 1º 1« ˆ D{ȟ} = « 1 +2 1»» Vˆ 2 (BIQUUE) 9 «¬ 1 1 +2 »¼ “replace V 2 by Vˆ 2 (BIQE):
Vˆ 2 = ( n + 2 rk A ) 1 e cy e y ” e y (I 3 , I 3 -BLUMBE) = [I 3 A ( AA c) A c]y ª1 1 1º ª1º 1 1 e y = ««1 1 1»» y = ( y1 + y2 + y3 ) ««1»» 3 3 «¬1 1 1»¼ «¬1»¼ 1 Vˆ 2 ( BIQE ) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 ) 9 ª +2 1 1º 1 D{ȟˆ} = «« 1 +2 1»» Vˆ 2 (BIQE) . 9 «¬ 1 1 +2 »¼ For practice, we recommend D{ȟˆ (BLUMBE), Vˆ 2 (BIQE)} , since the dispersion matrix D{ȟˆ} is remarkably smaller when compared to D{ȟˆ (BLUMBE), Vˆ 2 (B IQUUE)} , a result which seems to be unknown!
300
6 The third problem of probabilistic regression
Bias vector ȕ(BLUMBE) of the unknown vector of “fixed effects” ȕ = [I 3 A c( AA cAA c) AA cA]ȟ , ª1 1 1º 1« ȕ = «1 1 1»» ȟ , 3 «¬1 1 1»¼ “replace ȟ which is inaccessible by ȟˆ (I 3 , I 3 -BLUMBE) ” ª1 1 1º 1« ȕ = «1 1 1»» ȟˆ (I 3 , I 3 -BLUMBE) , 3 ¬«1 1 1»¼ ȕ=0 (due to [ˆ1 + [ˆ2 + [ˆ3 = 0 ). Mean Square Estimation Error MSE{ȟˆ (I 3 , I 3 -BLUMBE)} MSE{ȟˆ} = D{ȟˆ} + [I 3 A c( AA cAA c) AA cA]V 2 , ª5 2 2º V2 « ˆ MSE{ȟ} = 2 5 2 »» . 9 « «¬ 2 2 5 »¼ “replace V 2 by Vˆ 2 (BIQUUE): Vˆ 2 = (n rk A) 1 ecy e y ” 1 Vˆ 2 (BIQUUE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 ) , 3 ª5 2 2º 1« ˆ MSE{ȟ} = « 2 5 2 »» Vˆ 2 (BIQUUE) . 9 «¬ 2 2 5 »¼ “replace V 2 by Vˆ 2 (BIQE):
Vˆ 2 = ( n + 2 rk A ) 1 ecy e y ” 1 Vˆ 2 (BIQE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 ) 9
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
301
ª5 2 2º 1 MSE{ȟˆ} = «« 2 5 2 »» Vˆ 2 (BIQE) . 9 «¬ 2 2 5 »¼ Residual vector e y and dispersion matrix D{e y } of the “random effect” e y e y (I 3 , I 3 -BLUMBE) = [I 3 A ( A cA ) A c]y ª1 1 1º ª1º 1« 1 » e y = «1 1 1» y = ( y1 + y2 + y3 ) ««1»» 3 3 «¬1 1 1»¼ «¬1»¼ D{e y } = V 2 [I 3 A( A cA) A c] ª1 1 1º V2 « D{e y } = 1 1 1»» . 3 « «¬1 1 1»¼ “replace V 2 by Vˆ 2 (BIQUUE) or Vˆ 2 (BIQE)”: ª1 1 1º 1« D{e y } = «1 1 1»» Vˆ 2 (BIQUUE) 3 «¬1 1 1»¼ or ª1 1 1º 1« D{e y } = «1 1 1»» Vˆ 2 (BIQE) . 3 «¬1 1 1»¼ Finally note that ȟˆ (I 3 , I 3 -BLUMBE) corresponds x lm (I 3 , I 3 -MINOLESS) discussed in Chapter 5. In addition, D{e y | I 3 , I 3 -BLUUE} = D{e y | I 3 , I 3 -BLUMBE} . 6-122 The first example: V, S-BLUMBE In the second case, we assume a dispersion matrix D{y} = VV 2 of weighted observations [ y1 , y2 , y3 ]
and
a weighted substitute matrix S, in short w.s. .
302
6 The third problem of probabilistic regression
Under such a specification ȟˆ is V, S-BLUMBE of ȟ in the special GaussMarkov model with datum defect. ȟˆ = SA c( ASAcV 1ASAc) 1 ASAcV 1y . As dispersion matrix D{y} = VV 2 we choose ª2 1 1º 1« V = «1 2 1 »» , rk V = 3 = n 2 «¬1 1 2 »¼ ª 3 1 1º 1« V = « 1 3 1»» , but 2 «¬ 1 1 3 »¼ 1
S = Diag(0,1,1), rk S = 2 as the bias semi-norm. The matrix S fulfils the condition rk(SA c) = rk A = r = 2 . ?How did we compute the g-inverse ( ASAcV 1 ASA c) ? The computation of the g-inverse ( ASAcV 1 ASA c) has been based upon bordering. ª +3 1 1º 1« V = « 1 +3 1»» , S = Diag(0,1,1), rk S = 2 2 «¬ 1 1 +3»¼ 1
ȟˆ = SA c( ASA cV 1ASA c) ASA cV 1 ª 2 3 1 º ASAcV ASAc = 2 «« 3 6 3»» «¬ 1 3 2 »¼ 1
ª 2 0 1º 1« ( ASAcV ASAc) = « 0 0 3»» 6 «¬ 1 0 2 »¼ ª hĮ º 0 ª º 1« ˆȟ = « h » = « 2 y1 y2 y3 »» . « ȕ» 3 « hȖ » ¬« y1 + y2 2 y3 ¼» ¬ ¼ V ,S BLUMBE 1
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
Dispersion matrix D{ȟˆ} of the unknown vector of “fixed effects” D{ȟˆ} = V 2SA c( ASA cV 1ASA c) AS ª0 0 0 º V2 « ˆ D{ȟ} = 0 2 1 »» « 6 «¬0 1 2 »¼ “replace V 2 by Vˆ 2 (BIQUUE): Vˆ 2 = (n rk A) 1 e cy e y ” e y = (V, S-BLUMBE) = [I 3 A( A cV 1A) A cV 1 ]y ª1 1 1º y + y2 + y3 1« e y = «1 1 1»» y = 1 3 3 «¬1 1 1»¼
ª1º «1» «» «¬1»¼
ª1 1 1º ª1 1 1º 1 « e cy e y = y c «1 1 1»» ««1 1 1»» y 9 «¬1 1 1»¼ «¬1 1 1»¼ 1 e cy e y = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 ) 3 1 Vˆ 2 (BIQUUE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 ) 3 D{ȟˆ} = [V + A( A cV 1A) A c]Vˆ 2 (BIQUUE) ª1 1 1º 2« ˆ D{ȟ} = «1 1 1»» Vˆ 2 (BIQUUE) . 3 «¬1 1 1»¼ “replace V 2 by Vˆ 2 (BIQE):
Vˆ 2 = (n + 2 rk A) 1 e cy e y ” e y (V , S-BLUMBE) = [I 3 A ( A cV 1A ) A cV 1 ]y
303
304
6 The third problem of probabilistic regression
ª1 1 1º y + y2 + y3 1 e y = ««1 1 1»» y = 1 3 3 «¬1 1 1»¼
ª1º «1» «» «¬1»¼
1 Vˆ 2 (BIQE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 ) 9 ª +2 1 1º 1 D{ȟˆ} = «« 1 +2 1»» Vˆ 2 (BIQE) . 9 «¬ 1 1 +2 »¼ We repeat the statement that we recommend the use of D{ȟˆ (BLUMBE), Vˆ (BIQE)} since the dispersion matrix D{ȟˆ} is remarkably smaller when compared to D{ȟˆ (BLUMBE), Vˆ 2 (BIQUUE)} ! Bias vector ȕ(BLUMBE) of the unknown vector of “fixed effects” ȕ = [I 3 SA c( ASA cV 1 ASA c) ASA cV 1 A ]ȟ ª1 0 0 º ª[1 º « » ȕ = «1 0 0 » ȟ = ««[1 »» , ¬«1 0 0 »¼ ¬«[1 ¼» “replace ȟ which is inaccessible by ȟˆ (V,S-BLUMBE)” ª1º ȕ = ««1»» ȟˆ , (V , S-BLUMBE) z 0 . ¬«1¼» Mean Square Estimation Error MSE{ȟˆ (V , S-BLUMBE)} MSE{ȟˆ} = = D{ȟˆ} + [S SA c( ASA cV 1ASA c) ASA cV 1AS]V 2 ª0 0 0 º V2 « ˆ MSE{ȟ} = 0 2 1 »» = D{ȟˆ} . 6 « «¬0 1 2 »¼ “replace V 2 by Vˆ 2 (BIQUUE): Vˆ 2 = (n rk A) 1 ecy e y ”
Vˆ 2 (BIQUUE)=3V 2
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
305
MSE{ȟˆ} = D{ȟˆ} . Residual vector e y and dispersion matrix D{e y } of the “random effect” e y e y (V , S-BLUMBE) = [I 3 A ( A cV 1A ) A cV 1 ]y ª1 1 1º y + y2 + y3 1« e y = «1 1 1»» y = 1 3 3 «¬1 1 1»¼
ª1º «1» «» «¬1»¼
D{e y } = V 2 [V A( A cV 1A) A c] ª1 2 2« D{e y } = V «1 3 «¬1 2 “replace V by Vˆ 2
1 1º 1 1»» . 1 1»¼ (BIQE):
Vˆ 2 = (n + 2 rk A) 1 ecy e y ” Vˆ 2 (BIQE) versus ª0 0 0 º 1« ˆ MSE{ȟ} = «0 2 1 »» V 2 ( BIQE ) . 6 «¬0 1 2 »¼ Residual vector e y and dispersion matrix D{e y } of the “random effects” e y e y (V , S-BLUMBE) = [I 3 A ( A cV 1A ) A cV 1 ]y ª1 1 1º y + y2 + y3 1« e y = «1 1 1»» y = 1 3 3 «¬1 1 1»¼
ª1º «1» «» «¬1»¼
D{e y } = V 2 [V A( A cV 1A) A c] ª1 1 1º 2 2« D{e y } = V «1 1 1»» . 3 «¬1 1 1»¼ “replace V 2 by Vˆ 2 (BIQUUE) or Vˆ 2 (BIQE)”:
306
6 The third problem of probabilistic regression
D{e y } =
ª1 1 1º 2« 1 1 1»» Vˆ 2 (BIQUUE) « 3 «¬1 1 1»¼ or
ª1 1 1º 2« D{e y } = «1 1 1»» Vˆ 2 (BIQE) . 3 «¬1 1 1»¼ 6-123 The first example: I3 , I3-BLE In the third case, we assume a dispersion matrix D{y} = I 3V 2 of i.i.d. observations [ y1 , y2 , y3 ]
and
a unity substitute matrix S=I3, in short u.s. .
Under such a specification ȟˆ is I3, I3-BLE of ȟ in the special Gauss-Markov model with datum defect. ȟˆ (BLE) = (I 3 + A cA ) 1 A cy ª +3 1 1º I 3 + A cA = «« 1 +3 1»» , «¬ 1 1 +3»¼
ª2 1 1º 1« (I 3 + AcA) = «1 2 1 »» 4 «¬1 1 2 »¼ 1º ª 1 0 ª y1 + y3 º ˆȟ (BLE) = 1 « 1 1 0 » y = 1 « + y y » 1 2» » 4« 4« «¬ 0 1 1»¼ «¬ + y2 y3 »¼ 1
[ˆ1 + [ˆ2 + [ˆ3 = 0 . Dispersion matrix D{ȟˆ | BLE} of the unknown vector of “fixed effects” D{ȟˆ | BLE} = V 2 A cA (I 3 + AcA) 2 ª +2 1 1º 2 V « 1 +2 1» . D{ȟˆ | BLE} = » 16 « «¬ 1 1 +2 »¼ Bias vector ȕ(BLE) of the unknown vector of “fixed effects”
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
307
ȕ = [I 3 + A cA]1 ȟ ª2 1 1º ª 2[1 + [ 2 + [3 º 1« 1« » ȕ = « 1 2 1 » ȟ = «[1 + 2[ 2 + [3 »» . 4 4 «¬ 1 1 2 »¼ «¬[1 + [ 2 + 2[3 »¼ Mean Square Estimation Error MSE{ȟ (BLE)} MSE{ȟˆ (BLE)} = V 2 [I 3 + A cA]1 ª2 1 1º V2 « ˆ MSE{ȟ (BLE)} = 1 2 1 »» . 4 « «¬1 1 2 »¼ Residual vector e y and dispersion matrix D{e y } of the “random effect” e y e y (BLE) = [ AA c + I 3 ]1 y e y (BLE) =
ª2 1 1º ª 2 y1 + y2 + y3 º 1« 1« » 1 2 1 » y = « y1 + 2 y2 + y3 »» « 4 4 «¬1 1 2 »¼ «¬ y1 + y2 + 2 y3 »¼
D{e y (BLE)} = V 2 [I 3 + AA c]2 D{e y (BLE)} =
ª6 5 5º V2 « 5 6 5 »» . « 16 «¬5 5 6 »¼
Correlations C{e y , Aȟˆ} = V 2 [I 3 + AA c]2 AA c ª +2 1 1º V2 « ˆ C{e y , Aȟ} = 1 +2 1» . « » 16 ¬« 1 1 +2 ¼» Comparisons BLUMBE-BLE (i) ȟˆ BLUMBE ȟˆ BLE ȟˆ BLUMBE ȟˆ BLE = A c( AA cAA c) AAc( AAc + I 3 ) 1 y
308
6 The third problem of probabilistic regression
ª 1 0 1 º ª y1 + y3 º 1 1 ȟˆ BLUMBE ȟˆ BLE = «« 1 1 0 »» y = «« + y1 y2 »» . 12 12 «¬ 0 1 1»¼ «¬ + y2 y3 »¼ (ii) D{ȟˆ BLUMBE } D{ȟˆ BLE } D{ȟˆ BLUMBE } D{ȟˆ BLE } = = V 2 A c( AA cAAc) AAc( AAc + I 3 ) 1 AAc( AAcAAc) A + +V 2 A c( AA cAAc) AAc( AAc + I 3 ) 1 AAc( AAc + I 3 ) 1 AAc( AAcAAc) A ª +2 1 1º 7 2 D{ȟˆ BLUMBE } D{ȟˆ BLE } = V « 1 +2 1»» positive semidefinite. 144 « «¬ 1 1 +2 »¼ (iii) MSE{ȟˆ BLUMBE } MSE{ȟˆ BLE } MSE{ȟˆ BLUMBE } MSE{ȟˆ BLE } = = ı 2 A c( AA cAAc) AAc( AAc + I 3 ) 1 AAc( AAcAAc) A ª +2 1 1º V2 « ˆ ˆ MSE{ȟ BLUMBE } MSE{ȟ BLE } = 1 +2 1»» positive semidefinite. 36 « «¬ 1 1 +2 »¼ 6-124 The first example: V, S-BLE In the fourth case, we assume a dispersion matrix D{y} = VV 2 of weighted observations [ y1 , y2 , y3 ]
and
a weighted substitute matrix S, in short w.s. .
We choose ª2 1 1º 1« V = «1 2 1 »» positive definite, rk V = 3 = n , 2 «¬1 1 2 »¼ ª +3 1 1º 1« V = « 1 +3 1»» , 2 «¬ 1 1 +3»¼ 1
and S = Diag(0,1,1), rk S = 2 ,
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
309
ȟˆ = (I 3 + SA cV 1A) 1 SA cV 1y , ª 1 0 0º ª 21 0 0 º 1 « « » 1 1 1 c c I 3 + SA V A = « 2 5 2 » , (I 3 + SA V A) = «14 5 2 »» , 21 «¬ 2 2 5 »¼ «¬14 2 5 »¼ ª hĮ º ˆȟ (V, S-BLE) = « h » = « ȕ» « hȖ » ¬ ¼ V ,S -BLE 0 0º 0 ª0 ª º 1 « 1 « » = «14 6 4 » y = «10 y1 6 y2 4 y3 »» . 21 21 «¬ 4 «¬ 4 y1 + 6 y2 10 y3 »¼ 6 10 »¼ Dispersion matrix D{ȟˆ | V, S-BLE} of the unknown vector of “fixed effects” D{ȟˆ | V, S-BLE} = V 2SA cV 1A[I 3 + SA cV 1A]1 S , ª0 0 0 º V2 « ˆ D{ȟ | V, S-BLE} = 0 76 22 »» . « 441 «¬0 22 76 »¼ Bias vector ȕ(V, S-BLE) of the unknown vector of “fixed effects” ȕ = [I 3 + SA cV 1 A]1 ȟ 21[1 ª 21 0 0 º ª º 1 « 1 « » ȕ = «14 5 2 » ȟ = «14[1 + 5[ 2 + 2[ 3 »» . 21 21 «¬14 2 5 »¼ «¬14[1 + 2[ 2 + 5[ 3 »¼ Mean Square Estimation Error MSE{ȟ | V , S-BLE} MSE{ȟ | V, S-BLE} = V 2 [I 3 + SA cVA]1 S ª0 0 0 º V2 « MSE{ȟ | V, S-BLE} = 0 5 2 »» . 21 « «¬0 2 5 »¼ Residual vector e y and dispersion matrix D{e y } of the “random effect” e y e y (V , S-BLE) = [I 3 + ASA cV 1 ]1 y
310
6 The third problem of probabilistic regression
e y {V , S-BLE} =
ª11 6 4 º ª11 y1 + 6 y2 + 4 y3 º 1 « » y = 1 « 6y + 9y + 6y » 6 9 6 1 2 3 » » 21 « 21 « «¬ 4 6 11»¼ «¬ 4 y1 + 6 y2 + 11y3 »¼
D{e y (V, S-BLE)} = V 2 [I 3 + ASA cV 1 ]2 V ª614 585 565 º V2 « D{e y (V, S-BLE)} = 585 594 585 »» . 882 « «¬565 585 614 »¼ Correlations C{e y , Aȟˆ} = V 2 (I 3 + ASAcV 1 ) 2 ASAc ª 29 9 20 º 2 V « 9 18 9 » . C{e y , Aȟˆ} = » 441 « «¬ 20 9 29 »¼ Comparisons BLUMBE-BLE (i) ȟˆ BLUMBE ȟˆ BLE ȟˆ V ,S BLUMBE ȟˆ V ,S -BLE = SA c( ASA cV 1ASA c) ASA c( ASA c + V ) 1 y ȟˆ V ,S BLUMBE ȟˆ V ,S -BLE
0º 0 ª0 0 ª º 1 « 1 « » = « 4 1 3 » y = « 4 y1 y2 3 y3 »» . 21 21 «¬ 3 1 4 »¼ «¬3 y1 + y2 4 y3 »¼
(ii) D{ȟˆ BLUMBE } D{ȟˆ BLE } D{ȟˆ V ,S -BLUMBE } D{ȟˆ V ,S -BLE } = = V SA c( ASA cV ASA c) ASA c( ASA c + V ) 1 ASA c(ASA cV 1ASA c) AV + 2
1
V 2 SA c( ASA cV 1 ASA c) ASA c( ASA c + V ) 1 ASA c(ASA c + V ) 1 ASA c(ASA cV 1ASA c) AS
0 0º ª0 2 V « D{ȟˆ V ,S -BLUMBE } D{ȟˆ V ,S -BLE } = 0 142 103»» positive semidefinite. 882 « «¬ 0 103 142 »¼ (iii) MSE{ȟˆ BLUMBE } MSE{ȟˆ BLE } MSE{ȟˆ V ,S -BLUMBE } MSE{ȟˆ V ,S -BLE } = = V 2SA c( ASAcV 1ASAc) ASAc( ASAc + V ) 1 ASAc( ASAcV 1 ASAc) AS
6-1 Setup of the best linear minimum bias estimator of type BLUMBE
MSE{ȟˆ V ,S BLUMBE } MSE{ȟˆ V ,S BLE } =
311
ª0 0 0 º V2 « 0 4 3 »» positive semidefinite. « 42 «¬0 3 4 »¼
Summarizing, let us compare I,I-BLUMBE versus I,I-BLE and V,S-BLUMBE versus V,S-BLE! ȟˆ BLUMBE ȟˆ BLE , D{ȟˆ BLUMBE } D{ȟˆ BLE } and MSE{ȟˆ BLUMBE } MSE{ȟˆ BLE } as well as ȟˆ V ,S -BLUMBE ȟˆ V,S -BLE , D{ȟˆ V ,S -BLUMBE } D{ȟˆ V ,S -BLE } and MSE{ȟˆ V ,S -BLUMBE } MSE{ȟˆ V ,S -BLE } result positive semidefinite: In consequence, for three different measures of distorsions BLE is in favor of BLIMBE: BLE produces smaller errors in comparing with BLIMBE! Finally let us compare weighted BIQUUE and weighted BIQE: (i)
Weighted BIQUUE Vˆ 2 and weighted BIQE Vˆ 2
Vˆ 2 = (n r ) 1 y cV 1e y =
versus
= (n r ) 1 e cy V 1e y
Vˆ 2 = (n r + 2)y cV 1e y = = (n r + 2)e cy V 1e y
(e y ) V ,S -BLUMBE
ª4 1 1º 1« = «1 4 1 »» y 6 «¬1 1 4 »¼
ª +3 1 1º 1« r = rk A = 2, n = 3, V = « 1 +3 1»» 2 «¬ 1 1 +3»¼ 1
(ii)
1 Vˆ 2 = ( y12 + y22 + y32 ) 2
versus
1 Vˆ 2 = ( y12 + y22 + y32 ) 6
D{Vˆ 2 | BIQUUE}
versus
D{Vˆ 2 | BIQE}
D{Vˆ 2 } = 2(n r ) 1 V 4
versus
D{Vˆ 2 } = 2(n r )(n r + 2) 1 V 4
D{Vˆ 2 } = 2V 4
versus
2 D{Vˆ 2 } = V 4 9
312
6 The third problem of probabilistic regression
Dˆ {Vˆ 2 } = 2(n r ) 1 (Vˆ 2 ) 2
versus
Eˆ {Vˆ 2 V 2 } = = 2(n r + 2) 1 e cy V 1e y
1 Dˆ {Vˆ 2 } = ( y12 + y22 + y32 ) 2
versus
1 Eˆ {Vˆ 2 V 2 } = ( y12 + y22 + y32 ) 9 Eˆ {Vˆ 2 V 2 } = 2(n r + 2) 1 (e cy V 1e y ) 2 1 Eˆ {(Vˆ 2 V 2 )} = ( y12 + y22 + y32 ) . 54
(iii)
(e y ) BLUMBE = [I n A( A cV 1A) AV 1 ](e y ) BLE (Vˆ 2 ) BIQUUE = ( n r )(e cy ) BLE [ V 1 V 1A( A cV 1A) AV 1 ](e y ) BLE 1 Vˆ 2BIQUUE Vˆ 2BIQE = ( y12 + y22 + y32 ) positive. 3
2 We repeat that the difference Vˆ 2BIQUUE Vˆ BIQE is in favor of Vˆ 2BIQE < Vˆ 2BIQUUE .
6-2 Setup of the best linear estimators of type hom BLE, hom SBLE and hom Į-BLE for fixed effects Numerical tests have documented that ȟˆ of type Ȉ - BLUUE of ȟ is not robust against outliers in the stochastic vector y observations. It is for this reason that we give up the postulate of unbiasedness, but keeping the set up of a linear estimation ȟˆ = Ly of homogeneous type. The matrix L is uniquely determined by the D weighted hybrid norm of type minimum variance || D{ȟˆ} ||2 and minimum bias || I LA ||2 . For such a homogeneous linear estimation (2.21) by means of Box 6.4 let us specify the real-valued, nonstochastic bias vector ȕ:= E{ȟˆ ȟ} = E{ȟˆ}ȟ of type (6.11), (6.12), (6.13) and the real-valued, nonstochastic bias matrix I m LA (6.74) in more detail. First, let us discuss why a setup of an inhomogeneous linear estimation is not suited to solve our problem. In the case of an unbiased estimator, the setup of an inhomogeneous linear estimation ȟˆ = Ly + ț led us to E{ȟˆ} = ȟ the postulate of unbiasedness if and only if E{ȟˆ} ȟ := LE{y} ȟ + ț = (I m LA)ȟ + ț = 0 for all ȟ R m or LA = I m and ț = 0. Indeed the postulate of unbiasedness restricted the linear operator L to be the (non-unique) left inverse L = A L as well as the vector ț of inhomogeneity to zero. In contrast the bias vector ȕ := E{ȟˆ ȟ} = E{ȟˆ} ȟ = LE{y} ȟ = (I m LA)ȟ + ț for a setup of an inhomogeneous linear estimation should approach zero if ȟ = 0 is chosen as a special case. In order to include this case in the linear biased estimation procedure we set ț = 0 .
6-2 Setup of the best linear estimators fixed effects
313
Second, we focus on the norm (2.79) namely || ȕ ||2 := E{(ȟˆ ȟ )c(ȟˆ ȟ )} of the bias vector ȕ , also called Mean Square Error MSE{ȟˆ} of ȟˆ . In terms of a setup of a homogeneous linear estimation, ȟˆ = Ly , the norm of the bias vector is represented by (I m LA)cȟȟ c(I m LA) or by the weighted Frobenius matrix norm 2 || (I m LA)c ||ȟȟ c where the weight matrix ȟȟ c, rk ȟȟ c = 1, has rank one. Obviously 2 || (I m LA)c ||ȟȟ c is only a semi-norm. In addition, ȟȟ c is not accessible since ȟ is unknown. In this problematic case we replace the matrix ȟȟ c by a fixed, positive definite m×m matrix S and define the S-weighted Frobenius matrix norm || (I m LA)c ||S2 of type (2.82) of the bias matrix I m LA . Indeed by means of the rank identity, rk S=m we have chosen a weight matrix of maximal rank. Now we are prepared to understand intuitively the following. Here we focus on best linear estimators of type hom BLE, hom S-BLE and hom Į-BLE of fixed effects ȟ, which turn out to be better than the best linear uniformly unbiased estimator of type hom BLUUE, but suffer from the effect to be biased. At first let us begin with a discussion about the bias vector and the bias matrix as well as the Mean Square Estimation Error MSE{ȟˆ} with respect to a homogeneous linear estimation ȟˆ = Ly of fixed effects ȟ based upon Box 6.4. Box 6.4 Bias vector, bias matrix Mean Square Estimation Error in the special Gauss–Markov model with fixed effects E{y} = Aȟ
(6.71)
D{y} = Ȉ y
(6.72)
“ansatz” ȟˆ = Ly
(6.73)
bias vector ȕ := E{ȟˆ ȟ} = E{ȟˆ} ȟ
(6.74)
ȕ = LE{y} ȟ = [I m LA] ȟ
(6.75)
bias matrix B := I m LA
(6.76)
decomposition ȟˆ ȟ = (ȟˆ E{ȟˆ}) + ( E{ȟˆ} ȟ )
(6.77)
ȟˆ ȟ = L(y E{y}) [I m LA] ȟ
(6.78)
314
6 The third problem of probabilistic regression
Mean Square Estimation Error MSE{ȟˆ} := E{(ȟˆ ȟ )(ȟˆ ȟ )c}
(6.79)
MSE{ȟˆ} = LD{y}Lc + [I m LA ] ȟȟ c [I m LA ]c
(6.80)
( E{ȟˆ E{ȟˆ}} = 0) modified Mean Square Estimation Error MSES {ȟˆ} := LD{y}Lc + [I m LA] S [I m LA]c
(6.81)
Frobenius matrix norms || MSE{ȟˆ} ||2 := tr E{(ȟˆ ȟ )(ȟˆ ȟ )c}
(6.82)
|| MSE{ȟˆ} ||2 = = tr LD{y}Lc + tr[I m LA] ȟȟ c [I m LA]c
(6.83)
= || Lc ||
2 6y
+ || (I m LA)c ||
2 ȟȟ '
|| MSES {ȟˆ} ||2 := := tr LD{y}Lc + tr[I m LA]S[I m LA]c
(6.84)
= || Lc ||6y + || (I m LA)c ||S 2
2
hybrid minimum variance – minimum bias norm Į-weighted L(L) := || Lc ||62 y + 1 || (I m LA)c ||S2 D
(6.85)
special model dim R (SAc) = rk SAc = rk A = m .
(6.86)
The bias vector ȕ is conventionally defined by E{ȟˆ} ȟ subject to the homogeneous estimation form ȟˆ = Ly . Accordingly the bias vector can be represented by (6.75) ȕ = [I m LA] ȟ . Since the vector ȟ of fixed effects is unknown, there has been made the proposal to use instead the matrix I m LA as a matrix-valued measure of bias. A measure of the estimation error is the Mean Square estimation error MSE{ȟˆ} of type (6.79). MSE{ȟˆ} can be decomposed into two basic parts: • the dispersion matrix D{ȟˆ} = LD{y}Lc •
the bias product ȕȕc.
Indeed the vector ȟˆ ȟ can be decomposed as well into two parts of type (6.77), (6.78), namely (i) ȟˆ E{ȟˆ} and (ii) E{ȟˆ} ȟ which may be called estimation
315
6-2 Setup of the best linear estimators fixed effects
error and bias, respectively. The double decomposition of the vector ȟˆ ȟ leads straightforward to the double representation of the matrix MSE{ȟˆ} of type (6.80). Such a representation suffers from two effects: Firstly the vector ȟ of fixed effects is unknown, secondly the matrix ȟȟ c has only rank 1. In consequence, the matrix [I m LA] ȟȟ c [I m LA]c has only rank 1, too. In this situation there has been made a proposal to modify MSE{ȟˆ} with respect to ȟȟ c by the regular matrix S. MSES {ȟˆ} has been defined by (6.81). A scalar measure of MSE{ȟˆ} as well as MSES {ȟˆ} are the Frobenius norms (6.82), (6.83), (6.84). Those scalars constitute the optimal risk in Definition 6.7 (hom BLE) and Definition 6.8 (hom S-BLE). Alternatively a homogeneous Į-weighted hybrid minimum varianceminimum bias estimation (hom Į-BLE) is presented in Definition 6.9 (hom ĮBLE) which is based upon the weighted sum of two norms of type (6.85), namely •
average variance || Lc ||62 y = tr L6 y Lc
•
average bias || (I m LA)c ||S2 = tr[I m LA] S [I m LA]c.
The very important estimator Į-BLE is balancing variance and bias by the weight factor Į which is illustrated by Figure 6.1.
min bias
balance between variance and bias
min variance
Figure 6.1 Balance of variance and bias by the weight factor Į Definition 6.7 ( ȟˆ hom BLE of ȟ ): An m×1 vector ȟˆ is called homogeneous BLE of ȟ in the special linear Gauss-Markov model with fixed effects of Box 6-3, if (1st) ȟˆ is a homogeneous linear form ȟˆ = Ly ,
(6.87)
(2nd) in comparison to all other linear estimations ȟˆ has the minimum Mean Square Estimation Error in the sense of || MSE{ȟˆ} ||2 = = tr LD{y}Lc + tr[I m LA] ȟȟ c [I m LA]c = || Lc ||6y + || (I m LA)c || 2
2 ȟȟ c
.
(6.88)
316
6 The third problem of probabilistic regression
Definition 6.8
( ȟˆ S-BLE of ȟ ):
An m×1 vector ȟˆ is called homogeneous S-BLE of ȟ in the special linear Gauss-Markov model with fixed effects of Box 6.3, if (1st) ȟˆ is a homogeneous linear form ȟˆ = Ly ,
(6.89)
(2nd) in comparison to all other linear estimations ȟˆ has the minimum S-modified Mean Square Estimation Error in the sense of || MSES {ȟˆ} ||2 := := tr LD{y}Lc + tr[I m LA]S[I m LA]c
(6.90)
= || Lc ||62 y + || (I m LA)c ||S2 = min . L
Definition 6.9 ( ȟˆ hom hybrid min var-min bias solution, Į-weighted or hom Į-BLE): An m×1 vector ȟˆ is called homogeneous Į-weighted hybrid minimum variance- minimum bias estimation (hom Į-BLE) of ȟ in the special linear Gauss-Markov model with fixed effects of Box 6.3, if (1st) ȟˆ is a homogeneous linear form ȟˆ = Ly ,
(6.91)
(2nd) in comparison to all other linear estimations ȟˆ has the minimum variance-minimum bias in the sense of the Į-weighted hybrid norm tr LD{y}Lc + 1 tr (I m LA ) S (I m LA )c D = || Lc ||62 + 1 || (I m LA)c ||S2 = min , L D
(6.92)
y
in particular with respect to the special model
D \ + , dim R (SA c) = rk SA c = rk A = m . The estimations ȟˆ hom BLE, hom S-BLE and hom Į-BLE can be characterized as following: Lemma 6.10 (hom BLE, hom S-BLE and hom Į-BLE): An m×1 vector ȟˆ is hom BLE, hom S-BLE or hom Į-BLE of ȟ in the special linear Gauss-Markov model with fixed effects of Box 6.3, if and only if the matrix Lˆ fulfils the normal equations
317
6-2 Setup of the best linear estimators fixed effects
(1st)
hom BLE: ( Ȉ y + Aȟȟ cA c)Lˆ c = Aȟȟ c
(2nd)
(3rd)
(6.93)
hom S-BLE: ˆ c = AS ( Ȉ y + ASAc)L
(6.94)
( Ȉ y + 1 ASAc)Lˆ c = 1 AS . D D
(6.95)
hom Į-BLE:
:Proof: (i) hom BLE: The hybrid norm || MSE{ȟˆ} ||2 establishes the Lagrangean
L (L) := tr L6 y Lc + tr (I m LA) ȟȟ c (I m LA)c = min L
for ȟˆ hom BLE of ȟ . The necessary conditions for the minimum of the quadratic Lagrangean L (L) are wL ˆ (L) := 2[6 y Lˆ c + Aȟȟ cA cLˆ c Aȟȟ c] = 0 wL which agree to the normal equations (6.93). (The theory of matrix derivatives is reviewed in Appendix B (Facts: derivative of a scalar-valued function of a matrix: trace). The second derivatives w2 L (Lˆ ) > 0 w (vecL)w (vecL)c at the “point” Lˆ constitute the sufficiency conditions. In order to compute such an mn×mn matrix of second derivatives we have to vectorize the matrix normal equation wL ˆ ( L ) := 2Lˆ (6 y + Aȟȟ cA c) 2ȟ ȟ cA c , wL wL ( Lˆ ) := vec[2 Lˆ (6 y + Aȟȟ cA c) 2ȟȟ cA c] . w (vecL )
(ii) hom S-BLE: The hybrid norm || MSEs {ȟˆ} ||2 establishes the Lagrangean
L (L) := tr L6 y Lc + tr (I m LA) S (I m LA)c = min L
318
6 The third problem of probabilistic regression
for ȟˆ hom S-BLE of ȟ . Following the first part of the proof we are led to the necessary conditions for the minimum of the quadratic Lagrangean L (L) wL ˆ (L) := 2[6 y Lˆ c + ASAcLˆ c AS]c = 0 wL as well as to the sufficiency conditions w2 L (Lˆ ) = 2[( Ȉ y + ASAc)
I m ] > 0 . w (vecL)w ( vecL)c The normal equations of hom S-BLE
wL wL (Lˆ ) = 0 agree to (6.92).
(iii) hom Į-BLE: The hybrid norm || Lc ||62 + 1 || ( I m - LA )c ||S2 establishes the Lagrangean D y
L (L) := tr L6 y Lc + 1 tr (I m - LA)S(I m - LA)c = min L D for ȟˆ hom Į-BLE of ȟ . Following the first part of the proof we are led to the necessary conditions for the minimum of the quadratic Lagrangean L (L) wL ˆ (L) = 2[( Ȉ y + Aȟȟ cA c)
I m ]vecLˆ 2vec(ȟȟ cA c) . wL The Kronecker-Zehfuss product A
B of two arbitrary matrices as well as ( A + B)
C = A
B + B
C of three arbitrary matrices subject to dim A = dim B is introduced in Appendix A, “Definition of Matrix Algebra: multiplication of matrices of the same dimension (internal relation) and Laws”. The vec operation (vectorization of an array) is reviewed in Appendix A, too, “Definition, Facts: vecAB = (Bc
I cA )vecA for suitable matrices A and B”. Now we are prepared to compute w2 L (Lˆ ) = 2[(6 y + Aȟȟ cA c)
I m ] > 0 w (vecL)w (vecL)c as a positive definite matrix. The theory of matrix derivatives is reviewed in Appendix B “Facts: Derive of a matrix-valued function of a matrix, namely w (vecX) w (vecX)c ”. wL ˆ ˆ c+ Ȉ L ˆ c 1 AS]cD ( L) = 2[ 1 ASA cL y D D wL as well as to the sufficiency conditions
319
6-2 Setup of the best linear estimators fixed effects
w2 L (Lˆ ) = 2[( 1 ASA c + Ȉ y )
I m ] > 0. D w (vecL)w (vecL)c The normal equations of hom Į-BLE wL wL (Lˆ ) = 0 agree to (6.93).
h For an explicit representation of ȟˆ as hom BLE, hom S-BLE and hom Į-BLE of ȟ in the special Gauss–Markov model with fixed effects of Box 6.3, we solve the normal equations (6.94), (6.95) and (6.96) for Lˆ = arg{L (L) = min} . L
Beside the explicit representation of ȟˆ of type hom BLE, hom S-BLE and hom Į-BLE we compute the related dispersion matrix D{ȟˆ} , the Mean Square Estimation Error MSE{ȟˆ}, the modified Mean Square Estimation Error MSES {ȟˆ} and MSED ,S {ȟˆ} in Theorem 6.11 ( ȟˆ hom BLE): Let ȟˆ = Ly be hom BLE of ȟ in the special linear Gauss-Markov model with fixed effects of Box 6.3. Then equivalent representations of the solutions of the normal equations (6.93) are ȟˆ = ȟȟ cA c[ Ȉ y + Aȟȟ cA c]1 y
(6.96)
(if [6 y + Aȟȟ cA c]1 exists) and completed by the dispersion matrix D{ȟˆ} = ȟȟ cA c[ Ȉ y + Aȟȟ cA c]1 Ȉ y × × [ Ȉ y + Aȟȟ cA c]1 Aȟȟ c ,
(6.97)
by the bias vector ȕ := E{ȟˆ} ȟ = [I m ȟȟ cA c( Aȟȟ cA c + Ȉ y ) 1 A] ȟ
(6.98)
and by the matrix of the Mean Square Estimation Error MSE{ȟˆ} :
MSE{ȟˆ}:= E{(ȟˆ ȟ)(ȟˆ ȟ)c} = D{ȟˆ} + ȕȕc
(6.99)
320
6 The third problem of probabilistic regression
MSE{ȟˆ} := D{ȟˆ} + [I m ȟȟ cA c( Aȟȟ cA c + Ȉ y ) 1 A] ×
(6.100)
×ȟȟ c [I m Ac( Aȟȟ cA c + Ȉ y ) Aȟȟ c]. 1
At this point we have to comment what Theorem 6.11 tells us. hom BLE has generated the estimation ȟˆ of type (6.96), the dispersion matrix D{ȟˆ} of type (6.97), the bias vector of type (6.98) and the Mean Square Estimation Error of type (6.100) which all depend on the vector ȟ and the matrix ȟȟ c , respectively. We already mentioned that ȟ and the matrix ȟȟ c are not accessible from measurements. The situation is similar to the one in hypothesis testing. As shown later in this section we can produce only an estimator ȟ and consequently can setup a hypothesis ȟ 0 of the "fixed effect" ȟ . Indeed, a similar argument applies to the second central moment D{y} ~ Ȉ y of the "random effect" y, the observation vector. Such a dispersion matrix has to be known in order to be able to compute ȟˆ , D{ȟˆ} , and MSE{ȟˆ} . Again we have to apply the argument that we are ˆ and to setup a hypothesis about only able to construct an estimate Ȉ y D{y} ~ 6 y . Theorem 6.12 ( ȟˆ hom S-BLE): Let ȟˆ = Ly be hom S-BLE of ȟ in the special linear GaussMarkov model with fixed effects of Box 6.3. Then equivalent representations of the solutions of the the normal equations (6.94) are ȟˆ = SA c( Ȉ y + ASA c) 1 y
(6.101)
ȟˆ = ( A cȈ y1A + S 1 ) 1 AcȈ y1y
(6.102)
ȟˆ = (I m + SA cȈ y1A) 1 SA c6 y1y
(6.103)
(if S 1 , Ȉ y1 exist) are completed by the dispersion matrices D{ȟˆ} = SA c( ASAc + Ȉ y ) 1 Ȉ y ( ASAc + Ȉ y ) 1 AS D{ȟˆ} = ( A cȈ A + S ) Ac6 A( A cȈ A + S ) 1 y
1 1
1 y
1 y
1 1
(6.104) (6.105)
(if S 1 , Ȉ y1 exist) by the bias vector ȕ := E{ȟˆ} ȟ = [I m SA c( ASA c + Ȉ y ) 1 A] ȟ ȕ = [I m ( A cȈ y1A + S 1 ) 1 A c6 y1A] ȟ
(6.106)
321
6-2 Setup of the best linear estimators fixed effects
(if S 1 , Ȉ y1 exist) and by the matrix of the modified Mean Square Estimation Error MSE{ȟˆ} : MSES {ȟˆ} := E{(ȟˆ ȟ )(ȟˆ ȟ )c} = D{ȟˆ} + ȕȕc
(6.107)
MSES {ȟˆ} = SA c( ASA c + Ȉ y ) 1 Ȉ y ( ASA c + Ȉ y ) 1 AS + +[I m SA c( ASA c + Ȉ y ) 1 A] ȟȟ c [I m Ac( ASAc + Ȉ y ) 1 AS] =
(6.108)
= S SA c( ASA c + Ȉ y ) AS 1
MSES {ȟˆ} = ( A cȈ y1A + S 1 ) 1 A cȈ y1A( A cȈ y1A + S 1 )1 + + [I m ( A cȈ y1A + S 1 ) 1 A cȈ y1A] ȟȟ c × × [I m A cȈ y1A( A cȈ y1A + S 1 ) 1 ]
(6.109)
= ( A cȈ y1A + S 1 ) 1 (if S 1 , Ȉ y1 exist). The interpretation of hom S-BLE is even more complex. In extension of the comments to hom BLE we have to live with another matrix-valued degree of freedom, ȟˆ of type (6.101), (6.102), (6.103) and D{ȟˆ} of type (6.104), (6.105) do no longer depend on the inaccessible matrix ȟȟ c , rk(ȟȟ c) , but on the "bias weight matrix" S, rk S = m. Indeed we can associate any element of the bias matrix with a particular weight which can be "designed" by the analyst. Again the bias vector ȕ of type (6.106) as well as the Mean Square Estimation Error of type (6.107), (6.108), (6.109) depend on the vector ȟ which is inaccessible. Beside the "bias weight matrix S" ȟˆ , D{ȟˆ} , ȕ and MSEs {ȟˆ} are vector-valued or matrix-valued functions of the dispersion matrix D{y} ~ 6 y of the stochastic observation vector which is inaccessible. By hypothesis testing we may decide y . upon the construction of D{y} ~ 6 y from an estimate 6 Theorem 6.13 ( ȟˆ hom
D -BLE):
Let ȟˆ = /y be hom D -BLE of ȟ in the special linear GaussMarkov model with fixed effects Box 6.3. Then equivalent representations of the solutions of the normal equations (6.95) are ȟˆ = 1 SA c( Ȉ y + 1 ASA c) 1 y D D
(6.110)
ȟˆ = ( A cȈ y1A + D S 1 ) 1 A cȈ y1y
(6.111)
ȟˆ = (I m + 1 SA cȈ y1A) 1 1 SA cȈ y1y D D
(6.112)
322
6 The third problem of probabilistic regression
(if S 1 , Ȉ y1 exist) are completed by the dispersion matrix D{ȟˆ} = 1 SA c( Ȉ y + 1 ASA c) 1 Ȉ y ( Ȉ y + 1 ASA c) 1 AS 1 D D D D
(6.113)
D{ȟˆ} = ( A cȈ y1A + D S 1 ) 1 A cȈ y1A( A cȈ y1A + D S 1 )1
(6.114)
(if S 1 , Ȉ y1 exist), by the bias vector ȕ := E{ȟˆ} ȟ = [I m 1 SA c( 1 ASAc + Ȉ y ) 1 A] ȟ D D ȕ = [I m ( AcȈ y1 A + D S 1 ) 1 AcȈ y1A] ȟ
(6.115)
(if S 1 , Ȉ y1 exist) and by the matrix of the Mean Square Estimation Error MSE{ȟˆ} : MSE{ȟˆ} := E{(ȟˆ ȟ )(ȟˆ ȟ )c} = D{ȟˆ} + ȕȕc
(6.116)
MSED , S {ȟˆ} = SCc( ASA c + Ȉ y ) 1 Ȉ y ( ASA c + Ȉ y ) 1 AS + + [I m - 1 SAc( 1 ASA c + Ȉ y ) 1 A] ȟȟ c × D D × [I m - A c( 1 ASA c + Ȉ y ) 1 AS 1 ] = D D 1 1 1 1 1 = S SAc( ASAc + Ȉ y ) AS D D D D
(6.117)
MSED , S {ȟˆ} = ( A cȈ y1A + D S 1 ) 1 A cȈ y1A( A cȈ y1A + D S 1 ) 1 + + [I m - ( A cȈ y1A + D S 1 ) 1 A cȈ y1A] ȟȟ c × × [I m - A cȈ y1A( A cȈ y1A + D S 1 ) 1 ]
(6.118)
= ( A cȈ y1A + D S 1 ) 1 (if S 1 , Ȉ y1 exist). The interpretation of the very important estimator hom Į-BLE ȟˆ of ȟ is as follows: ȟˆ of type (6.111), also called ridge estimator or Tykhonov-Phillips regulator, contains the Cayley inverse of the normal equation matrix which is additively decomposed into A cȈ y1A and D S 1 . The weight factor D balances the first
6-2 Setup of the best linear estimators fixed effects
323
inverse dispersion part and the second inverse bias part. While the experiment l y , the bias weight matrix informs us of the variance-covariance matrix Ȉ y , say Ȉ S and the weight factor D are at the disposal of the analyst. For instance, by the choice S = Diag( s1 ,..., sA ) we may emphasize increase or decrease of certain bias matrix elements. The choice of an equally weighted bias matrix is S = I m . In contrast the weight factor D can be determined by the A-optimal design of type •
tr D{ȟˆ} = min
•
ȕȕc = min
•
tr MSED , S {ȟˆ} = min .
D
D
D
In the first case we optimize the trace of the variance-covariance matrix D{ȟˆ} of type (6.113), (6.114). Alternatively by means of ȕȕ ' = min we optimize D the quadratic bias where the bias vector ȕ of type (6.115) is chosen, regardless of the dependence on ȟ . Finally for the third case – the most popular one – we minimize the trace of the Mean Square Estimation Error MSED , S {ȟˆ} of type (6.118), regardless of the dependence on ȟȟ c . But beforehand let us present the proof of Theorem 6.10, Theorem 6.11 and Theorem 6.8. Proof: (i) ȟˆ = ȟȟ cA c[ Ȉ y + Aȟȟ cA c]1 y If the matrix Ȉ y + Aȟȟ cA c of the normal equations of type hom BLE is of full rank, namely rk(Ȉ y + Aȟȟ cA c) = n, then a straightforward solution of (6.93) is Lˆ = ȟȟ cA c[ Ȉ y + Aȟȟ cA c]1. (ii) ȟˆ = SA c( Ȉ y + ASA c) 1 y If the matrix Ȉ y + ASAc of the normal equations of type hom S-BLE is of full rank, namely rk(Ȉ y + ASA c) = n, then a straightforward solution of (6.94) is Lˆ = SAc( Ȉ y + ASAc) 1. (iii) z = ( A cȈ y1A + S 1 ) 1 AcȈ y1y Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(10), Duncan-Guttman matrix identity) the fundamental matrix identity SA c( Ȉ y + ASA c) 1 = ( A cȈ y1A + S 1 ) 1 A cȈ y1 , if S 1 and Ȉ y1 exist. Such a result concludes this part of the proof. (iv) ȟˆ = (I m + SA cȈ y1A) 1 SA cȈ y1y Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(9)) the fundamental matrix identity
324
6 The third problem of probabilistic regression
SA c( Ȉ y + ASAc) 1 = (I m + SAcȈ y1 A) 1 SAcȈ y1 , if Ȉ y1 exists. Such a result concludes this part of the proof. (v) ȟˆ = 1 SA c( Ȉ y + 1 ASA c) 1 y D D If the matrix Ȉ y + D1 ASA c of the normal equations of type hom Į-BLE is of full rank, namely rk(Ȉ y + D1 ASA c) = n, then a straightforward solution of (6.95) is Lˆ = 1 SA c[ Ȉ y + 1 ASAc]1 . D D (vi) ȟˆ = ( A cȈ y1A + D S 1 ) 1 A cȈ y1y Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(10), Duncan-Guttman matrix identity) the fundamental matrix identity 1 SAc( Ȉ + ASAc) 1 = ( AcȈ 1 A + D S 1 ) 1 AcȈ 1 y y y D if S 1 and Ȉ y1 exist. Such a result concludes this part of the proof. (vii) ȟˆ = (I m + 1 SA cȈ y1A) 1 1 SA cȈ y1y D D Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(9), Duncan-Guttman matrix identity) the fundamental matrix identity 1 SA c( Ȉ + ASA c) 1 = (I + 1 SA cȈ 1A ) 1 1 SA cȈ 1 m y y y D D D if Ȉ y1 exist. Such a result concludes this part of the proof. (viii) hom BLE: D{ȟˆ} D{ȟˆ} := E{[ȟˆ E{ȟˆ}][ȟˆ E{ȟˆ}]c} = = ȟȟ cA c[ Ȉ y + Aȟȟ cA c]1 Ȉ y [ Ȉ y + Aȟȟ cA c]1 Aȟȟ c. By means of the definition of the dispersion matrix D{ȟˆ} and the substitution of ȟˆ of type hom BLE the proof has been straightforward. (ix) hom S-BLE: D{ȟˆ} (1st representation) D{ȟˆ} := E{[ȟˆ E{ȟˆ}][ȟˆ E{ȟˆ}]c} = = SA c( ASA c + Ȉ y ) 1 Ȉ y ( ASA c + Ȉ y ) 1 AS. By means of the definition of the dispersion matrix D{ȟˆ} and the substitution of ȟˆ of type hom S-BLE the proof of the first representation has been straightforward.
6-2 Setup of the best linear estimators fixed effects
325
(x) hom S-BLE: D{ȟˆ} (2nd representation) D{ȟˆ} := E{[ȟˆ E{ȟˆ}][ȟˆ E{ȟˆ}]c} = = ( A cȈ y1A + S 1 ) 1 Ac6 y1A( A cȈ y1A + S 1 )1 , if S 1 and Ȉ y1 exist. By means of the definition of the dispersion matrix D{ȟˆ} and the substitution of ȟˆ of type hom S-BLE the proof of the second representation has been straightforward. (xi) hom Į-BLE: D{ȟˆ} (1st representation) ˆ D{ȟ} := E{[ȟˆ E{ȟˆ}][ȟˆ E{ȟˆ}]c} = = 1 SA c( Ȉ y + 1 ASA c) 1 Ȉ y ( Ȉ y + 1 ASA c) 1 AS 1 . D D D D By means of the definition of the dispersion matrix D{ȟˆ} and the substitution of ȟˆ of type hom Į-BLE the proof of the first representation has been straightforward. (xii) hom Į-BLE: D{ȟˆ} (2nd representation) D{ȟˆ} := E{[ȟˆ E{ȟˆ}][ȟˆ E{ȟˆ}]c} = = ( A cȈ y1A + D S 1 ) 1 AcȈ y1A( AcȈ y1A + D S 1 )1 , if S 1 and Ȉ y1 exist. By means of the definition of the dispersion matrix and the D{ȟˆ} substitution of ȟˆ of type hom Į-BLE the proof of the second representation has been straightforward. (xiii) bias ȕ for hom BLE, hom S-BLE and hom Į-BLE As soon as we substitute into the bias ȕ := E{ȟˆ} ȟ = ȟ + E{ȟˆ} the various estimators ȟˆ of the type hom BLE, hom S-BLE and hom Į-BLE we are directly led to various bias representations ȕ of type hom BLE, hom S-BLE and hom ĮBLE. (xiv) MSE{ȟˆ} of type hom BLE, hom S-BLE and hom Į-BLE MSE{ȟˆ} := E{(ȟˆ ȟ )(ȟˆ ȟ )c} ȟˆ ȟ = ȟˆ E{ȟˆ} + ( E{ȟˆ} ȟ ) E{(ȟˆ ȟ )(ȟˆ ȟ )c} = E{(ȟˆ E{ȟˆ})((ȟˆ E{ȟˆ})c} +( E{ȟˆ} ȟ )( E{ȟˆ} ȟ )c MSE{ȟˆ} = D{ȟˆ} + ȕȕc .
326
6 The third problem of probabilistic regression
At first we have defined the Mean Square Estimation Error MSE{ȟˆ} of ȟˆ . Secondly we have decomposed the difference ȟˆ ȟ into the two terms • •
ȟˆ E{ȟˆ} E{ȟˆ} ȟ
in order to derive thirdly the decomposition of MSE{ȟˆ} , namely • •
the dispersion matrix of ȟˆ , namely D{ȟˆ} , the quadratic bias ȕȕc .
As soon as we substitute MSE{ȟˆ} the dispersion matrix D{ȟˆ} and the bias vector ȕ of various estimators ȟˆ of the type hom BLE, hom S-BLE and hom D -BLE we are directly led to various representations ȕ of the Mean Square Estimation Error MSE{ȟˆ} . Here is my proof’s end.
h
7
A spherical problem of algebraic representation - inconsistent system of directional observational equations - overdetermined system of nonlinear equations on curved manifolds
“Least squares regression is not appropriate when the response variable is circular, and can lead to erroneous results. The reason for this is that the squared difference is not an appropriate measure of distance on the circle.” U. Lund (1999) A typical example of a nonlinear model is the inconsistent system of nonlinear observational equations generated by directional measurements (angular observations, longitudinal data). Here the observation space Y as well as the parameter space X is the hypersphere S p R p +1 : the von Mises circle S1 , p = 2 the Fisher sphere S 2 , in general the Langevin sphere S p . For instance, assume repeated measurements of horizontal directions to one target which are distributed as polar coordinates on a unit circle clustered around a central direction. Alternatively, assume repeated measurements of horizontal and vertical directions to one target which are similarly distributed as spherical coordinates (longitude, latitude) on a unit sphere clustered around a central direction. By means of a properly chosen loss function we aim at a determination of the central direction. Let us connect all points on S1 , S 2 , or in general S p the measurement points, by a geodesic, here the great circle, to the point of the central direction. Indeed the loss function will be optimal at a point on S1 , S 2 , or in general S p , called the central point. The result for such a minimum geodesic distance mapping will be presented. Please pay attention to the guideline of Chapter 7.
Lemma 7.2 minimum geodesic distance: S1 Definition 7.1 minimum geodesic distance: S1
Lemma 7.3 minimum geodesic distance: S1
Definition 7.4 minimum geodesic distance: S 2
Lemma 7.5 minimum geodesic distance: S 2 Lemma 7.6 minimum geodesic distance: S 2
328
7 A spherical problem of algebraic representation
7-1 Introduction Directional data, also called “longitudinal data” or “angular data”, arise in several situations, notable geodesy, geophysics, geology, oceanography, atmospheric science, meteorology and others. The von Mises or circular normal distribution CN ( P , N ) with mean direction parameter P (0 d P d 2S ) and concentration parameter N (N > 0) , the reciprocal of a dispersion measure, plays the role in circular data parallel to that of the Gauss normal distribution in linear data. A natural extension of the CN distribution to the distribution on a pdimensional sphere S p R p +1 leads to the Fisher - von Mises or Langevin distribution L( P , N ) . For p=2, namely for spherical data (spherical longitude, spherical latitude), this distribution has been studied by R. A. Fisher (1953), generalizing the result of R. von Mises (1918) for p=1, and is often quoted as the Fisher distribution. Further details can be taken from K. V. Mardia (1972), K. V. Mardia and P.E. Jupp (2000), G. S. Watson (1986, 1998) and A. Sen Gupta and R. Maitra (1998). Box 7.1: Fisher - von Mises or Langevin distribution p=1 ( R. von Mises 1918) f (/ | P , N ) = [2S I 0 (N )]1 exp[N cos( / P / )]
(7.1)
f (/ | P , N ) = [2S I 0 (N )] exp N < ȝ | ; > cos < := :=< ȝ | ; >= P x X + P y Y = cos P / cos / + sin P / sin /
(7.2) (7.3)
1
cos < = cos(/ P/ )
(7.4)
ȝ = e1 cos P/ + e 2 sin P / S1
(7.5)
X = e1 cos / + e 2 sin / S1
(7.6)
p=2 (R. A. Fisher 1953) f ( /, ) | P / , P ) , N ) =
N exp[cos ) cos P) cos(/ P / ) + sin ) sin P) ] 4S sinh N N = exp N < ȝ | X > 4S sinh N cos < :=< ȝ | X >= P x X + P yY + P z Z =
= cos P) cos P / cos ) cos / + cos P) sin P / cos ) sin / + sin P) sin ) cos < = cos ) cos P) cos(/ P / ) + sin ) sin P)
(7.7)
(7.8)
329
7-1 Introduction
ȝ = e1 P x + e 2 P y + e3 P z = = e1 cos P) cos P / + e 2 cos P) sin P / + e3 sin P) S 2 X = e1 X + e 2Y + e3 Z = = e1 cos ) cos / + e 2 cos ) sin / + e3 sin ) S 2 .
(7.9) (7.10)
Box 7.1 is a review of the Fisher- von Mises or Langevin distribution. First, we setup the circular normal distribution on S1 with longitude / as the stochastic variable and ( P/ , N ) the distributional parameters called “mean direction ȝ ” and “concentration measure”, the reciprocal of a dispersion measure. Due to the normalization of the circular probability density function (“pdf”) I 0 (N ) as the zero order modified Bessel function of the first kind of N appears. The circular distance between the circular mean vector ȝ S1 and the placement vector X S1 is measured by “ cos < ”, namely the inner product < ȝ | X > , both P and X represented in polar coordinates ( P / , / ) , respectively. In summary, (7.1) is the circular normal pdf, namely an element of the exponential class. Second, we refer to the spherical normal pdf on S 2 with spherical longitude / , spherical latitude ) as the stochastic variables and ( P / , P) , N ) the distributional parameters called “longitudinal mean direction, lateral mean direction ( P/ , P) ) ” and “concentration measure N ”, the reciprocal of a dispersion measure. Here the normalization factor of the spherical pdf is N /(4S sinh N ) . The spherical distance between the spherical mean vector ȝ S 2 and the placement vector X S 2 is measured by “ cos < ”, namely the inner product < ȝ | X > , both ȝ and X represented in polar coordinates – spherical coordinates ( P / , P) , /, ) ) , respectively. In summary, (7.7) is the spherical normal pdf, namely an element of the exponential class. Box 7.2: Loss function p=1: longitudinal data n
type1:
¦ cos < i =1
i
= max ~ 1c cos Ȍ = max
n
type 2 :
¦ (1 cos < i =1 n
type 3 :
¦ sin i =1
2
i
) = min ~ 1c(1 cos Ȍ) = min
< i / 2 = min ~ (sin
Ȍ Ȍ )c (sin ) = min 2 2
(7.11) (7.12) (7.13)
transformation 1 cos < = 2sin 2 < / 2 " geodetic distance" cos< i = cos(/ i x) = cos / i cos x + sin / i sin x
(7.14)
2sin < i / 2 = 1 cos < i = 1 cos / i cos x + sin / i sin x
(7.16)
2
(7.15)
330
7 A spherical problem of algebraic representation
ª cos <1 º ª y1 º ª cos /1 º cos Ȍ = «« # »» , cos y := cos ȁ := «« # »» = «« # »» «¬cos < n »¼ «¬ yn »¼ «¬ cos / n »¼
(7.17)
cos Ȍ = cos y cos x + sin y sin x .
(7.18)
? How to generate a loss function substituting least squares ? Obviously the von Mises pdf ( p { 1) has maximum likelihood if 6in=1 cos < i = 6in=1 cos(/ i x) is maximal. Equivalently 6in=1 (1 cos < i ) is minimal. By transforming 1 cos < i by (7.14) into 2sin 2 < / 2 , an equivalent loss function is 6in=1 sin 2 < / 2 to be postulated minimal. According to Box 7.2 the geodetic distance is represented as a nonlinear function of the unknown mean direction P x . (7.17) constitutes the observation vector y S1 . Similarly the Fisher pdf ( p { 2) has maximum likelihood if 6in=1 cos < i is maximal. Equivalent postulates are (7.20) 6in=1 (1 cos < i ) = min and (7.21) 6in=1 sin 2 < i / 2 = min . According to Box 7.3 the geodetic distance (7.23) is represented as a nonlinear function of the unknown mean direction ( P/ , P) ) ( x1 , x2 ) . (7.24), (7.25) constitute the nonlinear observational equations for direct observations of type “longitude latitude” (/ i , ) i ) S 2 , the observation space Y , and unknown parameters of type mean longitudinal, lateral direction ( P/ , P) ) S 2 , the parameter space X . Box 7.3: Loss function p=2: longitudinal data n
type1:
i
= max
~
1c cos Ȍ = max
(7.19)
i
) = min
~
1c(1 cos Ȍ) = min
(7.20)
¦ cos < i =1
n
type 2 :
¦ (1 cos < i =1
n
type 3 :
¦ sin i =1
2
< i / 2 = min
~
(sin
Ȍ Ȍ )c (sin ) = min 2 2
(7.21)
transformation 1 cos < = 2sin 2 < / 2 " geodetic distance" cos< i = cos ) i cos x2 cos(/ i x1 ) + sin ) i sin x2 = = cos ) i cos / i cos x1 cos x2 + cos ) i sin / i sin x1 cos x2 + sin ) i sin x2
(7.22)
(7.23)
331
7-2 Minimal geodesic distance: MINGEODISC
ª cos <1 º ª cos /1 º cos Ȍ := «« # »» , cos y1 := cos ȁ := «« # »» «¬ cos < n »¼ «¬ cos / n »¼
(7.24)
ª cos )1 º cos y 2 := cos ĭ := «« # »» , sin y1 , sin y 2 correspondingly ¬« cos ) n ¼» ª cos )1 cos /1 º ª cos )1 sin /1 º ª sin )1 º « » « » « » cos Ȍ = « # # » cos x1 cos x2 + « » sin x1 cos x2 + « # » sin x2 . «¬ cos ) n cos / n »¼ «¬ cos ) n sin / n »¼ «¬sin ) n »¼
(7.25)
7-2 Minimal geodesic distance: MINGEODISC By means of Definition 7.1 we define the minimal geodetic distance solution (MINGEODISC) on S 2 . Lemma 7.2 presents you the corresponding nonlinear normal equation whose close form solution is explicitly given by Lemma 7.3 in terms of Gauss brackets (special summation symbols). In contrast Definition 7.4 confronts us with the definition of the minimal geodetic distance solution (MINGEODISC) on S 2 . Lemma 7.5 relates to the corresponding nonlinear normal equations which are solved in a closed form via Lemma 7.6 , again taking advantage of the Gauss brackets. Definition 7.1 (minimum geodesic distance: S1 ): A point / g S1 is called at minimum geodesic distance to other points / i S1 , i {1, " , n} if the circular distance function n
n
i =1
i =1
L(/ g ) := ¦ 2(1 cos < i ) = ¦ 2[1 cos(/ i / g )] = min /g
(7.26)
is minimal. n
/ g = arg {¦ 2(1 cos < i ) = min | cos < i = cos(/ i / g )} .
(7.27)
i=1
Lemma 7.2 (minimum geodesic distance, normal equation: S1 ): A point / g S1 is called at minimum geodesic distance to other points / i S1 , i {1, " , n} if / g = x fulfils the normal equation n
n
i =1
i =1
sin x (¦ cos / i ) + cos x (¦ sin / i ) = 0.
(7.28)
332
7 A spherical problem of algebraic representation
Proof: / g is generated by means of the Lagrangean (loss function) n
L( x) := ¦ 2[1 cos(/ i x)] = i =1
n
n
i =1
i =1
= 2n 2 cos x ¦ cos / i 2sin x ¦ sin / i = min. x
The first derivatives n n d L( x ) (/ g ) = 2sin / g ¦ cos / i 2 cos / g ¦ sin / i = 0 dx i =1 i =1 constitute the necessary conditions. The second derivative n n d 2 L( x ) (/ g ) = 2 cos / g ¦ cos / i + 2sin / g ¦ sin / i > 0 2 dx i =1 i =1 builds up the sufficiency condition for the minimum at / g .
Lemma 7.3
(minimum geodesic distance, solution of the normal equation: S1 ):
Let the point / g S1 be at minimum geodesic distance to other points / i S1 , i {1, " , n} . Then the corresponding normal equation (7.28) is uniquely solved by tan / g = [sin /] /[cos / ] ,
(7.29)
such that the circular solution point is X g = e1 cos / g + e 2 sin / g =
1 [sin / ] + [cos / ]2 2
{e1 [cos /] + e 2 [sin /]} (7.30)
with respect to the Gauss brackets n
[sin / ]2 := (¦ sin / i ) 2
(7.31)
i=1 n
[cos / ]2 := (¦ cos / i ) 2 .
(7.32)
i=1
Next we generalize MINGEODISC ( p = 1) on S1 to MINGEODISC ( p = 2) on S 2 . Definition 7.4 (minimum geodesic distance: S 2 ): A point (/ g , ) g ) S 2 is called at minimum geodesic distance to other points (/ i , ) i ) S 2 , i {1, " , n} if the spherical distance
7-2 Minimal geodesic distance: MINGEODISC
333
function n
L(/ g , ) g ) := ¦ 2(1 cos < i ) =
(7.33)
i=1
n
= ¦ 2[1 cos ) i cos ) g cos(/ i / g ) sin ) i sin ) g ] = min
/g ,)g
i =1
is minimal. n
(/ g , ) g ) = arg {¦ 2(1 cos < i ) = min |
(7.34)
i=1
| cos < i = cos ) i cos ) g cos(/ i / g ) + sin ) i sin ) g } . Lemma 7.5 (minimum geodesic distance, normal equation: S 2 ): A point (/ g , ) g ) S 2 is called at minimum geodesic distance to other points (/ i , ) i ) S 2 , i {1, " , n} if / g = x1 , ) g = x2 fulfils the normal equations n
n
i =1
i =1
sin x2 cos x1 ¦ cos ) i cos / i sin x2 sin x1 ¦ cos ) i sin / i + (7.35)
n
+ cos x2 ¦ sin ) i = 0 i =1
n
n
i =1
i =1
cos x2 cos x1 ¦ cos ) i sin / i cos x2 sin x1 ¦ cos ) i cos / i = 0 . Proof: (/ g , ) g ) is generated by means of the Lagrangean (loss function) n
L( x1 , x2 ) := ¦ 2[1 cos ) i cos / i cos x1 cos x2 i =1
cos ) i sin / i sin x1 cos x2 sin ) i sin x2 ] = n
= 2n 2 cos x1 cos x2 ¦ cos ) i cos / i i =1
n
n
i =1
i =1
2sin x1 cos x2 ¦ cos ) i sin / i 2sin x2 ¦ sin ) i . The first derivatives n wL( x) (/ g , ) g ) = 2sin / g cos ) g ¦ cos ) i cos / i w x1 i =1 n
2 cos / g cos ) g ¦ cos ) i sin / i = 0 i =1
(7.36)
334
7 A spherical problem of algebraic representation n wL( x) (/ g , ) g ) = 2 cos / g sin ) g ¦ cos ) i cos / i + w x2 i =1 n
+ 2sin / g sin ) g ¦ cos ) i sin / i i =1
n
2 cos ) g ¦ sin ) i = 0 i =1
constitute the necessary conditions. The matrix of second derivative w 2 L( x ) (/ g , ) g ) t 0 w xw xc builds up the sufficiency condition for the minimum at (/ g , ) g ) . w 2 L( x ) (/ g , ) g ) = 2 cos / g cos ) g [cos ) cos / ] + w x12 + 2sin / g cos ) g [cos ) sin / ] w 2 L( x ) (/ g , ) g ) = 2sin / g sin ) g [cos ) cos / ] + w x1 x2 + 2 cos / g sin ) g [cos ) sin / ] w L( x ) (/ g , ) g ) = 2 cos / g cos ) g [cos ) cos / ] + w x22 2
+ 2sin / g cos ) g [cos ) sin / ] + + sin ) g [sin ) ].
.
h Lemma 7.6
(minimum geodesic distance, solution of the normal equation: S 2 ):
Let the point (/ g , ) g ) S 2 be at minimum geodesic distance to other points (/ i , ) i ) S 2 , i {1, " , n} . Then the corresponding normal equations ((7.35), (7.36)) are uniquely solved by tan / g = [cos ) sin /] /[cos ) cos /]
(7.37)
[sin )]
tan ) g =
[cos ) cos / ]2 + [cos ) sin /]2 such that the circular solution point is X g = e1 cos ) g cos / g + e 2 cos ) g sin / g + e3 sin ) g = =
1 [cos ) cos / ] + [cos ) sin /]2 + [sin )]2 2
*
*{e1[cos ) cos / ] + e 2 [cos ) sin /] + e3 [sin )]} 2
2
(7.38)
335
7-3 Special models
subject to n
[cos ) cos / ] := ¦ cos ) i cos / i
(7.39)
i=1 n
[cos ) sin / ] := ¦ cos ) i sin / i
(7.40)
i=1 n
[sin )] := ¦ sin ) i . i=1
7-3 Special models: from the circular normal distribution to the oblique normal distribution First, we present a historical note about the von Mises distribution on the circle. Second, we aim at constructing a twodimensional generalization of the Fisher circular normal distribution to its elliptic counterpart. We present 5 lemmas of different type. Third, we intend to prove that an angular metric fulfils the four axioms of a metric. 7-31
A historical note of the von Mises distribution
Let us begin with a historical note: The von Mises Distribution on the Circle In the early part of the last century, Richard von Mises (1918) considered the table of the atomic weights of elements, seven entries of which are as follows: Table 7.1 Element Atomic Weight
W
Al
Sb
Ar
As
Ba
Be
Bi
26.98
121.76
39.93
74.91
137.36
9.01
209.00
He asked the question “Does a typical element in some sense have integer atomic weight ?” A natural interpretation of the question is “Do the fractional parts of the weight cluster near 0 and 1?” The atomic weight W can be identified in a natural way with points on the unit circle, in such a way that equal fractional parts correspond to identical points. This can be done under the mapping ªcos T1 º W ox=« , T1 = 2S (W [W ]), ¬sin T1 »¼ where [u ] is the largest integer not greater than u . Von Mises’ question can now be seen to be equivalent to asking “Do this points on the circle cluster near e1 = [1 0]c ?”. Incidentally, the mapping W o x can be made in another way:
336
7 A spherical problem of algebraic representation
ªcos T 2 º 1 W ox=« , T 2 = 2S (W [W + ]) . » ¬sin T 2 ¼ 2 The two sets of angles for the two mappings are then as follows: Table 7.2 Element
Al
Sb
Ar
As
Ba
Be
Bi
Average
T1 / 2S
0.98
0.76
0.93
0.91
0.36
0.01
0.00
T1 / 2S = 0.566
T 2 / 2S
-0.02
-0.24
-0.06
-0.09
0.36
0.01
0.00
T 2 / 2S = 0.006
We note from the discrepancy between the averages in the final column that our usual ways of describing data, e.g., means and standard deviations, are likely to fail us when it comes to measurements of direction. If the points do cluster near e1 then the resultant vector 6 Nj=1x j (here N =7) we should have approximately x / || x ||= e , should point in that direction, i.e., 1 elements where x = 6 x j / N and ||x|| = (xcx)1/ 2 is the length of x . For the seven are considered we find whose weights here, D D ª 0.9617 º ª cos 344.09 º ª cos 15.91 º , x / || x ||= « = = « » « » D D» ¬ 0.2741 ¼ ¬«sin 344.09 ¼» ¬«sin 15.91 ¼»
a direction not far removed from e1 . Von Mises then asked “For what distribution of the unit circle is the unit vector Pˆ = [cos Tˆ0 sin Tˆ0 ]c = x / || x || a maximum likelihood estimator (MLE) of a direction T 0 of clustering or concentration ?” The answer is the distribution now known as the von Mises or circular normal distribution. It has density, expressed in terms of random angle T , exp{k cos(T T 0 )} dT , I 0 (k ) 2S where T 0 is the direction of concentration and the normalizing constant I 0 (k ) is a Bessel function. An alternative expression is ªcos Tˆ0 º exp(kx ' ȝ) dS , ȝ=« » , ||x|| = 1. I 0 (k ) 2S «¬sin Tˆ0 »¼ Von Mises’ question clearly has to do with the truth of the hypothesis ª1 º H 0 : T 0 = 0 or ȝ = e1 = « » . ¬0¼
337
7-3 Special models
It is worth mentioning that Fisher found the same distribution in another context (Fisher, 1956, SMSI, pp. 133-138) as the conditional distribution of x , given || x ||= 1, when x is N 2 (ȝ, k 1 I 2 ) . 7-32 Oblique map projection A special way to derive the general representation of the twodimensional generalized Fisher sphere is by forming the general map of S 2 onto \ 2 . In order to follow a systematic approach, let us denote by { A, <} the “real” metalongitude / meta-colatitude representing a point on the sphere given by {P/ , P) }
versus
by {D , r} resp. {x, y} , the meta-longitude / metalatitude representing the mapped values on the tangential plane given by {P / , P) } , alternatively by its polar coordinates {x, y} .
At first, we want to derive the equations generating an oblique map projection of S 2R onto T0 S 2R of equiareal type.
D = A, r = 2 R sin < / 2 versus x = 2 R sin < / 2 cos A y = 2 R sin < / 2sin A. Second, we intend to derive the transformation from the local surface element dAd< sin < to the alternate local surface element | J | dD dr sin < (D , r ) by means of the inverse Jacobian determinant ª dD « dA « « 0 ¬«
º ª dA 0 » « dD » = | J 1 | ~ J = « dr » « 0 «¬ d < ¼» ª wD « wA =« « wr «¬ wA
wD º w< » ª DAD »= wr » «¬ 0 w< »¼
ª1 Dr A º « = Dr < ¼» «0 ¬«
0
J 1
ªD A J=« D ¬ DD <
º 0 » » d< » dr ¼»
0 º D< r »¼
º 1 », J = . 1 » R cos < / 2 R cos < / 2 ¼»
338
7 A spherical problem of algebraic representation
Third, we read the inverse equations of an oblique map projection of S 2\ of equiareal type. A = D , sin sin < = 2sin
r r2 < < , cos = 1 2 = 2 2R 2 4R
< < < < r r2 cos = 2sin 1 sin 2 = 1 2 . 2 2 2 2 R 4R
We collect our basic results in a few lemmas. Lemma 7.7: dAd < sin < =
rdD dr . R2
Lemma 7.8 (oblique azimuthal map projection of S 2\ of equiareal type): direct equations D = A, r = 2 R sin < / 2 inverse equations r r A = D , sin < = 1 ( )2 . R 2R Lemma 7.9: dAd < sin < =
1 dD rdr dxdy = . R2 R2
Lemma 7.10 (oblique azimuthal map projection of S 2\ of equiareal type): direct equations x = 2 R sin
< < cos A, y = 2R sin sin A 2 2 inverse equations
tan A = sin < =
y < 1 , sin = x2 + y 2 x 2 2R
1 x2 + y 2 1 x2 + y 2 1 = 2 R 4R 2R2
x2 + y 2 4R 2 ( x 2 + y 2 ) .
339
7-3 Special models
Lemma 7.11 (change from one chart in another chart (“cha-cha-cha”), Kartenwechsel ): The direct equations of the transformation of spherical longitude and spherical latitude {/* , )* } into spherical meta-longitude and spherical meta-colatitude { A, <} are established by
ª cot A = [ cos ) tan ) * + sin ) cos( / * / )] / sin( / * / ) « * * * ¬ cos < = cos ) cos ) cos( / / ) + sin ) sin ) ) with respect to a meta-North pole {/ , )} . In contrast, the inverse equations of the transformation of spherical meta-longitude and spherical meta-colatitude { A, <} into spherical longitude and spherical latitude {/* , )* } read ªcot(/* / ) = [ sin ) cos A + cos ) cot < ] / sin A « * «¬sin ) = sin ) cos < + cos ) sin < cos A . We report two key problems.
Tangential plane Figure 7.1
Tangential plane Figure 7.2
First, in the plane located at ( P/ , P) ) we place the circular normal distribution x = 2(1 cos < ) cos A = r (< ) cos D y = 2(1 cos < ) sin A = r (< ) sin D or A = D , r = 2(1 cos < ) as an alternative. A natural generalization towards an oblique normal distribution will be given by
340
7 A spherical problem of algebraic representation
x* = x cos H + y sin H
x = x* cos H y* sin H or
y = x sin H + y cos H
y = x* sin H + y * cos H
*
and ( x ) F x + ( y ) F y = ( x cos H + y sin H ) 2 F x + ( x sin H + y cos H ) 2 F y = * 2
* 2
= x 2 cos 2 H F x + x 2 sin 2 H F y + y 2 sin 2 H F x + y 2 cos 2 H F y + + xy (sin H cos H F x sin H cos H F y ) = = x 2 (cos 2 H F x + sin 2 H F y ) + y 2 (sin 2 H F x + cos 2 H F y ) + + xy sin H cos H ( F x F y ). The parameters ( F x , F y ) determine the initial values of the elliptic curve representing the canonical data set, namely (1/ F x , 1/ F y ) . The circular normal distribution is achieved for the data set F x = F y = 1. Second, we intend to transform the representation of coordinates of the oblique normal distribution from Cartesian coordinates in the oblique equatorial plane to curvilinear coordinates in the spherical reference frame: x 2 (cos 2 H F x + sin 2 H F y ) + y 2 (sin 2 H F x + cos 2 H F y ) + xy sin H cos H ( F x F y ) = = r 2 cos 2 D (cos 2 H F x + sin 2 H F y ) + r 2 sin 2 D (sin 2 H F x + cos 2 H F y ) + + r 2 sin D cos D sin H cos H ( F x F y ) = r 2 (< , A) cos 2 A(cos 2 H F x + sin 2 H F y ) + r 2 (< , A) sin 2 A(sin 2 H F x + cos 2 H F y ) + + r 2 (< , A) sin A cos A sin H cos H ( F x F y ). Characteristically, the radical component r (< , A) be a function of the colatitude < and of the azimuth A. The angular coordinate is preserved, namely D = A . Here, our comments on the topic of the oblique normal distribution are finished. 7-33
A note on the angular metric
We intend to prove that an angular metric fulfils all the four axioms of a metric. Let us begin with these axioms of a metric, namely < x|y > < x|y > cos D = , 0 d D d S D = arccos 1 || x |||| y || || x |||| y || based on Euclidean metric forms. Let us introduce the distance function d(
x y < x |y > , ) = arccos || x || || y || || x |||| y ||
341
7-4 Case study
to fulfill M 1: d ( x, y ) t 0 M 2 : d ( x, y ) = 0 x = y M 3 : d ( x, y ) = d ( y, x) M 4 : d ( x, z ) d d ( x, y ) + ( y, z) : "Triangular Inequality". Assume || x ||2 =|| y ||2 =|| z ||2 = 1 : Axioms M1 and M3 are easy to prove, Axiom M 2 is not complicated, but the Triangular Inequality requires work. Let x, y , z X, D = d (x, y ), E = d ( y, z ) and J = d ( x, z ) , i.e.
D , E , J [0, S ], cos D =< x, y >, cos E =< y, z >, cos J =< x, z > . We wish to prove J d D + E . This result is trivial in the case cos t S , so we may assume D + E [0, S ] . The third desired inequality is equivalent to cos J t cos(D + E ) . The proof of the basic formulas relies heavily on the properties of the inverse product: < u + uc, v >=< u, v > + < uc, v > º < u, v + v c >=< u, v > + < u, v c > »» for all u,uc, v, v c \ 3 , and O \. < O u, v >= O < u, v >=< u, O v > »¼ Define xc, z c \ 3 by x = (cos D )y + xc, z = (cos E )y z c, then < xc, z c >=< x (cos D )y, z + (cos E ) y >= = < x, z > + (cos D ) < y, z > + (cos E ) < x, y > (cos D )(cos E ) < y, y >= = cos J + cos D cos E + cos D cos E cos D cos E = cos J + cos D cos E . In the same way || xc ||2 =< x, xc >= 1 cos 2 D = sin 2 D so that, since 0 d D d S , || xc ||= sin D . Similarly, || z c ||= sin E . But by Schwarz’ Inequality, < xc, z c > d || xc |||| z c || . It follows that cos J t cos D cos E sin D sin E = = cos(D + E ) and we are done!
7-4 Case study Table 7.3 collects 30 angular observations with two different theodolites. The first column contains the number of the directional observations / i , i {1,...,30}, n = 30 . The second column lists in fractions of seconds the directional data, while the third column / fourth column is a printout of cos / i / sin / i . ˆ as the arithmetic mean. Obviously, on the Table 7.4 is a comparison of / g and / ˆ are nearly the same. level of concentration of data, / g and /
342
7 A spherical problem of algebraic representation
Table 7.3 The directional observation data using two theodolites and its calculation Theodolite 1 No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
¦
Value of observation ( /i ) 76 42 c 17.2 cc 19.5 19.2 16.5 19.6 16.4 15.5 19.9 19.2 16.8 15.0 16.9 16.6 20.4 16.3 16.7 16.0 15.5 19.1 18.8 18.7 19.2 17.5 16.7 19.0 16.8 19.3 20.0 17.4 16.2 ˆ = / D
Theodolite 2
cos / i
sin / i
0.229969
0.973198
0.229958 0.229959 0.229972 0.229957 0.229972 0.229977 0.229956 0.229959 0.229970 0.229979 0.229970 0.229971 0.229953 0.229973 0.229971 0.229974 0.229977 0.229960 0.229961 0.229962 0.229959 0.229967 0.229971 0.229960 0.229970 0.229959 0.229955 0.229968 0.229973
0.973201 0.973200 0.973197 0.973201 0.973197 0.973196 0.973201 0.973200 0.973198 0.973196 0.973198 0.973197 0.973202 0.973197 0.973197 0.973197 0.973196 0.973200 0.973200 0.973200 0.973200 0.973198 0.973197 0.973200 0.973198 0.973200 0.973201 0.973198 0.973197
6.898982
29.195958
L
D
76 42 c17.73cc sˆ = ±1.55cc
Value of observation ( /i ) D
76 42 c19.5cc 19.0 18.8 16.9 18.6 19.1 18.2 17.7 17.5 18.6 16.0 17.3 17.2 16.8 18.8 17.7 18.6 18.8 17.7 17.1 16.9 17.6 17.0 17.5 18.2 18.3 19.8 18.6 16.9 16.7 ˆ = /
cos / i
sin / i
0.229958
0.973201
0.229960 0.229961 0.229970 0.229962 0.229960 0.229964 0.229966 0.229967 0.229962 0.229974 0.229968 0.229969 0.229970 0.229961 0.229966 0.229962 0.229961 0.229966 0.229969 0.229970 0.229967 0.229970 0.229967 0.229964 0.229963 0.229956 0.229962 0.229970 0.229971
0.973200 0.973200 0.973198 0.973200 0.973200 0.973199 0.973199 0.973198 0.973200 0.973197 0.973198 0.973198 0.973198 0.973200 0.973199 0.973200 0.973200 0.973199 0.973198 0.973198 0.973198 0.973198 0.973198 0.973199 0.973199 0.973201 0.973200 0.973198 0.973197
6.898956
29.195968
L
D
76 42c17.91cc sˆ = ±0.94 cc
Table 7.4: Computation of theodolite data ˆ and / Comparison of / g Left data set versus Right data set ˆ = 76D 42'17.73'', sˆ = 1.55'' Theodolite 1: / ˆ = 76D 42'17.91'', sˆ = 0.94 '' Theodolite 2: / “The precision of the theodolite two is higher compared to the theodolite one”.
7-4 Case study
343
Alternatively, let us present a second example. Let there be given observed azimuths / i and vertical directions ) i , by Table 7.3 in detail. First, we compute the solution of the optimization problem n
n
¦ 2(1 cos < ) = ¦ 2[1 cos ) i
i =1
i =1
i
cos ) P cos(/ i / P ) sin ) i sin ) P ] =
= min
/P , )P
subject to values of the central direction n
ˆ = tan /
¦ cos ) sin / i
i =1 n
n
¦ sin )
i
ˆ = , tan )
¦ cos )i cos /i
i =1
n
i
.
n
(¦ cos ) i cos / i ) + (¦ cos ) i sin / i ) 2
i =1
i =1
2
i =1
Table 7.5: Data of type azimuth / i and vertical direction ) i /i
)i
/i
)i
124D 9
88D1
125D 0
88D 0
125D 2
88D 3
124D 9
88D 2
126D1
88D 2
124D8
88D1
125D 7
88D1
125D1
88D 0
This accounts for measurements of data on the horizontal circle and the vertical circle being Fisher normal distributed. We want to tackle two problems: ˆ ,) ˆ ) with the arithmetic mean (/, ) ) of the Problem 1: Compare (/ data set. Why do the results not coincide? ˆ ,) ˆ ) and (/, ) ) do coincide? Problem 2: In which case (/ Solving Problem 1 Let us compute ˆ ,) ˆ ) = (125D.206, 664,5, 88D.125, 050, 77) (/, ) ) = (125D.212,5, 88D.125) and (/ '/ = 0D.005,835,5 = 21cc .007,8 , ') = 0D.000, 050, 7 = 0cc.18. The results do not coincide due to the fact that the arithmetic means are obtained by adjusting direct observations with least-squares technology.
344
7 A spherical problem of algebraic representation
Solving Problem 2 The results do coincide if the following conditions are met. • All vertical directions are zero •
ˆ = / if the observations / , ) fluctuate only “a little” around / i i the constant value / 0 , ) 0
•
ˆ = / if ) = const. / i
•
ˆ = ) if the fluctuation of / around / is considerably ) i 0 smaller than the fluctuation of ) i around ) 0 .
Note the values 8
¦ cos ) sin / i
i =1
8
¦ cos ) i =1
8
i
= 0.213,866, 2; (¦ cos ) i sin / i ) 2 = 0.045,378,8 i =1
8
i
cos / i = 0.750,903, 27; (¦ cos ) i cos / i ) 2 = 0.022, 771,8 i =1
8
¦ cos )
i
= 7.995, 705,3
i=1
and / i = / 0 + G/ i versus ) i = ) 0 + G) i
G/ =
1 n 1 n G/ i versus G) = ¦ G) i ¦ n i =1 n i =1
n
n
n
i =1
i =1
i =1
¦ cos )i sin /i = n cos ) 0 sin / 0 + cos ) 0 cos / 0 ¦ G/i sin ) 0 sin / 0 ¦ G) i = = n cos ) 0 (sin / 0 + G/ cos / 0 G) tan / 0 sin / 0 ) n
¦ cos ) i =1
i
n
n
i =1
i =1
cos / i = n cos ) 0 cos / 0 cos ) 0 sin / 0 ¦ G/ i sin ) 0 sin / 0 ¦ G) i = = n cos ) 0 (cos / 0 G/ sin / 0 G) tan / 0 cos / 0 ) ˆ = sin / 0 + G/ cos / 0 G) tan ) 0 sin / 0 tan / cos / 0 G/ sin / 0 G) tan ) 0 cos / 0 tan / =
sin / 0 + G/ cos / 0 cos / 0 G/ sin / 0
7-4 Case study
345 n
n
(¦ cos ) i sin / i ) 2 + (¦ cos ) i cos / i ) 2 = i =1
i =1
2
= n 2 (cos 2 ) 0 + cos 2 G/ + sin 2 G) 2sin ) 0 cos G)) n
¦ sin ) i =1
ˆ = tan )
i
= n sin ) 0 + G) cos ) 0 n sin ) 0 + G) cos ) 0
n cos ) 0 + G/ cos 2 ) 0 + G) 2 sin 2 ) 0 2G) sin ) 0 cos ) 0 2
2
tan ) =
sin ) 0 + G) cos ) 0 . cos ) 0 G) sin ) 0
ˆ z /, ) ˆ z ) holds in general. In consequence, / At the end we will summarize to additional references like E. Batschelet (1965), T.D. Downs and A.L. Gould (1967), E.W. Grafarend (1970), E.J. Gumbel et al (1953), P. Hartmann et al (1974), and M.A. Stephens (1969). References Anderson, T.W. and M.A. Stephens (1972), Arnold, K.J. (1941), BarndorffNielsen, O. (1978), Batschelet, E. (1965), Batschelet, E. (1971), Batschelet, E. (1981), Beran, R.J. (1968), Beran, R.J. (1979), Blingham, C. (1964), Blingham, C. (1974), Chang, T. (1986) Downs, T. D. and Gould, A. L. (1967), Durand, D. and J.A. Greenwood (1957), Enkin, R. and Watson, G. S. (1996), Fisher, R. A. (1953), Fisher, N.J. (1985), Fisher, N.I. (1993), Fisher, N.I. and Lee A. J. (1983), Fisher, N.I. and Lee A. J. (1986), Fujikoshi, Y. (1980), Girko, V.L. (1985), Goldmann, J. (1976), Gordon, L. and M. Hudson (1977), Grafarend, E. W. (1970), Greenwood, J.A. and D. Durand (1955), Gumbel, E. J., Greenwood, J. A. and Durand, D. (1953), Hammersley, J.M. (1950), Hartman, P. and G. S. Watson (1974), Hetherington, T.J. (1981), Jensen, J.L. (1981), Jupp, P.E. and… (1980), Jupp, P. E. and Mardia, K. V. , Kariya, T. (1989), (1989), Kendall, D.G. (1974), Kent, J. (1976), Kent, J.T. (1982), Kent, J.T. (1983), Krumbein, W.C. (1939), Langevin, P. (1905), Laycock, P.J. (1975), Lenmitz, C. (1995), Lenth, R.V. (1981), Lord, R.D. (1948), Lund, U. (1999), Mardia, K.V. (1972), Mardia, K.V. (1975), Mardia, K. V. (1988), Mardia, K. V. and Jupp, P. E. (1999), Mardia, K.V. et al. (2000), Mhaskar, H.N., Narcowich, F.J. and J.D. Ward (2001), Muller, C. (1966), Neudecker, H. (1968), Okamoto, M. (1973), Parker, R.L. et al (1979), Pearson, K. (1905), Pearson, K. (1906), Pitman, J. and M. Yor (1981), Presnell, B., Morrison, S.P. and R.C. Littell (1998), Rayleigh, L. (1880), Rayleigh, L. (1905), Rayleigh, R. (1919), Rivest, L.P. (1982), Rivest, L.P.
346
7 A spherical problem of algebraic representation
(1988), Roberts, P.H. and H.D. Ursell (1960), Sander, B. (1930), Saw, J.G. (1978), Saw, J.G. (1981), Scheidegger, A.E. (1965), Schmidt-Koenig, K. (1972), Selby, B. (1964), Sen Gupta, A. and R. Maitra (1998), Sibuya, M. (1962), Stam, A.J. (1982), Stephens, M.A. (1963), Stephens, M.A. (1964), Stephens, M. A. (1969), Stephens, M.A. (1979), Tashiro, Y. (1977), Teicher, H. (1961), Von Mises, R. (1918), Watson, G.S. (1956a, 1956b), Watson, G.S. (1960), Watson, G.S.(1961), Watson, G.S. (1962), Watson, G.S. (1965), Watson, G.S. (1966), Watson, G.S. (1967a, 1967b), Watson, G.S. (1968), Watson, G.S. (1969), Watson, G.S. (1970), Watson, G.S. (1974), Watson, G.S. (1981a, 1981b), Watson, G.S. (1982a, 1982b, 1982c, 1982d), Watson, G.S. (1983), Watson, G.S. (1986), Watson, G.S. (1988), Watson, G.S. (1998), Watson, G.S. and E.J. Williams (1956), Watson, G.S. and E..Irving (1957), Watson, G.S. and M.R. Leadbetter (1963), Watson, G.S. and S. Wheeler (1964), Watson, G.S. and R.J. Beran (1967), Watson, G.S., R. Epp and J.W. Tukey (1971), Wellner, J. (1979), Wood, A. (1982), Xu, P.L. (1999), Xu, P.L. (2001), Xu, P. (2002), Xu, P.L. et al. (1996a, 1996b), Xu, P.L., and Shimada, S.(1997).
8
The fourth problem of probabilistic regression – special Gauss-Markov model with random effects – Setup of BLIP and VIP for the central moments of first order
: Fast track reading : Read only Theorem 8.5, 8.6 and 8.7.
Lemma 8.4 hom BLIP, hom S-BLIP and hom Į-VIP
Definition 8.1 z : hom BLIP of z
Theorem 8.5 z : hom BLIP of z
Definition 8.2 z : S-hom BLIP of z
Theorem 8.6 z : S-hom BLIP of z
Definition 8.3 z : hom Į-VIP of z
Theorem 8.7 z : hom Į-VIP of z
The general model of type “fixed effects”, “random effects” and “error-invariables” will be presented in our final chapter: Here we focus on “random effects”.
348
8 The fourth problem of probabilistic regression
Figure 8.1: Magic triangle
8-1 The random effect model Let us introduce the special Gauss-Markov model with random effects y = Cz + e y Ce z specified in Box 8.1. Such a model is governed by two identities, namely the first identity CE{z} = E{y} of moments of first order and the second identity D{y Cz} + CD{z}Cc = D{y} of central moments of second order. The first order moment identity CE{z} = E{y} relates the expectation E{z} of the stochastic, real-valued vector z of unknown random effects ( “Zufallseffekte”) to the expectation E{y} of the stochastic, real-valued vector y of observations by means of the non-stochastic (“fixed”) real-valued matrix C \ n×l of rank rk C = l. n = dim Y is the dimension of the observation space Y, l=dim Z the dimension of the parameter space Z of random effects z. The second order central moment identity Ȉ y -Cz + CȈ z Cc = Ȉ y relates the variancecovariance matrix Ȉ y -Cz of the random vector y Cz , also called dispersion matrix D{y Cz} and the variance-covariance matrix Ȉ z of the random vector z, also called dispersion matrix D{z} , to the variance-covariance matrix Ȉ y of the random vector y of the observations, also called dispersion matrix D{y} . In the simple random effect model we shall assume (i) rk Ȉ y = n and (ii) C{y, z} = 0 , namely zero correlation between the random vector y of observations and the vector z of random effects. (In the random effect model of type KolmogorovWiener we shall give up such a zero correlation.) There are three types of un-
8-1 The random effect model
349
knowns within the simple special Gauss-Markov model with random effects: (i) the vector z of random effects is unknown, (ii) the fixed vectors E{y}, E{z} of expectations of the vector y of observations and of the vector z of random effects (first moments) are unknown and (iii) the fixed matrices Ȉ y , Ȉ z of dispersion matrices D{y}, D{z} (second central moments) are unknown. Box 8.1: Special Gauss-Markov model with random effects y = Cz + e y Ce z E{y} = CE{z} \ n D{y} = D{y Cz} + CD{z}Cc \ n×n C{y , z} = 0 z, E{z}, E{y}, Ȉ y-Cz , Ȉ z unknown dim R (Cc) = rk C = l. Here we focus on best linear predictors of type hom BLIP, hom S-BLIP and hom Į-VIP of random effects z, which turn out to be better than the best linear uniformly unbiased predictor of type hom BLUUP. At first let us begin with a discussion of the bias vector and the bias matrix as well as of the Mean Square Prediction Error MSPE{z} with respect to a homogeneous linear prediction z = Ly of random effects z based upon Box 8.2. Box 8.2: Bias vector, bias matrix Mean Square Prediction Error in the special Gauss–Markov model with random effects E{y} = CE{z} D{y} = D{y Cz} + CD{z}Cc “ansatz”
(8.1) (8.2)
z = Ly bias vector
(8.3)
ȕ := E{z z} = E{z } E{z}
(8.4)
ȕ = LE{y} E{z} = [I A LC]E{z}
(8.5)
bias matrix B := I A LC decomposition
(8.6)
350
8 The fourth problem of probabilistic regression
z z = z E{z} (z E{z}) + ( E{z} E{z}) z z = L(y E{y}) (z E{z}) [I A LC]E{z}
(8.7)
(8.8)
Mean Square Prediction Error MSPE{z} := E{(z z )(z z )c}
(8.9)
MSPE{z} = LD{y}Lc + D{z} + [I A LC]E{z}E{z}c [I A LC]c
(8.10)
(C{y , z} = 0, E{z E{z}} = 0, E{z E{z}} = 0) modified Mean Square Prediction Error MSPE S {z} := LD{y}Lc + D{z} + [I A LC] S [I A LC]c
(8.11)
Frobenius matrix norms || MSPE{z} ||2 := tr E{(z z )(z z )c}
(8.12)
|| MSPE{z} || = = tr LD{y}Lc + tr D{z} + tr [I A LC]E{z}E{z}c [I A LC]c
(8.13)
2
= || Lc ||62 + || (I A LC)c ||2E{( z ) E ( z ) ' + tr E{(z E{z})(z E{z})c} y
|| MSPE S {z} ||2 := := tr LD{y}Lc + tr [I A LC]S[I A LC]c + tr D{z}
(8.14)
= || Lc ||6y + || (I A LC)c ||S + tr E{( z E{z})(z E{z})c} 2
2
hybrid minimum variance – minimum bias norm Į-weighted L(L) := || Lc ||62 y + 1 || (I A LC)c ||S2 D
(8.15)
special model dim R (SCc) = rk SCc = rk C = l .
(8.16)
The bias vector ȕ is conventionally defined by E{z} E{z} subject to the homogeneous prediction form z = Ly . Accordingly the bias vector can be represented by (8.5) ȕ = [I A LC]E{z} . Since the expectation E{z} of the vector z of random effects is unknown, there has been made the proposal to use instead the matrix I A LC as a matrix-valued measure of bias. A measure of the prediction error is the Mean Square prediction error MSPE{z} of type (8.9). MSPE{z} can be decomposed into three basic parts:
8-1 The random effect model
351
•
the dispersion matrix D{z} = LD{y}Lc
•
the dispersion matrix D{z}
•
the bias product ȕȕc .
Indeed the vector z z can also be decomposed into three parts of type (8.7), (8.8) namely (i) z E{z} , (ii) z E{z} and (iii) E{z} E{z} which may be called prediction error, random effect error and bias, respectively. The triple decomposition of the vector z z leads straightforward to the triple representation of the matrix MSPE{z} of type (8.10). Such a representation suffers from two effects: Firstly the expectation E{z} of the vector z of random effects is unknown, secondly the matrix E{z}E{z c} has only rank 1. In consequence, the matrix [I A LC]E{z}E{z}c [I A LC]c has only rank 1, too. In this situation there has made the proposal to modify MSPE{z} by the matrix E{z}E{z c} and by the regular matrix S. MSPE{z} has been defined by (8.11). A scalar measure of MSPE{z } as well as MSPE{z} are the Frobenius norms (8.12), (8.13), (8.14). Those scalars constitute the optimal risk in Definition 8.1 (hom BLIP) and Definition 8.2 (hom S-BLIP). Alternatively a homogeneous Į-weighted hybrid minimum variance- minimum bias prediction (hom VIP) is presented in Definition 8.3 (hom Į-VIP) which is based upon the weighted sum of two norms of type (8.15), namely •
average variance || Lc ||62 y = tr L6 y Lc
•
average bias || (I A LC)c ||S2 = tr[I A LC] S [I A LC]c .
The very important predictor Į-VIP is balancing variance and bias by the weight factor Į which is illustrated by Figure 8.1.
min bias
balance between variance and bias
min variance
Figure 8.1. Balance of variance and bias by the weight factor Į Definition 8.1 ( z hom BLIP of z): An l×1 vector z is called homogeneous BLIP of z in the special linear Gauss-Markov model with random effects of Box 8.1, if (1st) z is a homogeneous linear form z = Ly , (2nd) in comparison to all other linear predictions z has the
(8.17)
352
8 The fourth problem of probabilistic regression
minimum Mean Square Prediction Error in the sense of || MSPE{z} ||2 = = tr LD{y}Lc + tr D{z} + tr[I A LC]E{z}E{z}c [I A LC]c
(8.18)
= || Lc ||62 y + || (I A LC)c ||2E{( z ) E ( z ) ' + tr E{( z E{z})( z E{z})c}. Definition 8.2 ( z S-hom BLIP of z): An l×1 vector z is called homogeneous S-BLIP of z in the special linear Gauss-Markov model with random effects of Box 8.1, if (1st) z is a homogeneous linear form z = Ly ,
(8.19)
(2nd) in comparison to all other linear predictions z has the minimum S-modified Mean Square Prediction Error in the sense of || MSPE S {z} ||2 := := tr LD{y}Lc + tr[I A LC]S[I A LC]c + tr E{( z E{z})( z E{z})c} = || Lc ||62 y + || (I A LC)c ||S2 + tr E{( z E{z})( z E{z})c} = min .
(8.20)
L
Definition 8.3 ( z hom hybrid min var-min bias solution, Į-weighted or hom Į-VIP): An l×1 vector z is called homogeneous Į-weighted hybrid minimum variance- minimum bias prediction (hom Į-VIP) of z in the special linear Gauss-Markov model with random effects of Box 8.1, if (1st) z is a homogeneous linear form z = Ly ,
(8.21)
(2nd) in comparison to all other linear predictions z has the minimum variance-minimum bias in the sense of the Į-weighted hybrid norm tr LD{y}Lc + 1 tr (I A LC) S (I A LC)c D 2 1 = || Lc ||6 + || (I A LC)c ||S2 = min L D y
in particular with respect to the special model
D \ + , dim R (SCc) = rk SCc = rk C = l .
(8.22)
8-1 The random effect model
353
The predictions z hom BLIP, hom S-BLIP and hom Į-VIP can be characterized as follows: Lemma 8.4 (hom BLIP, hom S-BLIP and hom Į-VIP): An l×1 vector z is hom BLIP, hom S-BLIP or hom Į-VIP of z in the special linear Gauss-Markov model with random effects of Box 8.1, if and only if the matrix Lˆ fulfils the normal equations (1st)
hom BLIP: (6 y + CE{z}E{z}cCc)Lˆ c = CE{z}E{z}c
(2nd)
(3rd)
(8.23)
hom S-BLIP: (6 y + CSCc)Lˆ c = CS
(8.24)
(6 y + 1 CSCc)Lˆ c = 1 CS . D D
(8.25)
hom Į-VIP:
:Proof: (i) hom BLIP: 2
The hybrid norm || MSPE{z} || establishes the Lagrangean
L (L) := tr L6 y Lc + tr (I l LC) E{z}E{z}c (I l LC)c + tr 6 z = min L
for z hom BLIP of z. The necessary conditions for the minimum of the quadratic Lagrangean L (L) are wL ˆ (L) := 2[6 y Lˆ c + CE{z}E{z}cCcLˆ c CE{z}E{z}c ] = 0 , wL which agree to the normal equations (8.23). (The theory of matrix derivatives is reviewed in Appendix B (Facts: derivative of a scalar-valued function of a matrix: trace).) The second derivatives w2 L (Lˆ ) > 0 w (vecL)w (vecL)c at the “point” Lˆ constitute the sufficiency conditions. In order to compute such an ln×ln matrix of second derivatives we have to vectorize the matrix normal equation
354
8 The fourth problem of probabilistic regression
wL ˆ (L) := 2Lˆ (6 y + CE{z}E{z}cCc) 2 E{z}E{z}cCc wL wL (Lˆ ) := vec[2Lˆ (6 y + CE{z}E{z}cCc) 2 E{z}E{z}cCc]. w (vecL) (ii) hom S-BLIP: 2
The hybrid norm || MSPEs {z} || establishes the Lagrangean
L (L) := tr L6 y Lc + tr (I A LC)S(I A LC)c + tr 6 z = min L
for z hom S-BLIP of z. Following the first part of the proof we are led to the necessary conditions for the minimum of the quadratic Lagrangean L (L) wL ˆ (L) := 2[6 y Lˆ c + CSCcLˆ c CS]c = 0 wL as well as to the sufficiency conditions w2 L (Lˆ ) = 2[(6 y + CSCc)
I A ] > 0. w (vecL)w (vecL)c The normal equations of hom S-BLIP wL wL (Lˆ ) = 0 agree to (8.24). (iii) hom Į-VIP: The hybrid norm || Lc ||62 + 1 || (I A - LC)c ||S2 establishes the Lagrangean D L (L) := tr L6 y Lc + 1 tr (I A - LC)S(I A - LC)c = min L D y
for z hom Į-VIP of z. Following the first part of the proof we are led to the necessary conditions for the minimum of the quadratic Lagrangean L (L) wL ˆ (L) = 2[(6 y + CE{z}E{z}cCc)
I A ]vecLˆ 2vec(E{z}E{z}cCc). wL The Kronecker-Zehfuss Product A
B of two arbitrary matrices as well as ( A + B)
C = A
B + B
C of three arbitrary matrices subject to dim A=dim B is introduced in Appendix A. (Definition of Matrix Algebra: multiplication matrices of the same dimension (internal relation) and multiplication of matrices (internal relation) and Laws). The vec operation (vectorization of an array) is reviewed in Appendix A, too. (Definition, Facts: vecAB = (Bc
I cA )vecA for suitable matrices A and B.) Now we are prepared to compute w2 L (Lˆ ) = 2[(6 y + CE{z}E{z}Cc)
I A ] > 0 w (vecL)w (vecL)c
8-1 The random effect model
355
as a positive definite matrix. (The theory of matrix derivatives is reviewed in Appendix B (Facts: derivative of a matrix-valued function of a matrix, namely w (vecX) w (vecX)c ).) wL ˆ (L) = 2[ 1 CSCcLˆ c + 6 y Lˆ c 1 CS]cD = 0 D D wL as well as to the sufficiency conditions w2 L (Lˆ ) = 2[( 1 CSCc + 6 y )
I A ] > 0 . D w (vecL)w ( vecL)c The normal equations of hom Į-VIP wL wL (Lˆ ) = 0 agree to (8.25).
h For an explicit representation of z as hom BLIP, hom S-BLIP and hom Į-VIP of z in the special Gauss–Markov model with random effects of Box 8.1, we solve the normal equations (8.23), (8.24) and (8.25) for Lˆ = arg{L (L) = min} . L
Beside the explicit representation of z of type hom BLIP, hom S-BLIP and hom Į-VIP we compute the related dispersion matrix D{z} , the Mean Square Prediction Error MSPE{z} , the modified the Mean Square Prediction Error MSPE S {z} and MSPED ,S {z} and the covariance matrices C{z, z z} in Theorem 8.5 ( z hom BLIP): Let z = Ly be hom BLIP of z in the special linear Gauss-Markov model with random effects of Box 8.1. Then equivalent representations of the solutions of the normal equations (8.23) are z = E{z}E{z}cCc[6 y + CE{z}E{z}cCc]1 y = E{z}E{z}cCc[6 y Cz + C6 z Cc + CE{z}E{z}cCc]1 y
(8.26)
(if [6 y + CE{z}E{z}cCc]1 exists) are completed by the dispersion matrix D{z} = E{z}E{z}cCc[6 y + CE{z}E{z}cCc]1 6 y × × [6 y + CE{z}E{z}cCc]1 CE{z}E{z}c by the bias vector (8.5)
(8.27)
356
8 The fourth problem of probabilistic regression
ȕ := E{z } E{z} = [I A E{z}E{z}cCc(CE{z}E{z}cCc + 6 y ) 1 C]E{z}
(8.28)
and by the matrix of the Mean Square Prediction Error MSPE{z} : MSPE{z } := E{(z z )(z z )c} = D{z} + D{z} + ȕȕc
(8.29)
MSPE{z } := D{z} + D{z} + [I A E{z}E{z}cCc(CE{z}E{z}cCc + 6 y ) 1 C] ×
(8.30)
×E{z}E{z}c [I A Cc(CE{z}E{z}cCc + 6 y ) 1 CE{z}E{z}c ]. At this point we have to comment what Theorem 8.5 tells us. hom BLIP has generated the prediction z of type (8.26), the dispersion matrix D{z} of type (8.27), the bias vector of type (8.28) and the Mean Square Prediction Error of type (8.30) which all depend on the vector E{z} and the matrix E{z}E{z}c , respectively. We already mentioned that E{z} and E{z}E{z}c are not accessible from measurements. The situation is similar to the one in hypothesis theory. n As shown later in this section we can produce only an estimator E {z} and consequently can setup a hypothesis first moment E{z} of the "random effect" z. Indeed, a similar argument applies to the second central moment D{y} ~ 6 y of the "random effect" y, the observation vector. Such a dispersion matrix has to be known in order to be able to compute z , D{z} , and MSPE{z} . Again we have to apply the argument that we are only able to construct an estimate 6ˆ cy and to setup a hypothesis about D{y} ~ 6 y . Theorem 8.6 ( z hom S-BLIP): Let z = Ly be hom S-BLIP of z in the special linear Gauss-Markov model with random effects of Box 8.1. Then equivalent representations of the solutions of the normal equations (8.24) are z = SCc(6 y + CSCc) 1 y = = SCc(6 y Cz + C6 z Cc + CSCc) 1 y
(8.31)
z = (Cc6 y1C + S 1 ) 1 Cc6 y1y
(8.32)
z = (I A + SCc6 y1C) 1 SCc6 y1y
(8.33)
(if S 1 , 6 y1 exist) are completed by the dispersion matrices D{z} = SCc(CSCc + 6 y ) 1 6 y (CSCc + 6 y ) 1 CS
(8.34)
D{z} = (Cc6 y1C + S 1 ) 1 Cc6 y1C(Cc6 y1C + S 1 )1
(8.35)
8-1 The random effect model
357 (if S 1 , 6 y1 exist) by the bias vector (8.5) ȕ := E{z} E{z}
= [I A SCc(CSCc + 6 y ) 1 C]E{z} ȕ = [I l (Cc6 y1C + S 1 ) 1 Cc6 y1C]E{z}
(8.36)
(if S 1 , 6 y1 exist) and by the matrix of the modified Mean Square Prediction Error MSPE{z} : MSPE S {z} := E{(z z )(z z )c} = D{z} + D{z} + ȕȕc
MSPES {z} = 6z + SCc(CSCc + 6y )1 6y (CSCc + 6y )1 CS + +[I A SCc(CSCc + 6y )1 C]E{z}E{z}c [Il Cc(CSCc + 6y )1 CS]
(8.37)
(8.38)
MSPE S {z } = 6 z + (Cc6 y1C + S 1 ) 1 Cc6 y1C(Cc6 y1C + S 1 )1 CS + + [I A (Cc6 y1C + S 1 ) 1 Cc6 y1C]E{z}E{z}c ×
(8.39)
× [I A Cc6 y1C(Cc6 y1C + S 1 ) 1 ] (if S 1 , 6 y1 exist). The interpretation of hom S-BLIP is even more complex. In extension of the comments to hom BLIP we have to live with another matrix-valued degree of freedom, z of type (8.31), (8.32), (8.33) and D{z} of type (8.34), (8.35) do no longer depend on the inaccessible matrix E{z}E{z}c , rk( E{z}E{z}c ) , but on the "bias weight matrix" S, rk S = l. Indeed we can associate any element of the bias matrix with a particular weight which can be "designed" by the analyst. Again the bias vector ȕ of type (8.36) as well as the Mean Square Prediction Error of type (8.37), (8.38), (8.39) depend on the vector E{z} which is inaccessible. Beside the "bias weight matrix S" z , D{z} , ȕ and MSPE{z} are vector-valued or matrix-valued functions of the dispersion matrix D{y} ~ 6 y of the stochastic observation vector which is inaccessible. By hypothesis testing we may decide y . upon the construction of D{y} ~ 6 y from an estimate 6 Theorem 8.7 ( z hom Į-VIP): Let z = Ly be hom Į-VIP of z in the special linear Gauss-Markov model with random effects Box 8.1. Then equivalent representations of the solutions of the normal equations (8.25) are
358
8 The fourth problem of probabilistic regression
z = 1 SCc(6 y + 1 CSCc) 1 y D D = 1 SCc(6 y Cz + C6 z Cc + 1 CSCc) 1 y D D
(8.40)
z = (Cc6 y1C + D S 1 ) 1 Cc6 y1y
(8.41)
z = (I A + 1 SCc6 y1C) 1 1 SCc6 y1y D D
(8.42)
(if S 1 , 6 y1 exist) are completed by the dispersion matrix D{z} = 1 SCc(6 y + 1 CSCc) 1 6 y (6 y + 1 CSCc) 1 CS 1 D D D D
(8.43)
D{z} = (Cc6 y1C + D S 1 ) 1 Cc6 y1C(Cc6 y1C + D S 1 )1
(8.44)
(if S 1 , 6 y1 exist) by the bias vector (8.5) ȕ := E{z} E{z} = [I A 1 SCc( 1 CSCc + 6 y ) 1 C]E{z} D D ȕ = [I A (Cc6 y1C + D S 1 ) 1 Cc6 y1C]E{z}
(8.45)
(if S 1 , 6 y1 exist) and by the matrix of the Mean Square Prediction Error MSPE{z} : MSPE{z } := E{(z z )(z z )c} = D{z} + D{z} + ȕȕc
(8.46)
MSPE{z } = = 6 z + SCc(CSCc + 6 y ) 1 6 y (CSCc + 6 y ) 1 CS + +[I A - 1 SCc( 1 CSCc + 6 y ) 1 C]E{z}E{z}c × D D 1 ×[I A - Cc( CSCc + 6 y ) 1 CS 1 ] D D
(8.47)
MSPE{z} = = 6 z + (Cc6 C + D S 1 ) 1 Cc6 y1C(Cc6 y1C + D S 1 ) 1 CS + 1 y
+[I A - (Cc6 y1C + D S 1 ) 1 Cc6 y1C]E{z}E{z}'× ×[I A - Cc6 y1C(Cc6 y1C + D S 1 ) 1 ] (if S 1 , 6 y1 exist).
(8.48)
8-1 The random effect model
359
The interpretation of the very important predictors hom Į-VIP z of z is as follows: z of type (8.41), also called ridge estimator or Tykhonov-Phillips regulator, contains the Cayley inverse of the normal equation matrix which is additively decomposed into Cc6 y1C and D S 1 . The weight factor Į balances the first inverse dispersion part and the second inverse bias part. While the experiment informs y , the bias weight matrix S and us of the variance-covariance matrix 6 y , say 6 the weight factor Į are at the disposal of the analyst. For instance, by the choice S = Diag( s1 ,..., sA ) we may emphasize increase or decrease of certain bias matrix elements. The choice of an equally weighted bias matrix is S = I A . In contrast the weight factor Į can be determined by the A-optimal design of type •
tr D{z} = min
•
ȕȕc = min
•
tr MSPE{z} = min .
D
D
D
In the first case we optimize the trace of the variance-covariance matrix D{z} of type (8.43), (8.44). Alternatively by means of ȕȕ ' = min we optimize D the quadratic bias where the bias vector ȕ of type (8.45) is chosen, regardless of the dependence on E{z} . Finally for the third case – the most popular one – we minimize the trace of the Mean Square Prediction Error MSPE{z} of type (8.48), regardless of the dependence on E{z}E{z}c . But beforehand let us present the proof of Theorem 8.5, Theorem 8.6 and Theorem 8.7. Proof: (i) z = E{z}E{z}cCc[6 y + CE{z}E{z}cCc]1 y If the matrix 6 y + CE{z}E{z}cCc of the normal equations of type hom BLIP is of full rank, namely rk(6 y + CE{z}E{z}cCc) = n, then a straightforward solution of (8.23) is Lˆ = E{z}E{z}cCc[6 y + CE{z}E{z}cCc]1 . (ii) z = SCc(6 y + CSCc) 1 y If the matrix 6 y + CSC ' of the normal equations of type hom S-BLIP is of full rank, namely rk(6 y + CSC ') = n, then a straightforward solution of (8.24) is Lˆ = SCc[6 y + CSCc]1. (iii) z = (Cc6 y1C + S 1 ) 1 Cc6 y1y Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(10), Duncan-Guttman matrix identity) the fundamental matrix identity SCc(6 y + CSCc) 1 = (Cc6 y1C + S 1 ) 1 Cc6 y1 ,
360
8 The fourth problem of probabilistic regression
if S 1 and 6 y1 exist. Such a result concludes this part of the proof. (iv) z = (I A + SCc6 y1C) 1 SCc6 y1y Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(9)) the fundamental matrix identity SC '(6 y + CSCc) 1 = (I A + SCc6 y1C) 1 SCc6 y1 , if 6 y1 exists. Such a result concludes this part of the proof. (v) z = 1 SCc(6 y + 1 CSCc) 1 y D D If the matrix 6 y + D1 CSCc of the normal equations of type hom Į-VIP is of full rank, namely rk(6 y + D1 CSCc) = n, then a straightforward solution of (8.25) is Lˆ = 1 SCc[6 y + 1 CSCc]1. D D (vi) z = (Cc6 y1C + D S 1 ) 1 Cc6 y1y Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(10), Duncan-Guttman matrix identity) the fundamental matrix identity 1 SCc(6 + CSCc) 1 = (Cc6 1C + D S 1 ) 1 Cc6 1 , y y y D if S 1 and 6 y1 exist. Such a result concludes this part of the proof. (vii) z = (I A + 1 SCc6 y1C) 1 1 SCc6 y1y D D Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(9), Duncan-Guttman matrix identity) the fundamental matrix identity 1 SCc(6 + CSCc) 1 = (I + 1 SCc6 1C) 1 1 SCc6 1 , y l y y D D D if 6 y1 exist. Such a result concludes this part of the proof. (viii) hom BLIP: D{z} D{z} := E{[z E{z}][ z E{z}]c} = = E{z}E{z}' C '[6 y + CE{z}E{z}cCc]1 6 y × × [6 y + CE{z}E{z}cCc]1 CE{z}E{z}c . By means of the definition of the dispersion matrix D{z} and the substitution of z of type hom BLIP the proof has been straightforward. (ix) hom S-BLIP: D{z} (1st representation)
8-1 The random effect model
361
D{z} := E{[z E{z}][z E{z }]c} = = SCc(CSCc + 6 y ) 1 6 y (CSCc + 6 y ) 1 CS. By means of the definition of the dispersion matrix D{z} and the substitution of z of type hom S-BLIP the proof of the first representation has been straightforward. (x) hom S-BLIP: D{z} (2nd representation) D{z}:= E{[z E{z}][z E{z}]c} = = (Cc6 y1C + S 1 ) 1 Cc6 y1C(Cc6 y1C + S 1 ) 1 ,
if S 1 and 6 y1 exist. By means of the definition of the dispersion matrix D{z} and the substitution of z of type hom S-BLIP the proof the second representation has been straightforward. (xi) hom Į-VIP: D{z} (1st representation) D{z}:= E{[z E{z}][z E{z}]c} = = 1 SCc(6 y + 1 CSCc) 1 6 y (6 y + 1 CSCc) 1 CS 1 . D D D D
By means of the definition of the dispersion matrix D{z} and the substitution of z of type hom Į-VIP the proof the first representation has been straightforward. (xii) hom Į-VIP: D{z} (2nd representation) D{z } := E{[z E{z}][z E{z}]c} = (Cc6 y1C + D S 1 ) 1 Cc6 y1C(Cc6 y1C + D S 1 ) 1 , if S 1 and 6 y1 exist. By means of the definition of the dispersion matrix D{z} and the substitution of z of type hom Į-VIP the proof of the second representation has been straightforward. (xiii) bias ȕ for hom BLIP, hom S-BLIP and hom Į-VIP As soon as we substitute into the bias ȕ := E{z} E{z} = E{z} + E{z } the various predictors z of the type hom BLIP, hom S-BLIP and hom Į-VIP we are directly led to various bias representations ȕ of type hom BLIP, hom S-BLIP and hom Į-VIP. (xiv) MSPE{z} of type hom BLIP, hom S-BLIP and hom Į-VIP MSPE{z}:= E{(z z )(z z )c} z z = z E{z} (z E{z}) = z E{z} (z E{z}) ( E{z} E{z})
E{( z z )(z z )c} = E{(z E{z})((z E{z})c} + E{(z E{z})( z E{z})c} + +( E{z} E{z})( E{z} E{z})c
362
8 The fourth problem of probabilistic regression
MSPE{z} = D{z} + D{z} + ȕȕc. At first we have defined the Mean Square Prediction Error MSPE{z} of z . Secondly we have decomposed the difference z z into the three terms • z E{z} • z E{z} • E{z} E{z} in order to derive thirdly the decomposition of MSPE{z} , namely •
the dispersion matrix of z , namely D{z} ,
•
the dispersion matrix of z , namely D{z} ,
•
the quadratic bias ȕȕc .
As soon as we substitute MSPE{z} the dispersion matrix D{z} and the bias vector ȕ of various predictors z of the type hom BLIP, hom S-BLIP and hom Į-VIP we are directly led to various representations ȕ of the Mean Square Prediction Error MSPE{z} . Here is my proof’s end.
h
8-2 Examples Example 8.1 Nonlinear error propagation with random effect models Consider a function y = f ( z ) where y is a scalar valued observation and z a random effect. Three cases can be specified as follows: Case 1 ( P z assumed to be known): By Taylor series expansion we have 1 1 f c( P z )( z P z ) + f cc( P z )( z P z ) 2 + O (3) 1! 2! 1 E{ y} = E{ f ( z )} = f ( P z ) + f cc( P z ) E{( z P z ) 2 } + O (3) 2!
f ( z) = f (P z ) +
leading to (cf. E. Grafarend and B. Schaffrin 1983, p.470) 1 f cc( P z )V z2 + O (3) 2! 1 2 E{( y E{ y}) } = E{[ f c( P z )( z P z ) + f cc( P z )( z P z ) 2 + O (3) 2! 1 f cc( P z )V z2 O (3)]2 }, 2! hence E{[ y E{ y}][[ y E{ y}]} is given by E{ y} = f ( P z ) +
8-2 Examples
363
V y2 = f c2 ( P z )V z2
1 2 f cc ( P z )V z4 + f fc cc( P z ) E{( z P z )3 } + 4 1 + f cc2 E{( z P z ) 4 } + O (3). 4
Finally if z is quasi-normally distributed, we have V y2 = f c 2 ( P z )V z2 +
1 f cc 2 ( P z )V z4 + O (3). 2
Case 2 ( P z unknown, but [ 0 known as a fixed effect approximation (this model is implied in E. Grafarend and B. Schaffrin 1983, p.470, [ 0 z P z )): By Taylor series expansion we have f ( z ) = f ([ 0 ) +
1 1 f c([ 0 )( z [ 0 ) + f cc([ 0 )( z [ 0 ) 2 + O (3) 1! 2!
using
[ 0 = P z + ([ 0 P z ) z [ 0 = z P z + ( P z [ 0 ) we have 1 1 f c([ 0 )( z P z ) + f c([ 0 )( z [ 0 ) + 1! 1! 1 1 2 + f cc([ 0 )( z P z ) + f cc([ 0 )( z [ 0 ) 2 + 2! 2! + f cc([ 0 )( z P z )( z [ 0 ) + O (3)
f ( z ) = f ([ 0 ) +
and E{ y} = E{ f ( z )} = f ([ 0 ) + f c([ 0 )( P z [ 0 ) + +
1 f cc([ 0 )V z2 + 2
1 f cc([ 0 )( P z [ 0 ) 2 + O (3) 2
leading to E{[ y E{ y}][[ y E{ y}]} as
V z2 = f c2 ([ 0 )V z2 + f fc cc([ 0 ) E{( z P z )3} + 2 f fc cc([ 0 )V z2 ( P z [ 0 ) + 1 + f cc2 ([ 0 ) E{( z P z ) 4 } + f cc2 ([ 0 ) E{( z P z ) 3}( P z [ 0 ) 4 1 f cc2 ([ 0 )V z4 + f cc2 ([ 0 )V z2 ( P z [ 0 ) 2 + O (3) 4 and with z being quasi-normally distributed, we have V z2 = f c2 ([ 0 )V z2 + 2 f fc cc([ 0 )V z2 ( P z [ 0 ) +
1 2 f cc ([ 0 )V z4 + f cc2 ([ 0 )V z2 ( P z [ 0 ) 2 + O (3) , 2
with the first and third terms (on the right hand side) being the right hand sideterms of case 1 (cf. E. Grafarend and B. Schaffrin 1983, p.470).
364
8 The fourth problem of probabilistic regression
Case 3 ( P z unknown, but z0 known as a random effect approximation): By Taylor series expansion we have f ( z) = f (P z ) +
1 1 f c( P z )( z P z ) + f cc( P z )( z P z ) 2 + 1! 2! 1 + f ccc( P z )( z P z )3 + O (4) 3!
changing z P z = z0 P z = z0 E{z0 } ( P z E{z0 }) and the initial bias ( P z E{z0 }) = E{z0 } P z =: E 0 leads to z P z = z0 E{z0 } + E 0 . Consider ( z P z ) 2 = ( z0 E{z0 }) 2 + E 02 + 2( z0 E{z0 }) E 0 we have 1 1 f c( P z )( z0 E{z0 }) + f c( P z ) E 0 + 1! 1! 1 1 2 2 + f cc( P z )( z0 E{z0 }) + f cc( P z ) E 0 + f cc( P z )( z0 E{z0 }) E 0 + O (3) 2! 2! 1 1 E{ y} = f ( P z ) + f c( P z ) E 0 + f cc( P z )V z2 + f cc( P z ) E 02 + O (3) 2 2 f ( z) = f (P z ) +
0
leading to E{[ y E{ y}][[ y E{ y}]} as
V y2 = f c2 ( P z )V z2 + f fc cc( P z ) E{( z0 E{z0 })3 } + 1 2 f fc cc( P z )V z2 E 0 + f cc2 ( P z ) E{( z0 E{z0 }) 4 } + 4 f cc2 ( P z ) E{( z0 E{z0 })3 }E 0 + f cc2 ( P z )V z2 E 02 + 1 1 + f cc2 ( P z )V z4 f cc2 ( P z ) E{( z0 E{z0 }) 2 }V z2 + O (3) 4 2 0
0
0
0
0
and with z0 being quasi-normally distributed, we have
V y2 = f c2 ( P z )V z2 + 2 f fc cc( P z )V z2 E 0 + 0
0
1 2 f cc ( P z )V z4 + f cc2 ( P z )V z2 E 02 + O (3) 2 0
0
with the first and third terms (on the right-hand side) being the right-hand side terms of case 1.
8-2 Examples
365
Example 8.2
Nonlinear vector valued error propagation with random effect models
In a GeoInformation System we ask for the quality of a nearly rectangular planar surface element. Four points {P1, P2, P3, P4} of an element are assumed to have the coordinates (x1, y1), (x2, y2), (x3, y3), (x4, y4) and form a 8×8 full variancecovariance matrix (central moments of order two) and moments of higher order. The planar surface element will be computed according the Gauß trapezoidal: 4
F =¦ i =1
yi + yi +1 ( xi xi +1 ) 2
with the side condition x5= x1, y5=y1. Note that within the Error Propagation Law w2 F z0 wxwy holds. P3
P2
e2
P4
P1 e1 Figure 8.2: Surface element of a building in the map First question ? What is the structure of the variance-covariance matrix of the four points if we assume statistical homogeneity and isotropy of the network (Taylor-Karman structure)? Second question ! Approach the criterion matrix in terms of absolute coordinates. Interpolate the correlation function linear!
366
8 The fourth problem of probabilistic regression
Table 8.1: Coordinates of a four dimensional simplex Point
x
y
P1
100.00m
100.00m
P2
110.00m
117.32m
P3
101.34m
122.32m
P4
91.34m
105.00m
Table 8.2: Longitudinal and lateral correlation functions 6 m and 6 A for a Taylor-Korman structured 4 point network |x|
6 m (|x|)
6 A (|x|)
10m
0.700
0.450
20m
0.450
0.400
30m
0.415
0.238
Our example refers to the Taylor-Karman structure or the structure function introduced in Chapter 3-222. :Solution: The Gauß trapezoidal surface element has the size: F=
y + y3 y + y4 y1 + y2 y + y1 ( x1 x2 ) + 2 ( x2 x3 ) + 3 ( x3 x4 ) + 4 ( x4 x1 ). 2 2 2 2
Once we apply the “error propagation law” we have to use (E44). 1 V F2 = JȈJ c + + (vecȈ)(vecȈ)c+ c . 2 In our case, n=1 holds since we have only one function to be computed. In contrast, the variance-covariance matrix enjoys the format 8×8, while the Jacobi matrix of first derivatives is a 1×8 matrix and the Hesse matrix of second derivatives is a 1×64 matrix. (i) The structure of the homogeneous and isotropic variance-covariance matrix is such that locally 2×2 variance-covariance matrices appear as unit matrices generating local error circles of identical radius. (ii) The celebrated Taylor-Karman matrix for absolute coordinates is given by 'xi 'x j Ȉij (x p , x q ) = 6 m (| x p x q |)G ij + [6 A (| x p x q |) 6 m (| x p x q |)] |x p x q |2 subject to 'x1 := 'x = x p xq , 'x2 := 'y = y p yq , i, j {1, 2}; p, q {1, 2,3, 4}.
8-2 Examples
367
By means of a linear interpolation we have derived the Taylor-Karman matrix by Table 8.3 and Table 8.4. Table 8.3: Distances and meridian correlation function 6 m and longitudinal correlation function 6 A p-q
|x p x q |
|x p x q |2
6m
6A
xp-xq
yp-yq
1-2
20.000
399.982
0.45
0.40
-10
-17.32
1-3
22.360
499.978
0.44
0.36
-1.34
-22.32
1-4
10.000
100.000
0.70
0.45
8.66
-5
2-2
10.000
100.000
0.70
0.45
8.66
-5
2-4
22.360
499.978
0.44
0.36
18.66
12.32
3-4
20.000
399.982
0.45
0.40
10
17.32
Table 8.4: Distance function versus 6 m (x), 6 A ( x) |x|
6 m ( x)
6 A ( x)
10-20
0.95 0.025 | x |
0.5 0.005 | x |
20-30
0.52 0.0035 | x |
0.724 0.0162 | x |
Once we take care of ¦ m and ¦ A as a function of the distance for gives values of tabulated distances we arrive at the Taylor-Karman correlation values of type Table 8.5. Table 8.5: Taylor-Karman matrix for the case study
x1 y1 x2
x1
y1
x2
y2
x3
y3
x4
y4
1
0
0.438
-0.022
0.441
-0.005
0.512
0.108
1
-0.022
0.412
-0.005
0.361
0.108
0.638
1
0
0.512
0.108
0.381
-0.037
1
0.108
0.634
-0.037
0.417
1
0
0.438
-0.022
1
-0.022
0.412
1
0
y2 x3 y3 x4 y4
symmetric
1
Finally, we have computed the Jacobi matrix of first derivatives in Table 8.6 and the Hesse matrix of second derivatives in Table 8.7.
368
8 The fourth problem of probabilistic regression
Table 8.6: Table of the Jacobi matrix “Jacobi matrix” wF wF wF wF , ," , , ] wx1 wy1 wx4 wy4 J = 12 [ y2 y4 , x4 x2 , y3 y1 , x1 x3 , y4 y2 , x2 x4 , y1 y3 , x3 x1 ] J =[
J = 12 [12.32, 18.66, 22.32, 1.34, 12.32,18.66, 22.32,1.34] :Note: y y º wF = i +1 i -1 » wxi 2 » wF xi +1 xi -1 » = » wyi 2 ¼
x0 = x4 y0 = y4 x5 = x1 y5 = y1 .
Table 8.7: Table Hesse matrix “Hesse matrix” w w
F (x)= wxc wxc w wF wF wF wF =
[ , ,", , ] wxc wx1 wy1 wx4 wy4
H=
=[
w2 F w2 F w2 F w2 F w2 F w2 F , ,", , ,", , ] 2 wx1wy4 wy1wx1 wy4 wx4 wy 2 wx1 wx1wy1
=[
w wF wF w wF wF ( ,", ),", ( ,", )]. wx1 wx1 wy4 wy4 wx1 wy4
Note the detailed computation in Table 8.8. Table 8.8: Second derivatives {0, +1/2, -1/2} : “interims formulae: Hesse matrix”: w2 F w2 F = =0 wxi wx j wyi wy j
i, j = 1, 2,3, 4
w2 F w2 F = =0 wxi wyi wyi wxi
i = 1, 2,3, 4
w2 F w2 F 1 = = wxi wyi 1 wyi wxi +1 2 w2 F w2 F 1 = = wyi wxi 1 wxi wyi +1 2
i = 1, 2,3, 4 i = 1, 2,3, 4.
8-2 Examples
369 Results
At first, we list the distances {P1P2, P2P3, P3P4, P4P1} of the trapezoidal finite element by |P1P2|=20 (for instance 20m), |P2P3|=10 (for instance 10m), |P3P4|=20 (for instance 20m) and |P4P1|=10 (for instance 10m). Second, we compute V F2 ( first term) = JȈJ c by V F2 ( first term) = 1 = [12.32, 18.66, 22.32, 1.34, 12.32, 18.62, 22.32, 1.34] × 2 0 0.438 -0.022 0.442 -0.005 0.512 0.108 º ª 1 1 -0.022 0.412 -0.005 0.362 0.108 0.638 » « 0 « 0.438 -0.022 1 0 0.512 0.108 0.386 -0.037 » «-0.022 0.412 0 1 0.108 0.638 -0.037 0.418 » × ×« 0.442 -0.005 0.512 0.108 1 0 0.438 -0.022 » « -0.005 0.362 0.108 0.638 0 1 -0.022 0.412 » « 0.512 0.108 0.386 -0.037 0.438 -0.022 1 0 » « 0.108 0.638 -0.037 0.418 -0.022 0.412 0 1 »¼ ¬ 1 × [12.32, 18.66, 22.32, 1.34, 12.32, 18.62, 22.32, 1.34]c = 2 = 334.7117. 1 Third, we need to compute V F2 ( second term) = H (vecȈ)(vecȈ)cH c by 2 1 V F2 ( second term) = H (vecȈ)(vecȈ)cH c = 7.2222 × 10-35 2 where ª 0 0 0 12 0 0 0 12 º « 0 0 1 0 0 0 1 0 » 2 2 « » 1 1 « 0 2 0 0 0 2 0 0 » « 1 0 0 0 1 0 0 0 » 2 H = vec « 2 1 1 », « 0 0 01 2 0 0 01 2 » « 0 0 2 0 0 0 2 0 » « 0 1 0 0 0 1 0 0 » 2 « 1 2 » 1 ¬« 2 0 0 0 2 0 0 0 ¼» and ª 1 « 0 « 0.438 «-0.022 vec Ȉ = vec « « 0.442 « -0.005 « 0.512 ¬« 0.108
0 1 -0.022 0.412 -0.005 0.362 0.108 0.638
0.438 -0.022 1 0 0.512 0.108 0.386 -0.037
-0.022 0.412 0 1 0.108 0.638 -0.037 0.418
0.442 -0.005 0.512 0.108 1 0 0.438 -0.022
-0.005 0.362 0.108 0.638 0 1 -0.022 0.412
Finally, we get the variance of the planar surface element F
0.512 0.108 0.386 -0.037 0.438 -0.022 1 0
0.108 º 0.638 » -0.037 » 0.418 » ». -0.022 » 0.412 » 0 » 1 ¼»
370
8 The fourth problem of probabilistic regression
V F2 = 334.7117+7.2222 × 10-35 = 334.7117 i.e.
V F = ±18.2951 (m 2 ) . Example 8.3 Nonlinear vector valued error propagation with random effect models The distance element between P1 and P2 has the size: F = ( x2 x1 ) 2 + ( y2 y1 ) 2 . Once we apply the “error propagation law” we have to use (E44). 1 V F2 = JȈJ c + H(vecȈ)(vecȈ)cH c. 2 Table 8.6: Table of the Jacobi matrix “Jacobi matrix” wF wF wF wF , ," , , ] wx1 wy1 wx4 wy4 ( x2 x1 ) ( y2 y1 ) ( x2 x1 ) ( y2 y1 ) J =[ , , , , 0, 0, 0, 0] F F F F J =[
J = [0.5, 0.866, 0.5, 0.866, 0, 0, 0, 0]. Table 8.7: Table Hesse matrix “Hesse matrix” w w
F (x)= wxc wxc w wF wF wF wF =
[ , ," , , ] wxc wx1 wy1 wx4 wy4 w2 F w2 F w2 F w2 F w2 F w2 F =[ 2 , ," , , ," , , ] wx1wy4 wy1wx1 wy4 wx4 wy 2 wx1 wx1wy1 w wF wF w wF wF =[ ( ," , )," , ( ,", )]. wx1 wx1 wy4 wy4 wx1 wy4
H=
Note the detailed computation in Table 8.8. Table 8.8: Second derivatives : “interims formulae: Hesse matrix”: ( x x1 )( y 2 y1 ) 1 ( x x )2 w wF wF ( ," , ) =[ 2 31 , 2 , F wx1 wx1 wy 4 F F3 1 ( x 2 x 2 ) ( x x1 )( y 2 y1 ) , 0, 0, 0, 0], + 2 3 1 , 2 F F F3
8-2 Examples
371
( x x )( y y ) 1 ( y y ) 2 w wF wF ( ," , ) = [ 2 1 3 2 1 , 2 3 1 , F wy1 wx1 wy4 F F ( x2 x1 )( y2 y1 ) 1 ( y22 y12 ) , + , 0, 0, 0, 0] F F3 F3 w wF wF 1 ( x x ) 2 ( x x )( y y ) ( ," , ) = [ + 2 3 1 , 2 1 3 2 1 , F wx2 wx1 wy4 F F ( x x )( y y ) 1 ( x22 x12 ) , 2 1 3 2 1 , 0, 0, 0, 0], 3 F F F ( x x )( y y ) w wF wF 1 ( y y )2 ( ," , ) =[ 2 1 3 2 1 , + 2 3 1 , F wy2 wx1 wy4 F F
( x2 x1 )( y2 y1 ) 1 ( y22 y12 ) , , 0, 0, 0, 0], F F3 F3
w wF wF º ( ," , ) = [0, 0, 0, , 0, 0, 0, 0, 0] » wxi wx1 wy4 » , i = 3, 4. » w wF wF ( ," , ) = [0, 0, 0, , 0, 0, 0, 0, 0]» wyi wx1 wy4 ¼ Results At first, we list the distance {P1P2} of the distance element by |P1P2|=20 (for instance 20m). Second, we compute V F2 ( first term) = JȈJ c by
V F2 ( first term) = = [0.5, 0.866, 0.5, 0.866, 0, 0, 0, 0] × ª 1 « 0 « « 0.438 -0.022 × «« 0.442 « -0.005 « « 0.512 «¬ 0.108
0 1 -0.022 0.412 -0.005 0.362 0.108 0.638
0.438 -0.022 1 0 0.512 0.108 0.386 -0.037
-0.022 0.412 0 1 0.108 0.638 -0.037 0.418
0.442 -0.005 0.512 0.108 1 0 0.438 -0.022
-0.005 0.362 0.108 0.638 0 1 -0.022 0.412
0.512 0.108 0.386 -0.037 0.438 -0.022 1 0
0.108 º 0.638 » » -0.037 » 0.418 » × -0.022 » 0.412 »» 0 » 1 »¼
×[0.5, 0.866, 0.5, 0.866, 0, 0, 0, 0]c = = 1.2000. Third, we need to compute V F2 ( second term) =
1 H (vecȈ)(vecȈ)cH c by 2
372
8 The fourth problem of probabilistic regression
1 V F2 ( second term) = H (vecȈ)(vecȈ)cH c = 0.0015 2 where ª 0.0375 « -0.0217 « « -0.0375 0.0217 H = vec «« 0 « 0 « « 0 «¬ 0
-0.0217 0.0125 0.0217 -0.0125 0 0 0 0
-0.0375 0.0217 0.0375 -0.0217 0 0 0 0
0.0217 -0.0125 -0.0217 0.0125 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0º 0» » 0» 0» , 0» » 0» 0» 0»¼
and ª 1 « 0 « « 0.438 -0.022 vec Ȉ = vec «« 0.442 « -0.005 « « 0.512 «¬ 0.108
0 1 -0.022 0.412 -0.005 0.362 0.108 0.638
0.438 -0.022 1 0 0.512 0.108 0.386 -0.037
-0.022 0.412 0 1 0.108 0.638 -0.037 0.418
0.442 -0.005 0.512 0.108 1 0 0.438 -0.022
-0.005 0.362 0.108 0.638 0 1 -0.022 0.412
Finally, we get the variance of the distance element F
V F2 = 1.2000+0.0015 = 1.2015 i.e.
V F = ±1.0961 (m) .
0.512 0.108 0.386 -0.037 0.438 -0.022 1 0
0.108 º 0.638 » » -0.037 » 0.418 » . -0.022 » » 0.412 » 0 » 1 »¼
9
The fifth problem of algebraic regression - the system of conditional equations: homogeneous and inhomogeneous equations {By = Bi versus c + By = Bi} :Fast track reading: Read Lemma 2, 3 and 6
Lemma 9.2 “inconsistent homogeneous conditions”
Definition 9.1 “inconsistent homogeneous conditions”
Lemma 9.3 G y -norm: least squares solution
Theorem 9.4 G y -seminorm: least squares solution
Definition 9.5 “inconsistent inhomogeneous conditions”
Lemma 9.6 “inconsistent inhomogeneous conditions”
Here we shall outline two systems of poor conditional equations, namely homogeneous and inhomogeneous inconsistent equations. First, Definition 9.1 gives us G y -LESS of a system of inconsistent homogeneous conditional equations which we characterize as the least squares solution with respect to the G y seminorm ( G y -norm) by means of Lemma 9.2, Lemma 9.3 ( G y -norm) and Lemma 9.4 ( G y -seminorm). Second, Definition 9.5 specifies G y -LESS of a system of
374
9 The fifth problem of algebraic regression
inconsistent inhomogeneous conditional equations which alternatively characterize as the corresponding least squares solution with respect to the G y -seminorm by means of Lemma 9.6. Third, we come up with examples.
9-1
G y -LESS of a system of a inconsistent homogeneous conditional equations
Our point of departure is Definition 9.1 by which we define G y -LESS of a system of inconsistent homogeneous condition equations. Definition 9.1 ( G y -LESS of a system of inconsistent homogeneous condition equations): An n × 1 vector i A of inconsistency is called G y -LESS (LEast Squares Solution with respect to the G y -seminorm ) of the inconsistent system of linear condition equations Bi = By, if in comparison to all other vectors i \ n the inequality || i A ||G2 := i cA G y i A d i cG y i =:|| i ||G2 y
y
(9.1) (9.2)
holds in particular if the vector of inconsistency i A has the least G y -seminorm. Lemma 9.2 characterizes the normal equations for the least squares solution of the system of inconsistent homogeneous condition equations with respect to the G y -seminorm. Lemma 9.2 (least squares solutions of the system of inconsistent homogeneous condition equations with respect to the G y -seminorm ): An n × 1 vector i A of the system of inconsistent homogeneous condition equations (9.3) Bi = By is G y -LESS if and only if the system of normal equations ªG y «B ¬
B cº ª i A º ª 0 º =« » 0 »¼ «¬ OA »¼ ¬ B y ¼
with the q × 1 vector OA of “Lagrange multipliers” is fulfilled. :Proof: G y -LESS of Bi = By is constructed by means of the Lagrangean
(9.4)
9-1 Inconsistent homogeneous conditional equations
375
L( i, O ) := icG y i + 2O c( Bi By ) = min . i, O
The first derivatives wL ( i A , OA ) = 2(G y i A + BcOA ) = 0 wi wL ( i A , OA ) = 2( Bi A By ) = 0 wO constitute the necessary conditions. (The theory of vector-valued derivatives is presented in Appendix B.) The second derivatives wL ( i A , OA ) = 2G y t 0 wiwic build up due to the positive semidefiniteness of the matrix G y the sufficiency condition for the minimum. The normal equations (9.4) are derived from the two equations of first derivatives, namely ªG y «B ¬
B cº ª i A º ª 0 º = . 0 »¼ «¬ OA »¼ «¬ By »¼
h Lemma 9.3 is a short review of the system of inconsistent homogeneous condition equations with respect to the G y -norm, Lemma 9.4 alternatively with respect to the G y -seminorm. Lemma 9.3 (least squares solution of the system of inconsistent homogeneous condition equations with respect to the G y -norm): An n × 1 vector i A of the system of inconsistent homogeneous condition equations Bi = By is the least squares solution with respect to the G y -norm if and only if it solves the normal equations G y i A = Bc(BG y1Bc) 1 By. (9.5) The solution i A = G y1Bc( BG y1Bc) 1 By
(9.6)
is unique. The “goodness of fit” of G y -LESS is || i A ||G2 = i cA G y i A = y cBc(BG y1Bc) 1 By. y
(9.7)
:Proof: A basis of the proof could be C. R. Rao’s Pandora Box, the theory of inverse partitioned matrices (Appendix A: Fact: Inverse Partitioned Matrix /IPM/ of a symmetric matrix). Due to the rank identity rkG y = n , the normal equations (9.4) can be faster solved by Gauss elimination.
376
9 The fifth problem of algebraic regression
G y i A + BcOA = 0 Bi A = By. Multiply the first normal equation by BG y1 and substitute the second normal equation for Bi A . BG y1G y i A = Bi A = BG y1BcOA º » Bi A = By ¼ BG y1BcOA = By
OA = (BG y1Bc) 1 By. Finally we substitute the “Lagrange multiplier” OA back to the first normal equation in order to prove G y i A + BcOA = G y i A Bc(BG y1Bc) 1 By = 0 i A = G y1Bc(BG y1Bc) 1 By. h We switch immediately to Lemma 9.4. Lemma 9.4 (least squares solution of the system of inconsistent homogeneous condition equations with respect to the G y -seminorm ): An n × 1 vector i A of the system of inconsistent homogeneous condition equations Bi = By is the least squares solution with respect to the G y -seminorm if the compatibility condition R (Bc) R (G y )
(9.8)
is fulfilled, and solves the system of normal equations G y i A = Bc(BG y1Bc) 1 By ,
(9.9)
which is independent of the choice of the g-inverse G y .
9-2
Solving a system of inconsistent inhomogeneous conditional equations
The text point of departure is Definition 9.5, a definition of G y -LESS of a system of inconsistent inhomogeneous condition equations.
377
9-3 Examples
Definition 9.5 ( G y -LESS of a system of inconsistent inhomogeneous condition equations): An n × 1 vector i A of inconsistency is called G y -LESS (LEast Squares Solution with respect to the G y -seminorm ) of the inconsistent system of inhomogeneous condition equations c + By = B i (9.10) (the minus sign is conventional), if in comparison to all other vectors i R n the inequality || i A ||2 := i cA G y i A d i cG y i =:|| i ||G2
y
(9.11)
holds, in particular if the vector of inconsistency i A has the least G y -seminorm. Lemma 9.6 characterizes the normal equations for the least squares solution of the system of inconsistent inhomogeneous condition equations with respect to the G y -seminorm. Lemma 9.6 (least squares solution of the system of inconsistent inhomogeneous condition equations with respect to the G y -seminorm): An n × 1 vector i A of the system of inconsistent homogeneous condition equations (9.12) Bi = By c = B(y d) is G y -LESS if and only if the system of normal equations ªG y B c º ª i A º ª O º (9.13) « B O » «O » = «By c » , ¬ ¼¬ A¼ ¬ ¼ with the q × 1 vector O of Lagrange multipliers is fulfilled. i A exists surely if (9.14) R (Bc) R (G y ) and it solves the normal equations G y i A = B c(BG y1B c) 1 (By c) , (9.15) which is independent of the choice of the g-inverse G y . i A is unique if the matrix G y is regular and in consequence positive definite.
9-3
Examples
Our two examples relate to the triangular condition, the so-called zero misclosure, within a triangular network, and the condition that the sum within a flat triangle accounts to 180 o .
378
9 The fifth problem of algebraic regression
(i)
the first example: triplet of angular observations
We assume that three observations of height differences within the triangle PD PE PJ sum up to zero. The condition of holonomic heights says hDE + hEJ + hJD = 0 , namely ª hDE º ªiDE º B := [1, 1, 1], y := «« hEJ »» , i := «« iEJ »» . «¬ hJD »¼ «¬ iJD »¼ The normal equations of the inconsistent condition read for the case G y = I 3 : i A = Bc(BBc) 1 By , ª1 1 1º 1 Bc(BBc)B = «1 1 1» , 3 «1 1 1» ¬ ¼ 1 (iDE )A = (iEJ )A = (iJD ) A = ( hDE + hEJ + hJD ) . 3 (ii)
the second example: sum of planar triangles
Alternatively, we assume: three angles which form a planar triangle of sum to
D + E + J = 180D namely ªD º ªiD º « » B := [1, 1, 1], y := « E » , i := ««iE »» , c := 180D. «¬ J »¼ «¬ iJ »¼ The normal equations of the inconsistent condition equation read in our case G y = I3 : i A = Bc(BBc) 1 (By c) , ª1º 1 Bc(BBc) 1 = «1» , By c = D + E + J 180D , 3 «1» ¬¼ ª1º 1 (iD )A = (i E )A = (iJ )A = «1» (D + E + J 180D ) . 3 «1» ¬¼
10 The fifth problem of probabilistic regression – general Gauss-Markov model with mixed effects– Setup of BLUUE for the moments of first order (Kolmogorov-Wiener prediction) “Prediction company’s chance of success is not zero, but close to it.” Eugene Fama “The best way to predict the future is to invent it.” Alan Kay : Fast track reading : Read only Theorem 10.3 and Theorem 10.5
Lemma 10.2 Ȉ y BLUUE of ȟ and E{z} :
Definition 10.1 Ȉ y BLUUE of ȟ and E{z} :
Theorem 10.3 n ȟˆ , E {z} : Ȉ y BLUUE of ȟ and E{z} : Lemma 10.4 n n E{y}: ȟˆ , E {z} Ȉ y BLUUE of ȟ and E{z}
The inhomogeneous general linear Gauss-Markov model with fixed effects and random effects will be presented first. We review the special KolmogorovWiener model and extend it by the proper stochastic model of type BIQUUE given by Theorem 10.5.
10.1.1.1 Theorem 10.5 Homogeneous quadratic setup of Vˆ 2
380
10 The fifth problem of probabilistic regression
The extensive example for the general linear Gauss-Markov model with fixed effects and random effects concentrates on a height network observed at two epochs. At the first epoch we assume three measured height differences. In between the first and the second epoch we assume height differences which change linear in time, for instance as a result of an earthquake we have found the height difference model • hDE (W ) = hDE (0) + hDE W + O (W 2 ) .
Namely, W indicates the time interval from the first epoch to the second epoch • relative to the height difference hDE . Unknown are •
the fixed effects hDE and
•
the expected values of stochastic effects of type height difference velocities hDE
given the singular dispersion matrix of height differences. Alternative estimation and prediction producers of •
type (V + CZCc) -BLUUE for the unknown fixed parameter vector ȟ of height differences of initial epoch and
•
the expectation data E{z} of stochastic height difference velocities z, and
•
of type (V + CZCc) -BLUUE for the expectation data E{y} of height difference measurements y,
•
of type e y of the empirical error vector,
•
as well as of type (V + CZCc) -BLUUP of the stochastic vector z of height difference velocities.
For the unknown variance component V 2 of height difference observations we review estimates of type BIQUUE. At the end, we intend to generalize the concept of estimation and prediction of fixed and random effects by a short historical remark.
10-1 Inhomogeneous general linear Gauss-Markov model (fixed effects and random effects) Here we focus on the general inhomogeneous linear Gauss-Markov model including fixed effects and random effects. By means of Definition 10.1 we review Ȉ y -BLUUE of ȟ and E{z} followed by the related Lemma 10.2, Theorem 10.3 and Lemma 10.4. Box 10.1 Inhomogeneous general linear Gauss–Markov model (fixed effects and random effects)
10-1 Inhomogeneous general linear Gauss-Markov model
381
Aȟ + CE{z} + Ȗ = E{y}
(10.1)
Ȉ z := D{z}, Ȉ y := D{y} C{y , z} = 0 ȟ, E{z}, E{y}, Ȉ z , Ȉ y
unknown
Ȗ known E{y} Ȗ ([ A, C]) .
(10.2)
The n×1 stochastic vector y of observations is transformed by means of y Ȗ =: y to the new n×1 stochastic vector y of reduced observations which is characterized by second order statistics, in particular by the first moments E{y} and by the central second moments D{y}. Definition 10.1 ( Ȉ y BLUUE of ȟ and E{z} ): The partitioned vector ȗ = Ly + ț , namely ª ȟˆ º ª ȟˆ º ª L1 º ª ț1 º « » = «n» = « »y + « » ¬ț 2 ¼ ¬ Șˆ ¼ «¬ E{z}»¼ ¬ L 2 ¼ is called Ȉ y BLUUE of ȗ (Best Linear Uniformly Unbiased Estimation with respect to Ȉ y - norm) in (10.1) if ȗˆ is uniformly unbiased in the sense of (1st)
(2nd)
(10.3) E{ȗˆ} = E{Ly + ț} = ȗ for all ȗ R m+l or ˆ E{ȟ} = E{L1y + ț1} = ȟ for all ȟ R m (10.4) n {z}} = E{L 2 y + ț 2 } = Ș = E{z} for all Ș R l E{Șˆ } = E{E and in comparison to all other linear uniformly unbiased estimation ȗˆ has minimum variance. tr D{ȗˆ} := E{(ȗˆ ȗ)c(ȗˆ ȗ)} = tr LȈ y Lc =|| Lc ||2Ȉ = min y
(10.5)
L
or tr D{ȟˆ} := E{(ȟˆ - ȟ )c(ȟˆ - ȟ )} = tr L1Ȉ y L1c =|| L1c ||2Ȉ = min y
L1
n tr D {Șˆ } := tr D{E {z}} := E {( Șˆ - Ș) (Șˆ - Ș)} = n n = E{( E {z} E{z})c( E {z} E{z}{z})} = tr L 2 Ȉ y Lc2 =|| Lc ||2Ȉ = min . y
L2
(10.6)
382
10 The fifth problem of probabilistic regression
We shall specify Ȉ y -BLUUE of ȟ and E{z} by means of ț1 = 0 , ț 2 = 0 and writing the residual normal equations by means of “Lagrange multipliers”. Lemma 10.2 ( Ȉ y BLUUE of ȟ and E{z} ): n An (m+l)×1 vector [ȟˆ c, Șˆ c]c = [ȟˆ c, E {z}c]c = [L1c , Lc2 ]c y + [ț1c , ț c2 ]c is n c c c Ȉ y BLUUE of [ȟ , E{z} ] in (10.1), if and only if ț1 = 0, ț 2 = 0 hold and the matrices L1 and L 2 tions ª Ȉ y A Cº ª L1c « Ac 0 0 » « ȁ « » « 11 «¬ Cc 0 0 »¼ «¬ ȁ 21
fulfill the system of normal equaLc2 º ª 0 ȁ12 » = « I m » « ȁ 22 »¼ «¬ 0
0º 0» » I l »¼
(10.7)
or Ȉ y L1c Aȁ11 Cȁ 21 = 0, Ȉ y Lc2 Aȁ12 Cȁ 22
0
A cL1c = I m , A cLc2 = 0 CcL1c = 0, CcLc2 = I l
(10.8)
with suitable matrices ȁ11 , ȁ12 , ȁ 21 and ȁ 22 of “Lagrange multipliers”. Theorem 10.3 specifies the solution of the special normal equations by means of (10.9) relative to the specific “Schur complements” (10.10)-(10.13). n Theorem 10.3 ( ȟˆ , E {z} Ȉ y BLUUE of ȟ and E{z} ): n Let [ȟˆ c, E{z}c]c be Ȉ y BLUUE of the [ȟ c, E{z}c]c in the mixed Gauss-Markov model (10.1). Then the equivalent representations of the solution of the normal equations (10.7) ª ȟˆ º ª A cȈ-y1A A cȈ-1y Cº ª A cº ˆ ˆȗ := «ª ȟ »º := « Ȉ-y1y »=« -1 -1 » « » n c c c ¬ Șˆ ¼ «¬ E {z}»¼ ¬ C Ȉ y A C Ȉ y C ¼ ¬ C ¼
(10.9)
ȟˆ = {A cȈ-y1[I n C(CcȈ-y1C) -1 CcȈ-y1 ]A}-1 × A cȈ-y1[I n - C(CcȈ-y1C) -1 CcȈ-y1 ] y 1 n Șˆ = E {z} = {CcȈ-1y [I n - A( AcȈ-y1A )-1 AcȈ-y1 ]C} × CcȈ-y1[In - A( AcȈ-y1A)-1 AcȈ-y1 ] y
ȟˆ = S -A1sA n Șˆ := E {z} = SC-1sC n ȟˆ = ( A cȈ-y1A ) 1 A cȈ-1y ( y E {z}) n Șˆ := E {z} = (CcȈ-1C) 1 CcȈ-1 ( y - Aȟˆ ) y
y
10-1 Inhomogeneous general linear Gauss-Markov model
383
are completed by the dispersion matrices and the covariance matrices. 1 -1 -1 ˆ ° ª ȟˆ º ½° ° ª ȟ º °½ ª A cȈ y A A cȈ y Cº ˆ D ȗ := D ® « » ¾ := D ® « =: Ȉȗˆ » = n ¾ « CcȈ-y1A CcȈ-y1C » ¼ ¯° ¬ Șˆ ¼ ¿° ¯° «¬ E {z}»¼ ¿° ¬
{}
{}
1 D ȟˆ = {A cȈ-y1 [I n - C(CcȈ-y1C) -1 CcȈ-y1 ]A} =: Ȉȟˆ
n ˆ Șˆ } = C{ȟˆ , E C{ȟ {z}}} =
= {A cȈ-y1[I n - C(CcȈ-y1C) -1 CcȈ-y1 ]A} A cȈ-y1C(CcȈ-y1C) 1 1
= ( A cȈ-y1A ) -1 A cȈ-y1C {CȈ-y1[I n - A( A cȈ-1y A ) -1 AcȈ-y1 ]C}
-1
n D {Șˆ } := D{E {z}} = {CcȈ-y1 [I n - A ( A cȈ-y1A ) -1 A cȈ-y1 ]C}1 =: ȈȘˆ n D{ȟˆ} = S 1 , D{Șˆ } = D{E {z}} = S 1 A
C
C{ȟˆ , z} = 0 n C{Șˆ , z}:= C{E {z}, z} = 0 , where the “Schur complements” are defined by S A := A cȈ-y1[I n - C(CcȈ-y1C) -1 CcȈ-y1 ]A,
(10.10)
s A := A cȈ-y1[I n - C(CcȈ-y1C) -1 CcȈ-1y ]y
(10.11)
SC := CcȈ-y1[I n - A( A cȈ-y1A ) -1 A cȈ-y1 ]C
(10.12)
sC := CcȈ-y1[I n - A( A cȈ-y1A ) -1 AcȈ-y1 ]y .
(10.13)
Our final result (10.14)-(10.23) summarizes (i) the two forms (10.14) and (10.15) n n {y} and D{E {y}} as derived covariance matrices, (ii) the empirical of estimating E error vector e y and the related variance-covariance matrices (10.19)-(10.21) and (iii) the dispersion matrices D{y} by means of (10.22)-(10.23). n n Lemma 10.4 ( E {y}: ȟˆ , E {z} Ȉ y BLUUE of ȟ and E{z} ): (i) With respect to the mixed Gauss-Markov model (10.1) Ȉ y BLUUE of the E{y} = Aȟ + CE{z} is given by n n E {y} = Aȟˆ + C E {z} = = AS -A1s A + C(CcȈy1C) 1 CcȈy1 ( y AS -A1s A ) or n n E{y} = Aȟˆ + C E {z} = = A ( A cȈy1A ) 1 A cȈy1 ( y AS C-1sC ) + CS-1CsC with the corresponding dispersion matrices
(10.14)
(10.15)
384
10 The fifth problem of probabilistic regression
n n D{E {y}} = D{Aȟˆ + C E {z}} = n n n = AD{ȟˆ}A c + A cov{ȟˆ , E {z}}Cc + C cov{ȟˆ , E {z}}A c + CD{E {z}}Cc n n D{E {y}} = D{Aȟˆ + C E {z}} = = C(CcȈ-y1C) 1 Cc + [I n - C(CcȈ-y1C) -1 CcȈ-y1 ]AS A1A c[I n - Ȉ -y1C(CcȈ -y1C) -1 Cc] n n {y}} = D{Aȟˆ + C E {z}} = D{E = A( A cȈ-y1A) 1 Ac + [I n - A( AcȈ-y1A)-1 AcȈ-y1 ]CS C1Cc[I n - Ȉ-y1A( AcȈ-y1A) -1 Ac], where S A , s A , SC , sC are “Schur complements” (10.10), (10.11), (10.12) and (10.13). n The covariance matrix of E {y} and z amounts to n n cov{E {y}, z} = C{Aȟˆ + C E {z}, z} = 0.
(10.16)
(ii) If the “error vector” e y is empirically determined by means of n the residual vector e y = y E { y} we gain the various representations of type e y = [I n C(CcȈ y1C) 1 CcȈ y1 ]( y AS -A1s A ) or
(10.17)
e y = [I n A ( A cȈy1A ) 1 A cȈy1 ]( y CSC-1sC )
(10.18)
with the corresponding dispersion matrices D{e y } = Ȉ y C(CcȈ-y1C) 1 Cc [I n - C(CcȈ-y1C)-1 CcȈ-y1 ]AS A1A c[I n - Ȉ-y1C(CcȈ-y1C)-1 Cc]
(10.19)
or D{e y } = Ȉ y A( A cȈ -y1A) 1 Ac [I n - A( A cȈ -y1A) -1 A cȈ -y1 ]CS C1Cc[I n - Ȉ -y1A( A cȈ -y1A) -1 Ac],
(10.20)
where S A , s A , SC , sC are “Schur complements” (10.10), (10.11), (10.12) and (10.13). e y and z are uncorrelated because of C{e y , z} = 0.
(10.21)
(iii) The dispersion matrices of the observation vector is given by n n D{y} = D{Aȟˆ + C E {z} + e y } = D{Aȟˆ + C E {z}} + D{e y } (10.22) D{y} = D{e y e y } + D{e y }. n {y} are uncorrelated since e y and E n n C{e y , E {y}} = C{e y , Aȟˆ + C E {z}} = C{e y , e y e y } = 0 .
(10.23)
385
10-2 Explicit representations of errors
10-2 Explicit representations of errors in the general Gauss-Markov model with mixed effects A collection of explicit representations of errors in the general Gauss-Markov model with mixed effects will be presented: ȟ , E{z} , y Ȗ = y , Ȉ z , Ȉ y will be assumed to be unknown, Ȗ known. In addition, C{y, z} will be assumed to vanish. The prediction of random effects will be summarized here. Note our simple model Aȟ + CE{z} = E{y}, E{y} R ([ A, C]), rk[ A, C] = m + A < n , E{z} unknown, ZV 2 = D{z} , Z positive definite rk Z = s d A , VV 2 = D{y Cz} , V positive semidefinite rk V = t d n, rk[V, CZ] = n , C{z, y Cz} = 0 . A homogeneous-quadratic ansatz Vˆ 2 = y cMy will be specified now. Theorem 10.5 (homogeneous-quadratic setup of Vˆ 2 ): (i)
Let Vˆ 2 = y cMy = (vec M)c( y
y ) be BIQUUE of V 2 with respect to the model of the front desk. Then
Vˆ 2 = ( n m A) 1[ y c{I n ( V + CZCc) 1 A[ A c( V + CZCc) 1 A ]1 Ac} ( V + CZCc) 1 y scASC1sC ]
Vˆ 2 = ( n m A) 1 [ y cQ( V + CZCc) 1 y scA S A1sA ] Q := I n (V + CZCc) 1 C[Cc(V + CZCc) 1 C]1 Cc subject to [S A , s A ] := A c( V + CZCc) 1{I n C[Cc( V + CZCc) 1 C]1 Cc( V + CZCc) 1}[ A, y ] = = A cQ( V + CZCc) 1[ A, y ] and [SC , sC ] := Cc( V + CZCc) 1{I n A[ Ac( V + CZCc) 1 A]1 Ac( V + CZCc) 1}[C, y] , where SA and SC are “Schur complements”. Alternately, we receive the empirical data based upon
Vˆ 2 = ( n m A) 1 y c( V + CZCc) 1 e y = = ( n m A) 1 e cy ( V + CZCc) 1 e y and the related variances
386
10 The fifth problem of probabilistic regression
D{Vˆ 2 } = 2( n m A) 1V 4 = 2( n m A) 1 (V 2 ) 2 or replacing by the estimations D{Vˆ 2 } = 2(n m A) 1 (Vˆ 2 ) 2 = 2(n m A) 1[e cy (V + CZCc) 1 e y ]2 . (ii)
If the cofactor matrix V is positive definite, we will find for the simple representations of type BIQUUE of V 2 the equivalent representations ª A cV 1A A cV 1Cº ª A cº 1 Vˆ 2 = ( n m A) 1 y c{V 1 V 1[ A, C] « » A }y 1 1 » « ¬ CcV A CcV C ¼ ¬ Cc ¼
Vˆ 2 = ( n m A) 1 ( y cQV 1 y scA S A1s A ) Vˆ 2 = ( n m A) 1 y cV 1 [I n A ( A cV 1A ) 1 A cV 1 ]( y CSC1sC ) subject to the projection matrix Q = I n V 1C(CcV 1C) 1 Cc and [S A , s A ] := A cV 1 [I n C(CcV 1C) 1 CcV 1 ][ A, y ] = A cQV 1 [ A, y ] [SC , sC ] := {I A + CcV 1[I n A( A cV 1A) 1 A cV 1 ]CZ}1 × ×CcV 1[I n A( AcV 1A) 1 AcV 1 ][C, y ]. Alternatively, we receive the empirical data based upon
Vˆ 2 = ( n m A) 1 y cV 1e y = ( n m A) 1 e cy V 1e y and the related variances D{Vˆ 2 } = 2( n m A) 1V 4 = 2( n m A) 1 (V 2 ) 2 Dˆ {Vˆ 2 } = 2( n m A) 1 (Vˆ 2 ) 2 = 2( n m A) 1 (e cy V 1e y ) 2 . The proofs are straight forward.
10-3 An example for collocation Here we will focus on a special model with fixed effects and random effects, in particular with ȟ , E{z} , E{y} , Ȉ z , Ȉ y unknown, but Ȗ known. We depart in analyzing a height network observed at two epochs. At the initial epoch three height differences have been observed. From the first epochs to the second epoch we assume height differences which change linear in time, for instance caused by an Earthquake. There is a height varying model • hDE (W ) = hDE (0) + hDE (0)W + O (W 2 ),
387
10-3 An example for collocation
where W notes the time difference from the first epoch to the second epoch, related to the height difference. Unknown are the fixed height differences hDE and • the expected values of the random height difference velocities hDE . Given is the singular dispersion matrix of height difference measurements. Alternative estimation and prediction data are of type (V + CZCc) -BLUUE for the unknown parameter ȟ of height difference at initial epoch and the expected data E{z} of stochastic height difference velocities z, of type (V + CZCc) -BLUUE of the expected data E{y} of height difference observations y, of type e y of the empirical error vector of observations and of type (V + CZCc) -BLUUP for the stochastic vector z of height difference velocities. For the unknown variance component V 2 of height difference observations we use estimates of type BIQUUE. In detail, our model assumptions are epoch 1 ª hDE º ª 1 0 º ªh º E{«« hEJ »»} = « 0 1 » « DE » « » hEJ «¬ hJD »¼ ¬ 1 1 ¼ ¬ ¼ epoch 2 ª hDE º ª hDE º ª 1 0 º ª W 0 º « hEJ » » « « « » » « » E{« hEJ »} = 0 1 0 W • « »« » « E{hDE }» «¬ hJD »¼ ¬ 1 1 ¼ ¬ W W ¼ « • » ¬« E{hEJ }¼» epoch 1 and 2 ª hDE º ª 1 0 º ª0 « h » « » «0 EJ 0 1 « » « » ªh º « h « » 1 1 » DE 0 E{« JD »} = « +« kDE 1 0 » «¬ hEJ »¼ « W « « » « » «0 « k EJ » « 0 1 » « W « k » ¬ 1 1 ¼ ¬ ¬ JD ¼ ª hDE º ª1 « h » «0 EJ « » « « hJD » 1 A := « y := « » , kDE «1 « » «0 « k EJ » « 1 « k » ¬ ¬ JD ¼ ªh º ȟ := « DE » ¬ hEJ ¼
0º 0» • » 0 » ª E{hDE }º • » 0 » «« E{hEJ }¼» ¬ W» W »¼ 0º 1» » 1» , 0» 1» 1 »¼
388
10 The fifth problem of probabilistic regression
ª 0 0º « 0 0» « » 0 0» C := « , « W 0» « 0 W» « W W » ¬ ¼
• ª E{hDE }º E{z} = « • » ¬« E{hEJ }¼»
rank identities rk A=2, rk C=2, rk [A,C]=m+l=4. The singular dispersion matrix D{y} = VV 2 of the observation vector y and the singular dispersion matrix D{z} = ZV 2 are determined in the following. We separate 3 cases. (i) rk V=6, rk Z=1 V = I6 , Z =
1 ª1 1º W 2 «¬1 1»¼
(ii) rk V=5, rk Z=2 V = Diag(1, 1, 1, 1, 1, 0) , Z =
1 I 2 , rk(V +CZCc)=6 W2
(iii) rk V=4, rk Z=2 V = Diag(1, 1, 1, 1, 0, 0) , Z =
1 I 2 , rk (V +CZCc)=6 . W2
In order to be as simple as possible we use the time interval W=1. With the numerical values of matrix inversion and of “Schur-complements”, e.g. Table 1: (V +CZCc)-1 Table 2: {I n -A[Ac(V +CZCc)-1 A]-1 Ac(V +CZCc)-1 } Table 3: {I n -C[Cc(V +CZCc)-1C]-1Cc(V +CZCc)-1 } Table 4: “Schur-complements” SA, SC Table 5: vectors sA, sC 1st case:
n n ȟˆ , D{ȟˆ} , E {z} , D{E {y}}
389
10-3 An example for collocation
1 ª 2 1 1 0 0 0 º 1 ª 2 y + y2 y3 º , ȟˆ = « y= « 1 » 3 ¬1 2 1 0 0 0¼ 3 ¬ y1 + 2 y2 + y3 »¼
V 2 ª2 1º , D{ȟˆ} = 3 «¬1 2 »¼ ª 2 y1 y2 + y3 + 2 y4 + y5 y6 º ª 2 1 1 2 1 1º n , E {z} = « y=« » 1 2 1 1 2 1 ¬ ¼ ¬ y1 2 y2 y3 + y4 + 2 y5 + y6 »¼ 2
V n D{E {z}} = 3 2nd case:
ª7 5º «¬ 5 7 »¼ ,
n n ȟˆ , D{ȟˆ} , E {z} , D{E {z}}
1 ª 2 1 1 0 0 0 º 1 ª 2 y + y2 y3 º , ȟˆ = « y= « 1 » 3 ¬1 2 1 0 0 0¼ 3 ¬ y1 + 2 y2 + y3 »¼
V 2 ª2 1º , D{ȟˆ} = 3 «¬1 2 »¼ ª 4 2 2 3 3 3º n E {z} = « y, ¬ 2 4 2 3 3 3 »¼
V 2 ª13 5 º n , D{E {z}} = 6 «¬ 5 13»¼ 3rd case:
n n {z} , D{E {z}} ȟˆ , D{ȟˆ} , E
1 ª 2 1 1 0 0 0 º 1 ª 2 y + y2 y3 º , ȟˆ = « y= « 1 » 1 2 1 0 0 0 3¬ 3 ¬ y1 + 2 y2 + y3 »¼ ¼
V 2 ª2 1º , D{ȟˆ} = 3 «¬1 2 »¼ ª 2 1 1 3 0 0 º 1 ª 2 y1 y2 + y3 + 3 y4 º n , E {z} = « ¬ 1 2 1 0 3 0 »¼ 3 «¬ y1 2 y2 y3 + 3 y5 »¼ 2
V n D{E {z}} = 3
ª5 1 º «¬1 5»¼ .
Table 1: Matrix inverse (V +CZCc)-1 for a mixed Gauss-Markov model with fixed and random effects V +CZCc
(V +CZCc)-1
390
10 The fifth problem of probabilistic regression
1st case
ª1 «0 « «0 «0 «0 «0 ¬
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 2 1 0
0 0 0 1 2 0
0º 0» » 0» 0» 0» 1 »¼
ª3 «0 1 ««0 3 «0 «0 «0 ¬
0 3 0 0 0 0
0 0 0 0 3 0 0 2 0 1 0 0
0 0 0 1 2 0
0 º 0 » » 0 » 0» 0» 3»¼
0 4 0 0 0 0
0 0 0 0 4 0 0 3 0 1 0 2
0 0 0 1 3 2
0 0 0
2nd case
ª1 «0 « «0 «0 «0 «0 ¬
0 1 0 0 0 0
0 0 0 0 1 0 0 2 0 0 0 1
0 0 0 0 2 1
0 º 0 » » 0 » 1» 1» 2 »¼
ª4 «0 1 «« 0 4 «0 «0 «0 ¬
3rd case
ª1 «0 « «0 «0 «0 «0 ¬
0 1 0 0 0 0
0 0 0 0 1 0 0 1 0 0 0 1
0 0 0 0 1 1
0 º 0 » » 0 » 1» 1» 3 »¼
ª1 «0 « «0 «0 «0 «0 ¬
0 1 0 0 0 0
0 0 0 0 1 0 0 2 0 1 0 1
0 0 0 1 2 1
º » » » 2» 2 » 4 »¼
0 º 0 » » 0 » 1» 1» 1 »¼
Table 2: Matrices {I n +[ Ac( V +CZCc) -1 A]1 Ac(V +CZCc) -1} for a mixed Gauss-Markov model with fixed and random effects {I n -A[ A c(V +CZCc) -1 A]-1 Ac(V +CZCc) -1 } 1st case
ª 13 7 « 7 13 1 «« 4 4 24 « 11 7 « 7 11 « 4 4 ¬
4 4 16 4 4 8
5 1 4 19 1 4
1 5 4 1 19 4
4º 4 » » 8 » 4» 4 » 16 »¼
2nd case
6 ª 13 5 « 5 13 6 1 «« 6 6 12 6 24 « 11 5 « 5 11 6 « 6 6 12 ¬
4 4 0 20 4 0
4 4 0 4 20 0
3º 3 » » 6 » 3» 3 » 18 »¼
391
10-3 An example for collocation
3rd case
ª5 « 1 1 «« 2 8 « 3 « 1 «2 ¬
1 2 3 1 0 º 5 2 1 3 0 » » 2 4 2 2 0 » 1 2 5 1 0 » 3 2 1 5 0 » 2 4 2 2 8 »¼
Table 3: Matrices {I n C[Cc(V +CZCc)-1Cc]1 Cc(V +CZCc)-1 } for a mixed Gauss-Markov model with fixed and random effects {I n -C[Cc(V +CZCc)-1C]-1Cc(V +CZCc)-1 } 1st case
ª3 «0 1 «« 0 3 «0 «0 «0 ¬
2nd case
ª2 «0 « «0 «0 «0 «0 ¬
3rd case
ª1 «0 « «0 «0 «0 «0 ¬
0 3 0 0 0 0 0 2 0 0 0 0
0 0 0 0 0 0 3 0 0 0 1 1 0 1 1 0 0 0
0º 0» » 0» 1» 1» 0»¼
0 0 0 0º 0 0 0 0» » 2 0 0 0» 0 1 1 1 » 0 1 1 1» 0 0 0 0 »¼
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1
0º 0» » 0» 0» 0» 1 »¼
Table 4: “Schur-complements” SA, SC , three cases, for a mixed Gauss-Markov model with fixed and random effects SA (10.10)
S A1
SC (10.12)
SC1
392
10 The fifth problem of probabilistic regression
1st case
ª 2 1º «¬ 1 2 »¼
1 ª2 1º 3 «¬1 2 »¼
1 ª 7 5º 8 «¬ 5 7 »¼
1 ª7 5º 3 «¬ 5 7 »¼
2nd case
ª 2 1º «¬ 1 2 »¼
1 ª2 1º 3 «¬1 2 »¼
1 ª13 5º 24 «¬ 5 13 »¼
1 ª13 5 º 6 «¬ 5 13»¼
3rd case
ª 2 1º «¬ 1 2 »¼
1 ª2 1º 3 «¬1 2 »¼
1 ª 5 1º 8 «¬ 1 5 »¼
1 ª5 1 º 3 «¬1 5»¼
Table 5: Vectors sA and sC, three cases, for a mixed Gauss-Markov model with fixed and random effects sA sC (10.11) (10.13) 1st case
ª1 0 1 0 0 0 º «0 1 1 0 0 0 » y ¬ ¼
9 3 12 º 1 ª 9 3 12 y « 8 ¬ 3 9 12 3 9 12 »¼
2nd case ª1 0 1 0 0 0 º «0 1 1 0 0 0 » y ¬ ¼
1 ª 7 1 6 4 4 9 º y 24 «¬ 1 7 6 4 4 9 »¼
3rd case
1 ª 3 1 2 5 1 0 º y 8 «¬ 1 3 2 1 5 0 »¼
ª1 0 1 0 0 0 º «0 1 1 0 0 0 » y ¬ ¼
First case: n n ȟˆ , D{ȟˆ} , E {z} , D{E {z}} Second case: n n ȟˆ , D{ȟˆ} , E {z} , D{E {z}} Third case: n n ȟˆ , D{ȟˆ} , E {z} , D{E {z}}. Here are the results of computing n n E {y} , D{E {z}}, e y and D{e y } , ordered as case 1, case 2, and case 3.
393
10-3 An example for collocation
Table 6: Numerical values, Case 1 n n {y} , D{E {y}} , e y , D{e y } 1st case: E 1
ª2 «1 « 1 n E {y} = « «0 «0 «0 ¬
1 1 0 2 1 0 1 2 0 0 0 2 0 0 1 0 0 1
ª2 «1 2 « V « 1 n D{E {y}} = 3 «0 «0 «0 ¬ ª 1 « 5 1 «« 8 e y = 12 « 11 «7 « 4 ¬
0 0º ª 2 y1 + y2 y3 º « y1 + 2 y2 + y3 » » 0 0 « » » y + 2 y2 + 2 y3 » 0 0» y=« 1 1 1» « 2 y4 + y5 y6 » « y4 + 2 y5 y6 » » 2 1 « y + y + 2y » » 1 2¼ 5 6 ¼ ¬ 4 1 1 0 2 1 0 1 2 0 0 0 5 0 0 4 0 0 1
0 0º 0 0» » 0 0» 4 1» 5 1» 1 2 »¼
5 8 5 1 4 º 1 8 1 5 4» » 8 4 4 4 8» 7 4 7 11 8 » 11 4 11 7 8» 4 8 8 8 4 »¼
ª 1 1 1 « 1 1 1 2 « V « 1 1 1 D{e y } = 3 «0 0 0 «0 0 0 «0 0 0 ¬
0 0 0 0 0 0 1 1 1 1 1 1
0º 0» » 0» . 1» 1» 1 »¼
Table 7 Numerical values, Case 2 n n {y} , D{E {y}} , e y , D{e y } 2nd case: E 1
ª4 «2 « 2 1 n E {y} = « 6« 0 «0 ¬0
2 2 4 2 2 4 0 0 0 0 0 0
0 0 0 3 3 0
0 0º 0 0» 0 0» 3 3» » 3 3» 0 6¼
394
10 The fifth problem of probabilistic regression
ª2 «1 2 « 1 V n D{E {y}} = « 3 «0 «0 ¬0
1 1 0 2 1 0 1 2 0 0 0 5 0 0 4 0 0 1
0 0º 0 0» 0 0» 4 1» » 5 1» 1 2¼
ª 2 y1 2 y2 + 2 y3 º ª 2 2 2 0 0 0 º « 2 y1 + 2 y2 2 y3 » « 2 2 2 0 0 0 » « » 1 «« 2 2 2 0 0 0 »» 2 y 2 y2 + 2 y3 » e y = y=« 1 « 3 y4 3 y5 + 3 y6 » 6 « 0 0 0 3 3 3 » « 3 y4 3 y5 + 3 y6 » « 0 0 0 3 3 3» « » «0 0 0 0 0 0» ¬ ¼ ¬ 3 y4 + 3 y5 3 y6 ¼ ª 2 2 2 0 0 « 2 2 2 0 0 2 « V « 2 2 2 0 0 D{e y } = 6 « 0 0 0 3 3 « 0 0 0 3 3 «0 0 0 0 0 ¬
0º 0» » 0» 0» 0» 0 »¼
n n {y} , D{E {y}} , e y , D{e y } 1st case: E ª2 «1 « 1 n E {y} = « «0 «0 «0 ¬
1 1 0 2 1 0 1 2 0 0 0 2 0 0 1 0 0 1
ª2 «1 2 « V « 1 n D{E {y}} = 3 «0 «0 «0 ¬ ª 1 « 5 1 «8 e y = « 12 « 11 «7 « 4 ¬
0 0º ª 2 y1 + y2 y3 º « y1 + 2 y2 + y3 » 0 0» « » » y + 2 y2 + 2 y3 » 0 0» y=« 1 1 1» « 2 y4 + y5 y6 » « y4 + 2 y5 y6 » » 2 1 « y + y + 2y » » 1 2¼ 5 6 ¼ ¬ 4 1 1 0 2 1 0 1 2 0 0 0 5 0 0 4 0 0 1
0 0º 0 0» » 0 0» 4 1» 5 1» 1 2 »¼
5 8 5 1 4 º 1 8 1 5 4» » 8 4 4 4 8» 7 4 7 11 8 » 11 4 11 7 8» 4 8 8 8 4 »¼
395
10-3 An example for collocation
ª 1 1 1 « 1 1 1 2 « V « 1 1 1 D{e y } = 3 «0 0 0 «0 0 0 «0 0 0 ¬
0 0 0 0 0 0 1 1 1 1 1 1
0º 0» » 0» . 1» 1» 1 »¼
Table 8 Numerical values, Case 3 n n {y} , D{E {y}} , e y , D{e y } 3rd case: E ª0 0 0 0 «0 0 0 0 1« 0 0 0 0 n E {y} = « 3 « 2 1 1 0 « 1 2 1 0 « 1 1 0 3 ¬ ª2 «1 2 « V « 1 n D{E {y}} = 3 «0 «0 «0 ¬ ª 1 1 1 « 1 1 1 1 « 1 1 1 e y = « 3« 0 0 0 «0 0 0 «0 0 0 ¬
0 0 0 0 0 0
1 1 0 2 1 0 1 2 0 0 0 3 0 0 0 0 0 3 0 0 0 0 0 0
0 0 0 0 3 3
0º 0» » 0» 0» 0» 0»¼ 0 0º 0 0» » 0 0» 0 3» 3 3» 3 6 »¼
0º ª y1 y2 + y3 º « y1 + 2 y2 y3 » 0» » « » 0» y y2 + y3 » y=« 1 0» 0 « » « » 0» 0 » « » 0¼ 0 ¬ ¼
ª 1 1 1 « 1 1 1 V 2 «« 1 1 1 D{e y } = 3 «0 0 0 «0 0 0 «0 0 0 ¬
0 0 0 0 0 0
0 0 0 0 0 0
0º 0» » 0» . 0» 0» 0 »¼
396
10 The fifth problem of probabilistic regression
Table 9 Data of type z , D{z} for 3 cases 1st case: 1 ª 2 1 1 2 1 1º z = « y, 3 ¬ 1 2 1 1 2 2 »¼ D{z} =
V ª7 5 º , 3 «¬ 5 7 »¼
2nd case: 1 ª 4 2 2 3 3 3º z = « y, 3 ¬ 2 4 2 3 3 3 »¼ D{z} =
V ª13 5 º , 6 «¬ 5 13»¼
3rd case: 1 ª 2 1 1 3 0 0 º z = « y, 3 ¬ 1 2 1 0 3 0 »¼ D{z} =
V ª5 1 º , 3 «¬1 5»¼
Table 10 Data of type Vˆ 2 , D{Vˆ 2 }, Dˆ {Vˆ 2 } for 3 cases 1st case: n=6, m=2, A = 2 , n m A = 2 ª7 « 1 1 « 2 Vˆ 2 = y c « 12 « 5 « 1 «4 ¬
1 7 2 1 5 4
2 2 10 4 4 8
ª7 « 1 « 1 2 Dˆ {Vˆ 2 } = {y c « 144 « 5 « 1 «4 ¬
5 1 4 7 1 2 1 7 2 1 5 4
1 5 4 1 7 2 2 2 10 4 4 8
4º 4 » » 8 » y , D{Vˆ 2 } = V 4 , 2 » 2» 10 »¼ 5 1 4 7 1 2
1 5 4 1 7 2
2nd case: n=6, m=2, A = 2 , n m A = 2
4º 4 » » 8 » 2 y} , 2 » 2» 10 »¼
397
10-4 Comments
ª 2 2 2 0 0 0 º « 2 2 2 0 0 0 » 1 « 2 2 2 0 0 0 » 2 4 2 Vˆ = y c « » y , D{Vˆ } = V , 12 « 0 0 0 3 3 3 » « 0 0 0 3 3 3» ¬ 0 0 0 3 3 3 ¼ ª2 « 2 «2 1 2 ˆ D{Vˆ } = {y c « 144 « 0 «0 ¬0 3rd case: n=6, m=2,
2 2 0 0 0 º 2 2 0 0 0 » 2 2 0 0 0 » 2 y} , 0 0 3 3 3 » » 0 0 3 3 3» 0 0 3 3 3 ¼ A = 2, nmA = 2
ª 1 1 1 « 1 1 1 « 1 1 1 1 Vˆ 2 = y c « 6 «0 0 0 «0 0 0 ¬0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0º 0» 0» y , D{Vˆ 2 } = V 4 , 0» » 0» 0¼
ª 1 1 1 « 1 1 1 « 1 1 1 1 2 Dˆ {Vˆ } = {y c « 144 « 0 0 0 «0 0 0 ¬0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0º 0» 0» 2 y} . 0» » 0» 0¼
Here is my journey’s end.
10-4 Comments (i) In their original contribution A. N. Kolmogorov (1941 a, b, c) and N. Wiener (“yellow devil”, 1939) did not depart from our general setup of fixed effects and random effects. Instead they departed from the model q
“ yPD = cPD + ¦ cPD QE yQE + E =1
q
¦c E J , =1
PD QE QJ
yQE yQJ + O ( y 3 ) ”
of nonlinear prediction, e. g. homogeneous linear prediction yP = yQ cPQ + yQ cPQ + " + yQ cPQ yP = yQ cP Q + yQ cP Q + " + yQ cP Q … yP = yQ cP Q + yQ cP Q + " + yQ cP Q 1
1
1 1
2
1
2
q
1
2
1
2 1
2
2
2
q
2
p 1
p 1 1
1
p 1
2
q
q
p 1 q
q
2
yP = yQ cP Q + yQ cP Q + " + yQ cP Q . p
1
p
1
2
p
2
q
p
q
398
10 The fifth problem of probabilistic regression
From given values of “random effects” ( yQ ," , yQ ) other values of “random effects” ( yP ," , yP ) have been predicted, namely under the assumption of “equal correlation” P of type p
1
p
1
P1
P2 = Q1
Q2“
n E { yP } = C(CȈ y1 C) 1 CȈ y1 yP P
P
n D{E { yP }} = C(CȈ y1 C) 1 Cc P
or for all yP { yP ," , yP } 1
P
|| yP yˆ P ||:= E{( yP yˆ P )}2 d min , Qq
E{( yP yˆ P ) 2 } = E{( y2i ¦ y1c j cij ) 2 } = min( KW ) Q1
for homogeneous linear prediction. “ansatz” E{ yP E{ yˆ P }} := 0 , E{( yP E{ yˆ P })( yP E{ yˆ P })} = cov( yP , yP ) 1
E{( yP yˆ P ) 2 } = cov( P, P )
Q = Qq
¦c
pj
Q = Q1
1
2
2
1
2
cov( P, Q j ) + ¦¦ c pj c pk cov(Q j , Qk ) = min Qj
Qk
“Kolmogorov-Wiener prediction” k =q
c pj ( KW ) = ¦ [ cov(Q j , Qk )]1 cov(Qk , P) k =1
q
q
E{( yP yˆ P ) 2 | KW } = cov( P, P) ¦¦ cov( P, Q j ) cov( P, Qk )[cov(Q j , Qk )]1 j =1 k =1
constrained to | Q j Qk | Qk
Qj cov(Q j , Qk ) = cov(Q j Qk )
“ yP E{ yP } is weakly translational invariant” KW prediction suffers from the effect that “apriori” we know only one realization of the random function y (Q1 )," , y (Qq ) , for instance.
10-4 Comments
399 1 n cov( W) = NG
¦
( yQ E{ yQ })( yQ E{ yQ }). j
j
k
k
|Q j Qk |
Modified versions of the KW prediction exist if we work with random fields in the plane (weak isotropy) or on the plane (rotational invariances). As a model of “random effects” we may write
¦c
PD QE
E{ yQE } = E{ yPD } CE{z} = E{ y}.
QE
(ii) The first model applies if we want to predict data of one type to predict data of the same type. Indeed, we have to generalize if we want to predict, for instance, vertical deflections, gravity gradients, gravity values,
from gravity disturbances.
The second model has to start from relating one set of heterogeneous data to another set of heterogeneous data. In the case we have to relate the various data sets to each other. An obvious alternative setup is
¦c
1 PD QE
QE
E{z1QE } + ¦ c2 PD RJ E{z2 RJ } + " = E{ yPD } . RJ
(iii) The level of collocation is reached if we include a trend model in addition to Kolmogorov-Wiener prediction, namely E{y} = Aȟ + CE{z} , the trend being represented by Aȟ . The decomposition of „trend“ and “signal” is well represented in E. Grafarend (1976), E. Groten (1970), E. Groten and H. Moritz (1964), S. Heitz (1967, 1968, 1969), S. Heitz and C.C. Tscherning (1972), R.A. Hirvonen (1956, 1962), S. K. Jordan (1972 a, b, c, 1973), W.M. Kaula (1959, 1963, 1966 a, b, c, 1971), K.R. Koch (1973 a, b), K.R. Koch and S. Lauer (1971), L. Kubackova (1973, 1974, 1975), S. Lauer (1971 a, b), S.L. Lauritzen (1972, 1973, 1975), D. Lelgemann (1972, 1974), P. Meissl (1970, 1971), in particular H. Moritz (1961, 1962, 1963 a, b, c, d, 1964, 1965, 1967, 1969, 1970 a, b, c, d, e, 1971, 1972, 1973 a, b, c, d, e, f, 1974 a, b, 1975), H. Moritz and K.P. Schwarz (1973), W. Mundt (1969), P. Naicho (1967, 1968), G. Obenson (1968, 1970), A.M. Obuchow (1947, 1954), I. Parzen (1960, 1963, 1972), L.P. Pellinen (1966, 1970), V.S. Pugachev (1962), C.R. Rao (1971, 1972, 1973 a, b), R. Rupp (1962, 1963, 1964 a, b, 1966 a, b, c, 1972, 1973 a, b, c, 1974, 1975), H. P. Robertson (1940), M. Rosenblatt (1959, 1966), R. Rummel (1975 a, b), U. Schatz (1970), I. Schoenberg (1942), W. Schwahn (1973, 1975), K.P. Schwarz (1972,
400
10 The fifth problem of probabilistic regression
1974 a, b, c, 1975 a, b, c), G. Seeber (1972), H.S. Shapiro (1970), L. Shaw et al (1969), L. Sjoeberg (1975), G. N. Smith (1974), F. Sobel (1970), G.F. Taylor (1935, 1938), C. Tscherning (1972, a, b, 1973, 1974 a, b, 1975 a, b, c, d, e), C. Tscherning and R.H Rapp (1974), V. Vyskocil (1967, 1974 a, b, c), P. Whittle (1963 a, b), N. Wiener (1958, 1964), H. Wolf (1969, 1974), E. Wong (1969), E. Wong and J.B. Thoma (1961), A.M. Yaglom (1959, 1961). (iv) An interesting comparison is the various solutions of type •
ˆ yˆ 2 = lˆ + Ly 1
•
ˆ yˆ 2 = Ly 1
(best homogeneous linear prediction)
•
ˆ yˆ 2 = Ly 1
(best homogeneous linear unbiased prediction)
(best inhomogeneous linear prediction)
dispersion identities D3 d D2 d D4 . (v) In spite of the effect that “trend components” and “KW prediction” may serve well the needs of an analyst, generalizations are obvious. For instance, in Krige’s prediction concept it is postulated that only || y p y p ||2 = : E{( y p y p ( E{ y p } E{ y p })) 2 } 1
2
1
2
1
2
is a weakly relative translational invariant stochastic process. A.N. Kolmogorov has called the weakly relation translational invariant random function “structure function” Alternatively, higher order variance-covariance functions have been proposed: || y1 , y2 , y3 ||2 = : E{( y1 2 y2 + y3 ) ( E{ y1 } 2 E{ y2 } + E{ y3 }) 2 } || y1 , y2 , y3 , y4 ||2 = : E{( y1 3 y2 + 3 y3 y4 ) ( E{ y1} 3E{ y2 } + 3E{ y3 } E{ y 4 }) 2 } etc.. (vi) Another alternative has been the construction of higher order absolute variancecovariance functions of type || ( yQ E{ yQ }) ( yQ E{ yQ }) ( yQ E{ yQ }) || 1
1
2
2
3
3
|| ( yQ E{ yQ }) (...) ( yQ E{ yQ }) || 1
1
n
n
like in E. Grafarend (1984) derived from the characteristic functional, namely a series expression of higher order variance-covariance functions.
11 The sixth problem of probabilistic regression – the random effect model – “errors-in-variables” “In difference to classical regression error-in-variables models here measurements occurs in the regressors. The naive use of regression estimators leads to severe bias in this situation. There are consistent estimators like the total least squares estimator (TLS) and the moment estimator (MME). J. Polzehl and S. Zwanzig Department of Mathematics Uppsala University, 2003 A.D.” Read only Definition 11.1 and Lemma 11.2 Please, pay attention to the guideline of Chapter 11, namely to Figure 11.1, 11.2 and 11.3, and the Chapter 11.3: References
Definition 11.1 “random effect model: errors-in variables”
Lemma 11.2 ”error-in-variables model, normal equations”
Figure 11.1: Magic triangle
402
11 The sixth problem of probabilistic regression
By means of Figure 11.1 we review the mixed model (fixed effects plus random effects), total least squares (fixed effects plus “errors-in-variables”) and a special type of the mixed model which superimposes random effects and “errors-invariables”. Here we will concentrate on the model “errors-in-variables”. In the context of the general probabilistic regression problem E{y} = Aȟ + &E{]} + E{;}Ȗ , we specialize here to the model “errors in variables”, namely E{y} = E{X}Ȗ in which y as well as X (vector y and matrix X) are unknown. A simple example is the straight line fit, abbreviated by “y = ax + b1”. (x, y) is assumed to be measured, in detail E{y} = aE{x} + b1 = ªa º = [ E{x}, 1] « » ¬b ¼ 2 E{(x E{x}) } z 0, E{(y E{y})2 } z 0 but Cov{x, y} = 0. Note Ȗ1 := a, Ȗ 2 := b E{y} = y ey , E{x} = x ex Cov{ex , ey } = 0 and ªȖ º y e y = [x e x , 1] « 1 » ¬Ȗ2 ¼ y = xȖ1 + 1Ȗ 2 e x Ȗ1 + e y .
403
11 The sixth problem of probabilistic regression
constrained by the Lagrangean
L ( Ȗ1 , Ȗ 2 , e x , e y , Ȝ ) =: = ecx Px e x + ecy Py e y + 2Ȝ c( y xȖ1 1Ȗ 2 + e x Ȗ1 e y ) = =
min
Ȗ1 , Ȗ 2 , ex , e y , Ȝ
.
The first derivatives constitute the necessary conditions, namely 1 wL 2 wȖ1 1 wL 2 wȖ 2 1 wL 2 we x 1 wL 2 we y 1 wL 2 wȜ
= x cȜ = 0, = 1cȜ = 0, = Px e x + Ȗ1Ȝ = 0, = Py e y 2 = 0, = y xȖ1 1Ȗ 2 + e x Ȗ1 e y = 0,
while the second derivatives “ z 0 ” refer the sufficiency conditions. Figure 11.2 is a geometric interpretation of the nonlinear model of type “errorsin-variables”, namely the straight line fit of total least squares.
y
• • •
( E{x}, E{ y}) •
•
ey
• •
ex
P ( x, y )
•
x
Figure11.2. The straight line fit of total least squares ( E{y} = a E{x} + 1b, E{x} = x e x , E{y} = y e y )
404
11 The sixth problem of probabilistic regression
An alternative model for total least squares is the rewrite E{y} = y e y = Ȗ1 E{x} + 1Ȗ 2 y 1Ȗ 2 = Ȗ1 E{x} + e y E{x} = x e x
x = E{x} + e x , for instance
ªe y º ª y 1Ȗ 2 º ª Ȗ1, n º « x » = « I » E{x} + «e » , ¬ ¼ ¬ n ¼ ¬ x¼ we get for Ȉ - BLUUE of E{x} ª y 1Ȗ 2 º n E {x} = ( A cȈ 1A ) 1 A cȈ 1 « » ¬ x ¼ subject to ªȖ I º ªȈ A := « 1 n » , Ȉ1 = « yy ¬ 0 ¬ In ¼
1
0 º ªP =« y » Ȉxx ¼ ¬0
0º . Px »¼
We can further solve the optimality problem using the Frobenius Ȉ - seminorms: n n ª y 1Ȗ 2 Ȗ1 E {x}º 1 ª y 1Ȗ 2 Ȗ1 E {x}º « »Ȉ « » n n «¬ »¼ «¬ »¼ xE {x} xE {x} ªeˆ º ª¬eˆ cy , eˆ cx º¼ Ȉ 1 « y » = eˆ cx Px eˆ x + eˆ cy Py eˆ y = min . Ȗ ,Ȗ ¬eˆ x ¼ 1
2
11-1 Solving the nonlinear system of the model “errors-invariables” First, we define the random effect model of type “errors-in-variables” subject to the minimum condition i ycWy i y + tr I 'X WX I X = min . Second, we form the derivations, the partial derivations i ycWy i y + tr I 'X WX I X + 2Ȝ c{ y XJ + I XJ i y } , the neccessary conditions for obtaining the minimum. Definition 11.1
(the random effect model: errors-invariables)
The nonlinear model of type “errors-in-variables” is solved by “total least squares” based on the risk function (11.1)
y = E{y} + e y
(11.3)
X = E{X} + E X
~ ~ subject to
y = y0 + iy
(11.2)
X = X 0 + IX
(11.4)
11-1 Solving the nonlinear system of the model “errors-in-variables”
405
y0 \n
(11.6)
X0 \ nxm
(11.8)
(11.5)
E{y} \ n
(11.7)
E{X} \ nxm
~ rk E{X} = m ~ rk X0 = m
(11.9)
and n t m
(11.10)
L1 := ¦ wii i2 ii1 ii2
and
L1 := icy Wi y
and
L2 :=
i1 ,i2
¦
i1 , k1 , k2
wk1k2 ii1k1 ii1k2
L2 := tr I 'X WX I X
L1 + L2 = icy Wy i y + tr I 'X WX I X = min y0 ,X0
L =: || i y ||2 W + || I X ||2 W y
X
(11.11)
subject to y X Ȗ + I X Ȗ i y = 0.
(11.12)
The result of the minimization process is given by Lemma 11.2: Lemma 11.2 (error-in-variables model, normal equations): The risk function of the model “errors-in-variables” is minimal, if and only if 1 wL = X cȜ A + I'X Ȝ A = 0 2 wȖ
(11.13)
1 wL = Wy i y Ȝ A = 0 2 wi y
(11.14)
1 wL = WX I X + ȖȜ cA 2 wI X
(11.15)
1 wL = y X y + IX Ȗ i y = 0 2 wȜ
(11.16)
and det (
w2L ) t 0. wJ i wJ j :Proof:
First, we begin with the modified risk function
(11.17)
406
11 The sixth problem of probabilistic regression
|| i y ||2 W + || I X ||2 +2 Ȝ c( y XȖ + I X Ȗ i y ) = min , y
where the minimum condition is extended over y, X, i y , I X , Ȝ , when Ȝ denotes the Lagrange parameter. icy Wy i y + tr I'X WX I X + 2Ȝ '( y XȖ + I X Ȗ i y ) = min
(11.18)
if and only if 1 wL = X cȜ A + I'X Ȝ A = 0 2 wȖ 1 wL = Wy i y Ȝ A 2 wi y 1 wL = WX I X = 0 2 w (tr I'X WX I X ) 1 wL = y XȖ + I X Ȗ i y = 0 2 wȜ A and
(11.19) (11.20)
(11.21)
(11.22)
w2L positive semidefinite. wȖ wȖ ' The first derivatives guarantee the necessity of the solution, while the second derivatives being positive semidefinite assure the sufficient condition. (11.23)
Second, we have the nonlinear equations, namely (11.24)
( X c + I'X ) Ȝ A = 0
(11.25)
Wy i y Ȝ A = 0
(11.26)
WX I X = 0
(11.27)
y XȖ + I X Ȗ i y = 0 ,
(bilinear) (linear) (bilinear) (bilinear)
which is a problem outside our orbit-of-interest. An example is given in the next chapter. Consult the literature list at the end of this chapter.
11-2 Example: The straight line fit Our example is based upon the “straight line fit” " y = ax + b1 ",
407
11-2 Example: The straight line fit
where (x, y) has been measured, in detail E{y} = a E{x} + b 1 = ªa º = [ E{x}, 1] « » ¬ b¼ or ªJ º J 1 := a, J 2 := b, xJ 1 + 1J 2 = [ x, 1] « 1 » ¬J 2 ¼ and y xJ 1 1J 2 + ecx J 1 ecy = 0. ( J 1 , J 2 ) are the two unknowns in the parameter space. It has to be noted that the term exJ 1 includes two coupled unknowns, namely e x and J 1 . Second, we formulate the modified method of least squares.
L (J 1 , J 2 , e x , e y , Ȝ ) = = icWi + 2( y c x cJ 1 1J 2 + i 'x J 1 i ' y )Ȝ = = icWi + 2Ȝ c( y xJ 1 1J 2 + i 'x J 1 i y ) or i cy Wy i y + i cx Wx i x + +2(y c xcJ 1 1cJ 2 + i cxJ 1 i cy )Ȝ. Third, we present the necessary and sufficient conditions for obtaining the minimum of the modified method of least squares. (11.28) (11.29) (11.30)
1 wL = x cȜ A + icx Ȝ A = 0 2 wJ 1 1 wL = 1cȜ A = 0 2 wJ 2 1 wL = Wy i y Ȝ A = 0 2 wi y
(11.31)
1 wL = Wx i x + Ȝ lJ 1 = 0 2 wi x
(11.32)
1 wL = y xJ 1 1J 2 + i xJ i y = 0 2 wȜ and ª w 2 L / w 2J 1 det « 2 ¬w L / wJ 1wJ 2
w 2 L / wJ 1wJ 2 w 2 L / w 2J 2
º » t 0. ¼
(11.33)
408
11 The sixth problem of probabilistic regression
Indeed, these conditions are necessary and sufficient for obtaining the minimum of the modified method of least squares. By Gauss elimination we receive the results (11.34)
(-x c + icx ) Ȝ A = 0
(11.35)
Ȝ1 + ... + Ȝ n = 0
(11.36)
Wy i y = Ȝ A
(11.37)
Wx i x = Ȝ A Ȗ1
(11.38)
Wy y = Wy xJ 1 + Wy 1J 2 Wy i xJ 1 + Wy i y or Wy y = Wy xJ 1 Wy 1J 2 ( I x J 12 ) Ȝ A = 0
(11.39)
if Wy = Wx = W and
(11.40)
xcWy xcWxJ 1 xcW1J 2 xc(I x J 12 )Ȝ A = 0
(11.41)
y cWy y cWxJ 1 y cW1J 2 y c( I x J 12 ) Ȝ A = 0
(11.42)
+ x cȜ l = + i 'x Ȝ A
(11.43)
Ȝ , +... + Ȝ n = 0 ª x cº ª y cº ª x cº ª x cº 2 « y c» Wy « x c» WxJ 1 « y c» W1J 2 « y c» ( I x J 1 ) Ȝ A = 0 ¬ ¼ ¬ ¼ ¬ ¼ ¬ ¼
(11.44)
subject to Ȝ1 + ... + Ȝ n = 0,
(11.45)
x1c Ȝ l = i cx Ȝ l .
(11.46) Let us iterate the solution. ª0 «0 « «0 « «0 «¬ x n
0 0 0
0 0 Wx
0 0 0
0 1n
0 J 1I n
Wy In
(xcn i cx ) º 1cn »» J 1I n » » I n » 0 »¼
ªJ1 º ª 0 º «J » « 0 » « 2» « » « ix » = « 0 » . « » « » « iy » « 0 » «¬ Ȝ A »¼ «¬ y n »¼
We meet again the problem that the nonlinear terms J 1 and i x appear. Our iteration is based on the initial data
409
11-2 Example: The straight line fit
(i ) Wx = Wy = W = I n , (ii) (i x ) 0 = 0, (iii) (J 1 ) n = J 1 (y (i y ) 0 = x(J 1 ) 0 + 1n (J 2 ) 0 ) in general ª0 «0 « «0 « «0 «xn ¬
0 0 0 0 1n
0 0 Wx 0 J 1(i ) I n
0 0 0 Wy In
(xc i cx ) º 1 »» J 1(i ) I n » » I n » 0 »¼
ª J 1(i +1) º ª 0 º «J » « » « 2(i +1) » « 0 » « i x (i +1) » = « 0 » « » « » « i y (i +1) » « 0 » « Ȝ l (i +1) » « y » ¬ ¼ ¬ n¼
x1 = J 1 , x2 = J 1 , x3 = i x , x4 = i y , x5 = Ȝ A . The five unknowns have led us to the example within Figure 11.3.
(1, 0.5)
(7, 5.5) (10, 5.05) (11, 9)
Figure 11.3: General linear regression Our solutions are collected as follows:
J 1 : 0.752, 6 0.866,3 0.781, 6 J 2 : 0.152, 0 -1.201,6 -0.244, 6 case : case : our general V y (i ) = 0 V X (i ) = 0 results after V X (i ) z 0
V y (i ) z 0
iteration.
11-3 References Please, contact the following references. Abatzoglou, J.T. and Mendel, J.M. (1987),,Abatzoglou, J.T. and Mendel, J.M. and Harada, G.A. (1991), Bajen, M.T., Puchal, T., Gonzales, A., Gringo, J.M., Castelao, A ., Mora, J. and Comin, M. (1997), Berry, S.M., Carroll, R.J. and Ruppert, D. (2002), Björck, A. (1996), Björck, A. (1997), Björck, A., Elfving, T. and Strakos, Z. (1998), Björck, A., Heggernes, P. and Matstoms, P. (2000), Bojanczyk, A.W., Bront, R.P., Van Dooren, P., de Hoog, F.R. (1987), Bunch, J.R., Nielsen, C.P. and Sorensen, D.C. (1978), Bunke, H. and Bunke, O. (1989), Capderou, A., Douguet, D., Simislowski, T., Aurengo, A. and Zelter, M. (1997), Carroll, R.J., Ruppert, D. and Stefanski, L.A. (1996), Carroll, R.J., Küschenhoff, H., Lombard, F. and Stefanski, L.A. (1996), Carroll, R.J. and Stefanski, L.A. (1997), Chandrasekuran, S. and Sayed, A.H. (1996), Chun, J. Kailath, T. and Lev-Ari, H. (1987), Cook, J.R. and Stefanski, L.A. (1994), Dembo, R.S., Eisenstat, S.C. and Steihaug, T. (1982), De Moor, B. (1994), Fuller, W.A. (1987), Golub, G.H. and Van Loan, C.F. (1980), Hansen, P.C. (1998), Higham, N.J. (1996), Holcomb, J. P. (1996), Humak, K.M.S. (1983), Kamm, J. and Nagy, J.G. (1998), Kailath, T. Kung, S. and More, M. (1979), Kailath, T. and Chun, J. (1994), Kailath, T. and Sayed, A.H.(1995), Kleming, J.S. and Goddard, B.A. (1974), Kung, S.Y., Arun, K.S. and Braskar Rao, D.V. (1983), Lemmerling, P., Van Huffel, S. and de Moor, B. (1997), Lemmerling, P., Dologlou, I. and Van Huffel, S. (1998), Lin, X. and Carroll, R.J. (1999), Lin, X. and Carroll, R.J. (2000), Mackens, W. and Voss, H. (1997), Mastronardi, N, Van Dooren, P. and Van Huffel, S., Mastronardi, N., Lemmerling, P. and Van Huffel, S. (2000), Nagy, J.G. (1993), Park, H. and Elden, L. (1997), Pedroso, J.J. (1996), Polzehl, J. and Zwanzig, S. (2003), Rosen, J.B., Park, H. and Glick, J. (1996), Stefanski, L.A. and Cook, J.R. (1995), Stewart, M. and Van Dooren, P. (1997), Van Huffel, S. (1991), Van Huffel, S. and Vandewalle, J. (1991), Van Huffel, S., Decanniere, C., Chen, H. and Van Hecke, P.V. (1994), Van Huffel, S., Park, H. and Rosen, J.B. (1996), Van Huffel, S., Vandewalle, J., de Rao, M.C. and Willems, J.L.,Wang, N., Lin, X., Gutierrez, R.G. and Carroll, R.J. (1998), and Yang, T. (1996).
12 The sixth problem of generalized algebraic regression – the system of conditional equations with unknowns – (Gauss-Helmert model) C.F. Gauss and F.R. Helmert introduced the generalized algebraic regression problem which can be identified as a system of conditional equations with unknowns. :Fast track reading: Read only Lemma 12.2, Lemma 12.5, Lemma 12.8
Lemma 12.2 Normal equations: Ax + Bi = By Definition 12.1 W - LESS: Ax + Bi = By Lemma 12.3 Condition A B
“The guideline of chapter twelve: first definition and lemmas”
Lemma 12.5 R, W - MINOLEES: Ax + Bi = By Definition 12.4 R, W - MINOLESS: Ax + Bi = By Lemma 12.6 relation between A und B
“The guideline of chapter twelve: second definition and lemmas”
412
12 The sixth problem of generalized algebraic regression
Definition 12.7 R, W – HAPS: Ax + Bi = By
Lemma 12.8 R, W – HAPS: normal equations: Ax + Bi = By
“The guideline of chapter twelve: third definition and lemmas” The inconsistent linear system Ax + Bi = By called generalized algebraic regression with unknowns or homogeneous Gauß - Helmert model will be characterized by certain solutions which we present in Definition 12.1, Definition 12.4 and Definition 12.7 solving special optimization problems. Because of rk B = q there holds automatically R ( A ) R (B). Lemma 12.2, Lemma 12.5 and Lemma 12.8 contain the normal equations as special optimizational problems. Lemma 12.3 and Lemma 12.6 refer to special solutions as linear forms of the observation vector, in particular which are characterized by products of certain generalized inverses of the coefficient matrices A and B of conditional equations. In addition, we compare R, W - MINOLESS and R, W - HAPS by a special lemma. As examples we treat a height network which is characterized by absolute and relation height difference measurements called “leveling” of type I - LESS, I, I - MINOLESS, I, I - HAPS and R,WMINOLESS. Lemma 12.10 W - LESS: Ax + Bi = By c Definition 12.9 W - LESS: Ax + Bi = By c Lemma 12.11 Condition A B
“The guideline of Chapter twelve: fourth definition and lemmas”
12 The sixth problem of generalized algebraic regression
413
Lemma 12.13 R, W - MINOLESS: Ax + Bi = By c Definition 12.12 R, W - MINOLESS: Ax + Bi = By c Lemma 12.14 relation between A und B
“The guideline of chapter twelve: fifth definition and lemmas”
Definition 12.15 R, W – HAPS: Ax + Bi = By c
Lemma 12.16 R, W – HAPS: Ax + Bi = By c
“The guideline of chapter twelve: sixth definition and lemmas” The inconsistent linear system Ax + Bi = By - note the constant shift – called generalized algebraic regression with unknowns or inhomogeneous Gauß-Helmert model will be characterized by certain solutions which we present in Definition 12.9, Definition 12.12 and Definition 12.15, solving special optimization problems. Because of the rank identity rk B = q there holds automatically R ( A) R (B). Lemma 12.10, Lemma 12.13 and Lemma 12.16 contain the normal equations as special optimizational problems. Lemma 12.11 and Lemma 12.14 refer to special solutions as linear forms of the observation vector, in particular which can be characterized by products of certain generalized inverses of the coefficient matrices A and B of the conditional equations. In addition, we compare R, WMINOLESS and R, W - HAPS by a special lemma.
414
12 The sixth problem of generalized algebraic regression
At this point we have to mention that we were not dealing with a consistent system of homogeneous or inhomogeneous condition equations with unknowns of type Ax = By , By R ( A ) or Ax = By c, By c + R ( A ) . For further details we refer to E. Grafarend and B. Schaffrin (1993, pages 28-34 and 54-57). We conclude with Chapter 4 (conditional equations with unknowns, namely “bias estimation” within an equivalent stochastic model) and Chapter 2 (Examples for the generalized algebraic regression problem: W - LESS, R,W - MINOLESS and R, W - HAPS).
12-1 Solving the system of homogeneous condition equations with unknowns First, we solve the problem of homogeneous condition equations by the method of minimizing the W - seminorm of Least Squares. We review by Definition 12.1, Lemma 12.2 and Lemma 12.3 the characteristic normal equations and linear form which build up the solution of type W - LESS. Instead, secondly by Definition 12.4 and Lemma 12.5 and Lemma 12.6 R we present, W - MINOLESS as MInimum NOrm LEast Squares Solution (R - SemiNorm, W - SemiNorm of type Least Squares). Third, we alternatively concentrate by Definition 12.7 and Lemma 12.8 and Lemma 12.9 on R, W - HAPS (Hybrid APproximate Solution with respect to the combined R - and W - Seminorm). Fourth, we compare R, W - MINOLESS and R, W - HAPS x h xlm . 12-11
W - LESS
W - LESS is built on Definition 12.1, Lemma 12.2 and Lemma 12.3. Definition 12.1 (W - LESS, homogeneous conditions with unknowns): An m × 1 vector xl is called W - LESS (LEast Squares Solution with respect to the W -seminorm ) of the inconsistent system of linear equations (12.1) Ax + Bi = By with Bi A := By Ax A , if compared to alternative vectors x R m with Bi := Bi Ax the inequality || i A ||2 W := i cA Wi A d i cWi =:|| i ||2W holds, if in consequence i A has the smallest W - seminorm. The solutions of type W - LESS are computed as follows.
(12.2)
12-1 Solving the system of homogeneous condition equations with unknowns
415
Lemma 12.2 (W - LESS, homogeneous conditions with unknowns: normal equations): An m × 1 vector x A is W-LESS of (12.1) if and only if it solves the system of normal equations ª W Bc 0 º ª i A º ª 0 º « B 0 ǹ » « Ȝ A » = «B y » «¬ 0 A c 0 »¼ « x » « 0 » ¬ A¼ ¬ ¼
(12.3)
with the q × 1 vector OA of “Lagrange multipliers”. x A exists in the case of R (B) R ( W) and is solving the system of normal equations A c(BW Bc) 1 AxA = A c(BW Bc) 1 By .
(12.4) (12.5)
which is independent of the choice of the g - inverse W and unique. x A is unique if and only if the matrix A c( BW Bc) 1 A is regular, and equivalently, if
(12.6)
rk A = m
(12.7)
holds. :Proof: W - LESS is constructed by means of the “Lagrange function”
L (i, x, Ȝ ):= i cWi + 2Ȝ c( Ax + Bi By ) = min . i , x, Ȝ
The necessary conditions for obtaining the minimum are given by the first derivatives 1 wL (i A , x A , Ȝ A ) = Wi A + BcȜ A = 0 2 wi 1 wL (i A , x A , Ȝ A ) = A cȜ A = 0 2 wx 1 wL (i A , x A , Ȝ A ) = Ax A + Bi A By = 0. 2 wȜ Details for obtaining the derivatives of vectors are given in Appendix B. The second derivatives 1 w2L (i A , x A , Ȝ A ) = W t 0 2 wiwic are the sufficient conditions for the minimum due to the matrix W being positive semidefinite. Due to the condition
416
12 The sixth problem of generalized algebraic regression
R ( Bc ) R ( W ) we have WW Bc = Bc. As shown in Appendix A, BW 1B is invariant with respect to the choice of the generalized inverse W . In fact, the matrix BW - Bc is uniquely invertible. Elimination of the vector i A leads us to the system of reduced normal equations. ª BW Bc A º ª Ȝ A º ª By º « »« » = « » 0 ¼ ¬xA ¼ ¬ 0 ¼ ¬ Ac and finally eliminating Ȝ A to A c( BW Bc) 1 Ax A = A c( BW Bc) 1 By,
(12.8)
because of BW W = B there follows the existence of x A . Uniqueness is assured due to the regularity of the matrix A c(BW Bc) 1 A, which is equivalent to rk A = m .
The linear forms x A = Ly , which lead to W - LESS of arbitrary observation vectors y R n because of (12.4), can be characterized by Lemma 12.3. Lemma 12.3 (W - LESS, relation between A and B): Under the condition
R (Bc) R ( W)
is x A = L y W - LESS of (12.1) for all y R n if and only if the matrix L obeys the condition (12.9) L = AB subject to ( BW Bc) 1 AA = [( BW Bc) 1 AA 1 ]c.
(12.10)
In this case, the vector Ax A = AA By is always unique. 12-12
R, W – MINOLESS
R, W - MINOLESS is built on Definition 12.4, Lemma 12.5 and Lemma 12.6. Definition 12.4 (R, W - MINOLESS, homogeneous conditions with unknowns): An m × 1 vector xAm is called R, W - MINOLESS (Minimum NOrm with respect to the R – Seminorm, LEast Squares Solution with respect to the W – Seminorm) of the inconsistent system of linear equations if (12.3) is consistent and x Am is R- MINOS of (12.3).
417
12-1 Solving the system of homogeneous condition equations with unknowns
In case of R (Bc) R ( W) we can compute the solutions of type R, W – MINOLESS as follows. Lemma 12.5 (R, W – MINOLESS, homogeneous conditions with unknowns: normal equations): Under the assumption R (Bc) R ( W) is an m × 1 vector xAm R- MINOLESS of (12.1) if and only if it solves the normal equation ª R A c(BW Bc)-1 A º ª x Am º « A c(BW Bc)-1 A » «Ȝ » = 0 ¬ ¼ ¬ Am ¼ 0 ª º =« » -1 ¬ A c(BW Bc) By ¼
(12.11)
with the m × 1 vector Ȝ Am of “Lagrange – multipliers”. x Am exists always and is uniquely determined, if rk[R, Ac] = m
(12.12)
holds, or equivalently, if the matrix R + A c(BW Bc)-1 A
(12.13)
is regular. The proof of Lemma 12.5 is based on applying Lemma 1.2 on the normal equations (12.5). The rest is based on the identity ªR R + A c(BW Bc)-1 A = [ R , A c] « ¬ 0
º ªR º 0 -1 » « » . c (BW B ) ¼ ¬ A ¼
Obviously, the condition (12.12) is fulfilled if the matrix R is positive definite, or if R describes an R norm. The linear forms x Am = Ly , which lead to the R, W – MINOLESS solutions, are characterized as follows. Lemma 12.6 (R, W – MINOLESS, relation between A und B): Under the assumption R (B') = R ( W ) is x Am = Ly of type R, W – MINOLESS of (12.1) for all y R n if and only if the matrix A B follows the condition L = AB
(12.14)
subject to (BW Bc)-1 AA = [(BW Bc)-1 AA ']'
(12.15)
RA AA = RA
(12.16)
418
12 The sixth problem of generalized algebraic regression
RA A = ( RA A ) ' is fulfilled. In this case
(12.17)
(12.18) Rx Am = RLy is always unique. In the special case that R is positive definite, the matrix L is unique, fulfilling (12.14) - (12.17). :Proof: Earlier we have shown that the representation (BW Bc)-1 AA = [(BW Bc)-1 AA ]'
(12.19)
leads us to L = A B uniquely. The general solution of the system A c(BW Bc)-1 Ax A = A c(BW Bc)-1 By
(12.20)
x A = x Am + (I [ A c(BW B c)-1 A ] A c(BW B c)-1 A ]z
(12.21)
is given by
or x A = x Am + (I LBcA )z
(12.22)
for an arbitrary m × 1 vector z , such that the related R - seminorm follows the inequality || x Am ||2R =|| L y ||R2 d|| L y ||R2 +2y cLcR (I LB A )z + || (I LB A )z ||R2 .
(12.23)
For arbitrary y R n , we have the result that LcR (I LB A) = 0
(12.24)
must be zero! Or (12.25)
RLB R AL = RL
and
RLB R A = (RLB R A )c.
(12.26)
To prove these identities we must multiply from the right by L, namely LcR = LcRLB R A
:
LcRL = LcRLB R AL.
(12.27)
Due to the fact that the left hand side is a symmetric matrix, the right-hand side must have the same property, in detail RLB R A = (RLB R A)c q.e.d . Add L = A B and taking advantage of B R BA = A , we find RA AA = RLB R ALW Bc(BW Bc) 1 = = RLW Bc(BW Bc) 1 = RA and RA A = (RA A)c .
(12.28) (12.29)
419
12-1 Solving the system of homogeneous condition equations with unknowns
Uniqueness of xAm follows automatically. In case that the matrix R is positive definite and, of course, invertible, it is granted that the matrix L = A B is unique! 12-13
R, W – HAPS
R, W – HAPS is alternatively built on Definition 12.7 and Lemma 12.8. Definition 12.7 (R, W - HAPS, homogeneous conditions with unknowns): An m × 1 vector x h with Bi h = By Ax h is called R, W - HAPS (Hybrid APproximate Solution with respect to the combined Rand W- Seminorm if compared to all other vectors x R n of type Bi = By Ax the inequality || i h ||2W + || x h ||2R := i ch Wi h + xch Rx h d d i cWi + xcRx =:|| i ||2W + || x ||2R
(12.30)
holds, in other words if the hybrid risk function || i ||2W + || x ||2R is minimal. The solutions of type R, W – HAPS can be computed by Lemma 12.8
(R, W – HAPS homogeneous conditions with unknowns: normal equations):
An m × 1 vector x h is R, W – HAPS of the Gauß – Helmert model of conditional equations with unknowns if and only if the normal equations ª W B' 0 º ª i h º ª 0 º « B 0 A » « Ȝ h » = « By » «¬ 0 A' R »¼ « x » «¬ 0 »¼ ¬ h¼
(12.31)
with the q × 1 vector Ȝ h of “Lagrange – multpliers” are fulfilled. x A certainly exists in case of R (Bc) R ( W) and is solution of the system of normal equations (R +A c(BW Bc)-1 A)x h = A c(BW Bc)-1 By ,
(12.32) (12.33)
which is independent of the choice of the generalized inverse W uniquely defined. x h is uniquely defined if and only if the matrix (R +A c(BW Bc)-1 A) is regular, equivalently if rk[R, Ac] = m holds.
(12.34)
420
12 The sixth problem of generalized algebraic regression
:Proof: With the “Lagrange function” Ȝ R, W – HAPS is defined by L(i, x, Ȝ ) := i cWi + xcRx + 2Ȝ c(Ax + Bi - By ) = min . i,x,Ȝ
The first derivatives 1 wL (i h , x h , Ȝ h ) = Wi h + BcȜ h = 0 2 wi
(12.35)
1 wL (i h , x h , Ȝ h ) = A cȜ h + Rx h = 0 2 wx
(12.36)
1 wL ( i h , x h , Ȝ h ) = Ax h + Bi h By = 0 2 wȜ
(12.37)
establish the necessary conditions. The second derivatives 1 w 2L (i h , x h , Ȝ h ) = W t 0 2 wiwi c
(12.38)
1 w 2L (i h , x h , Ȝ h ) = R t 0 2 wxwxc
(12.39)
due to the positive definiteness of the matrices W and R a sufficient criterion for the minimum. If in addition R (Bc) R ( W) holds, we are able to reduce i h , namely to device the reduced system of normal equations ª(BW Bc) A º ª Ȝ h º ªBy º « »« » = « » R ¼ ¬ xh ¼ ¬ 0 ¼ ¬ A'
(12.40)
and by reducing Ȝ h , in addition, (R +A c(BW Bc)-1 A)x h = A c(BW B c)-1 By.
(12.41)
Because of the identity ªRR +A c(BW B c)-1 A = [R , A '] « ¬0
º ªR º 0 », -1 » « (BW Bc) ¼ ¬ A ¼
(12.42)
we can assure the existence of our solution x h and, in addition, the equivalence of the regularity of the matrix (R +A c(BW Bc)-1 A) with the condition rk[ R, A c] = m , the basis of the uniqueness of x h .
421
12-2 Examples
12-14
R, W - MINOLESS against R, W - HAPS
Obviously, R, W – HAPS with respect to the model (12.32) is unique if and only if R, W – MINOLESS is unique, because the representations (12.34) and (12.12) are identical. Let us replace (12.11) by the equivalent system (R +A c(BW Bc)-1 A)x Am + A c(BW Bc)-1 AȜ Am = A c(BW Bc)-1 By (12.43) and A c(BW Bc) Ax Am = A c(BW Bc)-1 By
-1
(12.44)
such that the difference x h x Am = (R +A c(BW B c)-1 A)1 A c(BW B c)-1 AȜ Am
(12.45)
follows automatically.
12-2 Examples for the generalized algebraic regression problem: homogeneous conditional equations with unknowns As an example of inconsistent linear equations Ax + Bi = By we treat a height network, consisting of four points whose relative and absolute heights are derived from height difference measurements according to the network graph in Chapter 9. We shall study various optimal criteria of type I-LESS I, I-MINOLESS I, I-HAPS, and R, W-MINOLESS: R positive semidefinite W positive semidefinite. We use constructive details of the theory of generalized inverses according to Appendix A. Throughout we take advantage of holonomic height difference measurements, also called “gravimetric leveling” { hDE := hE hD , hJD := hD hJ , hEG := hG hE , hGJ := hJ hG } within the triangles {PD , PE , PJ } and {PE , PG , PJ }. In each triangle we have the holonomity condition, namely {hDE + hEJ + hJD = 0, (hDE + hEJ = hJD )} {hJE + hEG + hGJ = 0, (hEG + hGJ = hJE )} .
422 12-21
12 The sixth problem of generalized algebraic regression
The first case: I - LESS
In the first example we order four height difference measurements to the system of linear equations ªiDE º ª hDE º « » « » ª 1º ª1 1 0 0º «iJD » ª1 1 0 0º « hJD » «1 » hEJ + « 0 0 1 1 » «i » = « 0 0 1 1 » « h » ¬ ¼ ¬ ¼ « EG » ¬ ¼ « EG » «iGJ » « hGJ » ¬ ¼ ¬ ¼ as an example of homogeneous inconsistent condition equations with unknowns:
ª 1º A := « » , ¬1 ¼
ª hDE º « » « hJD » 1 1 0 0 ª º x := hEJ , B := , y := « » ¬« 0 0 1 1 ¼» h « EG » « hGJ » ¬ ¼
n = 4 , m = 1, rkA = 1, rkB = q = 2 1 A c(BBc)-1 A = 1, A c(BBc)-1 B = [1, 1,1,1] 2 1 xA = (hEJ ) A = ( hDE hJD + hEG + hGJ ) . 2 12-22
The second case: I, I – MINOLESS
In the second example, we solve I, I – MINOLESS for the problem of four height difference measurements associated with the system of linear equations ªiDE º ª hDE º « » « » ª 1 1º ª hE º ª1 1 0 0 º «iJD » ª1 1 0 0 º « hJD » «¬ 1 1 »¼ « h » + « 0 0 1 1 » «i » = « 0 0 1 1 » « h » «¬ J »¼ ¬ ¼ « EG » ¬ ¼ « EG » «iGJ » « hGJ » ¬ ¼ ¬ ¼ as our second example of homogeneous inconsistent condition equations with unknowns: ª hDE º « » ª hE º « hJD » ª 1 1º ª1 1 0 0 º A := « , x := « » , B := « , y := « » h ¬ 1 1 »¼ ¬ 0 0 1 1 »¼ «¬ hJ »¼ « EG » « hGJ » ¬ ¼ n = 4 , m = 2 , rkA = 1, rkB = q = 2 . I, I – LESS solves the system of normal equations
423
12-2 Examples
A c(BBc)-1 Ax A = A c(BBc)-1 By ª 1 1º A c(BBc)-1 A = « » =: DE ¬ 1 1 ¼ ª1 º D = « » , E = [1, 1] . ¬ 1¼ For the matrix of the normal equations A c(BBc)-1 A = DE we did rank factorizing: O(D) = m × r , O(E) = r × m, rkD = rkE = r = 1 [ A c(BBc)-1 A ]' = Ec(EEc) 1 (DcD) 1 Dc = A '(BB ')-1 B =
1 1 ª 1 1º A c(BB c)-1 A = « , 4 4 ¬ 1 1 »¼
1 ª 1 1 1 1º . 2 «¬ 1 1 1 1 »¼
I, I – MINOLESS due to rk[R , A ' ] = 2 leads to the unique solution ª hE º 1 ª hDE + hJD hEG hGJ º x Am = « » = « », ¬« hJ ¼» Am 4 «¬ hDE hJD + hEG + hGJ ¼» which leads to the centric equation (hE )Am + (hJ )Am = 0. 12-23
The third case: I, I - HAPS
Relating to the second design we compute the solution vector x h of type I, I – HAPS by the normal equations, namely I + A c(BBc) 1 A =
1 ª 5 3º ª 5 3º , [I + A c(BBc) 1 A]1 = « 4 «¬3 5»¼ ¬ 3 5 »¼
ª hE º xh = « » = 4 «¬ hJ »¼ h 12-24
ª hDE + hJE hEG hGJ º « ». «¬ hDE hJE + hEG + hGJ »¼
The fourth case: R, W – MINOLESS, R positive semidefinite, W positive semidefinite
This time we refer the characteristic vectors x, y and the matrices A, B to the second design. The weight matrix of inconsistency parameters will be chosen to
424
12 The sixth problem of generalized algebraic regression
ª1 1 «1 W= « 2 0 «0 ¬
1 1 0 0
0 0 1 1
0º 0» 1» 1 »¼
and W = W , such that R (B ') R ( W) holds. The positive semidefinite matrix R = Diag(0,1) has been chosen in such a way that the rank partitioned unknown vector x = [x1c , xc2 ]', O(x1 ) = r × 1, O( x 2 ) = ( m 1) × 1, rkA =: r = 1 relating to the partial solution x 2 = xJ = 0, namely 1 ª 1 1 1 1º ª 1 1º A c(BW Bc)-1 A = « , A c(BW Bc)-1 B = « » 2 ¬ 1 1 1 1 »¼ ¬ 1 1 ¼ and 1 ª 1 1 1 1º ª 1 1º ª x E º «¬ 1 1 »¼ « x » = 2 «¬ 1 1 1 1 »¼ y , «¬ J »¼ Am 1 ( x E ) Am = ( hDE + hJD hEG hGJ ), ( xJ ) Am = 0. 2
12-3 Solving the system of inhomogeneous condition equations with unknowns First, we solve the problem of inhomogeneous condition equations by the method of minimizing the W – seminorm of Least Squares. We review by Definition 12.9 and Lemma 12.10 and Lemma 12.11 the characteristic normal equations and linear form which build up the solution of type W – LESS. Second, we extend the method of W – LESS by R, W – MINOLESS by means of Definition 12.12 and Lemma 12.13 and Lemma 12.14. R, W – MINOLESS stands for Minimum Norm East Squares Solution (R – Seminorm, W – Seminorm of type (LEast Squares). Third, we alternatively present by Definition 12.15 and Lemma 12.16 R, W – HAPS (Hybrid AProximate Solution with respect to the combined R- and W– Seminorm). Fourth, we again compare R, W – MINOLESS and R, W – HAPS by means of computing the difference vector x A x Am . 12-31
W – LESS
W – LESS of our system of inconsistent inhomogeneous condition equations with unknowns Ax + Bi = By - c, By c + R(A) is built on Definition 12.9, Lemma 12.10 and Lemma 12.11. Definition 12.9 (W - LESS , inhomogeneous conditions with unknowns): An m × 1 vector x A is called W - LESS (LEast Squares Solution with respect to the W- seminorm) of the inconsistent system of inhomogeneous linear equations
425
12-3 Solving the system of inhomogeneous condition equations with unknowns
Ax + Bi = By - c
(12.46)
with Bi A := By c Ax A , if compared to alternative vector x R m with Bi := By c Ax the inequality || i A ||2W := i 'A Wi A d i'Wi =:|| i ||2W
(12.47)
holds. As a consequence i A has the smallest W – seminorm. The solutions of the type W- LESS are computed as follows. Lemma 12.10
(W – LESS, inhomogeneous conditions with unknowns: normal equations):
An m × 1 vector x A is W – LESS of (12.46) if and only if it solves the system of normal equation ª W Bc 0 º ª i A º ª 0 º « B 0 A » « Ȝ A » = « By c » « »« » « » ¬ 0 Ac 0 ¼ ¬ xA ¼ ¬ 0 ¼
(12.48)
with the q × 1 vector Ȝ A of “Lagrange – multipliers”. x A exists indeed in case of R (B ') R ( W) and is solving the system of normal equations A c(BW Bc)-1 Ax A = A c(BW Bc)-1 (By c) ,
(12.49) (12.50)
which is independent of the choice of the g – inverse W and unique. x A is unique if and only if the matrix A c(BW Bc)-1 A is regular, or equivalently, if rkA = m
(12.51) (12.52)
holds. In this case the solution can be represented by x A = [ A c(BW Bc)-1 A]1 A c(BW Bc)-1 (By c).
(12.53)
The proof follows the same line-of-thought of (12.3) – (12.7). The linear form x A = L(y d) = Ly A follows the basic definitions and can be characterized by Lemma 12.11. Lemma 12.11 (W – LESS, relation between A and B): Under the condition R (Bc) R ( W)
(12.54)
426
12 The sixth problem of generalized algebraic regression
is x A = Ly 1 W – LESS of (12.46) for y R n if and only if the matrix L and m × 1 vector 1 obey the conditions (12.55)
L = AB
and
1 = Ac
(12.56)
subject to (BW Bc)-1 AA = [(BW B c)-1 AA ]'.
(12.57)
In this case, the vector Ax A = AA (By c) is always unique. The proof is obvious. 12-32
R, W – MINOLESS
R, W – MINOLESS of our system of inconsistent, inhomogeneous condition equations with unknowns Ax + Bi = By c , By c + R(A) is built on Definition 12.12, Lemma 12.13 and Lemma 12.14. (R, W - MINOLESS, inhomogeneous conditions with unknowns): An m × 1 vector xAm is called R, W - MINOLESS (Minimum NOrm with respect to the R – Seminorm LEast Squares Solution with respect to the W – Seminorm) if the inconsistent system of linear equations of (12.46) is inconsistent and x Am R- MINOS of (12.46). Definition 12.12
In case of R (B') R ( W ) we can compute the solutions of type R, W – MINOLESS of (12.46) as follows. Lemma 12.13
(R, W – MINOLESS, inhomogeneous conditions with unknowns: normal equations):
Under the assumption R (Bc) R ( W)
(12.58)
is an m × 1 vector xAm R, W - MINOLESS of (12.46) if and only if it solves the normal equation ª R A c(BW Bc)-1 A º ª x Am º « A c(BW Bc)-1 A » «Ȝ » = 0 ¬ ¼ ¬ Am ¼ 0 ª º =« » -1 ¬ A c(BW Bc) (By c) ¼
(12.59)
with the m × 1 vector Ȝ Am of “Lagrange – multipliers”. x Am exists always and is uniquely determined, if rk[R, A '] = m
(12.60)
427
12-3 Solving the system of inhomogeneous condition equations with unknowns
holds, or equivalently, if the matrix R + A c(BW Bc)-1 A is regular. In this case the solution can be represented by
(12.61)
x Am = [R + A c(BW Bc)-1 A c(BW Bc)-1 A ] × ×{A c(BW Bc)-1 A[R + A c(BW Bc)-1 A ]1 ×
(12.62)
×A c(BW Bc) A} A c(BW Bc) (By c) , which is independent of the choice of the generalized inverse.
-1
-1
The proof follows similar lines as in Chapter 12-5. Instead we present the linear forms x = Ly which lead to the R, W – MINOLESS solutions and which can be characterized as follows. Lemma 12.14 (R, W – MINOLESS, relation between A und B): Under the assumption R (Bc) R ( W) is x Am = Ly of type R, W – MINOLESS of (12.46) for all y R n if and only if the matrix A B follows the condition (12.63)
and 1 = Ac and (BW Bc)-1 AA = [(BW Bc)-1 AA ]c RA AA = RA RA A = (RA A)c are fulfilled. In this case is L = AB
Rx Am = R ( Ly 1)
(12.64) (12.65) (12.66) (12.67) (12.68)
always unique. In the special case that R is positive definite, the matrix L is unique, following (12.59) - (12.62). The proof is similar to Lemma 12.6 if we replace everywhere By by By c . 12-33
R, W – HAPS
R, W – HAPS is alternatively built on Definition 12.15 and Lemma 12.16 for the special case of inconsistent, inhomogeneous conditions equations with unknowns Ax + Bi = By c , By c + R(A) . Definition 12.15
(R, W - HAPS, inhomogeneous conditions with unknowns):
An m × 1 vector x h with Bi h = By c Ax h is called R, W HAPS (Hybrid APproximate Solution with respect to the combined R- and W- Seminorm if compared to all other vectors x R n of type Bi = By c Ax the inequality
428
12 The sixth problem of generalized algebraic regression
|| i h ||2W + || x h ||2R =: i ch Wi h + xch Rx h d d i cWi + xcRx =:|| i ||2W + || x ||2R
(12.69)
holds, in other words if the hybrid risk function || i || + || x || is minimal. 2 W
2 R
The solution of type R, W – HAPS can be computed by Lemma 12.16
(R, W – HAPS inhomogeneous conditions with unknowns: normal equations):
An m × 1 vector x h is R, W – HAPS of the Gauß – Helmert model of inconsistent, inhomogeneous condition equations with unknowns if and only if the normal equations ª W Bc 0 º ª i h º ª 0 º « B 0 A » « Ȝ h » = « By c » « »« » « » ¬ 0 Ac R ¼ ¬ x h ¼ ¬ 0 ¼
(12.70)
with the q × 1 vector Ȝ h of “Lagrange – multpliers” are fulfilled. x A exists certainly in case of R (Bc) R ( W)
(12.71)
and is solution of the system of normal equations [R +A c(BW Bc)-1 A ]x h = A c(BW B c)-1 (By c) , (12.72) which is independent of the choice of the generalized inverse W uniquely defined. x h is uniquely defined if and only if the matrix [R +A c(BW Bc)-1 A] is regular, equivalently if rk[R, A c] = m
(12.73)
holds. In this case the solution can be represented by x h = [R +A c(BW Bc)-1 A]1 A c(BW Bc)-1 (By c).
(12.74)
The proof of Lemma 12.16 follows the lines of Lemma 12.8. 12-34
R, W - MINOLESS against R, W - HAPS
Again we note the relations between R, W-MINOLESS and R, W-HAPS: R, WHAPS is unique because the representations (12.59) and (12.12) are identical. Let us replace (12.59) by the equivalent system (R +A c(BW Bc)-1 A)x Am + A c(BW B c)-1 AȜ Am = A c(BW B c)-1 (By c) (12.75)
-1
A c(BW Bc) x Am
and = A c(BW Bc)-1 (By c) ,
(12.76)
429
12-4 Conditional equations with unknowns
such that the difference x h x Am = [R +A'(BW - B')-1 A ]1 A'(BWB')-1 AȜ Am
(12.77)
follows automatically.
12-4 Conditional equations with unknowns: from the algebraic approach to the stochastic one Let us consider the stochastic portray of the model “condition equations with unknowns”, namely the stochastic Gauß-Helmert model. Consider the model equations AE{x} = BE{y} Ȗ or Aȟ = BȘ Ȗ subject to O( A) = q × m, O(B) = q × n Ȗ = Bį for some į R n rkA = m < rkB = q < n E{x} = ȟ, E{y} = Ș E{x} = x e x , E{y} = y e y ª E{x E{x}} = 0 « 2 ¬ E{(x E{x})( x E{x}) '} = ı x 4 x versus ª E{y E{y}} = 0 « 2 «¬ E{(y E{y})( y E{y}) '} = ı y 4 y . 12-41
Shift to the centre
From the identity Ȗ = Bį to the centre we gain another identity of type AE{x} = BE{y į} = B{y į} Be y = w Be y such that B(y į) =: w w = Aȟ + Be y . 12-42
The condition of unbiased estimators
The unknown ȟˆ = K1 By + A 1 is uniformly unbiased estimable, if and only if º ªȟ = K1BE{y} + l1 = K1 ( Aȟ + Ȗ ) + l1 » or « n for all ȟ R m . E{x} = E{E {x}}¼» ¬
ȟ E{ȟˆ}
430
12 The sixth problem of generalized algebraic regression
ȟˆ is unbiased estimable if and only if A1 = K1J or K1 A = I m . In consequence, K1 = A must be a left generalized inverse. L
n {y} = y IJ = y (K2 By A 2 ) is uniformly unbiased estimable if The unknown E and only if n B{y} = E{E {y}} = (I n K2 B) E{y} + A 2 = E{y} K2 ( Aȟ + Ȗ ) + l2 for all ȟ R m and for all E{y} = R n . n E {y} is unbiased estimable if and only if A 2 = K2 Ȗ or K2 A = 0. 12-43
n {ȟ} The first step: unbiased estimation of ȟˆ and E
n {ȟ} will be presented first. The key lemma of unbiased estimation of ȟˆ and E ȟˆ is unbiased estimable if and only if n {y} is unbiased estimable if and only if ȟˆ = L1 y + A 1 = K1By + A 1 E n E {y} = L 2 y + A 2 = (I n K2 B)y + A 2 or BL 2 = AL1 = AA B since Ȗ + R ( A ) R (B ) = R 9 . 12-44
The second step: unbiased estimation K1 and K2
The bias parameter for K1 and K2 are to be estimated by K1 = [ A '(BQ y B ') 1 A]A '(BQ y B ') 1 K2 = Q 'y B '(BQ y B ') 1 (I q AK1 ) A 1 = K1Ȗ , A 2 = +K2 Ȗ generating BLUUE of E{x} = ȟ and E{y} = Ș , respectively.
13 The nonlinear problem of the 3d datum transformation and the Procrustes Algorithm A special nonlinear problem is the three-dimensional datum transformation solved by the Procrustes Algorithm. A definition of the three-dimensional datum transformation with the coupled unknowns of type dilatation unknown, also called scale factor, translation and rotation unknown follows afterwards. :Fast track reading: Read Definition 13.1, Corollary 13.2-13.4, 6 and Lemma 13.5 and Lemma 13.7
Corollary 13.2: translation partial W - LESS for x 2 Corollary 13.3: scale partial W - LESS for x1 Definition 13.1: ^ 7(3) 3d datum transformation
Corollary 13.4: rotation partial W - LESS for X 3 Theorem 13.5: W – LESS of ( Y '1 = Y2 X '3 x1 + 1x '2 + E) Corollary 13.6: I – LESS of ( Y1 Y2 X '3 x1 + 1x '2 + E)
“The guideline of Chapter 13: definition, corollaries and lemma” Let us specify the parameter space X, namely x1 the dilatation parameter – the scale factor – x1 \
versus
x 2 the column vector of translation parameter x 2 \ 3×1
versus X 3 O + (3) =: {X 3 || \ 3×3 | X 3* X 3 = I 3 and |X 3 |= +1} X 3 is an orthonormal matrix, rotation matrix of three parameters
432
13 The nonlinear problem
which is built on the scalar x1 , the vector x 2 and the matrix X 3 . In addition, by the matrices ª x1 Y1 := «« y1 «¬ z1
x2 ... xn 1 y2 ... yn 1 z2
... zn 1
xn º yn »» \ 3×n zn »¼
and ª X1 Y2 := «« Y1 «¬ Z1
X 2 ... X n 1 Y2 ... Yn 1 Z2
... Z n 1
Xn º Yn »» \ 3×n Z n »¼
we define a left and right three-dimensional coordinate arrays as an ndimensional simplex of observed data. Our aim is to determine the parameters of the 3 - dimensional datum transformation {x1 , x 2 , X 3} out of a nonlinear transformation (conformal group ^ 7 (3) ). x1 \ stands for the dilatation parameter, also called scale factor, x 2 \ 3×1 denotes the column vector of translation parameters, and X 3 O + (3) =: {X 3 \ 3×3 | X '3 X 3 = I 3 , |X 3 |= +1} the orthonormal matrix, also called rotation matrix of three parameters. The key problem is how to determine the parameters for the unknowns of type {x1 , x 2 , X 3} , namely scalar dilatation x1 , vector of translation and matrix of rotation, for instance by weighted least squares. Example 1 (simplex of minimal dimension, n = 4 points tetrahedron): ª x1 Y1 := «« y1 «¬ z1 ª x1 «x « 2 « x3 « ¬« x4
y1 y2 y3 y4
x2 y2 z2
x3 y3 z3
x4 ºc ª X1 » y4 » «« Y1 «¬ Z1 z4 »¼
z1 º ª X 1 Y1 z2 »» «« X 2 Y2 = z3 » « X 3 Y3 » « z4 ¼» ¬« X 4 Y4
X2 Y2 Z2
X3 Y3 Z3
X 4 ºc Y4 »» =: Y Z 4 »¼
Z1 º ª e11 e12 «e » e Z2 » X3 x1 + 1x '2 + « 21 22 «e31 e32 Z3 » « » Z 4 ¼» ¬«e 41 e 42
e13 º e 23 »» . e33 » » e 43 ¼»
Example 2 (W – LESS) We depart from the setup of the pseudo-observation equation given in Example 1 (simplex of minimal dimension, n = 4 points, tetrahedron). For a diagonal
13-1 The 3d datum transformation and the Procrustes Algorithm
433
weight W = Diag( w1 ,..., w4 ) R 4× 4 we compute the Frobenius error matrix W – semi - norm ª w1 0 0 0 º ª e11 e12 e13 º ª e11 e 21 e31 e 41 º « 0 w2 0 0 »» ««e 21 e 22 e 23 »» × || E ||2W := tr(E ' WE) = tr( ««e12 e 22 e32 e 42 »» « )= « 0 0 w3 0 » «e31 e32 e33 » «¬e13 e 23 e33 e 43 »¼ « » » « ¬« 0 0 0 w4 ¼» ¬«e 41 e 42 e 43 ¼» ª e11 e12 e13 º ª e11w1 e 21w2 e31w3 e 41w4 º « e e e » = tr( ««e12 w1 e 22 w2 e32 w3 e 42 w4 »» × « 21 22 23 » ) = «e31 e32 e33 » » ¬«e13 w1 e 23 w2 e33 w3 e 43 w4 ¼» «e ¬« 41 e 42 e 43 ¼» 2 2 2 = w1e11 + w2e 221 + w3e31 + w4e 241 + w1e12 + w2e 222 + 2 2 2 + w3e32 + w4e 242 + w1e13 + w2e 223 + w3e33 + w4e 243 .
Obviously, the coordinate errors (e11 , e12 , e13 ) have the same weight w1 , (e 21 , e 22 , e 23 ) have the same weight w2 , (e31 , e32 , e33 ) have the same weight w3 , and finally (e 41 , e 42 , e 43 ) have the same weight w4 . We may also say that the error weight is pointwise isotropic, weight e11 = weight e12 = weight e13 = w1 etc. However, the error weight is not homogeneous since w1 = weight e11 z weight e 21 = w2 . Of course, an ideal homogeneous and isotropic weight distribution is guaranteed by the criterion w1 = w2 = w3 = w4 .
13-1 The 3d datum transformation and the Procrustes Algorithm First, we present W - LESS for our nonlinear adjustment problem for the unknowns of type scalar, vector and special orthonormal matrix. Second, we review the Procrustes Algorithm for the parameters {x1 , x 2 , X 3} . Definition 13.1: (nonlinear analysis for the three-dimensional datum transformation: the conformal group ^ 7 (3) ): The parameter array {x1A , x 2 A , X 3A } is called W – LESS (LEast Squares Solution with respect to the W – Seminorm) of the inconsistent linear system of equations Y2 Xc3 x1 + 1xc2 + E = Y1
(13.1)
subject to Xc3 X3 = I 3 , | X3 |= +1
(13.2)
434
13 The nonlinear problem
of the field of parameters in comparison with alternative parameter arrays {x1A , x 2 A , X 3A } fulfils the inequality equation || Y1 Y2 Xc3 A x1A 1xc2 A ||2W := := tr(( Y1 Y2 Xc3A x1A 1xc2 A )cW( Y1 Y2 Xc3 A x1A 1xc2 A )) d =: tr ((Y1 Y2 Xc3 x1 1xc2 )cW( Y1 Y2 Xc3 x1 1xc2 )) =: =:|| Y1 Y2 Xc3 x1 1xc2 ||2W
(13.3)
EA := Y1 Y2 Xc3 A x1A 1xc2 A
(13.4)
in other words if
has the least W – seminorm. ? How to compute the three unknowns {x1 , x 2 , X 3} by means of W – LESS ? Here we will outline the computation of the parameter vector by means of partial W – LESS: At first, by means of W – LESS we determine x 2 A , secondly by means of W – LESS x1A , followed by thirdly means of W – LESS X 3 . In total, we outline the Procrustes Algorithm. Step one: x 2 Corollary 13.2 (partial W – LESS for x 2 A ): A 3 × 1 vector x 2A is partial W – LESS of (13.1) subject to (13.2) if and only if x 2A fulfils the system of normal equations 1cW1x 2 A = ( Y1 Y2 Xc3 x1 )cW1.
(13.5)
x 2A always exists and is represented by x 2 A = (1cW1)-1 (Y1 - Y2 Xc3 x1 )cW1 .
(13.6)
For the special case W = I n the translated parameter vector x 2 A is given by 1 x 2A = (Y1 Y2 Xc3 x1 )c1. (13.7) n For the proof, we shall first minimize the risk function (Y1 Y2 Xc3 x1 1xc2 )c(Y1 Y2 Xc3 x1 1xc2 ) = min x2
with respect to x 2 ! :Detailed Proof of Corollary 13.2: W – LESS is constructed by the unconstrained Lagrangean
435
13-1 The 3d datum transformation and the Procrustes Algorithm
L( x1 , x 2 , X 3 ) := =
1 || E ||2W =|| Y1 Y2 X '3 x1 1x ' 2 ||2W = 2
1 tr( Y1 Y2 X '3 x1 1x '2 ) ' W( Y1 Y2 X '3 x1 1x '2 ) = 2 =
min
x1 t 0, x 2 R 3×1 , X 3 ' X 3 = I 3
wL ( x 2A ) = (1'W1)x 2 ( Y1 Y2 X '3 x1 ) ' W1 = 0 wx ,2 constitutes the first necessary condition. Basics of the vector-valued differentials are found in E. Grafarend and B. Schaffrin (1993, pp. 439-451). As soon as we backward substitute the translational parameter x 2A , we are led to the centralized Lagrangean L( x1 , X 3 ) =
1 tr{[ Y1 ( Y2 X '3 x1 + (1'W1) 111 ' W( Y1 Y2 X '3 x1 ))]' W * 2
*[Y1 ( Y2 X '3 x1 + (1'W1) 1 11 ' W( Y1 Y2 X '3 x1 ))]} L( x1 , X 3 ) =
1 tr{[(I (1'W1) 111 ') W( Y1 Y2 X '3 x1 )]' W * 2
*[( I (1'W1) 1 11 ') W( Y1 Y2 X '3 x1 )]} 1 C := I n 11' 2 being a definition of the centering matrix, namely for W = I n C := I n (1 ' W1) 1 11'W
(13.8)
being in general symmetric. Substituting the centering matrix into the reduced Lagrangean L( x1 , X 3 ) , we gain the centralized Lagrangean L( x1 , X3 ) =
1 tr{[ Y1 Y2 X '3 x1 ]' C'WC[ Y1 Y2 X '3 x1 ]}. 2
(13.9)
Step two: x1 Corollary 13.3 (partial W – LESS for x1A ): A scalar x1A is partial W – LESS of (13.1) subject to (13.3) if and only if x1A =
tr Y '1 C'WCY2 X '3 tr Y '2 C'WCY2
(13.10)
436
13 The nonlinear problem
holds. For the special case W = I n the real parameter is given by x1A =
tr Y '1 C'CY2 X '3 . tr Y '2 C'CY2
(13.11)
The general condition is subject to C := I n (1'W1) 1 11'W.
(13.12)
:Detailed Proof of Corollary 13.3: For the proof we shall newly minimize the risk function 1 L( x1 , X 3 ) = tr{[ Y1 Y2 X '3 x1 ]' C'WC[ Y1 Y2 X '3 x1 ]} = min x 2 subject to 1
X '3 X 3 = I 3 . wL ( x1A ) = x1A tr X 3Y '2 C ' WCY2 X '3 tr Y '1 C ' WCY2 X '3 = 0 wx1 constitutes the second necessary condition. Due to tr X 3Y '2 C ' WCY2 X '3 = tr Y '2 C ' WCY2 X '3 X 3 = Y ' 2 C ' WCY2 lead us to x1A . While the forward computation of (wL / wx1 )( x1A ) = 0 enjoyed a representation of the optimal scale parameter x1A , its backward substitution into the Lagrangean L( x1 , X 3 ) amounts to L( X 3 ) = tr{[ Y1 Y2 X '3
L( X 3 ) =
tr Y '1 C ' WCY2 X '3 tr Y '1 C ' WCY2 X '3 ]C ' WC *[ Y1 Y2 X ' 3 ]} tr Y '2 C ' WCY2 tr Y '2 C ' WCY2
tr Y '1 C ' WCY2 X '3 1 tr{( Y '1 C ' WCY1 ) tr( Y '1 C ' WCY2 X '3 ) * 2 tr Y '2 C ' WCY2 tr( X 3Y '2 C'WCY1 )
tr Y '1 C ' WCY2 X '3 + tr Y '2 C ' WCY2
+ tr( X 3Y '2 C'WCY2 X '3 )
[tr Y '1 C ' WCY2 X '3 ]2 [tr Y '2 C ' WCY2 ]2
L( X 3 ) =
[tr Y '1 C ' WCY2 X '3 ]2 1 [tr Y '1 C ' WCY2 X '3 ]2 1 tr( Y '1 C ' WCY1 ) + 2 [tr Y '2 C ' WCY2 ] 2 [tr Y '2 C ' WCY2 ]
L( X3 ) =
1 1 [tr Y '1 C ' WCY2 X '3 ]2 tr( Y '1 C ' WCY1 ) = min . X ' X =I 2 2 [tr Y '2 C ' WCY2 ] 3
3
3
Third, we are left with the proof for the Corollary 13.4, namely X 3 .
(13.13)
437
13-1 The 3d datum transformation and the Procrustes Algorithm
Step three: X 3 Corollary 13.4 (partial W – LESS for X 3A ): A 3 × 1 orthonormal matrix X 3 is partial W – LESS of (13.1) subject to (13.3) if and only if X 3A = UV '
(13.14)
holds where A := Y '1 C'WCY2 = UȈ s V ' is a singular value decomposition with respect to a left orthonormal matrix U, U'U = I 3 , a right orthonormal matrix V, VV' = I 3 , and Ȉ s = Diag(V 1 , V 2 , V 3 ) a diagonal matrix of singular values (V 1 , V 2 , V 3 ). The singular values are the canonical coordinates of the right eigenspace ( A'A Ȉ2s I) V = 0. The left eigenspace is based upon U = AVȈs 1 . :Detailed Proof of Corollary 13.4: The form L( X 3 ) subject to X '3 X 3 = I 3 is minimal if tr(Y '1 C ' WCY2 X '3 ) =
max
x1 t 0, X '3 X3 = I3
.
Let A := Y '1 C ' WCY2 = UȈV ' , a singular value decomposition with respect to a left orthonormal matrix U, U'U = I 3 , a right orthonormal matrix V, VV' = I 3 and Ȉ s = Diag (V 1 , V 2 , V 3 ) a diagonal matrix of singular values (V 1 , V 2 , V 3 ). Then 3
3
i =1
i =1
tr( AX '3 ) = tr( UȈ s V ' X '3 ) = tr( Ȉ s V ' X 3U ) = ¦ V i rii d ¦ V i holds, since R = V'X '3 U = [ rij ] R 3×3
(13.15) 3
is orthonormal with || rii ||d 1 . The identity tr ( AX '3 ) = ¦ V i applies, if i =1
V'X '3 U = I3 , i.e. X '3 = VU ', X 3 = UV ', namely, if tr( AX '3 ) is maximal 3
tr( AX '3 ) = max tr AX '3 = ¦V i R = V'X '3 U = I 3 . X '3 X 3 = I 3
(13.16)
i =1
An alternative proof of Corollary 13.4 based on formal differentiation of traces and determinants has been given by P.H. Schönemann (1966) and P.H. Schönemann and R.M. Carroll (1970). Finally, we collect our sequential results in Theorem 13.5 identifying the stationary point of W – LESS specialized for W = I in Corollary 13.5. The highlight is the Procrustes Algorithm we review in Table 13.1.
438
13 The nonlinear problem
Theorem 13.5 (W – LESS of Y '1 = Y2 X '3 x1 + 1x '2 + E ) (i) The parameter array {x1 , x 2 , X 3} is W – LESS if x1A =
tr Y '1 C ' WCY2 X '3A tr Y '2 C ' WCY2
(13.17)
x 2 A = (1'W1) 1 ( Y1 Y2 X '3A x1A ) ' W1
(13.18)
X 3 = UV '
(13.19)
subject to the singular value decomposition of the general 3 × 3 matrix (13.20) Y '1 C ' WCY2 = U Diag(V 1 , V 2 , V 3 )V ' namely [( Y '1 C ' WCY2 ) '( Y '1 C ' WCY2 ) V i I] v i = 0
(13.21)
V = [v1 , v 2 ,v 3 ], VV' = I 3
(13.22)
U = Y '1 C ' WCY2 V Diag(V 11 , V 21 , V 31 ),
(13.23) (13.24)
U'U = I3 and as well as the centering matrix C := I n (1'W1) 1 11'W.
(13.25)
(ii) The empirical error matrix of type W- LESS accounts for EA = [I n 11'W(1 ' W1) 1 ]( Y1 Y2 VU '
tr Y '1 C ' WCY2 VU ' ) tr Y '2 C ' WCY2
(13.26)
with the related Frobenius matrix W – seminorm || EA ||2W = tr( E 'A WEA ) = tr{( Y1 Y2 VU '
tr Y '1 C ' WCY2 VU ' )' * tr Y '2 C ' WCY2
*[I n 11'W(1'W1) 1 ]' W[I n 11'W(1'W1) 1 ]* *( Y1 Y2 VU '
tr Y '1 C ' WCY2 VU ' )} tr Y '2 C ' WCY2
(13.27)
and the representative scalar measure of the error of type W - LESS || EA ||W = tr(E 'A WEA ) / 3n .
(13.28)
A special result is obtained if we specialize Theorem 13.5 to the case W = I n :
439
13-1 The 3d datum transformation and the Procrustes Algorithm
Corollary 13.6 (I – LESS of Y '1 = Y2 X '3 x1 + 1x '2 + E ): (i)
The parameter array {x1 , x 2 , X 3} is Y '1 = Y2 X '3 x1 + 1x '2 + E if x1A =
I – LESS of
tr Y '1 CY2 X '3A tr Y '2 CY2
(13.29)
1 ( Y1 Y2 X '3A x1A ) ' 1 (13.30) n (13.31) X 3A = UV ' subject to the singular value decomposition of the general 3 × 3 matrix (13.32) Y1 ' CY2 = U Diag(V 1 , V 2 , V 3 )V ' namely x 2A =
[(Y '1 CY2 )'(Y '1 CY2 )-V i2 ]Iv i = 0, i {1,2,3}, V = [v1 , v 2 ,v 3 ], VV' = I 3 (13.33)
U = Y '1 CY2 V Diag(V 11 , V 21 , V 31 ), UU' = I 3 and as well as the centering matrix 1 C := I n 11'. n (ii) The empirical error matrix of type I- LESS accounts for tr Y '1 C ' Y2 VU ' 1 EA = [I n 11']( Y1 Y2 VU ' ) n tr Y '2 CY2
(13.34)
(13.35)
(13.36)
with the related Frobenius matrix W – seminorm tr Y '1 CY2 VU ' )' * tr Y '2 CY2 tr Y '1 CY2 VU ' 1 *[I n 11']( Y1 Y2 VU ' )} n tr Y '2 CY2
|| E ||I2 = tr( E 'A EA ) = tr{( Y1 Y2 VU '
(13.37)
and the representative scalar measure of the error of type I - LESS || EA ||I = tr(E 'A EA ) / 3n .
(13.38)
In the proof of Corollary 13.6 we only sketch the result that the matrix I n (1/ n)11' is idempotent: 1 1 2 1 (I n 11c)(I n 11c) = I n 11c + 2 (11') 2 n n n n 2 1 1 = I n 11c + 2 n11c = I n 11c. n n n
440
13 The nonlinear problem
As a summary of the various steps of Corollary 2-4, 5 and Theorem 5, Table 13.1 presents us the celebrated Procrustes Algorithm, which is followed by one short und interesting Citation about “Procrustes”. Following Table 13.1, we present the celebrated Procrustes Algorithm which is a summary of the various steps of Corollary 2-4,5 and Theorem 5. Table 13.1: Procrustes Algorithm ª x1 y1 z1 º ª X 1 Y1 Z1 º « » # # # # » = Y2 and « # Step 1: Read Y1 = # « » « » «¬ xn yn zn »¼ «¬ X n Yn Z n ¼» 1 Step 2: Compute: Y '1 CY2 subject to C := I n 11 ' n Step 3: Compute: SVD Y '1 CY2 = UDiag (V 1 , V 2 , V 3 ) V ' 3-1
| ( Y '1 CY2 ) '( Y '1 CY2 ) V i2 I |= 0 (V 1 , V 2 , V 3 )
(( Y '1 CY2 ) '( Y '1 CY2 ) V i2 I)v i = 0, i {1, 2, 3} V = [ v1 , v 2 ,v 3 ] right eigenvectors (right eigencolumns) 3-3 U = Y '1 CY2 VDiag (V 11 , V 21 , V 31 ) left eigenvectors (left eigencolumns) Step 4: Compute: X 3A = UV ' rotation 3-2
Step 5: Step 6: Step 7:
tr Y '1 CY2 X '3 (dilatation) tr Y '2 CY2 1 Compute: x 2 A = ( Y1 Y2 X '3 x1 ) ' 1 (translation) n tr Y '1 CY2 VU ' ) (error matrix) Compute: EA = C(Y1 Y2 VU ' tr Y '2 CY2
Compute: x1A =
‘optional control’ EA := Y1 ( Y2 X '3 x1A + 1x '2 A ) Step 8:
Compute: || EA ||I := tr( E 'A EA ) (error matrix)
Step 9:
Compute: || EA ||I := tr( EcA EA ) / 3n (mean error matrix)
Procrustes (the subduer), son of Poseidon, kept an inn benefiting from what he claimed to be a wonderful all-fitting bed. He lopped of excessive limbage from tall guests and either flattened short guests by hammering or stretched them by racking. The victim fitted the bed perfectly but, regrettably, died. To exclude the embarrassment of an initially exact-fitting guest, variants of the legend allow Procrustes two, different-sized beds. Ultimately, in a crackdown on robbers and monsters, the young Theseus fitted Procrustes to his own bed.
441
13-2 The variance - covariance matrix of the error matrix E
13-2 The variance - covariance matrix of the error matrix E By Lemma 13.7 we review the variance - covariance matrix, namely the vector valued form of the transposed error matrix, as a function of Ȉ vecY , Ȉ vecY and the covariance matrix Ȉ vecY ' , ( I
x X ) vecȈ . 1
2
n
1
1
3
Y '2
Lemma 13.7: (Variance – covariance “error propagation”): Let vecE ' be the vector valued form of the transposed error matrix E := Y1 Y2 X '3 x1 1x '2 . Then Ȉ vecE ' = Ȉ vecY ' + (I n
x1 X3 ) ȈvecY ' (I n
x1 X3 ) ' 2ȈvecY ' , ( I 1
2
1
n
x1 X 3 ) vec Y '2
(13.39)
is the exact representation of the dispersion matrix (variance – covariance matrix) ȈvecE' of vec E ' in terms of dispersion matrices (variance – covariance matrices) ȈvecY ' and ȈvecY ' of the two coordinates sets vec Y '1 and vec Y '2 as well as their covariance matrix ȈvecY ' , ( I
X ) vecY ' . 1
1
n
3
2
2
The proof follows directly from “error propagation”. Obviously the variance – covariance matrix of ȈvecE' can be decomposed in the variance – covariance matrix ȈvecY ' , the product ( I n
x1X 3 ) ȈvecY ' (I n
X 3 ) ' using prior information of x1 and X 3 and the covariance matrix ȈvecY ' , ( I
x X ) vecY ' again using prior information of x1 and X3 . 1
2
1
n
1
3
2
13-3 Case studies: The 3d datum transformation and the Procrustes Algorithm By Table 13.1 and Table 13.2 we present two sets of coordinates, first for the local system A, second for the global system B, also called “World Geodetic System 84”. The units are in meter. The results of I – LESS, Procrustes Algorithm are listed in Table 13.3, especially || EA ||I := tr( E 'A EA ), ||| EA |||I := tr( E ' A EA ) / 3n and W – LESS, Procrustes Algorithm in Table 13.4, specially || EA ||W := tr( E 'A WEA ), |||EA |||W := tr( E ' A WEA ) / 3n completed by Table 13.5 of residuals from the Linearized Least Squares and by Table 13.6 listing the weight matrix.
442
13 The nonlinear problem
Discussion By means of the Procrustes Algorithm which is based upon W – LESS with respect to Frobenius matrix W – norm we have succeeded to solve the normal equations of Corollary 13.2 and 13.5 (necessary conditions) of the matrix – valued “error equations” vec E ' = vec Y '1 (I n
x1X3 ) vec Y '2 vec x 2 1 ' subject to X '3 X 3 = I 3 , | X 3 |= +1 . The scalar – valued unknown x1 R represented dilatation (scale factor), the vector – valued unknown x 2 R 3×1 the translation vector, and the matrix valued unknown X 3 SO (3) the orthonormal matrix. The conditions of sufficiency, namely the Hesse matrix of second derivatives, of the Lagrangean L( x1 , x 2 , X 3 ) are not discussed here. They are given in the Procrustes references. In order to present you with a proper choice of the isotropic weight matrix W, we introduced the corresponding “random regression model” E{vec E '} = E{vec Y '1} (I n
x1X3 )E{vec Y '2 } vec x 2 1 ' = 0 first moment identity, D{vec E '} = D{vec Y '1 } (I n
x1 X3 ) D{vec Y '2 }(I n
x1 X3 ) ' 2C{vec Y '1 , (I n
x1 X3 ) vec Y '2 }, second central moment identity. Table 13.2. Coordinates for system A (local system) Station name Solitude Buoch Zeil Hohenneuffen Kuehlenberg Ex Mergelaec Ex Hof Asperg Ex Kaisersbach
X(m)
Y(m)
Z(m)
positional error sphere
4 157 222.543 4 149 043.336 4 172 803.511 4 177 148.376 4 137 012.190 4 146 292.729 4 138 759.902
664 789.307 688 836.443 690 340.078 642 997.635 671 808.029 666 952.887 702 670.738
4 774 952.099 4 778 632.188 4 758 129.701 4 760 764.800 4 791 128.215 4 783 859.856 4 785 552.196
0.1433 0.1551 0.1503 0.1400 0.1459 0.1469 0.1220
Table 13.3. Coordinates for system B (WGS 84) Station name Solitude Buoch Zeil Hohenneuffen Kuehlenberg Ex Mergelaec Ex Hof Asperg Ex Kaisersbach
X(m)
Y(m)
Z(m)
4 157 870.237 4 149 691.049 4 173 451.354 4 177 796.064 4 137 659.549 4 146 940.228 4 139 407.506
664 818.678 688 865.785 690 369.375 643 026.700 671 837.337 666 982.151 702 700.227
4 775 416.524 4 779 096.588 4 758 594.075 4 761 228.899 4 791 592.531 4 784 324.099 4 786 016.645
positional error sphere 0.0103 0.0038 0.0006 0.0114 0.0068 0.0002 0.0041
13-3 Case studies: The 3d datum transformation and the Procrustes Algorithm
443
Table 13.4. Results of the I-LESS Procrustes transformation Rotation matrix
X 3 \3 x 3 Translation
x 2 \ 3 x1 (m) Scale x1 \ Residual matrix
E(m)
Error matrix norm (m)
Values 0.999999999979023 -4.33275933098276e- 6 4.81462518486797e-6 -4.8146461589238e-6 0.999999999976693 -4.84085332591588e-6 4.33273602401529e-6 4.84087418647916e-6 0.999999999978896 641.8804 68.6553 416.3982 1.00000558251985 Site Solitud Buoch Zeil Hohenneuffen Kuelenberg Ex Mergelaec Ex Hof Asperg Ex Keisersbach
X(m)
Y(m)
Z(m)
0.0940 0.0588 -0.0399 0.0202 -0.0919 -0.0118 -0.0294
0.1351 -0.0497 -0.0879 -0.0220 0.0139 0.0065 0.0041
0.1402 0.0137 -0.0081 -0.0874 -0.0055 -0.0546 0.0017
0.2890
|| EA ||I := tr(EcA EA )
Mean error matrix norm (m)
0.0631
||| EA |||I := tr(EcA EA ) / 3n
Table 13.5. Results of the W-LESS Procrustes transformation Rotation matrix
X 3 \3 x 3 Translation
x 2 \ 3 x1 (m) Scale x1 \ Residual matrix E(m)
Values 0.999999999979141 4.77975830372179e-6 -4.34410139438235e-6 -4.77977931759299e-6 0.999999999976877 -4.83729276438971e-6 4.34407827309968e-6 4.83731352815542e-6 0.999999999978865 641.8377 68.4743 416.2159 1.00000561120732 Site Solitude Buoch Zeil Hohenneuffen Kuelenberg Ex Mergelaec Ex Hof Asperg Ex Keisersbach
Error matrix norm (m) ||| EA |||W := tr( E*A WEA )
Mean error matrix norm (m) ||| EA |||W := tr( E WEA ) / 3n * A
0.4268
0.0930
X(m) 0.0948 0.0608 -0.0388 0.0195 -0.0900 -0.0105 -0.0266
Y(m) 0.1352 -0.0500 -0.0891 -0.0219 0.0144 0.0067 0.0036
Z(m) 0.1407 0.0143 -0.0072 -0.0868 -0.0052 -0.0542 0.0022
444
13 The nonlinear problem
Table 13.6. Residuals from the linearized LS solution Site Solitude Buoch Zeil Hohenneuffen Kuelenberg Ex Mergelaec Ex Hof Asperg Ex Keisersbach
X(m) 0.0940 0.0588 -0.0399 0.0202 -0.0919 -0.0118 -0.0294
Y(m) 0.1351 -0.0497 -0.0879 -0.0220 0.0139 0.0065 0.0041
Z(m) 0.1402 0.0137 -0.0081 -0.0874 -0.0055 -0.0546 -0.0017
Table 13.7. Weight matrix 1.8110817 0 0 0 0 0 0
0 0 0 2.1843373 0 0 0 2.1145291 0 0 0 1.9918578 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2.6288452 0 0 0 2.1642460 0 0 0 2.359370
13-4 References Here is a list of important references: Awange LJ (1999), Awange LJ (2002), Awange LJ, Grafarend E (2001 a, b, c), Awange LJ, Grafarend E (2002), Bernhardt T (2000), Bingham C, Chang T, Richards D (1992), Borg I, Groenen P (1997), Brokken FB (1983), Chang T, Ko DJ (1995), Chu MT, Driessel R (1990), Chu MT, Trendafilov NT (1998), Crosilla F (1983a, b), Dryden IL (1998), Francesco D, Mathien PP, Senechal D (1997), Golub GH (1987), Goodall C (1991), Gower JC (1975), Grafarend E and Awange LJ (2000, 2003), Grafarend E, Schaffrin B (1993), Grafarend E, Knickmeyer EH, Schaffrin B (1982), Green B (1952), GullikssonM(1995a, b), Kent JT, Mardia KV (1997), Koch KR (2001), Krarup T (1979), Lenzmann E, Lenzmann L (2001a, b), Mardia K (1978), Mathar R (1997), Mathias R (1993), Mooijaart A, Commandeur JJF (1990), Preparata FP, Shamos MI (1985), Reinking J (2001), Schönemann PH (1966), Schönemann PH, Carroll RM (1970), Schottenloher M (1997), Ten Berge JMF (1977), Teunissen PJG (1988), Trefethen LN, Bau D (1997) and Voigt C (1998).
14 The seventh problem of generalized algebraic regression revisited: The Grand Linear Model: The split level model of conditional equations with unknowns (general Gauss-Helmert model) The reaction of one man can be forecast by no known mathematics; the reaction of a billion is something else again. Isaac Asimov :Fast track reading: Read only Lemma 14.1, Lemma 14.2, Lemma 14.3
Lemma 14.10: W - LESS
Lemma 14.1: W - LESS
Lemma 14.13:R, W - MINOLESS
Lemma 14.2: R, W - MINOLESS
14-11
Lemma 14.16: R, W - Lemma 14.3: R, W - HAPS relation between A und B HAPS “The guideline of Chapter 14: three lemmas”
The inconsistent, inhomogeneous system of linear equations Ax + Bi = By c we treated before will be specialized for arbitrary condition equations between the observation vector y on the one side and the other side. We assume in addition that those condition equations which do not contain the observation vector y are consistent. The n × 1 vector i of inconsistency is specialized to B(y - i ) c + R ( A) . The first equation: B1i = B1y c1 The first condition equation is specialized to contain only conditions acting on the observation vector y, namely as an inconsistent equation B1i = B1y c1.
446
14 The seventh problem of generalized algebraic regression
There are many examples for such a model. As a holonomity condition there is said, for instance, that the “true observations” fulfill an equation of type 0 = B1E{y} c1 . Example Let there be a given two connected triangular networks of type height difference measurements which we already presented in Chapter 9-3, namely for c1 := 0 and {hDE + hEJ + hJD = 0} {hJE + hEG + hJE = 0} ª1 1 1 0 0 º B1 = « , y := ª¬ hDE , hEJ , hJD , hEJ , hGE º¼c . ¬ 0 1 0 1 1 »¼ The second equation: A 2 x + B 2 i = B 2 y c 2 , c 2 R (B 2 ) The second condition equation with unknowns is assumed to be the general model which is characterized by the inconsistent, inhomogeneous system of linear equations, namely A 2 x + B 2 i = B 2 y c 2 , c 2 R (B 2 ). Examples have been given earlier. The third equation: A 3 x = c3 , c3 R ( A 3 ) The third condition equation is specialized to contain only a restriction acting on the unknown vector x in the sense of a fixed constraint, namely A 3 x = c3 or A 3 x + c3 = 0, c3 R ( A 3 ). We refer to our old example of fixing a triangular network in the plane whose position coordinates are derived from distance measurements and fixed by a datum constraint. The other linear model of type Chapters 1, 3, 5, 9 and 12 can be considered as special cases. Lemma 14.1 refers to the solution of type W LESS, Lemma 14.2 of type R, W – MINOLESS and Lemma 14.3 of type R, W – HAPS.
14-1 Solutions of type W-LESS The solutions of our model equation Ax + Bi = By c of type W – LESS can be characterized by Lemma 14.1. Lemma 14.1 (The Grand Linear Model, W - LESS): The m × 1 vector xl is W – LESS of Ax + Bi = By c if and only if
447
14-1 Solutions of type W-LESS
ª W Bc 0 º ª i l º ª 0 º « B 0 A » « Ȝ l » = « By c » « 0 Ac 0 » « x » « 0 » ¬ ¼¬ l¼ ¬ ¼
(14.1)
with the q3 × 1 vector Ȝ l of “Lagrange multipliers”. x l exists in the case of R (Bc) R ( W) and is solution of the system of normal equations ª A c2 (B 2 W W1W Bc2 ) 1 A 2 A 3 º ª xl º = « 0 »¼ «¬ Ȝ 3 »¼ A2 ¬ ª A c (B W W1W Bc2 ) 1 (B 2 W W1y k 2 ) º =« 2 2 » c 3 ¬ ¼
(14.2)
with W1 := W B1c ( B1 W B1c ) 1 B1
(14.3)
k 2 := c 2 B 2 W B1c ( B1 W B1c ) c1
(14.4)
1
and the q3 × 1 vector Ȝ 3 of “Lagrange multipliers” which are independent of the choice of the g-inverse W uniquely determined if Ax l is uniquely determined. x l is unique if the matrix N := A c2 ( B 2 W W1 W Bc2 ) 1 A 2 + A c3 A 3 is regular or equivalently if
(14.5)
rk[ A c2 , A c3 ] = rk A = m.
(14.6)
In this case, x l has the representation xl = N 1 N 3 N 1 A c2 (B 2 W W1 k 2 ) N 1 Ac3 ( A 3 N 1 Ac3 ) c3
(14.7)
with N 3 := N A c3 ( A 3 N 1A c3 ) A 3
(14.8)
independent of the choice of the g-inverse ( A 3 N 1 A c3 ) . :Proof: W – LESS will be constructed by means of the “Lagrange function”
L ( i, x, Ȝ ) := icWi + 2Ȝ c( Ax + Bi By + c ) = min i, x, Ȝ
for which the first derivatives wL (i l , xl , Ȝ l ) = 2( Wi l + BcȜ l ) = 0 wi
448
14 The seventh problem of generalized algebraic regression
wL (i l , xl , Ȝ l ) = 2 A cȜ l = 0 wx wL (i l , xl , Ȝ l ) = 2( Axl + Bi l By + c) = 0 wȜ are necessary conditions. Note the theory of vector derivatives is summarized in Appendix B. The second derivatives w2L (i l , x l , Ȝ l ) = 2W t 0 wiwic constitute due to the positive-semidefiniteness of the matrix W the sufficiency condition. In addition, due to the identity WW Bc = Bc and the invariance of BW Bc with respect to the choice of the g-inverse such that with the matrices BW Bc and B1 W B1c the “Schur complements” B 2 W Bc2 B 2 W B1c (B1 W B1c ) 1 B1 W Bc2 = B 2 W W1 W Bc2 is uniquely invertible. Once if the vector ( q1 + q2 + q3 ) × 1 vector Ȝ l is partitioned with respect to Ȝ cl := [Ȝ1c , Ȝ c2 , Ȝ c3 ] with O(Ȝ i ) = q1 × 1 for all i = 1, 2, 3, then by eliminating of i l we arrive at the reduced system of normal equations ª B1W B1c B1W Bc2 0 0 º ª Ȝ1 º ª B1y c1 º « B W B c B W B c 0 A » « Ȝ » « B y c » 2 1 2 2 2» « 2 « 2»= « 2 » Ȝ c 0 0 0 A 3 « 3»« 3» « » 0 A c2 A c3 0 »¼ «¬ Ȝ l »¼ «¬ 0 »¼ ¬«
(14.9)
and by further eliminating Ȝ1 and Ȝ 2 1
ª B W Bc B1 W Bc2 º 1 1 » = A c(B 2 W W1 W Bc2 ) [B 2 W B1c (B1 W B1c ) , I] c B W B ¬ 2 1 2 2¼
[ 0, Ac2 ] «B1 W B1c
leads with c1 \ q = R (B1 ), c 2 \ q = R (B 2 ) and c3 R ( A 3 ) to the existence of x l . An equivalent system of equations is 1
2
ª N A c3 º ª xl º ª A c2 (B 2 W W1W Bc2 ) 1 (B 2 W W1y k 2 ) Ac3c3 º » «¬ A 3 0 »¼ «¬ Ȝ 3 »¼ = « c3 ¬ ¼ subject to N := A c2 ( B 2 W W1 W Bc2 ) 1 A 2 + A c3 A 3 ,
which we can solve for c3 = A 3 xl = A 3 N Nxl = A 3 N Ac2 (B 2 W W1 W Bc2 ) 1 (B 2 W W1 y k 2 ) A3 N Ac3 (c3 + Ȝ 3 ) for an arbitrary g-inverse N . In addition, we solve further for a g-inverse ( A 3 N A c3 )
449
14-2 Solutions of type R, W-MINOLESS
A c3 (c3 + Ȝ 3 ) = A c3 ( A 3 N A c3 ) A c3 N Ac3 (c3 Ȝ 3 ) = = A c3 ( A 3 N A c3 ) A 3 N Ac2 (B 2 W W1 W Bc2 ) 1 (B 2 W W1 y k 2 ) + Ac3 ( A 3 N Ac3 ) c3 subject to Nxl = A c2 (B 2 W W1 W Bc2 ) 1 (B 2 W W1 y k 2 ) Ac3 (c3 + Ȝ 3 ) = [N A c3 ( A 3 N A c3 ) A 3 ]N A c2 (B 2 W W1 W Bc2 ) 1 (B 2 W W1 y k 2 ) A c3 ( A 3 N A c3 ) c3 . With the identity ª(B W W1W Bc2 ) 1 0 º ª A 2 º N = [ A c2 , A c3 ] « 2 0 I »¼ «¬ A 3 »¼ ¬
(14.10)
we recognize that with Nx l also Ax l = AN Nx l is independent of the g-inverse N and is always uniquely determinable. x l is unique if and only if N is regular. We summarize our results by specializing the matrices A and B and the vector c and find x l of type (14.7) and (14.8).
14-2 Solutions of type R, W-MINOLESS The solutions of our model equation Ax + Bi = By c of type R, W – MINOLESS are characterized by Lemma 14.2. Lemma 14.2 (The Grand Linear Model, R, W – MINOLESS): Under the assumption R (Bc) R ( W)
(14.11)
is the vector x lm R, W – MINOLESS of Ax + Bi = By c if and only if the system of normal equations 0 ª º ª R N º ª xlm º « 1 » c c N N A ( B W W W B ) ( B W W y k ) 2 2 1 2 2 1 2 «¬ N 0 »¼ «¬ Ȝ lm »¼ « 3 » A c3 ( A 3 N A c3 ) c3 ¬« ¼»
(14.12)
with the m × 1 vector Ȝ lm of “Lagrange multipliers” subject to (14.13) W1 := W B1 (B1 W B1c )-1 B1 , k 2 := c 2 B 2 W B1c (B1 W B1c )-1 c1
(14.14)
(14.15) N := A c2 (B 2 W W1 W Bc2 ) A 2 + A c3 A 3 , N 3 := N A c3 ( A 3 N Ac3 ) A 3 . (14.16)
-1
All definitions are independent of the choice of the g – inverse N and ( A 3 N A c3 ) . xAm exists always if and only if the matrix (14.17) R+N is regular, or equivalently if (14.18) rk[R, A] = m holds.
450
14 The seventh problem of generalized algebraic regression
:Proof: The proof follows the line of Lemma 12.13 if we refer to the reduced system of normal equations (12.62). The rest is subject to the identity (12.59) ªR « 0 R + N = [R , A c] « « 0 « ¬« 0
0 0 0º » I 0 0» ªR º « ». 0 (B 2 W W1W Bc2 )1 0 » ¬ A ¼ » 0 0 I ¼»
(14.19)
It is obvious that the condition (14.18) is fulfilled if the matrix R is positive definite and consequently if it is describing a R - norm. In specifying the matrices A and B and the vector c, we receive a system of normal equations of type (14.12)-(14.18).
14-3 Solutions of type R, W-HAPS The solutions of our model equation Ax + Bi = By c of type R, W – HAPS will be characterized by Lemma 14.3. Lemma 14.3 (The Grand Linear Model, R, W - HAPS): An m × 1 vector x h is R, W – HAPS of Ax + Bi = By c if it solves the system of normal equations ª W Bc 0 º ª i h º ª 0 º « B 0 A » « Ȝ h » = « By - c » « 0 Ac R » « x » « 0 » ¬ ¼¬ h¼ ¬ ¼
(14.20)
with the q × 1 vector Ȝ A of “Lagrange multipliers”. x h exists if R (Bc) R ( W) and if it solves the system of normal equations ª R + A c2 (B 2 W W1W Bc2 ) 1 A 2 A c3 º ª x h º = « A3 0 »¼ «¬ Ȝ 3 »¼ ¬ ª A c (B W W1W Bc2 ) 1 (B 2 W W1y k 2 ) º =« 2 2 » c3 ¬ ¼
(14.21)
(14.22)
with (14.23) W1 := W B1c ( B1 W B1c ) 1 B1 , k 2 := c 2 B 2 W B1c ( B1 W B1c ) 1 c1 (14.24) and the q3 × 1 vector Ȝ 3 of “Lagrange multipliers” which are defined independent of the choice of the g – inverse W and in such a way that both Ax h and Rx A are uniquely determined. x h is unique if and only if the matrix
451
14-3 Solutions of type R, W-HAPS
R+N
(14.25)
with N := A c2 ( B 2 W W1 W Bc2 ) 1 A 2 + A c3 A 3
(14.26)
being regular or equivalently, if rk[R , A ] = m.
(14.27)
In this case, x h can be represented by x h = (R + N ) 1{R + N A c3 [ A 3 (R + N) 1 A c3 ] A 3 × ×(R + N ) 1 A c2 (B 2 W W1 W Bc2 ) 1 ×
(14.28)
×(B 2 W W1 y k 2 ) ( R + N) Ac3 [ A 3 ( R + N) A c3 ) c3 ,
1
1
which is independent of the choice of the g - inverse [ A 3 ( R + N) 1 A c3 ] . :Proof: R, W – HAPS will be constructed by means of the “Lagrange function”
L ( i, x, Ȝ ) := icWi + x cRx + 2 Ȝ c( Ax + Bi By + c) = min i, x, Ȝ
for which the first derivatives wL ( i h , x h , Ȝ h ) = 2( Wi h + BcȜ h ) = 0 wi wL ( i h , x h , Ȝ h ) = 2( A cȜ h + Rx h ) = 0 wx wL ( i h , x h , Ȝ h ) = 2( Ax h + Bi h By + c ) = 0 wȜ are necessary conditions. Note the theory of vector derivatives is summarized in Appendix B. The second derivatives w 2L (i h , x h , Ȝ h ) = 2 W t 0 wiwi c w 2L ( i h , x h , Ȝ h ) = 2R t 0 wxwx c constitute due to the positive-semidefiniteness of the matrices W and R a sufficiency condition for obtaining a minimum. Because of the condition (14.21) R (Bc) R ( W) we are able to reduce first the vector i A in order to be left with the system of normal equations.
452
14 The seventh problem of generalized algebraic regression
ª -B1W B1c -B1 W B c2 0 0 º ª Ȝ1 º ª B1y - c1 º «-B W Bc -B W B c 0 A » « Ȝ » «B y - c » 2 1 2 2 3» « 2 « 2»= « 2 » 0 0 0 A 3 » « Ȝ 3 » « -c3 » « 0 A c3 A c3 R ¼» «¬ x h »¼ «¬ 0 ¼» ¬«
(14.29)
is produced by partitioning the ( q1 + q2 + q3 ) × 1 vector due to Ȝ ch = [Ȝ1c , Ȝ c2 , Ȝ c3 ] and 0( Ȝ i ) = qi for i = 1, 2, 3 . Because of BW Bc and B1 W B1c with respect to the “Schur complement”, B 2 W Bc2 B 2 W B1c (B1 W B1c ) 1 B1 W Bc2 = B 2 W W1 W Bc2 is uniquely invertible leading to a further elimination of Ȝ1 and Ȝ 2 because of 1
ª B W Bc B1W Bc2 º 1 1 [0, A c2 ] « 1 1 » = A c2 (B 2 W W1 W Bc2 ) [ B 2 W B1c (B1W B1c ) , I ] c c B W B B W B ¬ 2 1 2 2¼ and ª R + N A c3 º ª x h º ª A c2 (B 2 W W1 W Bc2 )1 (B 2 W W1 y k 2 ) A c3 c3 º =« ». « A 0 »¼ «¬ Ȝ 3 »¼ ¬ c3 ¬ 3 ¼ For any g – inverse ( R + N ) there holds c3 = A 3 x h = A 3 (R + N) 1 (R + N)x h = A 3 (R + N)[ A c2 (B 2 W W1 W Bc2 )-1 (B 2 W W1 y k 2 ) A 3 (c3 + Ȝ 3 )] and for an arbitrary g – inverse [ A 3 ( R + N) A c3 ] (R + N)x h = A c2 (B 2 W W1W Bc2 )-1 (B 2 W W1y k 2 ) A c3 (c3 = Ȝ 3 ) = = {R + N A c3 [ A 3 (R + N) A c3 ] A c3 }(R + N) (B 2 W B1 W Bc2 )-1 × ×(B 2 W W1y k 2 ) A3c [ A3 ( R + N) Ac3 ] c3 A c3 (c3 = Ȝ 3 ) = A c3 [ A 3 (R + N) A 3 ](R + N) A c3 (c3 = Ȝ 3 ) = = A c3 [ A 3 ( R + N) A c3 ] A 3 (R + N) 1 A c2 (B 2 W W1 W Bc2 )-1 (B 2 W W1 y k 2 ) + + A c3 [ A 3 (R + N) A c3 ] c3 . (14.30) Thanks to the identity ªR « R + N = [R, A c2 , A c3 ] « 0 « 0 ¬ it is obvious that
0 0º ª R º » 1 (B 2 W W1W B c2 ) 0 » «« A 2 »» 0 I »¼ «¬ A 3 »¼
453
14-4 Review of the various models: the sixth problem
(i)
the solution x h exists always and
(ii)
is unique when the matrix ( R + N ) is regular which coincide with (14.28).
Under the condition R (Bc) R ( W) , R, W – HAPS is unique if R, W – MINOLESS is unique. Indeed the forms for R, W – HAPS and R, W – MINOLESS are identical in this case. A special form, namely (R + N)x Am + NȜ Am = = N 3 N A c2 (B 2 W W1 W Bc2 )-1 (B 2 W W1y k 2 ) A c3 ( A 3 N Ac3 ] c3
Nx Am = = N 3 N Ac2 (B 2 W W1W Bc2 ) (B 2 W W1y k 2 ) A c3 ( A 3 N A c3 ] c3 ,
1
(14.31)
(14.32)
leads us to the representation xh xAm = (R + N)1 NȜ Am + +(R + N)1{R + N Ac3 ( A3 (R + N)1 Ac3 ) A3 ](R + N)1 N3 N- }Ac2 (B2 W W1 W B'2 )-1 ×
(14.33)
×(B2 W W1y k 2 ) + (R + N)1 Ac3 {(A3 N1 Ac3 ) [A3 (R + N)1 Ac3 ] }c3 .
14-4 Review of the various models: the sixth problem Table 14.1 gives finally a review of the various models of type “split level”. Table 14.1 (Special cases of the general linear model of type conditional equations with unknowns (general Gauss-Helmert model)): B1i = B1y - c1 ª0 º ª B1 º ª B1 º ª c1 º A 2 x + B 2 i = B 2 y - c 2 « A 2 » x + «B 2 » i = « B 2 » y «c 2 » « » « » « » « » A 3 x = -c31 ¬ A3 ¼ ¬0¼ ¬0¼ ¬ c3 ¼ Ax = y Ax + i = y AX = By Ax + Bi = By Bi = By Ax = By - c Ax + Bi = By - c Bi = By - c y R(A) y R (A) By R(A) By R(A) By c + R(A) By c + R(A) A2 = A A3 = 0 B1 = 0 B2 = I (i = 0) c1 = 0 c2 = 0 c3 = 0
A2 = A A3 = 0 B1 = 0 B2 = I c1 = 0 c2 = 0 c3 = 0
A2 = A A3 = 0 B1 = 0 B2 = B (i = 0) c1 = 0 c2 = 0 c3 = 0
A2 = A A3 = 0 B1 = 0 B2 = B
A2 = 0 A3 = 0 B1 = 0 B2 = 0
c1 = 0 c2 = 0 c3 = 0
c1 = 0 c2 = 0 c3 = 0
A2 = A A3 = 0 B1 = 0 B2 = B (i = 0) c1 = 0 c2 = c c3 = 0
A2 = A A3 = 0 B1 = 0 B2 = B
A2 = 0 A3 = 0 B1 = B B2 = 0
c1 = 0 c2 = c c3 = 0
c1 = c c2 = 0 c3 = 0
454
14 The seventh problem of generalized algebraic regression
Example 14.1 As an example of a partitioned general linear system of equations, of type Ax + Bi = By c we treat a planar triangle whose coordinates consist of three distance measurements under a datum condition. As approximate coordinates for the three points we choose xD = 3 / 2, yD = 1/ 2, xE = 3, yE = 1, xJ = 3 / 2, yJ = 3 / 2 such that the linearized observation equation can be represented as A 2 x = y (B 2 = I, c 2 = 0, y R (A 2 )) ª 3 / 2 1/ 2 « A2 = « 0 1 « 0 0 ¬
0 º » 1 ». 3 / 2 1/ 2 3 / 2 1/ 2 »¼
3/2 0
1/ 2 0
0 0
The number of three degrees of freedom of the network rotation are fixed by three conditions: ª1 0 0 0 0 A 3x = c 3 (c 3 R(A 3 ) ), A 3 = «0 1 0 0 0 « «¬0 0 0 0 1
of type translation and 0º ª 0, 01º » 0 , c 3 = «0, 02 » , » « » «¬ 0, 01»¼ 0»¼
especially with the rank and order conditions rkA 3 = m rkA 2 = 3, O(A 2 ) = n × m=3 × 6, O(A 3 ) = ( m rkA 2 ) × m=3 × 6, and the 6 × 6 matrix [ A c2 , A c3 ]c is of full rank. Choose the observation vector ªA2 º ª y º y = [104 ,5 × 104 , 4 × 104 ]c and find the solution x = « » « » , in detail: ¬ A 3 ¼ ¬ c 3 ¼ x = [0.01, 0.02, 0.00968, 0.02075, 0.01, 0.02050]c.
15 Special problems of algebraic regression and stochastic estimation: multivariate Gauss-Markov model, the n-way classification model, dynamical systems Up to now, we have only considered an “univariate Gauss-Markov model”. Its generalization towards a multivariate Gauss-Markov model will be given in Chapter 15.1. At first, we define a multivariate linear model by Definition 15.1 by giving its first and second order moments. Its algebraic counterpart via multivariate LESS is subject of Definition 15.2. Lemma 15.3 characterizes the multivariate LESS solution. Its multivariate Gauss-Markov counterpart is given by Theorem 15.4. In case we have constraints in addition, we define by Definition 15.5 what we mean by “multivariate Gauss-Markov model with constraints”. The complete solution by means of “multivariate Gauss-Markov model with constraints” is given by Theorem 15.5. In contrast, by means of a MINOLESS solution we present the celebrated “n-way classification model”. Examples are given for a 1-way classification model, for a 2-way classification model without interaction, for a 2-way classification model with interaction with all numerical details for computing the reflexive, symmetric generalized inverse ( A cA ) rs . The higher classification with interaction is finally reviewed. We especially deal with the problem how to compute a basis of unbiased estimable quantities from biased solutions. Finally, we take account of the fact that in addition to observational models, we have dynamical system equations. Additionally, we therefore review the Kalman Filter (Kalman - Bucy Filter). Two examples from tracking a satellite orbit and from statistical quality control are given. In detail, we define the stochastic process of type ARMA and ARIMA. A short introduction on “dynamical system theory” is presented. By two examples we illustrate the notions of “a steerable state” and of “observability”. A careful review of the conditions “steerability” by Lemma 15.7 and “observability” by Lemma 15.8 is presented. Traditionally the state differential equation as well as the observational equation are solved by a typical Laplace transformation which we will review shortly. At the end, we focus on the modern theory of dynamic nonlinear models and comment on the theory of chaotic behaviour as its up-to date counterpart.
15-1 The multivariate Gauss-Markov model – a special problem of probabilistic regression – Let us introduce the multivariate Gauss-Markov model as a special problem of the probabilistic regression. If for one matrix A of dimension O( A) = n × m in a Gauss-Markov model instead of one vector of observations several observation vectors y i of dimension O ( y i ) = n × p with identical variance-covariance matrix Ȉij are given and the fixed array of parameters ȟ i has to be determined, the model is referred to as a
456
15 Special problems of algebraic regression and stochastic estimation
Multivariate Gauss-Markov model. The standard Gauss-Markov model is then called a univariate Gauss-Markov model. The analysis of variance-covariance is applied afterwards to a multivariate model if the effect of factors can be referred to not only by one characteristic of the phenomenon to be observed, but by several characteristics. Indeed this is the multivariate analysis of variance-covariance. For instance, the effects of different regions on the effect of a species of animals are to be investigated, the weight of the animals can serve as one characteristic and the height of the animals as a second one. Multivariate models can also be setup, if observations are repeated at different times, in order to record temporal changes of a phenomenon. If measurements in order to detect temporal changes of manmade constructions are repeated with identical variance-covariance matrices under the same observational program, the matrix A of coefficients in the Gauss-Markov model stays the same for each repetition and each repeated measurement corresponds to one characteristic. Definition 15.1 (multivariate Gauss-Markov model): Let the matrix A of the order n × m be given, called the first order design matrix, let ȟ i denote the matrix of the order m × p of fixed unknown parameters, and let y i be the matrix of the order n × p called the matrix of observations subject to p d n . Then we speak of a “multivariate Gauss-Markov model” if (15.1)
E{y i } = Aȟ i
(15.2)
D{y i , y j } = I nG ij
ª O{y i } = n × p subject to «« O{A} = n × m «¬O{ȟ i } = m × p subject to O{G ij } = p × p and p.d.
for all i, j {1,..., p} apply for a second order statistics. Equivalent vector and matrix forms are (15.3)
E{Y} = AȄ and E{vec Y } = (I p
A) vec Ȅ
(15.5)
D{vec Y} = Ȉ
I n and d {vec Y} = d ( Ȉ
I n ) (15.6)
(15.4)
subject to O{Ȉ} = p × p, O{Ȉ
I} = np × np, O{vec Y} = np × 1, O{Y} = n × p O{D{vec Y}} = np × np, O{d (vec Y)} = np( np + 1) / 2. The matrix D{vec Y} builds up the second order design matrix as the Kronecker-Zehfuss product Ȉ and I n .
457
15-1 The multivariate Gauss-Markov model
In the multivariate Gauss-Markov model both the matrices ȟ i and V ij or ; and Ȉ are unknown. An algebraic equivalent of the multivariate linear model would read as given by Definition 15.2. Definition 15.2 (multivariate linear model): Let the matrix A of the order n × m be given, called first order algebraic design matrix, let x i denote the matrix of the order of fixed unknown parameter, and y i be the matrix of order n × p called the matrix of observations subject to p d n . Then we speak of an algebraic multivariate linear model if p
¦ || y
i
Axi ||G2 = min ~ || vec Y (I p
A) vec X ||G2 y
i =1
xi
vec Y
= min
(15.7)
establishing a G y or G vecY -weighted least squares solution of type multivariate LESS. It is a standard solution of type multivariate LESS if ª O{X} = m × p X = ( A cG y A) A cG y Y « O{A} = n × m « ¬O{Y} = n × p ,
(15.8)
which nicely demonstrates that the multivariate LESS solution is built on a series of univariate LESS solutions. If the matrix A is regular in the sense of rk( A cG y A ) = rk A = m , our multivariate solution reads X = ( A cG y A ) 1 A cG y Y ,
(15.9)
excluding any rank deficiency caused by a datum problem. Such a result may be initiated by fixing a datum parameter of type translation (3 parameters at any epoch), rotation (3 parameters at any epoch) and scale (1 parameter at any epoch). These parameters make up the seven parameter conformal group C7 (3) at any epoch in a three-dimensional Euclidian space (pseudo-Euclidian space). Lemma 15.3 (general multivariate linear model): A general multivariate linear model is multivariate LESS if p, p
¦ (y
Ax i )Gij ( y j Ax j ) = min
i
(15.10)
x
i , j =1
or n m, m p , p
¦ ¦ ¦ ( yD D E J =1
,
i
aDE xE i )G ij ( yD j aDJ xJ j ) = min
(15.11)
x
i, j
or (vec Y (I p
A) vec X)c(I n
G y (vec Y (I p
A) vec X) = min . (15.12) x
An array X , dim X = m × p is multivariate LESS, if
458
15 Special problems of algebraic regression and stochastic estimation
vec X = [( I p
A )c( I n
G Y ) ( I p
A )]1 ( I p
A ) c( I n
G Y ) vec Y (15.13) and rk(I n
G Y ) = np.
(15.14)
Thanks to the weight matrix Gij the multivariate least squares solution (15.3) differs from the special univariate model (15.9). The analogue to the general LESS model (15.10)-(15.12) of type multivariate BLUUE is given next. Theorem 15.4 (multivariate Gauss-Markov model of type ȟ i , in particular ( Ȉ, I n ) - BLUUE): A multivariate Gauss-Markov model is ( Ȉ, I n ) - BLUUE if the vector vecȄ of an array Ȅ, dim Ȅ = n × p , dim(vec Ȅ) = np × 1 of unknowns is estimated by the matrix ȟ i , namely ˆ = [(I
A)c( Ȉ
I ) 1 (I
A)]1 (I
A)c( Ȉ
I ) 1 vec Y (15.15) vec Ȅ p n p p n subject to rk( Ȉ
I n ) 1 = np .
(15.16)
Ȉ ~ V ij denotes the variance-covariance matrix of multivariate effects yDi for all D = 1,..., n and i = 1,..., p . An unbiased estimator of the variance-covariance matrix of multivariate effects is 1 ª 2 ˆ ˆ « i = j : Vˆ i = n q (y i Aȟ i )c(y i Aȟ i ) « «i z j : Vˆ = 1 (y Aȟˆ )c(y Aȟˆ ) ij i i j i «¬ nq
(15.17)
because of E{(y i Aȟˆ i )c(y j Aȟˆ j )} = E{y ci ( I A( AcA) 1 A c) y j } = V ij ( n q) . (15.18) A nice example is given in K.R. Koch (1988 pp. 281-286). For practical applications we need the incomplete multivariate models which do not allow a full rank matrix V ij . For instance, in the standard multivariate model, it is assumed that the matrix A of coefficients has to be identical for p vectors y i and the vectors y i have to be completely given. If due to a change in the observational program in the case of repeated measurements or due to a loss of measurements, these assumptions are not fulfilled, an incomplete multivariate model results. If all the matrices of coefficients are different, but if p vectors y i of observations agree with their dimension, the variance-covariance matrix Ȉ and the vectors ȟ i of first order parameters can be iteratively estimated.
459
15-1 The multivariate Gauss-Markov model
For example, if the parameters of first order, namely ȟ i , and the parameters of second order, namely V ij , the elements of the variance-covariance matrix, are unknown, we may use the hybrid estimation of first and second order parameters of type {ȟ i , V ij } as outlined in Chapter 3, namely Helmert type simultaneous estimation of {ȟ i , V ij } (B. Schaffrin 1983, p.101). An important generalization of the standard multivariate Gauss-Markov model taking into account constraints, for instance caused by rank definitions, e.g. the datum problem at r epochs, is the multivariate Gauss-Markov model with constraints which we will treat at the end. Definition 15.5 (multivariate Gauss-Markov model with constraints): If in a multivariate model (15.1) and (15.2) the vectors ȟ i of parameters of first order are subject to constraints Hȟ i = w i , (15.19) where H denotes the r × m matrix of known coefficients with the restriction (15.20) H ( A cA) A cA = H, rk H = r d m and w i known r × 1 vectors, then E{y i } = Aȟ i ,
(15.21) (15.22)
D{y i , y j } = I nV ij subject to Hȟ i = w i (15.23) is called “the multivariate Gauss-Markov model with linear homogeneous constraints”. If the p vectors w i are collected in the r × p matrix W , dim W = r × p, the corresponding matrix model reads E{Y} = A;, D{vec Y} = Ȉ
I n , HȄ = W
(15.24)
subject to O{Ȉ} = p × p, O{Ȉ
I n } = np × np, O{vec Y} = np × 1, O{Y} = n × p O{D{vec Y}} = np × np, O{H} = r × m, O{Ȅ} = m × p, O{W} = r × p.
(15.25)
The vector forms E{vec Y} = (I p
A) vec Ȅ, '{vec Y} = Ȉ
I n , vec W = (I p
H) vec Ȅ are equivalent to the matrix forms.
460
15 Special problems of algebraic regression and stochastic estimation
A key result is Lemma 15.6 in which we solve for a given multivariate weight matrix G ij - being equivalent to ( Ȉ
I n ) 1 - a multivariate LESS problem. Theorem 15.6 (multivariate Gauss-Markov model with constraints): A multivariate Gauss-Markov model with linear homogeneous constraints is ( Ȉ, I n ) BLUUE if ˆ = (I
( AcA) Ac) vec Y + Y(I
( AcA) H c( H( AcA) H c) 1 ) vec Y vec Ȅ p p (I p
( A cA) H c) 1 H ( A cA) A c) vec Y (15.26) or ˆ = ( A cA) ( A cY + H c(H ( A cA) H c) 1 ( W H( A cA) A cY)) Ȅ (15.27) An unbiased estimation of the variance-covariance matrix Ȉ is 1 = ˆ Y)c( AȄ ˆ Y) + Ȉ {( AȄ (15.28) nm+r ˆ W)c( H( AcA) Ac) 1 ( HȄ ˆ W)}. + (HȄ
15-2 n-way classification models Another special model is called n-way classification model. We will define it and show how to solve its basic equations. Namely, we begin with the 1-way classification and to continue with the 2- and 3-way classification models. A specific feature of any classification model is the nature of the specific unknown vector which is either zero or one. The methods to solve the normal equation vary: In one approach, one assumption is that the unknown vector of zeros or ones is a fixed effect. The corresponding normal equations are solved by standard MINOLESS, weighted or not. Alternatively, one assumes that the parameter vector consists of random effects. Methods of variance-covariance component estimation are applied. Here we only follow a MINOLESS approach, weighted or not. The interested reader of the alternative technique of variance-covariance component estimation is referred to our Chapter 3 or to the literature, for instance H. Ahrens and J. Laeuter (1974) or S.R. Searle (1971), my favorite. 15-21
A first example: 1-way classification
A one-way classification model is defined by (15.30)
yij = E{ yij } + eij = P + Di + eij
(15.29)
y c := [y1c y c2 ...y cp 1 y cp ], xc := [ P D1 D 2 ...D p 1 D p ]
(15.31)
where the parameters P and Di are unknown. It is characteristic for the model that the coefficients of the unknowns are either one or zero. A MINOLESS
461
15-2 n-way classification models
(Minimum Norms LEast Squares Solution) for the unknown parameters { P , Di } is based on || y - Ax ||I2 = min and || x ||I2 = min x
x
we built around a numerical example. Numerical example: 1-way classification Here we will investigate data concerning the investment on consumer durables of people with different levels of education. Assuming that investment is measured by an index number, namely supposing that available data consist of values of this index for 7 people: Table 15.1 illustrates a very small example, but adequate for our purposes. Table 15.1 (investment indices of seven people): Level of education
number of people
Indices
Total
1 (High School incomplete)
3
74, 68, 77
219
2 (High School graduate) 3 (College graduate) Total
2 2 7
76, 80 85,93
156 178 553
A suitable model for these data is yij = P + Di + eij ,
(15.32)
where yij is investment index of the jth person in the ith education level, P is a general mean, Di is the effect on investment of the ith level of education and eij is the random error term peculiar to yij . For the data of Table 15.1 there are 3 educational levels and i takes the values j = 1, 2,..., ni 1, ni where ni is the number of observations in the ith educational level, in our case n1 = 3, n2 = 2 and n3 = 2 in Table 15.1. Our model is the model for the 1-way classification. In general, the groupings such as educational levels are called classes and in our model yij as the response and levels of education as the classes, this is a model we can apply to many situations. The normal equations arise from writing the data of Table 15.1 in terms of our model equation. ª 74 º ª y11 º ª P + D1 + e11 º « 68 » « y12 » « P + D1 + e12 » «77 » « y13 » « P + D1 + e13 » « 76 » = « y21 » = « P + D 2 + e21 » , O (y ) = 7 × 1 «80 » « y22 » « P + D 2 + e22 » «85 » « y31 » « P + D 3 + e31 » «¬ 93 »¼ « y » « P + D + e » 3 32 ¼ ¬ 32 ¼ ¬
462
15 Special problems of algebraic regression and stochastic estimation
or ª 74 º ª1 « 68 » «1 « » « «77 » «1 « 76 » = y = «1 «80 » «1 «85 » «1 « » « «¬ 93 »¼ «¬1 ª1 «1 «1 A = «1 «1 «1 «¬1
1 1 1 0 0 0 0
0 0 0 1 1 0 0
1 1 1 0 0 0 0
0 0 0 1 1 0 0
0º ª e11 º «e » 0» » ª P º « 12 » 0 » « » « e13 » D 0 » « 1 » + « e21 » = Ax + e y D 0 » « 2 » « e22 » «¬D 3 »¼ « » » 1» « e31 » 1 »¼ «¬ e32 »¼ and
0º 0» ªP º 0» «D » 0 » , x = «D 1 » , O ( A) = 7 × 4, O( x) = 4 ×1 2 » 0 «¬D 3 »¼ 1» 1 »¼
with y being the vector of observations and e y the vector of corresponding error terms. As an inconsistent linear equation y e y = Ax, O{y} = 7 × 1, O{A} = 7 × 4, O{x} = 4 × 1 we pose the key question: ?What is the rank of the design matrix A? Most notable, the first column is 1n and the sum of the other three columns is also one, namely c 2 + c 3 + c 4 = 1n ! Indeed, we have a proof for a linear dependence: c1 = c 2 + c 3 + c 4 . The rank rk A = 3 is only three which differs from O{A} = 7 × 4. We have to build in this rank deficiency. For example, we could postulate the condition x4 = D 3 = 0 eliminating one component of the unknown vector. A more reasonable approach would be based on the computation of the symmetric reflexive generalized inverse such that xlm = ( A cA) rs A cy ,
(15.33)
which would guarantee a least squares minimum norm solution or a V, SBLUMBE solution (Best Linear V-Norm Uniformly Minimum Bias S-Norm Estimation) for V=I, S=I and rk A = rk A cA = rk( A cA) rs = rk A + A cA is a symmetric matrix ( A cA ) rs is a symmetric matrix or called :rank preserving identity: !symmetry preserving identity!
(15.34)
463
15-2 n-way classification models
We intend to compute xlm for our example. Table 15.2: 1-way classification, example: normal equation ª1 1 0 0 º «1 1 0 0 » ª1 1 1 1 1 1 1 º «1 1 0 0 » ª7 3 2 2 º «1 1 1 0 0 0 0 » « «3 3 0 0» A cA = « 1 0 1 0» = « 0 0 0 1 1 0 0» « 2 0 2 0» » «0 0 0 0 0 1 1 » «1 0 1 0 » « 2 0 0 2 » ¬ ¼ 1 0 0 1 ¬ ¼ « » ¬«1 0 0 1 ¼» A cA = DE, O{D} = 4 × 3, O{E} = 3 × 4 ª7 «3 DcD = « 2 «2 ¬
3 3 0 0
2º 0» , E to be determined: 2» » 0¼
DcA cA = DcDE ( DcD) 1 DcAcA = E ª7 3 2 2º DcD = « 3 3 0 0 » «¬ 2 0 2 0 »¼
ª7 «3 «2 «2 ¬
3 3 0 0
2º 66 30 18º 0 » ª« = 30 18 6 » 2» « 18 6 8 »¼ » ¬ 0¼
:compute (DcD) 1 and (DcD) 1 Dc : ª1 0 0 1 º E = (DcD) 1 DcA cA = «0 1 0 1» «¬0 0 1 1 »¼ ( A cA) rs = Ec(EEc) 1 (DcD) 1 Dc = 0 ª 0.0833 « 0 0.2500 =« 0.0417 -0.1250 « ¬« 0.0417 -0.1250
0.0417 -0.1250 0.3333 -0.1667
0.0417 º -0.1250 » -0.1667 » » 0.3333 ¼»
( A cA) rs A c = ª 0.0833 « 0.2500 «-0.0833 « «¬-0.0833
0.0833 0.2500 -0.0833 -0.0833
0.0833 0.2500 -0.0833 -0.0833
0.1250 -0.1250 0.3750 -0.1250
0.1250 -0.1250 0.3750 -0.1250
ª 60.0 º «13.0 » xlm = ( AcA) Acy = « ». «18.0 » ¬« 29.0 ¼» rs
0.1250 -0.1250 -0.1250 0.3750
0.1250 º -0.1250 » -0.1250 » » 0.3750 »¼
464
15 Special problems of algebraic regression and stochastic estimation
Summary The general formulation of our 1-way classification problem is generated by identifying the vector of responses as well as the vector of parameters: Table 15.3: 1-way classification y c := [y11 , y12 ...y1( n 1) y1n | y 21y 22 ...y 2( n 1
1
2
1)
y 2 n | ... | y p1y p 2 ...y p ( n 2
p
1)
y ( pn ) ] p
xc := [ P D1 D 2 ...D p 1 D p ] ª1 «" «1 «1 A := «" 1 «" «1 «" ¬« 1
1 " 1 0 " 0 " 0 " 0
0
0º "» 0» 0 "»» , O( A) = n × ( p + 1) 0 "» 1» "» 1 ¼»
0 1 1 0 0
p
n = n1 + n2 + ... + n p = ¦ ni
(15.35)
i =1
experimental design: number of rank of the number of observations parameters: design matrix: n = n1 + n2 + … + n p
1+ p
1 + ( p 1) = p
(15.36)
:MINOLESS: (15.37)
15-22
|| y - Ax || = min and || x ||2 = min
(15.38)
xlm = ( A cA) rs A cy.
(15.39)
2
x
A second example: 2-way classification without interaction
A two-way classification model without interaction is defined by “MINOLESS” yijk = E{ yijk } + eijk = P + D i + E j + eijk
(15.40)
c , y c21 ,..., y cp 1 1 , y cp1 , y12 c , y c22 ,..., y cp q 1 , y cpq ] y c := [y11 xc = [ P ,D1 ,..., D p , E1 ,..., E q ] (15.42)
|| y - Ax ||I2 = min x
and
(15.41)
|| x ||2 = min . x
(15.43)
The factor A appears in p levels and the factor B in g levels. If nij denotes the number of observations under the influence of the ith level of the factor A and the jth level of the factor B, then the results of the experiment can be condensed
465
15-2 n-way classification models
in Table 15.4. If Di and E y denote the effects of the factors A and B, P the mean of all observations, we receive
P + D i + E j = E{y ijk } for all i {1,..., p}, j {1,..., q}, k {1,..., nij } (15.44) as our model equation. Table 15.4 (level of factors): level of the factor B
1
2
…
q
level of factor A
1
n11
n12
…
n1q
2 … p
n21 … np1
n22 … np2
n2q … npq
If nij = 0 for at least one pair{i, j} , then our experimental design is called incomplete. An experimental design for which nij is equal of all pairs {ij} , is said to be balanced. The data of Table 15.5 describe such a general model of y ijk observations in the ith row (brand of stove) and jth column (make of the pan), P is the mean, Di is the effect of the ith row, E j is the effect of the jth column, and eijk is the error term. Outside the context of rows and columns Di is equivalently the effect due to the ith level of the D factor and E j is the effect due to the jth level of the E factor. In general, we have p levels of the D factor with i = 1,..., p and q levels of the E factor with j = 1,..., q : in our example p = 4 and q = 3 . Table 15.5 (number of seconds beyond 3 minutes, taken to boil 2 quarts of water): Make of Pan number of A B C total mean observations Brand of Stove
X Y Z W
Total number of observations mean
18 — 3 6 27
12 — — 3 15
24 9 15 18 66
3
2
4
9
1 2
7
54 9 18 27 108
3 3 3 3
18 18 18 18
16 12
With balanced data every one of the pq cells in Table 15.5 would have one (or n) observations and n d 1 would be the only symbol needed to describe the number of observations in each cell. In our Table 15.5 some cells have zero observations and some have one. We therefore need nij as the number of observations in
466
15 Special problems of algebraic regression and stochastic estimation
the ith row and jth column. Then all nij = 0 or 1, and the number of observations are the values of q
p
p
q
ni = ¦ nij , n j = ¦ nij , n = ¦¦ nij . j =1
i =1
(15.45)
i =1 j =1
Corresponding totals and means of the observations are shown, too. For the observations in Table 15.5 the linear equations of the model are given as follows, ª18 º ª y11 º ª «12 » « y12 » « « 24 » « y13 » « « 9 » « y23 » « « 3 » = « y31 » = « «15 » « y33 » « « 6 » «y » « « 3 » « y 41 » « «18 » «« y42 »» « ¬ ¼ ¬ 43 ¼ ¬
1 1 1 1 1 1 1 1 1
1 1 1
1
1 1
1 1 1
1 1
1 1 1
º » 1» 1» » 1» » » 1 »¼
ªe º ª P º « e11 » «D1 » « e12 » «D 2 » « 13 » «D 3 » « e23 » «D 4 » + « e31 » , « E1 » « e33 » « E » « e41 » « E 2 » «e42 » ¬ 3 ¼ «e » ¬ 43 ¼
where dots represent zeros. In summary, ª ª18 º « «12 » « « 24 » « «9» «3»=y=« « «15 » « «6» « «3» « «18 » ¬ ¼ ¬ ª1 1 0 0 «1 1 0 0 «1 1 0 0 «1 0 1 0 A = «1 0 0 1 «1 0 0 1 «1 0 0 0 «1 0 0 0 «1 0 0 0 ¬
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1
1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0
0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0
0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 1 1 1
1 0 0 0 1 0 1 0 0
0 1 0 0 0 0 0 1 0
0º 0» 1» 1» 0» 1» 0» 0» 1 »¼
ªe º ª P º « e11 » «D1 » « e12 » «D 2 » « 13 » «D 3 » « e23 » «D 4 » + « e31 » « E1 » « e33 » « E » « e41 » « E 2 » « e42 » ¬ 3 ¼ «e » ¬ 43 ¼
0º ªP º 0» « D1 » 1» «D 2 » 1» «D » 0 » , x = «D 3 » , O( A) = 9 × 8, O(x) = 8 × 1 4 1» « E1 » » 0 «E » 0» « E2 » ¬ 3¼ 1 »¼
with y being the vector of observations and e y the vector of corresponding error terms. As an inconsistent linear equation y - e y = Ax, O{y} = 9 × 1, O{A} = 9 × 8, O{x} = 8 × 1 we pose the key question: ? What is the rank of the design matrix A ? Most notable, the first column is 1n and the sum of the next 4 columns is also 1n as well as the sum of the remaining 3 columns is 1n , too, namely
467
15-2 n-way classification models
c2 + c3 + c4 + c5 = 1n and c6 + c7 + c8 = 1n. The rank rkA = 1 + ( p 1) + ( q 1) = 1 + 3 + 2 = 6 is only six which differs from O{A} = 9 × 8. We have to take advantage of this rank deficiency. For example, we could postulate the condition x5 = 0 and x8 = 0 eliminating two components of the unknown vector. A more reasonable approach would be based on the computation of the symmetric reflexive generalized inverse such that xlm = ( AcA) rs Acy ,
(15.46)
which would guarantee a least square minimum norm solution or a I, I – BLUMBE solution (Best Linear I – Norm Uniformly Minimum Bias I – Norm Estimation) and rk A = rk A cA = rk( A cA) rs = rkA c
(15.47)
rs
A cA is a symmetric matrix ( A cA) is a symmetric matrix or called :rank preserving identity: :symmetry preserving identity:
Table 15.6: 2-way classification without interaction, example: normal equation
ª1 «1 «0 «0 A cA = « «0 «1 «0 «¬ 0
1 1 0 0 0 0 1 0
1 1 0 0 0 0 0 1
1 0 1 0 0 0 0 1
ª9 «3 «1 « = «2 3 «3 «2 «4 ¬
1 0 0 1 0 1 0 0
1 0 0 1 0 0 0 1
3 3 0 0 0 1 1 1
1 0 0 0 1 1 0 0
1 0 0 0 1 0 1 0
1º 0» 0» 0» » 1» 0» 0» 1 »¼
1 0 1 0 0 0 0 1
2 0 0 2 0 1 0 1
3 0 0 0 3 1 1 1
P ª «1 «1 «1 « «1 «1 «1 «1 «1 «1 «¬ 3 1 0 1 1 3 0 0
D1 D 2 D 3 D 4 p 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 n 2 1 0 0 1 0 2 0
4º 1» 1» 1» 1» 0» 0» 4 »¼
E1 E 2 E 3 pº 1 0 0» 0 1 0» 0 0 1» » 0 0 1» 1 0 0» 0 0 1» 1 0 0» 0 1 0» 0 0 1» n »¼
468
15 Special problems of algebraic regression and stochastic estimation
A cA = DE, O{D} = 8 × 6, O{E} = 6 × 8 ª9 «3 «1 D = «2 «3 «3 «¬ 2
3 3 0 0 0 1 1
1 0 1 0 0 0 0
2 0 0 2 0 1 0
3 1 0 1 1 3 0
2º 1» 0» 0» , 1» 0» 2 »¼
E to be determined
DcA cA = DcDE (DcD) 1 DcAcA = E ª133 « 45 « DcD = « 14 29 « 44 «¬ 28
45 14 29 21 4 8 4 3 3 8 3 10 15 3 11 11 2 4
44 28º 15 11 » 3 2» 11 4 » 21 8 » 8 10 »¼
compute (DcD) 1 and (DcD) 1 Dc E = (DcD) 1 DcA cA = ª1.0000 « 0.0000 « 0.0000 =« « 0.0000 « 0.0000 «¬ 0.0000
0.0000 1.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 1.0000 0.0000 0 0
ª 0.0665 -0.0360 « -0.0360 0.3112 « 0.1219 -0.2327 « 0.0166 -0.0923 =« « -0.0360 -0.0222 « 0.0222 -0.0120 « « 0.0748 -0.0822 «¬ -0.0305 0.0582 ª 0.0526 « 0.2632 « 0.0132 « 0.1535 =« «-0.0702 « 0.2675 « «-0.1491 ¬«-0.0658
0.1053 0.1930 0.0263 0.0263 -0.1404 -0.1316 0.3684 -0.1316
0.0000 0.0000 0.0000 1.0000 0.0000 0.0000
( AcA) rs 0.1219 -0.2327 0.8068 -0.2195 -0.2327 0.1240 0.1371 -0.1392
0.0000 0.3333 -0.2500 -0.0833 0.0000 -0.0833 -0.1667 0.2500
1.0000 -1.0000 -1.0000 -1.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 1.0000 0.0000
= Ec(EEc)( DcD) 1 Dc = 0.0166 -0.0360 0.0222 -0.0923 -0.0222 -0.0120 -0.2195 -0.2327 0.1240 0.4208 -0.0923 -0.0778 -0.0923 0.3112 -0.0120 -0.0778 -0.0120 0.2574 0.1020 -0.0822 -0.1417 -0.0076 0.0582 -0.0935
( AcA) rs A = 0.1579 0.1053 -0.2105 -0.1404 0.7895 0.0263 -0.2105 0.3596 -0.2105 -0.1404 0.0526 0.2018 0.0526 0.0351 0.0526 -0.1316
0.0526 -0.0702 -0.2368 0.4298 -0.0702 -0.1491 0.0175 0.1842
0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0748 -0.0822 0.1371 0.1020 -0.0822 -0.1417 0.3758 -0.1593
0.0526 -0.0702 0.0132 -0.1535 0.2632 0.2675 -0.1491 -0.0658
1.0000 º 0.0000 » 0.0000 » » 0.0000 » -1.0000 » -1.0000 »¼
-0.0305 º 0.0582 » -0.1392 » » -0.0076 » 0.0582 » -0.0935 » » -0.1593 » 0.2223 »¼
0.1053 -0.1404 0.0263 0.0263 0.1930 -0.1316 0.3684 -0.1316
0.0000 º 0.0000» -0.2500 » » -0.0833 » 0.3333 » -0.0833 » » -0.1667 » 0.2500 ¼»
469
15-2 n-way classification models
ª « « « xlm = ( A cA) rs A cy = « « « « « ¬«
5.3684 º 10.8421» -6.1579 » » -1.1579 » . 1.8421 » -0.2105 » -4.2105 »» 9.7895 ¼»
Summary The general formulation of our 2-way classification problem without interaction is generated by identifying the vector of responses as well as the vector of parameters. Table 15.7: 2-way classification without interaction c , ..., y cp1 , y12 c ,..., y cpq 1 y cpq ] y c := [y11 xc := [ P , D1 ,..., D p , E1 ,..., E q ] A := [1n , c2 ,..., c p , c p +1 ,..., cq ] subject to c2 + ... + c p = 1, c p +1 + ... + cq = 1 q
p
ni = ¦ nij , n j = ¦ nij , n = j =1
i =1
experimental design: number of number of observations parameters: n=
p,q
¦n
i , j =1
ij
1+ p + q
p, q
¦
i =1, j =1
nij
(15.48)
rank of the design matrix:
1 + ( p 1) + (q 1) = p + q 1
(15.49)
:MINOLESS: (15.50)
|| y Ax || = min 2
x
and
|| x ||2 = min
(15.51)
xlm = ( A cA) rs A cy . 15-23
A third example: 2-way classification with interaction
A two-way classification model with interaction is defined by “MINOLESS” yijk = E{ yijk } + eijk = P + D i + E j + (DE )ij + eijk subject to i {1,… , p}, j {1,… , q}, k {1,… , nij } c ,… , y cp 1 , y cp , y12 c y c22 ,… y cnq 1 y cpq ] y c := [ y11
(15.52)
470
15 Special problems of algebraic regression and stochastic estimation
(15.54)
xc := [ P , D1 ,… , D p , E1 ,… , E q , (DE )11 ,… , (DE ) pq ]
(15.53)
|| y Ax ||2I = min and || x ||2I = min .
(15.55)
x
x
It was been in the second example on 2-way classification without interaction that the effects of different levels of the factors A and B were additive. An alternative model is a model in which the additivity does not hold: the observations are not independent of each other. Such a model is called a model with interaction between the factors whose effect (DE )ij has to be reflected by means of ªi {1,… ,p} P + D i + E j + (DE )ij = E{ yijk } for all « j {1,… ,q} (15.56) « k {1,… ,nij } ¬ like our model equation. As an example we consider by means of Table 15.8 a plant breeder carrying out a series of experiments with three fertilizer treatments on each of four varieties of grain. For each treatment-by-variety combination when he or she plants several 4c × 4c plots. At harvest time she or he finds that many of the plots have been lost due to being wrongly ploughed up and all he or she is left with are the data of Table 15.8. Table 15.8 (weight of grain form 4c × 4c trial plots): Variety Treatment 1 2 3 4 1 8 12 7 13 11 9 30 12 18 2
y11 (n11 ) 6 12 18
3
-
Totals
48
y31 (n13 )
y41 ( n14 )
12 14 26
-
-
9 7
14 16
16 42
30 42
Totals
60
44 10 14 11 13 48 66
94 198
With four of the treatment-in-variety combinations there are no data at all, and with the others there are varying numbers of plots, ranging from 1 to 4 with a total of 18 plots in all. Table 15.8 shows the yield of each plot, the total yields, the number of plots in each total and the corresponding mean, for each treatmentvariety combination having data. Totals, numbers of observations and means are also shown for the three treatments, the four varieties and for all 18 plots. The symbols for the entries in the table, are also shown in terms of the model.
471
15-2 n-way classification models
The equations of a suitable linear model for analyzing data of the nature of Table 15.8 is for yijk as the kth observation in the ith treatment and jth variety. In our top table, P is the mean, D i is the effect of the ith treatment, E j is the effect of the jth variety, (DE )ij is the interaction effect for the ith treatment and the jth variety and A ijk is the error term. With balanced data every one of pq cells of our table would have n observations. In addition there would be pq levels of the (DE ) factor, the interaction factor. However, with unbalanced data, when some cells have no observations they are only as many (DE )ij - levels in the data as there are non-empty cells. Let the number of such cells be s (s = 8 in Table 15.8). Then, if nij is the number of observations in the (i, j)th cell of type “treatment i and variety j”, s the number of cells in which nij z 0 , in all other cases nij > 0 . For these cells nij
yij = ¦ yijk , yij = yij / nij
(15.57)
k =1
is the total yield in the (i, j)th cell, and yij is the corresponding mean. Similarly, p
q
p ,q
i =1
j =1
i , j =1
y = ¦ yi = ¦ y j =
¦
p , q , nij
yij =
¦
yijk
(15.58)
i =1, j =1, k =1
is the total yield for all plots, the number of observations called “plots” therein being p
q
p,q
n = ¦ ni = ¦ n j = ¦ nij . i =1
j =1
(15.59)
i, j
We shall continue with the corresponding normal equations being derived from the observational equations. (DE )
P D1 D 2 D 3 E1 E 2 E 3 E 4 11 13 14 21 22 31 33 34 ª e111 º ª 8 º ª y111 º ª1 1 1 1 º y « » « e112 » P ª º «1 1 1 1 » «13» 112 « 9 » « y113 » «1 1 1 1 » « D1 » « e113 » «12 » « y131 » «1 1 1 » « D 2 » « e131 » « 7 » « y141 » «1 1 1 » « D 3 » « e141 » «11» « y142 » «1 1 1 » « E1 » « e142 » « 6 » « y211 » «1 1 1 1 » « E 2 » « e211 » «12 » « y212 » «1 1 1 1 » « E 3 » «e212 » «12 » « y » «1 1 1 1 » « E » « e » «14 » = « y 221 » = «1 1 1 1 » « (DE4) » + «e221 » « 9 » « y222 » «1 1 1 1 » « (DE )11 » « e222 » « 7 » « y321 » «1 1 1 1 » « (DE )13 » « e321 » 14 » « 322 » » « « » « 322 » « 14 1 1 1 1 DE ( ) y « » « e331 » 21 » 31 3 « « » « » «16 » « y332 » «1 1 1 1 » « (DE ) 22 » « e332 » «10 » « y341 » «1 1 1 1» « (DE )32 » « e341 » «14 » « y342 » «1 1 1 1» « (DE )33 » « e342 » «11» « y343 » «1 1 1 1» «¬ (DE )34 »¼ « e343 » ««¬ e344 ¼»» ¬«13¼» ««¬ y344 »»¼ ¬«1 1 1 1¼»
472
15 Special problems of algebraic regression and stochastic estimation
where the dots represent zeros. ª18 «6 «4 «8 «5 «4 «3 «6 «3 «1 «2 « «2 «2 «2 «2 ¬« 4
6 6 3 1 2 3 1 2
4 4 2 2 2 2
8 8 2 2 4 2 2 4
5 3 2 5 3 2
4 2 2 4 2 2
3 1 2 3 1 2
6 2 4 6 2 4
3 3 3 3
1 1 1 1
2 2 2 2
2 2 2 2
2 2 2 2
2 2 2 2
2 2 2 2
4 º ª P º ª y º ª198º » « D1 » « y1 » « 60 » » « D 2 » « y2 » « 44 » 4 » « D 3 » « y3 » « 94 » » « E1 » « y1 » « 48 » » « E 2 » « y2 » « 42 » » «« E 3 »» «« y3 »» « 42 » 4 » « E 4 » = « y4 » = « 66 » ~ AcAx = Acy. A » « (DE )11 » « y11 » « 30 » » « (DE )13 » « y13 » « 12 » »» « (DE )14 » « y14 » «« 18 »» » « (DE ) 21 » « y21 » « 18 » » «(DE ) 22 » « y22 » « 26 » » « (DE )32 » « y32 » « 16 » » « (DE )33 » « y33 » « 30 » 4 ¼» «¬ (DE )34 »¼ «¬ y34 »¼ ¬« 48 ¼»
Now we again pose the key question: ?What is the rank of the design matrix A? The first column is 1n and the sum of other columns is c2 + c3 + c4 = 1n and c5 + c6 + c7 + c8 = 1n . How to handle the remaining sum (DE )... of our incomplete model? Obviously, we experience rk[c9 ,… , c16 ] = 8 , namely rk[(DE )ij ] = 8 for (DE )ij {J 11 ,… , J 34 }.
(15.60)
As a summary, we have computed rk( AcA) = 8 , a surprise for our special case. A more reasonable approach would be based on the computation of the symmetric reflexive generalized inverse such that (15.61) x Am = ( A cA ) rs A cy , which would assure a minimum norm, least squares solution or a I, I – BLUMBE solution (Best Linear I – Norm Uniformly Minimum Bias I – Norm Estimation ) and rkA = rkA cA = rk(A cA ) rs = rkA +
(15.62)
rs
A cA is a symmetric matrix ( A cA ) is a symmetric matrix or called :rank preserving identity: !symmetry preserving identity! Table 15.9 summarizes all the details of 2-way classification with interaction. In general, for complete models our table lists the general number of parameters and the rank of the design matrix which differs from our incomplete design model.
473
15-2 n-way classification models
Table 15.9: 2-way classification with interaction c ,… , y cp1 , y12 c ,… , y cpq 1 , y cpq ] y c := [ y11 x c := [ P , D1 ,… , D p , E1 ,… , E q , (DE )11 ,… , (DE ) pq ] A := [1n , c1 ,… , c p , c p +1 ,… , cq , c11 ,… , c pq ] subject to c2 + … + c p = 1 , c p +1 + … + cq = 1,
p,q
¦c
i , j =1
p
q
p ,q
i =1
j =1
i, j
ij
= ( p 1)(q 1)
n = ¦ ni = ¦ n j = ¦ ni , j experimental design: number of observations p,q
n = ¦ ni , j
number of parameters:
1 + ( p 1) + (q 1) + (15.63) + ( p 1)(q 1)
1 + p + q + pq
i, j
(15.64)
rank of the design matrix:
|| y Ax ||2 = min and || x ||2 = min x
(15.65)
x
x Am = ( A cA ) rs A cy .
(15.66)
For our key example we get from the symmetric normal equation A cAx A = A cy the solution x Am = ( A cA ) rs A cy given A cA and A cy O{A cA} = 16 × 16, O{P ,… , (DE )31 } = 16 × 1, O{A cy} = 16 × 1 A cA = DE, O{D} = 18 × 12, O{E} = 12 × 16 ª3 «3 «0 «0 «3 «0 «0 « D = «0 3 «0 «0 «0 «0 «0 «0 «0 ¬«
1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0
2 2 0 0 0 0 0 2 0 0 2 0 0 0 0 0
2 0 2 0 2 0 0 0 0 0 0 2 0 0 0 0
2 0 2 0 0 2 0 0 0 0 0 0 2 0 0 0
2 0 0 2 0 2 0 0 0 0 0 0 0 2 0 0
2 0 0 2 0 0 2 0 0 0 0 0 0 0 2 0
4º 0» 0» 4» 0» 0» 0» 4» 0» 0» 0» 0» 0» 0» 0» 4 »¼»
474
15 Special problems of algebraic regression and stochastic estimation
ª1.0000 «1.0000 « «1.0000 «1.0000 «1.0000 «1.0000 « «1.0000 «¬1.0000
DcA cA = DcDE ( DcD) 1 DcA cA = E E = (DcD) 1 DcA cA = 1.0000 0.0000 0.0000 1.0000 0.0000 0.0000 º 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 » » 0.0000 » 1.0000 0.0000 0.0000 0.0000 0 0.0000 1.0000 0.0000 1.0000 0.0000 0.0000 » 0 1.0000 0.0000 0 1.0000 0.0000 » » 0 0 1.0000 0 1.0000 0 » 0.0000 0.0000 1.0000 0 0.0000 1.0000 » »¼ 0 0.0000 1.0000 0 0.0000 0
x Am = ( A cA) rs A cy = = [6.4602, 1.5543, 2.4425, 2.4634, 0.6943, 1.0579, 3.3540, 1.3540, 1.2912, 0.6315, -0.3685, -0.5969, 3.0394, -1.9815, 2.7224,1.72245]c . 15-24
Higher classifications with interaction
If we generalize 1-way and 2-way classifications with interactions we arrive at a higher classification of type P + Di + E j + J k + … + +(DE )ij + (DJ )ik + (DE ) jk + … + (DEJ )ijk + … = E{y ijk …A }
(15.67)
for all i {1,… , p}, j {1,… , q}, k {1,… , r},… , A {1,… , nijk ,…}. An alternative stochastic model assumes a fully occupied variance – covariance matrix of the observations, namely D{y} E{[ y E{y}][ y E{y}]c} Ȉij . Variance – covariance estimation techniques are to be applied. In addition, a mixed model for the effects are be applied, for instance of type E{y} = Aȟ + &E{]} D{y} = CD{z}Cc.
(15.68) (15.69)
Here we conclude with a discussion of what is unbiased estimable: Example: 1–way classification If we depart from the model E{y ij } = P + D i and note rkA = rkA cA = 1 + ( p 1) for i {1,… , p}, namely rkA = p, we realize that the 1 + p parameters are not unbiased estimable: The first column results from the sum of the other columns. It is obvious the difference Di D1 is unbiased estimable. This difference produces a column matrix with full rank. Summary Di D1 quantities are unbiased estimable
475
15-2 n-way classification models
Example: 2–way classification with interaction Our first statement relates to the unbiased estimability of the terms D1 ,… , D p and E1 ,… , E q : obviously, the differences Di D1 and E j E1 for i, j < 1 are unbiased estimable. The first column is the sum of the other terms. For instance, the second column can be eliminated which is equivalent to estimating Di D1 in order to obtain a design matrix of full column rank. The same effect can be seen with the other effect E j for the properly chosen design matrix: E j E1 for all j > 1 is unbiased estimable! If we add the pq effect (DE )ij of interactions, only those interactions increase the rank of the design matrix by one respectively, which refer to the differences Di D1 and E j E1 , altogether ( p 1)( q 1) interactions. To the effect (DE )ij of the interactions are estimable, pq ( p 1)( q 1) = p + q 1 constants may be added, that is to the interactions (DE )i1 , (DE )i 2 ,… , (DE )iq with
i {1,… , p}
the constants '(DE1 ), '(DE 2 ),…, '(DE q ) and to the interactions (DE ) 2 j , (DE )3 j ,… , (DE ) pj with j {1,… , q} the constants '(DE 2 ), '(DE 3 ),… , '(DE p ). The constants '(DE1 ) need not to be added which can be interpreted by '(DE1 ) = '(DE1 ). A numerical example is p = 2, q = 2 xc = [ P , D1 , D 2, E1 , E 2 , (DE )11 , (DE )12 , (DE ) 22 ]. Summary 'D = D 2 D1 , 'E = E 2 E1 for all i {1, 2}, j {1, 2} as well as '(DE1 ), '(DE 2 ), '(DE 2 ) are unbiased estimable.
(15.70) (15.71)
476
15 Special problems of algebraic regression and stochastic estimation
At the end we review the number of parameters and the rank of the design matrix for a 3–way classification with interactions according to the following example. 3–way classification with interactions experimental design: number of number of observations parameters:
n=
p,q,r
¦
i , j , k =1
nijk
rank of the design matrix:
1 + ( p 1) + (q 1) + (r 1) 1+ p + q + r + +( p 1)(q 1) + (15.72) + pq + pr + qr + +( p 1)(r 1) + (q 1)(r 1) + + pqr ( p 1)(q 1)(r 1) = pqr
15-3 Dynamical Systems There are two essential items in the analysis of dynamical systems: First, there exists a “linear or liniarized observational equation y (t ) = Cz(t ) ” connecting a vector of stochastic observations y to a stochastic vector z of so called “state variables”. Second, the other essential is the characteristic differential equation of type “ zc(t ) = F (t , z(t )) ”, especially linearizied “ zc(t ) = Az(t ) ”, which maps the first derivative of the “state variable” to the “state variable” its off. Both, y (t ) and z(t ) are functions of a parameter, called “time t”. The second equation describes the time development of the dynamical system. An alternative formulation of the dynamical system equation is “ z(t ) = Az(t 1) ”. Due to the random nature “of the two functions “ y (t ) = Cz(t ) ” and zc = Az ” the complete equations read (15.73) E{y (t )} = CE{z} (15.76) E{zc(t )} = AE{z(t )}
and
and
V{e y (t1 ), e y (t2 )} = Ȉ y (t1 , t2 )
(15.74)
D{e y (t )} = Ȉ y (t ),
(15.75)
D{e z (t )} = Ȉ z (t ), V{e z (t1 ), e z (t2 )} = Ȉz (t1 , t2 )
(15.77) (15.78)
Here we only introduce “the time invariant system equations” characterized by A (t ) = A. zc(t ) abbreviates the functional dz(t ) / dt. There may be the case that the variance-covariance functions Ȉ y (t ) and Ȉ z (t ) do not change in time: Ȉ y (t ) = Ȉ y , Ȉ z (t ) = Ȉ z equal a constant. Various models exist for the variancecovariance functions Ȉ y (t1 , t2 ) and Ȉ z (t1 , t2 ) , e.q. linear functions as in the case of a Gauss process or a Brown process Ȉ y (t2 t1 ) , Ȉz (t2 t1 ) . The analysis of dynamic system theory was initiated by R. E. Kalman (1960) and by R. E. Kalman and R. S. Bucy (1961): “KF” stand for “Kalman filtering”. Example 1 (tracking a satellite orbit):
477
15-3 Dynamical Systems
Tracking a satellite’s orbit around the Earth might be based on the unknown state vector z(t ) being a function of the position and the speed of the satellite at time t with respect to a spherical coordinate system with origin at the mass center of the Earth. Position and speed of a satellite can be measured by GPS, for instance. If distances and accompanying angles are measured, they establish the observation y (t ) . The principles of space-time geometry, namely mapping y (t ) into z(t ) , would be incorporated in the matrix C while e y (t ) would reflect the measurement errors at the time instant t. The matrix A reflects the situation how position and speed change in time according the physical lows governing orbiting bodies, while ez would allow for deviation from the lows owing to factors as nonuniformity of the Earth gravity field. Example 2 (statistical quality control): Here the observation vector y (t ) is a simple approximately normal transformation of the number of derivatives observed in a sample obtained at time t, while y1 (t ) and y2 (t ) represent respectively the refractive index of the process and the drift of the index. We have the observation equation and the system equations z (t ) = z2 (t ) + ez1 y (t ) = z1 (t ) + ey (t1 ) and 1 z2 (t ) = z2 (t 1) + ez 2 . In vector notation, this system of equation becomes z(t ) = Az(t 1) + e z namely ª z (t ) º ª1 1º ª ez º ª 0 1º z(t ) = « 1 » , e z = « « », A = « » » ¬ 0 1¼ «¬ ez »¼ ¬ 0 1¼ ¬ z2 ( t ) ¼ 1
2
do not change in time. If we examine y (t ) y (t 1) for this model, we observe that under the assumption of constant variance, namely e y (t ) = e y and e z (t ) = e z , the autocorrelation structure of the difference is identical to that of an ARIMA (0,1,1) process. Although such a correspondence is sometimes easily discernible, we should in general not consider the two approaches to be equivalent. A stochastic process is called an ARMA process of the order ( p, q ) if z (t ) = a1z (t 1) + a2 z (t 2) + … + a p z (t p ) = = b0u(t ) + b1u(t 1) + b2u(t 2) + … + bq u(t q ) for all t {1,… , T } also called a mixed autoregressive/moving - average process.
(15.79)
478
15 Special problems of algebraic regression and stochastic estimation
In practice, most time series are non-stationary. In order to fit a stationary model, it is necessary to remove non-stationary sources of variation. If the observed time series is non-stationary in the mean, then we can use the difference of the series. Differencing is widely used in all scientific disciplines. If z (t ), t {1,… , T } , is replaced by d z (t ) , then we have a model capable of describing certain types of non-stationary signals. Such a model is called an “integrated model” because the stationary model that fits to the difference data has to the summed or “integrated” to provide a model for the original non-stationary data. Writing W (t ) = d z (t ) = (1 B ) d z (t )
(15.80)
for all t {1,… , T } the general autoregressive integrated moving-average (ARIMA) process is of the form ARIMA W (t ) = D1W (t 1) + … + D pW (t p ) + b0 u(t ) + … + bq u(t q) (15.81) or ĭ( B )W(t ) = ī( B )
(15.82)
ĭ( B)(1 B) d z (t ) = ī( B)u (t ).
(15.83)
Thus we have an ARMA ( p, q ) model for W (t ) , t {1,… , T } , while the model for W (t ) describing the dth differences for z (t ) is said to be an ARIMA process of order ( p, d , q) . For our case, ARIMA (0,1,1) means a specific process p = 1, = 1, q = 1 . The model for z (t ), t {1,… , T } , is clearly nonstationary, as the AR operators ĭ( B )(1 q) d has d roots on the unit circle since patting B = 1 makes the AR operator equal to zero. In practice, first differencing is often found to the adequate to make a series stationary, and accordingly the value of d is often taken to be one. Note the random part could be considered as an ARIMA (0,1,0) process. It is a special problem of time-series analysis that the error variances are generally not known a priori. This can be dealt with by guessing, and then updating then is an appropriate way, or, alternatively, by estimating then forming a set of data over a suitable fit period. In the state space modeling, the prime objective is to predict the signal in the presence of noise. In other words, we want to estimate the m × 1 state vector E{z(t )} which cannot usually be observed directly. The Kalman filter provides a general method for doing this. It consists of a set of equations that allow us to update the estimate of E{z(t )} when a new observation becomes available. We will outline this updating procedure with two stages, called
479
15-3 Dynamical Systems
Ɣ Ɣ
the prediction stage and the updating stage.
Suppose we have observed a univariate time series up to time (t 1) , and that E{z (t 1)} is “the best estimator” E{z (t 1)} based on information up to this time. For instance, “best” is defined as an PLUUP estimator. Note that z (t ), z (t 1) etc is a random variable. Further suppose that we have evaluated the m × m variance-covariance matrix of E{zn (t 1)} which we denote by P{t 1}. The first stage called the prediction stage is concerned with forecasting E{z (t )} from data up to the time (t 1) , and we denote the resulting estimator in a obvious notation by E{zn (t ) | z (t 1)} . Considering the state equations where D{ez (t 1)} is still unknown at time (t 1), the obvious estimator for E{z (t )} is given by E{z(t ) | zˆ (t 1)} = G (t ) E{zn (t 1)}
(15.84)
and the variance-covariance matrix V{t | t 1} = G (t ) V{t 1}G c + W {t}
(15.85)
called prediction equations. The last equations follows from the standard results on variance-covariance matrices for random vector variables. When the new observation at time t, namely when y (t ) has been observed the estimator for E{z(t )} can be modified to take account of the extra information. At time (t 1) , the best forecast of y (t ) is given by hc E{z(t ) | zˆ (t 1)} so that the prediction error is given by (15.86) eˆy (t ) = y (t ) hc(t ) E{z (t ) | z (t 1)}. This quantity can be used to update the estimate of E{z (t )} and its variancecovariance matrix. E{zˆ (t )} = E{z(t ) | zˆ (t 1)} + K (t )eˆ y
(15.87)
V{t} = V{t | t 1} K (t )hc(t ) V{t | t 1}
(15.88)
V{t} = V{t , t 1}hc(t ) /[hc(t ) V{t | t 1}h + V n2 ] .
(15.89)
V{t} is called the gain matrix. In the univariate case, K (t ) is just a vector of size ( m 1) . The previous equations constitute the second updating stage of the Kalman filter, thus they are called the updating equations. A major practical advantage of the Kalman filter is that the calculations are recursive so that, although the current estimates are based on the whole past history of measurements, there is no need for an ever expanding memory. Rather the near estimate of the signal is based solely on the previous estimate and the latest observations. A second advantage of the Kalman filter is that it converges fairly
480
15 Special problems of algebraic regression and stochastic estimation
quickly when there is a constant underlying model, but can also follow the movement of a system where the underlying model is evolving through time. For special cases, there exist much simpler equations. An example is to consider the random walk plus noise model where the state vector z(t ) consist of one state variable, the current level ȝ(t ) . It can be shown that the Kalman filter for this model in the steady state case for t o f reduces the simple recurrence relation ˆ t) , ˆ t ) = ȝ( ˆ t 1) + D e( ȝ( where the smoothing constant D is a complicated function of the signal-to-noise ratio ı 2w ı n2 . Our equation is simple exponential smoothing. When ı 2w tends to zero, ȝ(t ) is a constant and we find that D o 0 would intuitively be expected, while as ı 2w ı n2 becomes large, then D approaches unity. For a multivariate time series approach we may start from the vector-valued equation of type E{y (t )} = CE{z} ,
(15.90)
where C is a known nonsingular m × m matrix. By LESS we are able to predict E{zn (t )} = C1E{n y (t )} .
(15.91)
Once a model has been put into the state-space form, the Kalman filter can be used to provide estimates of the signal, and they in turn lead to algorithms for various other calculations, such as making prediction and handling missing values. For instance, forecasts may be obtained from the state-space model using the latest estimates of the state vector. Given data to time N, the best estimate of the state vector is written E{zn ( N )} and the h-step-ahead forecast is given by E{yn ( N )} = hc( N + h) E{zn ( N + h)} = = h( N + h)G ( N + h)G{N + h 1}… G ( N + 1) E{zn ( N )}
(15.92)
where we assume h ( N + h) and future values of G (t ) are known. Of course, if G (t ) is a constant, say G, then we get E{yn ( N | h)} = hc( N + h)G h E{zn ( N )}.
(15.93)
If future values of h(t ) or G(t ) are not known, then they must themselves be forecasted or otherwise guessed. Up to this day a lot of research has been done on nonlinear models in prediction theory relating to state-vectors and observational equations. There are excellent reviews, for instance by P. H. Frances (1988), C. W. J. Granger and P. Newbold
481
15-3 Dynamical Systems
(1986), A. C. Harvey (1993), M. B. Priestley (1981, 1988) and H. Tong (1990). C. W. Granger and T. Teräsvirta (1993) is a more advanced text. In terms of the dynamical system theory we regularly meet the problem that the observational equation is not of full column rank. A state variable leads to a relation between the system input-output solution, especially a statement on how a system is developing in time. Very often it is reasonable to switch from a state variable, in one reference system to another one with special properties. Let T this time be a similarity transformation, namely described by a non-singular matrix of type z := Tz z = T -1z
(15.94)
d z = T 1ATz + T 1Bu(t ), z 0 = T 1z 0 dt
(15.95)
y (t ) = CTz (t ) + Du(t ).
(15.96)
The key question is now whether to the characteristic state equation there belongs a transformation matrix such that for a specific matrix A and B there exists an integer number r, 0 d r < n , of the form ª A A = « 11 ¬ 0
º A12 r × (n r ) º ª r×r , O{A } = «
» A 22 ¼ ¬(n 1) × r (n r ) × ( n r ) ¼»
ª B º ª q×r º B = « 1 » , O{B } = « ». ¬ q × (n r ) ¼ ¬0 ¼ In this case the state equation separates in two distinct parts. d
z1 (t ) = A11 z1 + A12 z 2 (t ) + B1 h(t ), z1 (0) = z10 dt d z 2 (t ) = A 22 z 2 (t ), z 2 (0) = z 20 . dt
(15.97) (15.98)
The last n r elements of z cannot be influenced in its time development. Influence is restricted to the initial conditions and to the eigen dynamics of the partial system 2 (characterized by the matrix A 22 ). The state of the whole system cannot be influenced completely by the artificially given point of the state space. Accordingly, the state differential equation in terms of the matrix pair (A, B) is not steerable. Example 3: (steerable state): If we apply the dynamic matrix A and the introductory matrix of a state model of type
482
15 Special problems of algebraic regression and stochastic estimation
ª 4 « A=« 3 1 « «¬ 3
2º 3», B = ª 1 º , « 0.5» 5» ¬ ¼ » 3 »¼
we are led to the alternative matrices after using the similarity transformation ª 1 A = « ¬ 0
1º ª1º , B =« ». 2 »¼ ¬ 0¼
If the initial state is located along the z1 -axis, for instance z20 = 0 , then the state vector remains all times along this axis. It is only possible to move this axis along a straight line “up and down”.
In case that there exists no similarity transformation we call the state matrices (A, B) steerable. Steerability of a state differential equation may be tested by Lemma 15.7 (Steerability): The pair (A, B) is steerable if and only if rk[BAB … A n 1B] = rkF( A, B) = n.
(15.99)
F( A, B) is called matrix of steerability. If its rank r < n , then there exists a transformation T such that A = T 1AT and B = T 1B has the form
ª A A12 º A = « 11 ,
» ¬ 0 A 22 ¼
and ( A11 , B1 ) is steerable.
(15.100)
ª B º B = « 1 » ¬0 ¼
(15.101)
Alternatively we could search for a transformation matrix T such that transforms the dynamic matrix and the exit matrix of a state model to the form ª A A = « 11
¬ A 21
r × (n r ) º 0 º ª r×r , O{A } = « »
» A 22 ¼ ¬( n 1) × r ( n r ) × ( n r ) ¼
C = [C1 , 0], O{C } = [r × p, (n r ) × p ]. In this case the state equation and the observational equations read d
z1 (t ) = A11 z1 (t ) + B1 u(t ), z1 (0) = z10 dt
(15.102)
d z 2 (t ) = A 21z1 (t ) + A 22 z 2 (t ) + B 2u( t ), z 2 (0) = z 20 dt
(15.103)
y (t ) = C 2 z1 (t ) + Du (t ).
(15.104)
483
15-3 Dynamical Systems
The last n r elements of the vector z are not used in the exit variable y. Since they do not have an effect to z1 , the vector g contains no information of the component of the state vector. This state moves in the n r dimensional subspace of \ n without any change in the exit variables. Our model (C, A) is in this case called non-observable. Example 4: (observability): If the exit matrix and the dynamic matrix of a state model can be characterized by the matrices 2º ª4 ª 0 1º C=« », A = « , 3¼ ¬ 2 3»¼ ¬5 an application of the transformation matrix T leads to the matrices 0º ª 1 C = [1, 0] , A = « ». ¬ 1 2 ¼ For an arbitrary motion of the state in the direction of the z 2 axis has no influence on the existing variable. If there does not exist a transformation T, we call the state vector observable. A rank study helps again! Lemma 15.8 (Observability test): The pair (C, A) is observable if and only if ªC º «CA » » = rkG(C, A ) = n. (15.105) rk « «# » « n 1 » «¬C »¼ G(C, A ) is called observability matrix. If its rank r < n , then there exists a transformation matrix T such that A = T 1AT and C = CT is of the form ª A A = « 11
¬ A 21
r × (n r ) º 0 º ª r×r , O{A } = « »
» A 22 ¼ ¬( n 1) × r ( n r ) × ( n r ) ¼
C = [C1 , 0], O{C } = [r × p, (n r ) × p ]
(15.106) (15.107)
and C1 , A11 is observable. With Lemma 15.7 and Lemma 15.8 we can only state whether a state model is steerable or observable or not, or which dimension has a partial system being classified as non-steerable and non-observable. In order to determine which part of a system is non-steerable or non-observable - which eigen motion is not ex-
484
15 Special problems of algebraic regression and stochastic estimation
cited or non-observable – we have to be able to construct proper transformation matrices T. A tool is the PBH – test we do not analyze here. Both the state differential equation as well as the initial equation we can Laplace transform easily. We only need the relations between the input, output and state variable via polynom matrices. If the initial conditions z0 vanish, we get the Laplace transformed characteristical equations ( sI n A ) z( s ) = Bu( s )
(15.108)
= Cz ( s ) + Du( s ).
y (s)
(15.109)
For details we recommend to check the reference list. We only refer to solving both the state differential equation as well as the initial equation: Eliminating the state vector z( s ) lead us to the algebraic relation between u( s ) and y ( s ) : G( s ) = C( sI n A ) 1 B + D
(15.110)
or (15.111)
§ ªI G( s ) = ¬ªC C ¼º ¨ s « r ¨ ¬0 ©
1
2
0 º ª A11 « I n r »¼ ¬ 0
1
º · ª B1 º A12 ¸ « »+D » A 22 ¼ ¸¹ ¬ 0 ¼
1 = C1 ( sI n A11 ) B1 + D.
Recently, the topic of chaos has attracted much attention. Chaotic behavior arises from certain types of nonlinear models, and a loose definition is apparently random behavior that is generated by a purely deterministic, nonlinear system. Refer to the contributions of K.S. Chan and H. Tong (2001), J. Gleick (1987), V. Isham (1983), H. Kants and I. Schreiber (1997).
Appendix A: Matrix Algebra As a two-dimensional array we define a quadratic and rectangular matrix. First, we review matrix algebra with respect to two inner and one external relation, namely multiplication of a matrix by a scalar, addition of matrices of the same order, matrix multiplication of type Cayley, Kronecker-Zehfuss, Khati-Rao and Hadamard. Second, we introduce special matrices of type symmetric, antisymmetric, diagonal, unity, null, idempotent, normal, orthogonal, orthonormal (special facts of representing a 2×2 orthonormal matrix, a general nxn orthonormal matrix, the Helmert representation of an orthonormal matrix with examples, special facts about the representation of a Hankel matrix with examples, the definition of a Vandermonde matrix), the permutation matrix, the commutation matrix. Third, scalar measures like rank, determinant, trace and norm. In detail, we review the Inverse Partitional Matrix /IPM/ and the Cayley inverse of the sum of two matrices. We summarize the notion of a division algebra. A special paragraph is devoted to vector-valued matrix forms like vec, vech and veck. Fifth, we introduce the notion of eigenvalue-eigenvector decomposition (analysis versus synthesis) and the singular value decomposition. Sixth, we give details of generalized inverse, namely g-inverse, reflexive g-inverse, reflexive symmetric ginverse, pseudo inverse, Zlobec formula, Bjerhammar formula, rank factorization, left and right inverse, projections, bordering, singular value representation and the theory solving linear equations.
A1 Matrix-Algebra A matrix is a rectangular or a quadratic array of numbers, ª a11 «a « 21 A := [aij ] = « ... « « an 11 «¬ an1
a12 a22 ...
... ... ...
a1m 1 a2 m 1 ...
an 12 ... an 1m1 an 2 ... anm 1
a1m a2 m ...
º » » » , aij \,[ aij ] \ n×m . » an 1m » anm »¼
The format or “order” of A is given by the number n of rows and the number of the columns, O( A) := n × m. Fact: Two matrices are identical if they have identical format and if at each place (i, j) are identical numbers, namely ª i {1,..., n} A = B aij = bij « ¬ j {1,..., m}.
486
Appendix A: Matrix Algebra
Beside the identity of two matrices the transpose of an m × n matrix A = [aij ] is the m × n matrix ǹc = [a ji ] whose ij element is a ji . Fact: ( Ac)c = A. A matrix algebra is defined by the following operations: • multiplication of a matrix by a scalar (external relation) • addition of two matrices of the same order (internal relation) • multiplication of two matrices (internal relation) Definition (matrix additions and multiplications): (1) Multiplication by a scalar ǹ = [aij ], D \ D A = AD = [D aij ] . (2) Addition of two matrices of the same order A = [aij ], B = [bij ] A + B := [aij + bij ] A + B = B + A (commutativity) (A + B) + C = A + (B + C) (associativity) A B = A + ( 1)B (inverse addition). Compatibility (D + E )A = D A + E A º distributivity D ( A + B) = D A + D B »¼ ( A + B)c = A c + Bc. (3) Multiplication of matrices 3(i) “Cayley-product” (“matrix-product”) ª A = [aij ], O( A) = n × l º « B = [b ], O(B) = l × m » ij ¬ ¼ l
C := A B = [cij ] := ¦ aik bkl , O(C) = n × m k =1
3(ii) “Kronecker-Zehfuss-product” A = [aij ], O( A) = n × m º B = [bij ], O(B) = k × l »¼
487
A1 Matrix-Algebra
C := B
A = [cij ], B
A := [bij A], O(C) = O(B
A) = kn × l 3(iii) “Khatri-Rao-product” (of two rectangular matrices of identical column number) A = [a1 ,..., am ], O ( A) = n × m º B = [b1 ,..., bm ], O (B) = k × m »¼ C := B : A := [b1
a1 ,… , bm
am ], O(C) = kn × m 3(iv) “Hadamard-product” (of two rectangular matrices of the same order; elementwise product) G = [ gij ], O(G ) = n × m º H =[hij ], O(H ) = n × m »¼ K := G H = [kij ], kij := gij hij , O(K ) = n × m . The existence of the product A B does not imply the existence of the product B A . If both products exist, they are in general not equal. Two quadratic matrices A and B, for which holds A B = B A , are called commutative. Laws (i)
(A B) C = A (B C) A ( B + C) = A B + A C ( A + B) C = A C + B C ( A B)c = ( Bc A c) .
(ii) ( A
B)
C = A
( B
C) = A
B
C ( A + B )
C = ( A
B ) + ( B
C) A
( B + C) = ( A
B) + ( A
C) ( A
B ) ( C
D ) = ( A C)
( B D ) ( A
B )c = A c
B c . (iii) ( A : B) : C = A : ( B : C) = A : B : C ( A + B ) : C = ( A : C ) + ( B : C) A : ( B + C) = ( A : B) + ( A : C) ( A C) : (B D) = ( A : B) (C : D) A : (B D) = ( A : B) D, if dij = 0 for i z j.
488
Appendix A: Matrix Algebra
The transported Khatri-Rao-product generates a row product which we do not follow here. (iv)
A B = B A ( A B ) C = A ( B C) = A B C ( A + B ) C = ( A C ) + ( B C) ( A1 B1 C1 ) ( A 2 B 2 C2 ) = ( A1 : A 2 )c ( B1
B 2 ) (C1 : C2 ) (D A) (B D) = D ( A B) D, if dij = 0 for i z j ( A B)c = Ac Bc.
A2 Special Matrices We will collect special matrices of symmetric, antisymmetric, diagonal, unity, zero, idempotent, normal, orthogonal, orthonormal, positive-definite and positive-semidefinite, special orthonormal matrices, for instance of type Helmert or of type Hankel. Definitions (special matrices): A quadratic matrix A = [aij ] of the order O( A) = n × n is called symmetric
aijc = a ji i, j {1,..., n} : A = A c
antisymmetric aij = a ji i, j {1,..., n} : A = A c aij = 0 i z j ,
diagonal
A = Diag[a11 ,..., ann ] ª aij = 0 i z j I n× n = « ¬ aij = 1 i z j
unity zero matrix
0 n× n : aij = 0 i, j {1,..., n}
upper º » triangular: lower »¼
ª aij = 0 i > j « a = 0 i < j ¬ ij
idempotent if and only if A A = 0 normal if and only if A A c = A c A . Definition (orthogonal matrix) : The matrix A is called orthogonal if AA c and A cA are diagonal matrices. (The rows and columns of A are orthogonal.)
A2 Special Matrices
489
Definition (orthonormal matrix) : The matrix A is called orthonormal if AA c = A cA = I . (The rows and columns of A are orthonormal.) Facts (representation of a 2×2 orthonormal matrix) X SO(2) : A 2×2 orthonormal matrix X SO(2) is an element of the special orthogonal group SO(2) defined by SO(2) := {X R 2×2 | XcX = I 2 and det X = +1} ªx {X = « 1 ¬ x3
(i)
x2 º R 2×2 x 4 »¼ ª cos I X=« ¬ sin I
x12 + x 22 = 1 x1 x3 + x 2 x 4 = 0 , x1 x 4 x 2 x3 = +1} x32 + x 42 = 1 sin I º R 2×2 , I [0, 2S ] cos I »¼
is a trigonometric representation of X SO(2) . (ii)
ª x X=« 2 ¬ 1 x
1 x 2 º R 2×2 , x [1, +1] » x ¼
is an algebraic representation of X SO(2) 2 2 ( x112 + x122 = 1, x11 x 21 + x12 x 22 = x 1 x 2 + x 1 x 2 = 0, x 21 + x 22 = 1) .
(iii)
ª 1 x2 2x º + « » 2 1 + x 2 » R 2×2 , x R X = « 1+ x 2 1 x » « 2 x «¬ 1 + x 2 1 + x 2 »¼ is called a stereographic projection of X (stereographic projection of SO(2) ~ S1 onto L1 ).
(iv)
ª 0 xº X = (I 2 + S)(I 2 S) 1 , S = « », ¬ x 0 ¼ where S = S c is a skew matrix (antisymmetric matrix), is called a Cayley-Lipschitz representation of X SO(2) .
(v)
X SO(2) is a commutative group (“Abel”) (Example: X1 SO(2) , X 2 SO(2) , then X1 X 2 = X 2 X1 ) ( SO( n) for n = 2 is the only commutative group, SO(n | n z 2) is not “Abel”).
490
Appendix A: Matrix Algebra
Facts (representation of an n×n orthonormal matrix) X SO(n) : An n×n orthonormal matrix X SO(n) is an element of the special orthogonal group SO(n) defined by SO(n) := {X R n×n | XcX = I n and det X = +1} . As a differentiable manifold SO(n) inherits a Riemann structure from the ambin 2 ent space R n with a Euclidean metric ( vec Xc \ , dim vec Xc = n ). Any atlas of the special orthogonal group SO(n) has at least four distinct charts and there is one with exactly four charts. (“minimal atlas”: Lusternik – Schnirelmann category) 2
2
(i)
X = (I n + S)(I n S) 1 , where S = Sc is a skew matrix (antisymmetric matrix), is called a Cayley-Lipschitz representation of X SO(n) . ( n! / 2(n 2)! is the number of independent parameters/coordinates of X)
(ii)
If each of the matrices R 1 ," , R k is an n×n orthonormal matrix, then their product R1R 2 " R k 1R k SO(n) is an n×n orthonormal matrix. Facts (orthonormal matrix: Helmert representation) :
Let ac = [a1 , ", a n ] represent any row vector such that a i z 0 (i {1, " , n}) is any row vector whose elements are all nonzero. Suppose that we require an n×n orthonormal matrix, one row which is proportional to ac . In what follows one such matrix R is derived. Let [r1c, " , rnc ] represent the rows of R and take the first row r1c to be the row of R that is proportional to ac . Take the second row r2c to be proportional to the ndimensional row vector [a1 , a12 / a 2 , 0, 0, " , 0],
(H2)
the third row r3c proportional to [a1 , a 2 , (a12 + a 22 ) / a 3 , 0, 0, ", 0]
(H3)
and more generally the first through nth rows r1c, " , rnc proportional to k 1
[a1 , a 2 , " , a k 1 , ¦ a i2 / a k , 0, 0, " , 0] i =1
for k {2,", n} ,
(Hn-1)
A2 Special Matrices
491
respectively confirm to yourself that the n-1 vectors ( H n1 ) are orthogonal to each other and to the vector ac . In order to obtain explicit expressions for r1c, ", rnc it remains to normalize ac and the vectors ( H n1 ). The Euclidean norm of the kth of the vectors ( H n1 ) is k 1
k 1
k 1
k
i =1
i =1
i =1
i =1
{¦ a i2 + (¦ a i2 ) 2 / a k2 }1 / 2 = {(¦ a i2 ) (¦ a i2 ) / a k2 }1 / 2 . Accordingly for the orthonormal vectors r1c, " , rnc we finally find n
r1c = [¦ a i2 ] 1 / 2 (a1 , ", a n )
(1st row)
i =1
(kth row) rkc = [
a 2k k
i =1
i =1
(¦ a i2 ) (¦ a i2 ). (nth row)
rnc = [
a i2 , 0, 0, ", 0) i =1 a k
k 1
k 1
] 1 / 2 (a1 , a 2 , ", a k 1 , ¦
a 2n
a i2 ] . i =1 a n
n 1
n 1
n
i =1
i =1
(¦ a i2 ) (¦ a i2 ).
] 1 / 2 [a1 , a 2 , ", a n1 , ¦
The recipy is complicated: When a c = [1, 1, ",1, 1] , the Helmert factors in the 1st row, …, kth row,…, nth row simplify to r1c = n 1 / 2 [1, 1, ",1, 1] R n rkc = [k (k 1)]1 / 2 [1, 1, " ,1, 1 k , 0, 0, " , 0, 0] R n rnc = [ n( n 1)]
1/ 2
n
[1, 1, " ,1, 1 n] R .
The orthonormal matrix ª r1c º « rc » « 2 » «"» « » «rkc1 » SO(n) « rkc » « » «"» «r c » « n 1 » «¬ rnc »¼ is known as the Helmert matrix of order n. (Alternatively the transposes of such a matrix are called the Helmert matrix.)
492
Appendix A: Matrix Algebra
Example (Helmert matrix of order 3): ª1/ 3 « «1/ 2 « «¬1/ 6
1/ 3 º » 0 » SO(3). » 2 / 6 »¼
1/ 3 1/ 2 1/ 6
Check that the rows are orthogonal and normalized. Example (Helmert matrix of order 4): ª 1/ 2 « « 1/ 2 « « 1/ 6 «1/ 12 ¬
1/ 2
1/ 2
1/ 2
0
1/ 6
2 / 6
1/ 12
1/ 12
1/ 2 º » 0 » » SO(4). 0 » 3 / 12 »¼
Check that the rows are orthogonal and normalized. Example (Helmert matrix of order n): ª 1/ n « 1/ 2 « « 1/ 6 « « " « 1 « « « (n 1)(n 2) « 1 « n(n 1) ¬«
1/ n
1/ n "
1/ n
1/ n
1/ 2
0
0
"
0
1/ 6
2/ 6
0
"
0
1
1
(n 1)(n 2)
(n 1)(n 2)
1
1
n(n 1)
n(n 1)
" "
"
"
"
1 (n 1) (n 1)(n 2) 1 n(n 1)
1/ n º » 0 » » 0 » » » SO(n). » 0 » » 1 n » » n(n 1) ¼»
Check that the rows are orthogonal and normalized. An example is the nth row 1 n ( n 1)
+"+
2
=
n n n ( n 1)
=
1 n( n 1)
n ( n 1) n( n 1)
+
(1 n )
2
n( n 1)
=
n 1 n( n 1)
+
1 2n + n n( n 1)
= 1,
where (n-1) terms 1/[n(n-1)] have to be summed. Definition (orthogonal matrix) : A rectangular matrix A = [aij ] \ n×m is called “a Hankel matrix” if the n+m-1 distinct elements of A ,
2
=
A2 Special Matrices
493 ª a11 «a « 21 « " « « an 11 «¬ an1
an 2
º » » » » » " anm »¼
only appear in the first column and last row. Example: Hankel matrix of power sums Let A R n× m be a n×m rectangular matrix ( n d m ) whose entries are power sums. ª n « ¦ D i xi « i =1 « n ¦D x2 A := «« i =1 i i « # « n « D xn i i «¬ ¦ i =1
n
¦D x
2 i i
i =1 n
¦D x
3 i i
i =1
# n
n +1 i i
¦D x i =1
n
º » i =1 » n m +1 » " ¦ D i xi » i =1 » » # # » n " ¦ D i xin + m1 » »¼ i =1 "
¦D x
m i i
A is a Hankel matrix. Definition (Vandermonde matrix): Vandermonde matrix: V R n× n ª 1 « x V := « #1 «¬ x1n 1
1 " 1 º x2 " xn » # # # », n 1 n 1 » " xn ¼ x2
n
det V = ( xi x j ). i, j i> j
Example: Vandermonde matrix V R 3×3 ª1 V := «« x1 «¬ x12
1 x2 x22
1º x3 »» , det V = ( x2 x1 )( x3 x2 )( x3 x1 ). x32 »¼
Example: Submatrix of a Hankel matrix of power sums Consider the submatrix P = [a1 , a2 ," , an ] of the Hankel matrix A R n× m (n d m) whose entries are power sums. The determinant of the power sums matrix P is
494
Appendix A: Matrix Algebra n
n
i =1
i =1
det P = ( D i )( xi )(det V ) 2 , where det V is the Vandermonde determinant. Example: Submatrix P R 3×3 of a 3×4 Hankel matrix of power sums (n=3,m=4) A= ª D1 x1 + D 2 x2 + D 3 x3 D1 x12 + D 2 x22 + D 3 x32 D1 x13 + D 2 x23 + D 3 x33 D1 x14 + D 2 x24 + D 3 x34 º « 2 2 2 3 3 3 4 4 4 5 5 5» «D1 x1 + D 2 x2 + D 3 x3 D1 x1 + D 2 x2 + D 3 x3 D1 x1 + D 2 x2 + D 3 x3 D1 x1 + D 2 x2 + D 3 x3 » «D1 x13 + D 2 x23 + D 3 x33 D1 x14 + D 2 x24 + D 3 x34 D1 x15 + D 2 x25 + D 3 x35 D1 x16 + D 2 x26 + D 3 x36 » ¬ ¼ P = [a1 , a2 , a3 ] ª D1 x1 + D 2 x2 + D 3 x3 D1 x12 + D 2 x22 + D 3 x32 D1 x13 + D 2 x23 + D 3 x33 º « 2 2 2 3 3 3 4 4 4» «D1 x1 + D 2 x2 + D 3 x3 D1 x1 + D 2 x2 + D 3 x3 D1 x1 + D 2 x2 + D 3 x3 » . «D1 x13 + D 2 x23 + D 3 x33 D1 x14 + D 2 x24 + D 3 x34 D1 x15 + D 2 x25 + D 3 x35 » ¬ ¼
Definitions (positive definite and positive semidefinite matrices) A matrix A is called positive definite, if and only if xcAx > 0 x \ n , x z 0 . A matrix A is called positive semidefinite, if and only if xcAx t 0 x \ n . An example follows. Example (idempotence): All idempotent matrices are positive semidefinite, at the time BcB and BBc for an arbitrary matrix B . What are “permutation matrices” or “commutation matrices”? After their definitions we will give some applications. Definitions (permutation matrix, commutation matrix) A matrix is called a permutation matrix if and only if each column of the matrix A and each row of A has only one element 1 . All other elements are zero. There holds AA c = I . A matrix is called a commutation matrix, if and only if for a matrix of the order n 2 × n 2 there holds K = K c and K 2 = I n . 2
The commutation matrix is symmetric and orthonormal.
A3 Scalar Measures and Inverse Matrices
495
Example (commutation matrix)
n=2
ª1 «0 K4 = « «0 « ¬0
0 0 1 0
0 1 0 0
0º 0 »» = K c4 . 0» » 1¼
A general definition of matrices K nm of the order nm × nm with n z m are to found in J.R. Magnus and H. Neudecker (1988 p.46-48). This definition does not lead to a symmetric matrix anymore. Nevertheless is the transpose commutation matrix again a commutation matrix since we have K cnm = K nm and K nm K mn = I nm . Example (commutation matrix)
n = 2º m = 3»¼
n = 3º m = 2 »¼
K 23
ª1 «0 «0 = «0 «0 «0 «¬0
0 0 0 1 0 0 0
0 1 0 0 0 0 0
0 0 0 0 1 0 0
0 0 1 0 0 0 0
0º 0» 0» 0» 0» 0» 1 »¼
K 32
ª1 «0 « = «0 0 «0 «¬0
0 0 1 0 0 0
0 0 0 0 1 0
0 1 0 0 0 0
0 0 0 1 0 0
0º 0» 0» 0» 0» 1 »¼
K 32 K 23 = I 6 = K 23 K 32 .
A3 Scalar Measures and Inverse Matrices We will refer to some scalar measures, also called scalar functions, of matrices. Beforehand we will introduce some classical definitions of type • • •
linear independence column and row rank rank identities.
Definitions (linear independence, column and row rank): A set of vectors x1 , ..., x n is called linear independent if for an arbitrary n linear combination 6 i=1D i xi = 0 only holds if all scalars D1 , ..., D n disappear, that is if D1 = D 2 = ... = D n 1 = D n = 0 holds.
496
Appendix A: Matrix Algebra
For all vectors which are characterized by x1 ,..., x n unequal from zero are called linear dependent. Let A be a rectangular matrix of the order O( ǹ) = n × m . The column rank of the matrix A is the largest number of linear independent columns, while the row rank is the largest number of linear independent rows. Actually the column rank of the matrix A is identical to its row rank. The rank of a matrix thus is called rk A . Obviously, rk A d min{n, m}. If rk A = n holds, we say that the matrix A has full row ranks. In contrast if the rank identity rk A = m holds, we say that the matrix A has full column rank. We list the following important rank identities. Facts (rank identities): (i)
rk A = rk A c = rk A cA = rk AA c
(ii)
rk( A + B) d rk A + rk B
(iii)
rk( A B) d min{rk A, rk B}
(iv)
rk( A B) = rk A if B has full row rank,
(v)
rk( A B) = rk B if A has full column rank.
(vi)
rk( A B C) + rk B t rk( A B) + rk(B C)
(vii)
rk( A
B) = (rk A) (rk B).
If a rectangular matrix of the order O( A) = n × m is fulfilled and, in addition, Ax = 0 holds for a certain vector x z 0 , then rk A d m 1 . Let us define what is a rank factorization, the column space, a singular matrix and, especially, what is division algebra. Facts (rank factorization) We call a rank factorization A = GF , if rk A = rk G = rk F holds for certain matrices G and F of the order O(G ) = n × rk A and O(F) = rk A × m.
497
A3 Scalar Measures and Inverse Matrices
Facts A matrix A has the column space R ( A) formed by the column vectors. The dimension of such a vector space is dim R ( A) = rk A . In particular, R ( A) = R ( AA c) holds. Definition (non-singular matrix versus singular matrix) Let a quadratic matrix A of the order O( A) be given. A is called nonsingular or regular if rk A = n holds. In case rk A < n, the matrix A is called singular. Definition (division algebra): Let the matrices A, B, C be quadratic and non-singular of the order O( A) = O(B) = O(C) = n × n . In terms of the Cayley-product an inner relation can be based on A = [aij ], B = [bij ], C = [cij ], O( A) = O(B) = O(C) = n × n (i)
( A B ) C = A ( B C)
(ii)
AI = A
(identity)
(iii)
A A 1 = I
(inverse).
(associativity)
The non-singular matrix A 1 = B is called Cayley-inverse. The conditions A B = In B A = In are equivalent. The Cayley-inverse A 1 is left and right identical. The Cayleyinverse is unique. Fact: ( A 1 ) c = ( A c) 1 : A is symmetric A 1 is symmetric. Facts: (Inverse Partitional Matrix /IPM/ of a symmetric matrix): Let the symmetric matrix A be partitioned as ªA A := « 11 c ¬ A 12
A 12 º c = A 11 , A c22 = A 22 . , A 11 A 22 »¼
498
Appendix A: Matrix Algebra
Then its Cayley inverse A 1 is symmetric and can be partitioned as well as ªA A 1 = « 11 c ¬ A 12
A 12 º A 22 »¼
1 1 1 ª[I + A 11 c A 11 c ]A 11 A 12 ( A 22 A 12 A 12 ) 1 A 12 « 1 1 c A 11 c A 11 ( A 22 A 12 A 12 ) 1 A 12 ¬
1
=
1 1 c A 11 A 11 A 12 ( A 22 A 12 A 12 ) 1 º », 1 c A 11 ( A 22 A 12 A 12 ) 1 ¼
1 if A 11 exists ,
A
1
ªA = « 11 c ¬ A 12
A 12 º A 22 »¼
1
=
ª º c A 221 A 12 ) 1 c A 221 A 12 ) 1 A 12 A 221 ( A 11 A 12 ( A 11 A 12 , « 1 1 1 1 1 1 1 » c ( A 11 A 12 c A 22 A 12 ) c ( A 11 A 12 A 22 A 12 c ) A 12 ]A 22 ¼ [I + A 22 A 12 ¬ A 22 A 12 if A 221 exists . 1 c A 11 c A 221 A 12 S 11 := A 22 A 12 A 12 and S 22 := A 11 A 12
are the minors determined by properly chosen rows and columns of the matrix A called “Schur complements” such that A
1
ªA = « 11 c ¬ A 12
1 1 1 ª(I + A 11 c ) A 11 A 12 S 11 A 12 « 1 1 c A 11 S 11 A 12 ¬
A 12 º A 22 »¼
1
=
1 1 º A 11 A 12 S 11 » 1 S 11 ¼
1 if A 11 exists ,
A
1
ªA = « 11 c ¬ A 12
A 12 º A 22 »¼
1
=
ª º S 221 S 221 A 12 A 221 « 1 1 1 1 1 » c S 22 [I + A 22 A 12 c S 22 A 12 ]A 22 ¼ ¬ A 22 A 12 if A 221 exists , are representations of the Cayley inverse partitioned matrix A 1 in terms of “Schur complements”.
499
A3 Scalar Measures and Inverse Matrices
The formulae S11 and S 22 were first used by J. Schur (1917). The term “Schur complements” was introduced by E. Haynsworth (1968). A. Albert (1969) replaced the Cayley inverse A 1 by the Moore-Penrose inverse A + . For a survey we recommend R. W. Cottle (1974), D.V. Oullette (1981) and D. Carlson (1986). :Proof: For the proof of the “inverse partitioned matrix” A 1 (Cayley inverse) of the partitioned matrix A of full rank we apply Gauss elimination (without pivoting). AA 1 = A 1 A = I ªA A = « 11 c ¬ A 12
A 12 º c = A 11 , A c22 = A 22 , A 11 A 22 »¼
ª A R m×m , A R m×l 12 « 11 l ×m l ×l c R , A 22 R «¬ A 12 ªB A 1 = « 11 c ¬B 12
B 12 º c = B 11 , B c22 = B 22 , B 11 B 22 »¼
ªB R m×m , B R m×l 12 « 11 l ×m l ×l c R , B 22 R «¬ B 12 AA 1 = A 1 A = I
c = B11A11 + B12 A12 c = Im ª A11B11 + A12 B12 «A B + A B = B A +B A = 0 12 22 11 12 12 22 « 11 12 c B11 + A 22 B12 c = B12 c A11 + B 22 A12 c =0 « A12 « c B12 + A 22 B 22 = B12 c A12 + B 22 A 22 = I l ¬ A12
(1) (2) (3) (4).
1 Case (i): A 11 exists
“forward step” c = I m (first left equation: A11B11 + A12 B12
º » cA ) » multiply by A12 c B11 + A 22 B12 c = 0 (second right equation) ¼» A12 1 11
1 1 º c B 11 A 12 c A 11 c = A 12 c A 11 A 12 A 12 B 12 » c B 11 + A 22 B 12 c =0 A 12 ¼
500
Appendix A: Matrix Algebra
c = Im ª A B + A 12 B 12 « 11 11 1 1 c A 11 A 12 )B 12 c = A 12 c A 11 ¬( A 22 A 12 1 1 c = ( A 22 A 12 c A 11 c A 11 B 12 A 12 ) 1 A 12 1 1 c = S 11 A 12 c A 11 B 12
or
ª Im « Ac A 1 ¬ 12 11
0 º ª A11
I l »¼ «¬ A12c
A12 º
ª A11 = A 22 »¼ «¬ 0
º . A 22 A12c A11 A12 »¼ A12
1
1 c A 11 Note the “Schur complement” S 11 := A 22 A 12 A 12 .
“backward step” c = Im A 11B 11 + A 12 B12 º 1 1 1 » c = ( A 22 A 12 c A 11 A 12 ) A 12 c A 11 ¼ B12 1 1 c ) = (I m B 12 A 12 c ) A 11 B 11 = A 11 (I m A 12 B 12 1 1 1 c A 11 c ]A 11 B 11 = [I m + A 11 A 12 ( A 22 A 12 A 12 ) 1 A 12 1 1 1 1 c A 11 B 11 = A 11 + A 11 A 12 S 11 A 12
A11B12 + A12 B 22 = 0 (second left equation) 1 1 1 c A 11 B 12 = A 11 A 12 B 22 = A 11 A 12 ( A 22 A 12 A 12 ) 1
1 c A11 B 22 = ( A 22 A12 A12 ) 1 1 B 22 = S11 .
Case (ii): A 221 exists “forward step” A11B12 + A12 B 22 = 0 (third right equation) º c B12 + A 22 B 22 = I l (fourth left equation: » A12 » multiply by A12 A 221 ) »¼
A 11B 12 + A 12 B 22 = 0 º 1 1 » c B 12 A 12 B 22 = A 12 A 22 ¼ A 12 A 22 A 12
ª A c B + A 22 B 22 = I l « 12 12 1 c )B 12 = A 12 A 221 ¬( A 11 A 12 A 22 A 12
501
A3 Scalar Measures and Inverse Matrices 1 1 c ) 1 A 12 c A 22 B 12 = ( A 11 A 12 A 22 A 12 1 1 B 12 = S 22 A 12 A 22
or ªI m « ¬0
A 12 A 221 º ª A 11 »« c Il ¼ ¬ A 12
A 12 º ª A 11 A 12 A 221 A 12 c =« » A 22 ¼ ¬ c A 12
0 º ». A 22 ¼
1 c . Note the “Schur complement” S 22 := A 11 A 12 A 22 A 12
“backward step” c B12 + A 22 B 22 = I l A 12 º 1 1 1 » c ) A 12 A 22 ¼ B12 = ( A 11 A 12 A 22 A 12 1 1 c B 12 c ) = (I l B 12 c A 12 ) A 22 B 22 = A 22 (I l A 12
c ( A 11 A 12 A 221 A 12 c ) 1 A 12 ]A 221 B 22 = [I l + A 221 A 12 1 1 1 1 c S 22 A 12 A 22 B 22 = A 22 + A 22 A 12 c B 11 + A 22 B 12 c = 0 ( third left equation ) A 12 c = A 221 A 12 c B 11 = A 221 A 12 c ( A 11 A 12 A 221 A 12 c ) 1 B 12
B 11 = ( A 11 A 12 A
1 22
A 1c 2 ) 1
B 1 1 = S 2 21 .
h c , B 22 } in terms of { A11 , A12 , A 21 = A12 c , The representations { B11 , B12 , B 21 = B12 A 22 } have been derived by T. Banachiewicz (1937). Generalizations are referred to T. Ando (1979), R. A. Brunaldi and H. Schneider (1963), F. Burns, D. Carlson, E. Haynsworth and T. Markham (1974), D. Carlson (1980), C. D. Meyer (1973) and S. K. Mitra (1982), C. K. Li and R. Mathias (2000). We leave the proof of the following fact as an exercise. Fact (Inverse Partitioned Matrix /IPM/ of a quadratic matrix): Let the quadratic matrix A be partitioned as ªA A := « 11 ¬ A 21
A 12 º . A 22 »¼
Then its Cayley inverse A 1 can be partitioned as well as
502
Appendix A: Matrix Algebra
ªA A 1 = « 11 ¬ A 21
A 12 º A 22 »¼
1 1 1 1 ª A 11 + A 11 A 12 S 11 A 21 A 11 « 1 1 S 11 A 21 A 11 ¬
1
=
1 1 º A 11 A 12 S 11 », 1 S 11 ¼
1 if A 11 exists
A
1
ª S 221 « 1 1 ¬ A 22 A 21S 22
ªA = « 11 ¬ A 21
A 221
A 12 º A 22 »¼
1
=
º S 221 A 12 A 221 , 1 1 1 » + A 22 A 21S 22 A 12 A 22 ¼
if A 221 exists and the “Schur complements” are definded by 1 S 11 := A 22 A 21 A 11 A 12 and S 22 := A 11 A 12 A 221 A 21 .
Facts: ( Cayley inverse: sum of two matrices): (s1)
( A + B ) 1 = A 1 A 1 ( A 1 + B 1 ) 1 A 1
(s2)
( A B) 1 = A 1 + A 1 ( A 1 B 1 ) 1 A 1
(s3)
( A + CBD) 1 = A 1 A 1 (I + CBDA 1 ) 1 CBDA 1
(s4)
( A + CBD) 1 = A 1 A 1 (I + BDA 1C) 1 BDA 1
(s5)
( A + CBD) 1 = A 1 A 1CB(I + DA 1CB) 1 DA 1
(s6)
( A + CBD) 1 = A 1 A 1CBD(I + A 1CBD) 1 A 1
(s7)
( A + CBD) 1 = A 1 A 1CBDA 1 (I + CBDA 1 ) 1
(s8)
( A + CBD) 1 = A 1 A 1C(B 1 + DA 1C) 1 DA 1 ( Sherman-Morrison-Woodbury matrix identity )
(s9)
B( AB + C) 1 = (I + BC1 A) 1 BC1
(s10)
BD( A + CBD) 1 = (B 1 + DA 1C) 1 DA 1 (Duncan-Guttman matrix identity).
W. J. Duncan (1944) calls (s8) the Sherman-Morrison-Woodbury matrix identity. If the matrix A is singular consult H. V. Henderson and G. S. Searle (1981), D. V. Ouellette (1981), W. M. Hager (1989), G. W. Stewart (1977) and K. S. Riedel
A3 Scalar Measures and Inverse Matrices
503
(1992). (s10) has been noted by W. J. Duncan (1944) and L. Guttman (1946): The result is directly derived from the identity ( A + CBD)( A + CBD) 1 = I A( A + CBD) 1 + CBD( A + CBD) 1 = I ( A + CBD) 1 = A 1 A 1CBD( A + CBD) 1 A 1 = ( A + CBD) 1 + A 1CBD( A + CBD) 1 DA 1 = D( A + CBD) 1 + DA 1CBD( A + CBD) 1 DA 1 = (I + DA 1CB)D( A + CBD) 1 DA 1 = (B 1 + DA 1C)BD( A + CBD) 1 (B 1 + DA 1C) 1 DA 1 = BD( A + CBD)1 .
h Certain results follow directly from their definitions. Facts (inverses): (i)
( A ¸ B)1 = B1 ¸ A1
(ii)
( A B)1 = B1 A1
(iii)
A positive definite A1 positive definite
(iv)
( A B)1 , ( A B)1 and (A1 B1 ) are positive definite, then (A1 B1 ) ( A B)1 is positive semidefinite as well as (A1 A ) I and I (A1 A)1 .
Facts (rank factorization): (i) If the n × n matrix is symmetric and positive semidefinite, then its rank factorization is ªG º A = « 1 » [G1c G c2 ] , ¬G 2 ¼ where G1 is a lower triangular matrix of the order O(G1 ) = rk A × rk A with rk G 2 = rk A , whereas G 2 has the format O(G 2 ) = (n rk A) × rk A. In this case we speak of a Choleski decomposition.
504
Appendix A: Matrix Algebra
(ii) In case that the matrix A is positive definite, the matrix block G 2 is not needed anymore: G1 is uniquely determined. There holds A 1 = (G11 )cG11 . Beside the rank of a quadratic matrix A of the order O( A) = n × n as the first scalar measure of a matrix, is its determinant A =
¦
(1)) ( j ,..., j
n
1
n )
a i =1
perm ( j1 ,..., jn )
iji
plays a similar role as a second scalar measure. Here the summation is extended as the summation perm ( j1 ,… , jn ) over all permutations ( j1 ,..., jn ) of the set of integer numbers (1,… , n) . ) ( j1 ,… , jn ) is the number of permutations which transform (1,… , n) into ( j1 ,… , jn ) . Laws (determinant) (i)
| D A | = D n | A | for an arbitrary scalar D R
(ii)
| A B |=| A | | B |
(iii)
| A
B |=| A |m | B |n for an arbitrary m × n matrix B
(iv)
(vi)
| A c |=| A | 1 | (A + A c) |d| A | if A + A c is positive definite 2 | A 1 |=| A |1 if A 1 exists
(vii)
| A |= 0 A is singular ( A 1 does not exist)
(viii)
| A |= 0 if A is idempotent, A z I
(ix)
| A |= aii if A is diagonal and a triangular matrix
(v)
n
i =1
n
(x)
0 d| A |d aii =| A I | if A is positive definite i =1
n
(xi)
| A | | B | d | A | bii d| A B | if A and B are posii =1
tive definite
(xii)
ª A11 «A ¬ 21
1 ª det A11 det( A 22 A 21A11 A12 ) « m ×m , rk A11 = m1 A12 º « A11 R =« » 1 A 22 ¼ « det A 21 det( A11 A12 A 22 A 21 ) « A R m ×m , rkA = m . 22 22 2 ¬ 1
2
1
2
505
A3 Scalar Measures and Inverse Matrices
A submatrix of a rectangular matrix A is the result of a canceling procedure of certain rows and columns of the matrix A. A minor is the determinant of a quadratic submatrix of the matrix A. If the matrix A is a quadratic matrix, to any element aij there exists a minor being the determinant of a submatrix of the matrix A which is the result of reducing the i-th row and the j-th column. By multiplying with (1)i + j we gain a new element cij of a matrix C = [cij ] . The transpose matrix Cc is called the adjoint matrix of the matrix A, written adjA . Its order is the same as of the matrix A. Laws (adjoint matrix) n
(i)
| A |= ¦ aij cij , i = 1,… , n j =1
n
(ii)
| A |= ¦ a jk c jk , k = 1,… , n j =1
(iii)
A (adj A) = (adj A) A = | A | I
(iv)
adj( A B) = (adj B) (adj A)
(v)
adj( A
B) = (adj A)
(adj B)
(vi)
adj A =| A | A 1 if A is nonsingular
(vii)
adjA positive definitive A positive definite.
As a third scalar measure of a quadratic matrix A of the order O( A) = n × n we introduce the trace tr A as the sum of diagonal elements, n
tr A = ¦ aii . i =1
Laws (trace of a matrix) (i)
tr(D A) = D tr A for an arbitrary scalar D R
(ii)
tr( A + B) = tr A + tr B for an arbitrary n × n matrix B
(iii)
tr( A
B) = (tr A) (tr B) for an arbitrary m × m matrix B
iv) (v)
tr A = tr(B C) for any factorization A = B C tr A c(B C) = tr( A c Bc)C for an arbitrary n × n matrix B and C tr A c = tr A trA = rkA if A is idempotent 0 < tr A = tr ( A I) if A is positive definite
(vi) (vii) (viii) (ix)
tr( A B) d (trA) (trB) if A und % are positive semidefinite.
506
Appendix A: Matrix Algebra
In correspondence to the W – weighted vector (semi) – norm. || x ||W = (xc W x)1/ 2 is the W – weighted matrix (semi) norm || A ||W = (trA cWA)1/ 2 for a given positive – (semi) definite matrix W of proper order. Laws (trace of matrices): (i) tr A cWA t 0 (ii) tr A cWA = 0 WA = 0 A = 0 if W is positive definite
A4 Vector-valued Matrix Forms If A is a rectangular matrix of the order O( A) = n × m , a j its j – th column, then vec A is an nm × 1 vector ª a1 º «a » « 2 » vec A = « … » . « » « an 1 » «¬ an »¼ In consequence, the operator “vec” of a matrix transforms a vector in such a way that the columns are stapled one after the other. Definitions ( vec, vech, veck ): ª a1 º «a » « 2 » (i) vec A = « … » . « » « an 1 » «¬ an »¼ (ii) Let A be a quadratic symmetric matrix, A = A c , of order O( A) = n × n . Then vechA (“vec - koef”) is the [n(n + 1) / 2] × 1 vector which is the result of row (column) stapels of those matrix elements which are upper and under of its diagonal.
A4 Vector-valued Matrix Forms
507
ª a11 º «… » « » « an1 » a A = [aij ] = [a ji ] = A c vechA := «« 22 »» . … «a » « n2 » «… » «¬ ann »¼ (iii) Let A be a quadratic, antisymmetric matrix, A = A c , of order O( A) = n × n . Then veckA (“vec - skew”) is the [n(n + 1) / 2] × 1 vector which is generated columnwise stapels of those matrix elements which are under its diagonal. ª a11 º « … » « » « an1 » « a » A = [aij ] = [a ji ] = Ac veckA := « 32 » . … « a » « n2 » « … » «¬ an, n 1 »¼ Examples (i)
(ii)
(iii)
ªa b A=« ¬d e ªa b A = «« b d ¬« c e
cº vecA = [a, d , b, e, c, f ]c f »¼ cº e »» = A c vechA = [ a, b, c, d , e, f ]c f »¼
ª 0 a b « a 0 d A=« «b d 0 « f «¬ c e
c º e »» = A c veckA = [a, b, c, d , e, f ]c . f» » 0 »¼
Useful identities, relating to scalar- and vector - valued measures of matrices will be reported finally. Facts (vec and trace forms): vec(A B Cc) = (C
A) vec B (i) (ii)
vec(A B) = (Bc
I n ) vec A = (Bc
A) vec I m = = (I1
A) vec B, A R n× m , B R m × q
508
Appendix A: Matrix Algebra
(iii)
A B c = (cc
A)vecB = ( A
cc)vecB c, c R q
(iv)
tr( A c B) = (vecA)cvecB = (vecA c)vecBc = tr( A Bc)
(v)
tr(A B Cc Dc) = (vec D)c(C
A) vec B = = (vec Dc)c( A
C) vec Bc
(vi)
K nm vecA = vecA c, A R n× m
(vii)
K qn (A
B) = (B
A)K pm
(viii)
K qn (A
B)K mp = (B
A)
(ix)
K qn (A
c) = c
A
(x)
K nq (c
A) = A
c, A R n×m , B R q× p , c R q
(xi)
vec(A
B) = (I m
K pn
I q )(vecA
vecB)
(xii)
A = (a1 ,… , a m ), B := Diagb, O(B) = m × m, m
Cc = [c1 ,… , c m ] vec(A B Cc) = vec[¦ (a j b j ccj )] = j =1
m
= ¦ (c j
a j )b j = [c1
a1 ,… , c m
a m )]b = (C : A)b j =1
(xiii)
A = [aij ], C = [cij ], B := Diagb, b = [b1 ,… ,b m ] R m tr(A B Cc B) = (vec B)c vec(C B A c) = = bc(I m : I m )c ( A : C)b = bc( A C)b
(xiv)
B := I m tr( A Cc) = rmc ( A C)rm ( rm is the m ×1 summation vector: rm := [1, …,1]c R m )
(xv)
vec DiagD := (I m D)rm = [I m ( A c B C)]rm = = (I m : I m )c = [I m : ( A c B C)] vec DiagI m = = (I m : I m )c vec( A c B C) = = (I m : I m )c (Cc
A c)vecB = (C : A)cvecB when D = A c B C is factorized.
Facts (Löwner partial ordering): For any quadratic matrix A R m×m there holds the uncertainty I m ( A c A) t I m A A = I m [( A : I m )c (I m : A)] in the Löwner partial ordering that is the difference matrix I m (Ac A) I m A A is at least positive semidefinite.
A5 Eigenvalues and Eigenvectors
509
A5 Eigenvalues and Eigenvectors To any quadratic matrix A of the order O( A) = m × m there exists an eigenvalue O as a scalar which makes the matrix A O I m singular. As an equivalent statement, we say that the characteristic equation O I m A = 0 has a zero value which could be multiple of degrees, if s is the dimension of the related null space N ( A O I ) . The non-vanishing element x of this null space for which Ax = O x, x z 0 holds, is called right eigenvector of A. Related vectors y for which y cA = Ȝy , y z 0 , holds, are called left eigenvectors of A and are representative of the right eigenvectors A’. Eigenvectors always belong to a certain eigenvalue and are usually normed in the sense of xcx = 1, y cy = 1 as long as they have real components. As the same time, the eigenvectors which belong to different eigenvalues are always linear independent: They obviously span a subspace of R ( A) . In general, the eigenvalues of a matrix A are complex! There is an important exception: the orthonormal matrices, also called rotation matrices whose eigenvalues are +1 or, –1 and idempotent matrices which can only be 0 or 1 as a multiple eigenvalue generally, we call a null eigenvalue a singular matrix. There is the special case of a symmetric matrix A = A c of order O( A) = m × m . It can be shown that all roots of the characteristic polynomial are real numbers and accordingly m - not necessary different - real eigenvalues exist. In addition, the different eigenvalues O and P and their corresponding eigenvectors x and y are orthogonal, that is (O P )xc y = ( xc Ac) y xc( A y ) = 0, O P z 0. In case that the eigenvalue O of degrees s appears s-times, the eigenspace N ( A O I m ) is s - dimensional: we can choose s orthonormal eigenvectors which are orthonormal to all other! In total, we can organize m orthonormal eigenvectors which span the entire R m . If we restrict ourselves to eigenvectors and to eigenvalues O , O z 0 , we receive the column space R ( A) . The rank of A coincides with the number of non-vanishing eigenvalues {O1 ,… , Or }. U := [U1 , U 2 ], O(U) = m × m, U U c = U cU = I m U1 := [u1 ,… , u r ], O(U1 ) = m × r , r = rkA U 2 := [u r +1 ,… , u m ], O(U 2 ) = m × (m r ), A U 2 = 0. With the definition of the r × r diagonal matrix O := Diag(O1 ,… Or ) of nonvanishing eigenvalues we gain ª/ 0º A U = A [U1 , U 2 ] = [U1/, 0] = [U1 , U 2 ] « ». ¬ 0 0¼
510
Appendix A: Matrix Algebra
Due to the orthonormality of the matrix U := [U1 , U 2 ] we achieve the results about eigenvalue – eigenvector analysis and eigenvalues – eigenvector synthesis. Lemma (eigenvalue – eigenvector analysis: decomposition): Let A = A c be a symmetric matrix of the order O( A) = m × m . Then there exists an orthonormal matrix U in such a way that U cAU = Diag(O1 ,… Or , 0,… , 0) holds. (O1 ,… Or ) denotes the set of non – vanishing eigenvalues of A with r = rkA ordered decreasingly. Lemma (eigenvalue – eigenvectorsynthesis: decomposition): Let A = A c be a symmetric matrix of the order O( A) = m × m . Then there exists a synthetic representation of eigenvalues and eigenvectors of type A = U Diag(O1 ,… Or , 0,… , 0)U c = U1/U1c . In the class of symmetric matrices the positive (semi)definite matrices play a special role. Actually, they are just the positive (nonnegative) eigenvalues squarerooted. /1/ 2 := Diag( O1 ,… , Or ) . The matrix A is positive semidefinite if and only if there exists a quadratic m × m matrix G such that A = GG c holds, for instance, G := [u1 /1/ 2 , 0] . The quadratic matrix is positive definite if and only if the m × m matrix G is not singular. Such a representation leads to the rank fatorization A = G1 G1c with G1 := U1 /1/ 2 . In general, we have Lemma (representation of the matrix U1 ): If A is a positive semidefinite matrix of the order O( A) with non – vanishing eigenvalues {O1 ,… , Or } , then there exists an m × r matrix U1 := G1 / 1 = U1 / 1/ 2 with U1c U1 = I r , R (U1 ) = R (U1 ) = R ( A), such that U1c A U1 = (/
1/ 2
U1c ) (U1 / U1c ) (U1 / 1/ 2 ) = I r .
A5 Eigenvalues and Eigenvectors
511
The synthetic relation of the matrix A is A = G1 G1c = U1 / 1 U1c . The pseudoinverse has a peculiar representation if we introduce the matrices U1 , U1 and / 1 . Definition (pseudoinverse): If we use the representation of the matrix A of type A = G1 G1c = U1 /U1c then A + := U1 U1 = U1 / 1 U1c is the representation of its pseudoinverse namely (i)
AA + A = (U1 /U1c )( U1 / 1 U1c )( U1 /U1c ) = U1 /U1c
(ii) A + AA + = (U1 / 1 U1c )( U1 /U1c )( U1 / 1 U1c ) = U1/ 1 U1c = A + (iii) AA + = (U1 /U1c )( U1 / 1 U1c ) = U1 U1c = ( AA + )c (iv) A + A = (U1 / 1 U1c )( U1 /U1c ) = U1 U1c = ( A + A)c . The pseudoinverse A + exists and is unique, even if A is singular. For a nonsingular matrix A, the matrix A + is identical with A 1 . Indeed, for the case of the pseudoinverse (or any other generalized inverse) the generalized inverse of a rectangular matrix exists. The singular value decomposition is an excellent tool which generalizes the classical eigenvalue – eigenvector decomposition of symmetric matrices. Lemma (Singular value decomposition): (i) Let A be an n × m matrix of rank r := rkA d min(n, m) . Then the matrices A cA and A cA are symmetric positive (semi) definite matrices whose nonvanishing eigenvalues {O1 ,… Or } are positive. Especially r = rk( A cA) = rk( AA c) holds. AcA contains 0 as a multiple eigenvalue of degree m r , and AAc has the multiple eigenvalue of degree n r . (ii) With the support of orthonormal eigenvalues of A cA and AA c we are able to introduce an m × m matrix V and an n × n matrix U such that UUc = UcU = I n , VV c = V cV = I m holds and U cAAcU = Diag(O12 ,… , Or 2 , 0,… , 0), V cA cAV = Diag(O12 ,… , Or 2 , 0,… , 0).
512
Appendix A: Matrix Algebra
The diagonal matrices on the right side have different formats m × m and m × n . (iii)
The original n × m matrix A can be decomposed according to ª/ 0º U cAV = « » , O(UAV c) = n × m ¬ 0 0¼
with the r × r diagonal matrix / := Diag(O1 ,… , Or ) of singular values representing the positive roots of nonvanishing eigenvalues of A cA and AA c . (iv)
A synthetic form of the n × m matrix A is ª / 0º A = Uc « » V c. ¬ 0 0¼
We note here that all transformed matrices of type T1AT of a quadratic matrix have the same eigenvalues as A = ( AT)T1 being used as often as an invariance property. ?what is the relation between eigenvalues and the trace, the determinant, the rank? The answer will be given now. Lemma (relation between eigenvalues and other scalar measures): Let A be a quadratic matrix of the order O( A) = m × m with eigenvalues in decreasing order. Then we have m
m
j =1
j =1
| A |= O j , trA = ¦ O j , rkA = trA , if A is idempotent. If A = A c is a symmetric matrix with real eigenvalues, then we gain O1 t max{a jj | j = 1,… , m},
Om d min{a jj | j = 1,… , m}. At the end we compute the eigenvalues and eigenvectors which relate the variation problem xcAx = extr subject to the condition xcx = 1 , namely xcAx + O (xcx) = extr . x, O
The eigenvalue O is the Lagrange multiplicator of the optimization problem.
513
A6 Generalized Inverses
A6 Generalized Inverses Because the inversion by Cayley inversion is only possible for quadratic nonsingular matrices, we introduce a slightly more general definition in order to invert arbitrary matrices A of the order O( A) = n × m by so – called generalized inverses or for short g – inverses. An m × n matrix G is called g – inverse of the matrix A if it fulfils the equation AGA = A in the sense of Cayley multiplication. Such g – inverses always exist and are unique if and only if A is a nonsingular quadratic matrix. In this case G = A 1 if A is invertible, in other cases we use the notation G = A if A 1 does not exist. For the rank of all g – inverses the inequality r := rk A d rk A d min{n, m} holds. In reverse, for any even number d in this interval there exists a g – inverse A such that d = rkA = dim R ( A ) holds. Especially even for a singular quadratic matrix A of the order O( A) = n × n there exist g-inverses A of full rank rk A = n . In particular, such g-inverses A r are of interest which have the same rank compared to the matrix A, namely rkA r = r = rkA . Those reflexive g-inverse A r are equivalent due to the additional condition A r AA r = A r but are not necessary symmetric for symmetric matrices A. In general, A = A c and A g-inverse of A ( A )c g-inverse of A A rs := A A( A )c is reflexive symmetric g inverse of A. For constructing of A rs we only need an arbitrary g-inverse of A. On the other side, A rs does not mean unique. There exist certain matrix functions which are independent of the choice of the g-inverse. For instance,
514
Appendix A: Matrix Algebra
A( A cA) A and A c( AA c) 1 A can be used to generate special g-inverses of AcA or AA c . For instance, A A := ( A cA) A c and A m := A c( AA c) have the special reproducing properties A( A cA) A cA = AA A A = A and AAc( AAc) A = AA m A = A , which can be generalized in case that W and S are positive semidefinite matrices to WA( A cWA) A cWA = WA ASAc( ASAc) AS = AS , where the matrices WA( A cWA) A cW and SAc( ASA c) AS are independent of the choice of the g-inverse ( A cWA) and ( ASA c) . A beautiful interpretation of the various g-inverses is based on the fact that the matrices ( AA )( AA ) = ( AA A) A = AA and ( A A)( A A) = A ( AA A) = A A are idempotent and can therefore be geometrically interpreted as projections. The image of AA , namely R ( AA ) = R ( A) = {Ax | x R m } R n , can be completed by the projections A A along the null space N ( A A) = N ( A) = {x | Ax = 0} R m . By the choice of the g – inverse we are able to choose the projected direction of AA and the image of the projections A A if we take advantage of the complementary spaces of the subspaces R ( A A) N ( A A) = R m and R ( AA ) N ( AA ) = R n by using the symbol " " as the sign of “direct sum” of linear spaces which only have the zero element in common. Finally we have use the corresponding dimensions dim R ( A A) = r = rkA = dim R ( AA ) ªdim N ( A A) = m rkA = m r « ¬ dim N ( AA ) = n rkA = n r
515
A6 Generalized Inverses
independent of the special rank of the g-inverses A which are determined by the subspaces R ( A A) and N ( AA ) , respectively.
N ( AA c)
R( A A )
N (A A)
R( AA ) in R n
in R m
Example (geodetic networks): In a geodetic network, the projections A A correspond to a S – transformations in the sense of W. Baarda (1973). Example ( A A and A m g-inverses): The projections AA A = A( A cA) A c guarantee that the subspaces R ( AA ) and N ( AA A ) are orthogonal to each other. The same holds for the subspaces R ( A m A) and N ( A m A) of the projections A m A = A c( AA c) A. In general, there exist more than one g-inverses which lead to identical projections AA and A A. For instance, following A. Ben – Israel, T. N. E. Greville (1974, p.59) we learn that the reflexive g-inverse which follows from A r = ( A A) A ( AA ) = A AA contains the class of all reflexive g-inverses. Therefore it is obvious that the reflexive g-inverses A r contain exact by one pair of projections AA and A A and conversely. In the special case of a symmetric matrix A , A = A c , and n = m we know due to R ( AA ) = R ( A) A N ( A c) = N ( A) = N ( A A) that the column spaces R ( AA ) are orthogonal to the null space N ( A A) illustrated by the sign ”A ”. If these complementary subspaces R ( A A) and N ( AA ) are orthogonal to each other, the postulate of a symmetric reflexive ginverse agrees to A rs := ( A A) A ( A A)c = A A( A )c , if A is a suited g-inverse.
516
Appendix A: Matrix Algebra
There is no insurance that the complementary subspaces R ( A A) and N ( A A) and R ( AA ) and N ( AA ) are orthogonal. If such a result should be reached, we should use the uniquely defined pseudoinverse A + , also called Moore-Penrose inverse for which holds R ( A + A) A N ( A + A), R ( AA + ) A N ( AA + ) or equivalent +
AA = ( AA + )c, A + A = ( A + A)c. If we depart from an arbitrary g-inverse ( AA A) , the pseudoinverse A + can be build on A + := Ac( AAcA) Ac (Zlobec formula) or +
A := Ac( AAc) A( AcA) Ac (Bjerhammar formula) ,
if both the g-inverses ( AA c) and ( A cA) exist. The Moore-Penrose inverse fulfils the Penrose equations: (i) AA + A = A (g-inverse) (ii) A + AA + = A + (reflexivity) (iii) AA + = ( AA + )cº » Symmetry due to orthogonal projection . (iv) A + A = ( A + A)c »¼
Lemma (Penrose equations) Let A be a rectangular matrix A of the order O( A) be given. A ggeneralized matrix inverse which is rank preserving rk( A) = rk( A + ) fulfils the axioms of the Penrose equations (i) - (iv). For the special case of a symmetric matrix A also the pseudoinverse A + is symmetric, fulfilling R ( A + A) = R ( AA + ) A N ( AA + ) = N ( A + A) , in addition A + = A( A 2 ) A = A( A 2 ) A( A 2 ) A.
517
A6 Generalized Inverses
Various formulas of computing certain g-inverses, for instance by the method of rank factorization, exist. Let A be an n × m matrix A of rank r := rkA such that A = GF, O(G ) = n × r , O(F) = r × m . Due to the inequality r d rk G d min{r , n} = r only G posesses reflexive ginverses G r , because of I r × r = [(G cG ) 1 G c]G = [(G cG ) 1 G c](GG cr G ) = G r G represented by left inverses in the sense of G L G = I. In a similar way, all ginverses of F are reflexive and right inverses subject to Fr := F c(FF c) 1 . The whole class of reflexive g-inverses of A can be represented by A r := Fr G r = Fr G L . In this case we also find the pseudoinverse, namely A + := F c(FF c) 1 (G cG ) 1 G c because of +
R ( A A) = R (F c) A N (F) = N ( A + A) = N ( A) R ( AA + ) = R (G ) A N (G c) = N ( AA + ) = N ( A c). If we want to give up the orthogonality conditions, in case of a quadratic matrix A = GF , we could take advantage of the projections A r A = AA r we could postulate R ( A p A) = R ( AA r ) = R (G ) , N ( A cA r ) = N ( A r A) = N (F) . In consequence, if FG is a nonsingular matrix, we enjoy the representation A r := G (FG ) 1 F , which reduces in case that A is a symmetric matrix to the pseudoinverse A + . Dual methods of computing g-inverses A are based on the basis of the null space, both for F and G, or for A and A c . On the first side we need the matrix EF by FEcF = 0, rkEF = m r versus G cEG c = 0, rkEG c = n r on the other side. The enlarged matrix of the order (n + r r ) × (n + m r ) is automatically nonsingular and has the Cayley inverse
518
Appendix A: Matrix Algebra
ªA «E ¬ F
1
EG c º ª A+ =« + » 0 ¼ ¬ EG c
EF+ º » 0¼
with the pseudoinverse A + on the upper left side. Details can be derived from A. Ben – Israel and T. N. E. Greville (1974 p. 228). If the null spaces are always normalized in the sense of < EF | EcF >= I m r , < EcG c | EG c >= I n r because of + F
E = EcF < EF | EcF > 1 = EcF and E
+ Gc
=< EcG c | EG c > 1 EcG c = EcG c
ªA «E ¬ F
1
EG c º ªA+ = « 0 »¼ ¬ EcF
EG c º » . 0 ¼
These formulas gain a special structure if the matrix A is symmetric to the order O( A) . In this case EG c = EcF =: Ec , O(E) = (m r ) × m , rk E = m r and 1
ª A+ E c < E | Ec > 1 º ª A Ec º » «E 0 » = « 1 0 ¬ ¼ ¬ < E | Ec > E ¼ on the basis of such a relation, namely EA + = 0 there follows I m = AA + + Ec < E | Ec > 1 E = = ( A + EcE)[ A + + Ec(EEcEEc) 1 E] and with the projection (S - transformation) A + A = I m Ec < E | Ec > 1 E = ( A + EcE) 1 A and A + = ( A + EcE) 1 Ec(EEcEEc) 1 E pseudoinverse of A R ( A + A) = R ( AA + ) = R ( A) A N ( A) = R (Ec) . In case of a symmetric, reflexive g-inverse A rs there holds the orthogonality or complementary
519
A6 Generalized Inverses
R ( A rs A) A N ( AA rs ) N ( AA rs ) complementary to R ( AA rs ) , which is guaranteed by a matrix K , rk K = m r , O(K ) = (m r ) × m such that KEc is a non-singular matrix. At the same time, we take advantage of the bordering of the matrix A by K and K c , by a non-singular matrix of the order (2m r ) × (2m r ) . 1
ª A rs K R º ª A K cº = « ». «K 0 » ¬ ¼ ¬ (K R ) c 0 ¼ K R := Ec(KEc) 1 is the right inverse of A . Obviously, we gain the symmetric reflexive g-inverse A rs whose columns are orthogonal to K c : R( A rs A) A R(K c) = N ( AA rs ) KA rs = 0
I m = AA + K c(EK c) 1 E = rs
= ( A + K cK )[ A rs + Ec(EK cEK c) 1 E] and projection (S - transformation) A A = I m Ec(KEc) 1 K = ( A + K cK ) 1 A c , rs
A rs = ( A + K cK ) 1 Ec(EK cEK c) 1 E. symmetric reflexive g-inverse For the special case of a symmetric and positive semidefinite m × m matrix A the matrix set U and V are reduced to one. Based on the various matrix decompositions ª- 0 º ª U1c º A = [ U1 , U 2 ] « » « » = U1AU1c , ¬ 0 0 ¼ ¬ U c2 ¼ we find the different g - inverses listed as following. ª-1 A = [ U1 , U 2 ] « ¬ L 21
L12 º ª U1c º »« ». L 21-L12 ¼ ¬ U c2 ¼
Lemma (g-inverses of symmetric and positive semidefinite matrices): (i)
ª-1 A = [ U1 , U 2 ] « ¬ L 21
L12 º ª U1c º » « », L 22 ¼ ¬ U c2 ¼
520
Appendix A: Matrix Algebra
(ii) reflexive g-inverse L12 º ª U1c º »« » L 21-L12 ¼ ¬ U c2 ¼
ª-1 A r = [ U1 , U 2 ] « ¬ L 21
(iii) reflexive and symmetric g-inverse ª-1 L12 º ª U1c º A rs = [ U1 , U 2 ] « »« » ¬ L12 L12 -L12 ¼ ¬ U c2 ¼ (iv) pseudoinverse ª-1 A + = [ U1 , U 2 ] « ¬ 0
0 º ª U1c º 1 » « c » = U1- U1 . U 0¼ ¬ 2 ¼
We look at a representation of the Moore-Penrose inverse in terms of U 2 , the basis of the null space N ( A A) . In these terms we find E := U1
ªA ¬« U c2
1
+ U2 º = ª« A 0 ¼» ¬ U c2
U2 º , 0 »¼
by means of the fundamental relation of A + A A + A = lim( A + G I m ) 1 A = AA + = I m U 2 U c2 = U1U1c , G o0
we generate the fundamental relation of the pseudo inverse A + = ( A + U 2 U c2 ) 1 U 2 U c2 . The main target of our discussion of various g-inverses is the easy handling of representations of solutions of arbitrary linear equations and their characterizations. We depart from the solution of a consistent system of linear equations, Ax = c, O( A) = n × m,
c R ( A) x = A c
for any g-inverse A .
x = A c is the general solution of such a linear system of equations. If we want to generate a special g - inverse, we can represent the general solution by x = A c + (I m A A ) z
for all z R m ,
since the subspaces N ( A) and R (I m A A ) are identical. We test the consistency of our system by means of the identity AA c = c . c is mapped by the projection AA to itself. Similary we solve the matrix equation AXB = C by the consistency test: the existence of the solution is granted by the identity
521
A6 Generalized Inverses
AA CB B = C for any g-inverse A and B . If this condition is fulfilled, we are able to generate the general solution by X = A CB + Z A AZBB , where Z is an arbitrary matrix of suitable order. We can use an arbitrary ginverse A and B , for instance the pseudoinverse A + and B + which would be for Z = 0 coincide with two-sided orthogonal projections. How can we reduce the matrix equation AXB = C to a vector equation? The vec-operator is the door opener. AXB = C
(Bc
A) vec X = vec C .
The general solution of our matrix equation reads vec X = (Bc
A) vec C + [I (Bc
A) (Bc
A)] vec Z . Here we can use the identity ( A
B) = B
A , generated by two g-inverses of the Kronecker-Zehfuss product. At this end we solve the more general equation Ax = By of consistent type R ( A) R (B) by Lemma (consistent system of homogenous equations Ax = By ): Given the homogenous system of linear equations Ax = By for y R A constraint by By R ( A ) . Then the solution x = Ly can be given under the condition R ( A ) R (B ) . In this case the matrix L may be decomposed by L = A B for a certain g-inverse A .
Appendix B: Matrix Analysis A short version on matrix analysis is presented. Arbitrary derivations of scalarvalued, vector-valued and matrix-valued vector – and matrix functions for functionally independent variables are defined. Extensions for differenting symmetric and antisymmetric matrices are given. Special examples for functionally dependent matrix variables are reviewed.
B1 Derivatives of Scalar valued and Vector valued Vector Functions Here we present the analysis of differentiating scalar-valued and vector-valued vector functions enriched by examples. Definition: (derivative of scalar valued vector function): Let a scalar valued function f (x) of a vector x of the order O(x) = 1× m (row vector) be given, then we call Df (x) = [D1 f (x),… , Dm f ( x)] :=
wf wxc
first derivative of f (x) with respect to xc . Vector differentiation is based on the following definition. Definition: (derivative of a matrix valued matrix function): Let a n × q matrix-valued function F(X) of a m × p matrix of functional independent variables X be given. Then the nq × mp Jacobi matrix of first derivates of F is defined by J F = DF(X) :=
wvecF(X) . w (vecX)c
The definition of first derivatives of matrix-functions can be motivated as following. The matrices F = [ f ij ] R n × q and X = [ xk A ] R m × p are based on twodimensional arrays. In contrast, the array of first derivatives ª wf ij º n× q× m× p « » = ª¬ J ijk A º¼ R w x ¬ kA ¼ is four-dimensional and automatic outside the usual frame of matrix algebra of two-dimensional arrays. By means of the operations vecF and vecX we will vectorize the matrices F and X. Accordingly we will take advantage of vecF(X) of the vector vecX derived with respect to the matrix J F , a two-dimensional array.
B2 Derivatives of Trace Forms
523
Examples (i) f (x) = xcAx = a11 x12 + (a12 + a21 ) x1 x2 + a22 x22 wf = wxc = [2a11 x1 + (a12 + a21 ) x2 | (a12 + a21 ) x1 + 2a22 x2 ] = xc( A + Ac) Df (x) = [D1 f (x), D 2 f (x)] =
ªa x + a x º (ii) f ( x) = Ax = « 11 1 12 2 » ¬ a21 x1 + a22 x2 ¼ J F = Df (x) =
wf ª a11 =« wxc ¬ a21
ª x2 + x x (iii) F(X) = X 2 = « 11 12 21 ¬ x21 x11 + x22 x21
a12 º =A a22 »¼
x11 x12 + x12 x22 º 2 » x21 x12 + x22 ¼
ª x112 + x12 x21 º « » x x +x x vecF(X) = « 21 11 22 21 » «x x + x x » « 11 12 12 2 22 » ¬« x21 x12 + x22 ¼» (vecX)c = [ x11 , x21 , x12 , x22 ] ª 2 x11 « wvecF(X) « x21 J F = DF(X) = = w (vecX)c « x12 « ¬ 0
x12 x11 + x22 0 x12
x21 0 x11 + x22 x21
0 º x21 »» x12 » » 2 x22 ¼
O(J F ) = 4 × 4 .
B2 Derivatives of Trace Forms Up to now we have assumed that the vector x or the matrix X are functionally idempotent. For instance, the matrix X cannot be a symmetric matrix X = [ xij ] = [ x ji ] = Xc or an antisymmetric matrix X = [ xij ] = [ x ji ] = Xc . In case of a functional dependent variables, for instance xij = x ji or xij = x ji we can take advantage of the chain rule in order to derive the differential procedure. ª A c, if X consists of functional independent elements; w « tr( AX) = « Ac + A - Diag[a11 ,… , ann ], if the n × n matrix X is symmetric; wX «¬ A c A, if the n × n matrix X is antisymmetric.
524
Appendix B: Matrix Analysis
ª[vecAc]c, if X consists of functional independent elements; «[vec(Ac + A - Diag[a ,…, a ])]c, if the n × n matrix X is w 11 nn tr( AX) = « « symmetric; w(vecX) « ¬[vec(Ac A)]c, if the n × n matrix X is antisymmetric. for instance ªa A = « 11 ¬ a21
a12 º ªx , X = « 11 » a22 ¼ ¬ x21
x12 º . x22 »¼
Case # 1: “the matrix X consists of functional independent elements” ª w « wx w = « 11 wX « w « wx ¬ 21
w º wx12 » », w » wx22 »¼
ªa w tr( AX) = « 11 wX ¬ a12
a21 º = Ac. a22 »¼
Case # 2: “the n × n matrix X is symmetric : X = Xc “ x12 = x21 tr( AX ) = a11 x11 + ( a12 + a21 ) x21 + a22 x22 ª w « wx w = « 11 wX « w « wx ¬ 21 ª a11 w tr( AX) = « wX ¬ a12 + a21
dx21 w º ª w dx12 wx21 » « wx11 »=« w » « w wx22 »¼ «¬ wx21
w º wx21 » » w » wx22 »¼
a12 + a21 º = A c + A Diag(a11 ,… , ann ) . a22 »¼
Case # 3: “the n × n matrix X is antisymmetric : X = X c ” x11 = x22 = 0, x12 = x21 tr( AX) = (a12 a21 ) x21 ª w « wx w = « 11 wX « w « ¬ wx21
dx21 w º ª w dx12 wx21 » « wx11 »=« » « w w » « wx22 ¼ ¬ wx21
ª 0 w tr( AX) = « wX ¬ a12 a21
w º wx21 » » w » wx22 »¼
a12 + a21 º » = Ac A . 0 ¼
525
B2 Derivations of Trace Forms
Let us now assume that the matrix X of variables xij is always consisting of functionally independent elements. We note some useful identities of first derivatives. Scalar valued functions of vectors w (acx) = ac w xc
(B1)
w (xcAx) = Xc( A + Ac). w xc
(B2)
Scalar-valued function of a matrix: trace w tr(AX) = Ac ; wX
(B3)
especially: w acXb w tr(bacX) = = b c
ac ; w (vecX)c w (vecX)c w tr(XcAX) = ( A + A c) X ; wX
(B4)
especially: w tr(XcX) = 2(vecX)c . w (vecX)c w tr(XAX) = XcA c + A cXc , wX
(B5)
especially: w trX 2 = 2(vecXc)c . w (vecX)c w tr(AX 1 ) = ( X 1AX 1 ), if X is nonsingular, wX especially: 1
w tr(X ) = [vec(X 2 )c]c ; w (vecX)c w acX 1b w tr(bacX 1 ) = = bc( X 1 )c
acX 1 . c c w (vecX) w (vecX)
(B6)
526
Appendix B: Matrix Analysis
w trXD = D ( Xc)D 1 , if X is quadratic ; wX
(B7)
especially: w trX = (vecI)c . w (vecX)c
B3 Derivatives of Determinantal Forms The scalarvalued forms of matrix determinantal form will be listed now. w | AXBc |= A c(adjAXBc)cB =| AXBc | A c(BXcA c) 1 B, wX if AXBc is nonsingular ;
(B8)
especially: wacxb = bc
ac, where adj(acXb)=1 . w (vecX)c w | AXBXcC |= C(adjAXBXcC) AXB + Ac(adjAXBXcC)cCXBc ; wX
(B9)
especially: w | XBXc |= (adjXBXc)XB + (adjXB cXc) XB c ; wX w | XSXc | = 2(vecX)c(S
adjXSXc), if S is symmetric; w (vecX)c w | XXc | = 2(vecX)c(I
adjXXc) . w (vecX)c w | AXcBXC |= BXC(adjAXcBXC) A + BcXA c(adjAXcBXC)cCc ; wX especially: w | XcBX |= BX(adjXcBX) + BcX(adjXcBcX) ; wX w | XcSX | = 2(vecX)c(adjXcSX
S), if S is symmetric; w (vecX)c w | XcX | = 2(vecX)c(adjXcX
I ) . w (vecX)c
(B10)
B4 Derivatives of a Vector/Matrix Function of a Vector/Matrix w | AXBXC |= BcXcA c(adjAXBXC)cCc + A c(adjAXBXC)cCcXcBc ; wX w | XBX |= BcXc(adjXBX)c + (adjXBX)cXB c ; wX
527 (B11)
especially: 2
w|X | = (vec[Xcadj(X 2 )c + adj(X 2 )cXc])c = w (vecX)c =| X |2 (vec[X c(X c) 2 + (X c) 2 X c])c = 2 | X |2 [vec(X 1 ) c]c, if X is non-singular . w | XD |= D | X |D ( X 1 )c, D N if X is non-singular , wX
(B12)
w|X| =| X | (X 1 )c if X is non-singular; wX especially: w|X| = [vec(adjXc)]c. w (vecX)c
B4 Derivatives of a Vector/Matrix Function of a Vector/Matrix If we differentiate the vector or matrix valued function of a vector or matrix, we will find the results of type (B13) – (B20). vector-valued function of a vector or a matrix w AX = A wxc
(B13)
w w (ac
A)vecX AXa = = ac
A w (vecX)c w (vecX)c
(B14)
matrix valued function of a matrix w (vecX) = I mp for all X R m × p w (vecX)c
(B15)
w (vecXc) = K m p for all X R m × p w (vecX)c
(B16)
where K m p is a suitable commutation matrix w (vecXX c) = ( I m +K m m )(X
I m ) for all X R m × p , w (vecX )c 2
where the matrix I m +K mm is symmetric and idempotent, 2
528
Appendix B: Matrix Analysis
w (vecXcX) = (I p +K p p )(I p
Xc) for all X R m × p w (vecX)c 2
w (vecX 1 ) = ( X 1 )c if X is non-singular w (vecX)c w (vecXD ) D = ¦ (Xc)D -j
X j 1 for all D N , if X is a square matrix. w (vecX)c j =1
B5 Derivatives of the Kronecker – Zehfuss product Act a matrix-valued function of two matrices X and Y as variables be given. In particular, we assume the function F(X, Y) = X
Y for all X R m × p , Y R n × q as the Kronecker – Zehfuss product of variables X and Y well defined. Then the identities of the first differential and the first derivative follow: dF(X, Y) = (dX)
Y + X
dY, dvecF(X, Y) = vec( dX
Y) + vec(X
dY), vec( dX
Y) = (I p
K qm
I n ) (vecdX
vecY) = = (I p
K qm
I n ) (I mp
vecY) d (vecX) = = (I p
[K qm
I n ) (I m
vecY)]) d (vecX), vec(X
dY) = (I p
K qm
I n ) (vecX
vecdY) = = (I p
K qm
I n ) (vecX
I nq ) d (vecY) = = ([( I p
K qm ) (vecX
I q )]
I n ) d (vecY), w vec(X
Y) = I p
[(K qm
I n) (I m
vecY)] , w (vecX)c w vec(X
Y) = (I p
K qm ) (vecX
I q )]
I n . w (vecY)c
B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Matrix Functions Many matrix functions f ( X) or F(X) force us to pay attention to dependencies within the variables. As examples we treat here first derivatives of symmetric or antisymmetric matrix functions of X.
B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Matrix Functions
Definition: (derivative of a matrix-valued symmetric matrix function): Let F(X) be an n × q matrix-valued function of an m × m symmetric matrix X = X c . The nq × m( m + 1) / 2 Jacobi matrix of first derivates of F is defined by wvecF(X) . w (vechX )c
J Fs = DF(X = X c) :=
Definition: (derivative of matrix valued antisymmetric matrix function): Let F(X) be an n × q matrix-valued function of an m × m antisymmetric matrix X = X c . The nq × m( m 1) / 2 Jacobi matrix of first derivates of F is defined by J aF = DF(X = X c) :=
wvecF(X) . w (veckX )c
Examples (i) Given is a scalar-valued matrixfunction tr(AX ) of a symmetric variable matrix X = X c , for instance a A = ª« 11 a ¬ 21
a12 º x , X = ª« 11 a22 »¼ x ¬ 21
ª x11 º x12 º , vechX = «« x22 »» x22 »¼ «¬ x33 »¼
tr(AX ) = a11 x11 + (a12 + a 21 )x 21 + a22 x22 w w w w =[ , , ] w (vechX )c wx11 wx21 wx22 w tr(AX) = [a11 , a12 + a21 , a22 ] w (vechX)c w tr(AX) w tr(AX) = [vech(A c + A Diag[a11 ,… , ann ])]c=[vech ]c. c w (vechX) wX (ii) Given is scalar-valued matrix function tr(AX) of an antisymmetric variable matrix X = Xc , for instance a A = ª« 11 a ¬ 21
a12 º 0 , X = ª« a22 »¼ x ¬ 21
x21 º , veckX = x21 , 0 »¼
tr(AX) = (a12 a 21 )x 21
529
530
Appendix B: Matrix Analysis
w w w tr(AX) = , = a12 a21 , w (veckX)c wx21 w (veckX)c w tr(AX) w tr(AX) = [veck(A c A)]c=[veck ]c . w (veckX)c wX
B7 Higher order derivatives Up to now we computed only first derivatives of scalar-valued, vector-valued and matrix-valued functions. Second derivatives is our target now which will be needed for the classification of optimization problems of type minimum or maximum. Definition: (second derivatives of a scalar valued vector function): Let f (x) a scalar-valued function of the n × 1 vector x . Then the m × m matrix w2 f DDcf (x) = D( Df ( x))c := wxwxc denotes the second derivatives of f ( x ) to x and xc . Correspondingly w w D2 f (x) :=
f (x) = (vecDDc) f ( x) wxc wx denotes the 1 × m 2 vector of second derivatives. and Definition: (second derivative of a vector valued vector function): Let f (x) be an n × 1 vector-valued function of the m × 1 vector x . Then the n × m 2 matrix of second derivatives H f = D2f (x) = D(Df (x)) =:
w w w 2 f ( x)
f ( x) = wxc wx wxcwx
is the Hesse matrix of the function f (x) . and Definition: (second derivatives of a matrix valued matrix function): Let F(X) be an n × q matrix valued function of an m × p matrix of functional independent variables X . The nq × m 2 p 2 Hesse matrix of second derivatives of F is defined by H F = D2 F(X) = D(DF(X)):=
w w w 2 vecF(X)
vecF(X) = . w (vecX)c w (vecX)c w (vecX)c
w(vecX)c
531
B7 Higher order derivatives
The definition of second derivatives of matrix functions can be motivated as follows. The matrices F = [ f ij ] R n×q and X = [ xk A ] R m × p are the elements of a two-dimensional array. In contrast, the array of second derivatives [
w 2 f ij wxk A wx pq
] = [kijk Apq ] R n × q × m × p × m × p
is six-dimensional and beyond the common matrix algebra of two-dimensional arrays. The following operations map a six-dimensional array of second derivatives to a two-dimensional array. (i) vecF(X) is the vectorized form of the matrix valued function (ii) vecX is the vectorized form of the variable matrix w w
w (vecX )c w (vecX )c vectorizes the matrix of second derivatives
(iii) the Kronecker – Zehfuss product
(iv) the formal product of the 1 × m 2 p 2 row vector of second derivatives with the nq ×1 column vector vecF(X) leads to an nq × m 2 p 2 Hesse matrix of second derivatives. Again we assume the vector of variables x and the matrix of variables X consists of functional independent elements. If this is not the case we according to the chain rule must apply an alternative differential calculus similary to the first deri-vative, case studies of symmetric and antisymmetric variable matrices. Examples: (i) f (x) = xcAx = a11 x12 + (a12 + a21 ) x1 x2 + a22 x22 Df (x) =
wf = [2a11 x1 + (a12 + a21 ) x2 | (a12 + a21 ) x1 + 2a22 x2 ] wxc
D2 f (x) = D(Df (x))c =
(ii)
ª 2a11 w2 f =« wxwxc ¬ a12 + a21
a12 + a21 º = A + Ac 2a22 »¼
ªa x + a x º f (x) = Ax = « 11 1 12 2 » ¬ a21 x1 + a22 x2 ¼ Df (x) =
DDcf (x) =
wf ª a11 =« wxc ¬ a21
a12 º =A a22 »¼
ª0 0º w 2f =« , O(DDcf (x)) = 2 × 2 c wxwx ¬0 0 »¼
D2 f (x) = [0 0 0 0], O(D2 f (x)) = 1 × 4
532
Appendix B: Matrix Analysis
(iii)
ª x2 + x x F(X) = X 2 = « 11 12 21 ¬ x21 x11 + x22 x21
x11 x12 + x12 x22 º 2 » x21 x12 + x22 ¼
ª x112 + x12 x21 º « » x21 x11 + x22 x21 » « vecF(X) = , O (F ) = O ( X) = 2 × 2 « x11 x12 + x12 x22 » « » 2 «¬ x21 x12 + x22 »¼ (vecX)c = [ x11 , x21 , x12 , x22 ] ª 2 x11 « w vecF(X) « x21 = JF = w (vecX)c « x12 « «¬ 0
x12 x11 + x22
x21 0
0 x12
x11 + x22 x21
0 º x21 »» x12 » » 2 x22 »¼
O(J F ) = 4 × 4 HF =
w w w w w w
vecF(X) = [ , , , ]
JF = w (vecX)c w (vecX)c wx11 wx21 wx12 wx22
ª2 « 0 =« «0 « ¬« 0
0 1 0 0
0 0 1 0
0 0 0 0
0 1 0 0
0 0 0 0
1 0 0 1
0 1 0 0
0 0 1 0
1 0 0 1
0 0 0 0
0 0 1 0
0 0 0 0
0 1 0 0
0 0 1 0
0º » 0» 0» » 2 ¼»
O(H F ) = 4 × 16 . At the end, we want to define the derivative of order l of a matrix-valued matrix function whose structure is derived from the postulate of a suitable array. Definition ( l-th derivative of a matrix-valued matrix function): Let F(X) be an n × q matrix valued function of an m × p matrix of functional independent variables X. The nq × ml p l matrix of l-th derivative is defined by Dl F(X) := =
w w
…
vecF(X) = w (vecX)c l -times w (vecX)c
wl vecF(X) for all l N . w (vecX)c
…
(vecX)c l -times
Appendix C: Lagrange Multipliers ?How can we find extrema with side conditions? We generate solutions of such external problems first on the basis of algebraic manipulations, namely by the lemma of implicit functions, and secondly by a geometric tool box, by means of interpreting a risk function and side conditions as level surfaces (specific normal images, Lagrange multipliers).
C1 A first way to solve the problem A first way to find extreme with side conditions will be based on a risk function f ( x1 ,..., xm ) = extr
(C1)
with unknowns ( x1 ,..., xm ) \ m , which are restricted by side conditions of type
[ F1 ( x1 ,..., xm ), F2 ( x1 ,..., xm ),..., Fr ( x1 ,..., xm ) ]c = 0 rk(
wFi ) = r < m. wxm
(C2) (C3)
The side conditions Fi ( x j ) (i = 1,..., r , j = 1,..., m) are reduced by the lemma of the implicit function: solve for xm r +1 = G1 ( x1 ,..., xm r ) xm r +2 = G2 ( x1 ,..., xmr ) ... xm1 = Gr 1 ( x1 ,..., xm r )
(C4)
xm = Gr ( x1 ,..., xmr ) and replace the result within the risk function f ( x1 , x2 ,..., xm r , G1 ( x1 ,..., xm r ),..., Gr ( x1 ,..., xm1 )) = extr .
(C5)
The “free” unknowns ( x1 , x2 ,..., xm r 1 , xm r ) \ m r can be found by taking the result of the implicit function theorem as follows. Lemma C1 (“implicit function theorem”): Let ȍ be an open set of \ m = \ m r × \ r and F : ȍ o \ r with vectors x1 \ m r and x 2 \ m r . The maps
534
Appendix C: Lagrange Multipliers
ª F1 ( x1 ,..., xm r ; xm r +1 ,..., xm ) º « F2 ( x1 ,..., xm r ; xm r +1 ,..., xm ) » « » (x1 , x 2 ) 6 F(x1 , x 2 ) = « (C6) ... » « Fr 1 ( x1 ,..., xm r ; xm r +1 ,..., xm ) » «¬ Fr ( x1 ,..., xm r ; xm r +1 ,..., xm ) »¼ transform a continuously differential function with F(x1 , x 2 ) = 0 . In case of a Jacobi determinant j not zero or a Jacobi matrix J of rank r, or w ( F1 ,..., Fr ) (C7) , w ( xm r +1 ,..., xm ) there exists a surrounding U := U(x1 ) \ m r and V := UG (x 2 ) \ r such that the equation F (x1 , x 2 ) = 0 for any x1 U in V c has only one solution j := det J z 0 or rk J = r , J :=
ª xm r +1 º ª G1 ( x1 ,..., xm r ) º « xm r + 2 » « G 2 ( x1 ,..., xm r ) » « » « » x 2 = G (x1 ) or « ... » = « ... ». x G ( x ,..., x ) « m 1 » « r 1 1 mr » «¬ xm »¼ «¬ G r ( x1 ,..., xm r ) »¼
(C8)
The function G : U o V is continuously differentiable. A sample reference is any literature treating analysis, e.g. C. Blotter . Lemma C1 is based on the Implicit Function Theorem whose result we insert within the risk function (C1) in order to gain (C5) in the free variables ( x1 , ..., xm r ) \ m r . Our example C1 explains the solution technique for finding extreme with side conditions within our first approach. Lemma C1 illustrates that there exists a local inverse of the side conditions towards r unknowns ( xm r +1 , xm r + 2 ,..., xm 1 , xm ) \ r which in the case of nonlinear side conditions towards r unknowns ( xm r +1 , xm r + 2 ,..., xm 1 , xm ) \ r which in case of nonlinear side conditions is not necessary unique. :Example C1: Search for the global extremum of the function f ( x1 , x2 , x3 ) = f ( x, y , z ) = x y z subject to the side conditions ª F1 ( x1 , x2 , x3 ) = Z ( x, y, z ) := x 2 + 2 y 2 1 = 0 « F ( x , x , x ) = E ( x, y , z ) := 3x 4 z = 0 ¬ 2 1 2 3
(elliptic cylinder) (plane)
C1 A first way to solve the problem
J=(
535
wFi ª2 x 4 y 0 º )=« , rk J ( x z 0 oder y z 0) = r = 2 wx j ¬ 3 0 4 »¼
1 ª 2 «1 y = + 2 2 1 x F1 ( x1 , x2 , x3 ) = Z ( x, y, z ) = 0 « «2 y = 1 2 1 x2 «¬ 2 3 F2 ( x1 , x2 , x3 ) = E ( x, y, z ) = 0 z = x 4 1 3 2 1 x2 , ) = 1 f ( x1 , x2 , x3 ) = 1 f ( x, y , z ) = f ( x, + 2 4 x 1 = 2 1 x2 4 2 1 3 2 1 x2 , ) 2 f ( x1 , x2 , x3 ) = 2 f ( x, y , z ) = f ( x, 2 4 x 1 = + 2 1 x2 4 2 x 1 1 1 c + = 0 1 x = 2 1 f ( x) = 0 4 2 3 1 x2 x 1 1 1 + = 0 2 x = + 2 2 4 2 3 1 x 1 3 1 3 (minimum), 2 f ( ) = + ( maximum). 1 f ( ) = 3 4 3 4 2
f ( x )c = 0
At the position x = 1/ 3, y = 2 / 3, z = 1/ 4 we find a global minimum, but at the position x = +1/ 3, y = 2 / 3, z = 1/ 4 a global maximum. An alternative path to find extreme with side conditions is based on the geometric interpretation of risk function and side conditions. First, we form the conditions F1 ( x1 ,… , xm ) = 0 º wFi F2 ( x1 ,… , xm ) = 0 » )=r » rk( … wx j » Fr ( x1 ,… , xm ) = 0 »¼ by continuously differentiable real functions on an open set ȍ \ m . Then we define r equations Fi ( x1 ,..., xm ) = 0 for all i = 1,..., r with the rank conditions rk(wFi / wx j ) = r , geometrically an (m-1) dimensional surface M F ȍ which can be seen as a level surface. See as an example our Example C1 which describe as side conditions
536
Appendix C: Lagrange Multipliers
F1 ( x1 , x2 , x3 ) = Z ( x, y , z ) = x 2 + 2 y 2 1 = 0 F2 ( x1 , x2 , x3 ) = E ( x, y , z ) = 3x 4 z = 0 representing an elliptical cylinder and a plane. In this case is the (m-r) dimensional surface M F the intersection manifold of the elliptic cylinder and of the plane as the m-r =1 dimensional manifold in \ 3 , namely as “spatial curve”. Secondly, the risk function f ( x1 ,..., xm ) = extr generates an (m-1) dimensional surface M f which is a special level surface. The level parameter of the (m-1) dimensional surface M f should be external. In our Example C1 one risk function can be interpreted as the plane f ( x1 , x2 , x3 ) = f ( x, y , z ) = x y z . We summarize our result within Lemma C2. Lemma C2 (extrema with side conditions) The side conditions Fi ( x1 ,… , xm ) = 0 for all i {1,… , r} are built on continuously differentiable functions on an open set ȍ \ m which are subject to the side conditions rk(wFi / wx j ) = r generating an (m-r) dimensional level surface M f . The function f ( x1 ,… , xm ) produces certain constants, namely an (m-1) dimensional level surface M f . f ( x1 ,… , xm ) is geometrically as a point p M F conditionally extremal (stationary) if and only if the (m-1) dimensional level surface M f is in contact to the (m-r) dimensional level surface in p. That is there exist numbers O1 ,… , Or , the Lagrange multipliers, by r
grad f ( p ) = ¦ i =1 Oi grad Fi ( p ). The unnormalized surface normal vector grad f ( p ) of the (m-1) dimensional level surface M f in the normal space `M F of the level surface M F is in the unnormalized surface normal vector grad Fi ( p ) in the point p . To this equation belongs the variational problem
L ( x1 ,… , xm ; O1 ,… , Or ) = r
f ( x1 ,… , xm ) ¦ i =1 Oi Fi ( x1 ,… , xm ) = extr . :proof: First, the side conditions Fi ( x j ) = 0, rk(wFi / wx j ) = r for all i = 1,… , r ; j = 1,… , m generate an (m-r) dimensional level surface M F whose normal vectors ni ( p ) := grad Fi ( p ) ` p M F
(i = 1,… , r )
span the r dimensional normal space `M of the level surface M F ȍ . The r dimensional normal space ` p M F of the (m-r) dimensional level surface M F
537
C1 A first way to solve the problem
is orthogonal complement Tp M p to the tangent space Tp M F \ m1 of M F in the point p spanned by the m-r dimensional tangent vectors t k ( p ) :=
wx wx k
Tp M F
(k = 1,..., m r ).
x= p
:Example C2: Let the m r = 2 dimensional level surface M F of the sphere S r2 \ 3 of radius r (“level parameter r 2 ”) be given by the side condition F ( x1 , x2 , x3 ) = x12 + x2 2 + x32 r 2 = 0. :Normal space: ª 2 x1 º wF wF wF + e2 + e3 = [e1 , e 2 , e 3 ] « 2 x2 » . «2 x » wx1 wx2 wx3 ¬ 3¼p 3 The orthogonal vectors [e1 , e 2 , e 3 ] span \ . The normal space will be generated locally by a normal vector n( p ) = grad F ( p ). n( p ) = grad F ( p ) = e1
:Tangent space: The implicit representation is the characteristic element of the level surface. In order to gain an explicit representation, we take advantage of the Implicit Function Theorem according to the following equations. F ( x1 , x2 , x3 ) = 0 º » wF rk( ) = r = 1 » x3 = G ( x1 , x2 ) wx j »¼ x12 + x22 + x32 r = 0 and (
wF wF ) = [2 x1 + 2 x2 + 2 x3 ], rk( ) =1 wx j wx j
x j = G ( x1 , x2 ) = + r 2 ( x12 + x2 2 ) . The negation root leads into another domain of the sphere: here holds the do2 2 main 0 < x1 < r , 0 < x2 < r , r 2 ( x1 + x2 ) > 0. The spherical position vector x( p ) allows the representation x( p ) = e1 x1 + e 2 x2 + e 3 r 2 ( x12 + x22 ) , which is the basis to produce
538
Appendix C: Lagrange Multipliers
ª ª « « x1 wx « t1 ( p ) = ( p ) = e1 e3 = [e1 , e 2 , e3 ] « 2 2 2 wx2 « « r ( x1 + x2 ) « «¬ « ª « « x2 wx « = [e1 , e 2 , e3 ] « «t1 ( p ) = wx ( p ) = e 2 e3 2 2 2 « r ( x1 + x2 ) 2 « «¬ «¬
1 0 x1 r 2 ( x12 + x2 2
0 1 x2 r 2 ( x12 + x2 2
º » » » )» ¼ º » », » )» ¼
which span the tangent space Tp M F = \ 2 at the point p. :The general case: In the general case of an ( m r ) dimensional level surface M F , implicitly produced by r side conditions of type F1 ( x1 ,..., xm ) = 0 º F2 ( x1 ,..., xm ) = 0 » wFi » ... » rk ( wx ) = r , j Fr j ( x1 ,..., xm ) = 0 » » Fr ( x1 ,..., xm ) = 0 ¼ the explicit surface representation, produced by the Implicit Function Theorem, reads x ( p ) = e1 x1 + e 2 x2 + ... + e m r xm r + e m r +1G1 ( x1 ,..., xmr ) + ... + e mGr ( x1 ,..., xmr ). The orthogonal vectors [e1 ,..., e m ] span \ m . Secondly, the at least once conditional differentiable risk function f ( x1 ,..., xm ) for special constants describes an (m 1) dimensional level surface M F whose normal vector n f := grad f ( p ) ` p M f spans an one-dimensional normal space ` p M f of the level surface M f ȍ in the point p . The level parameter of the level surface is chosen in the extremal case that it touches the level surface M f the other level surface M F in the point p . That means that the normal vector n f ( p ) in the point p is an element of the normal space ` p M f . Or we may say the normal vector grad f ( p ) is a linear combination of the normal vectors grad Fi ( p ) in the point p, r
grad f ( p ) = ¦ i =1 Oi grad Fi ( p ) for all i = 1,..., r , where the Lagrange multipliers Oi are the coordinates of the vector grad f ( p ) in the basis grad Fi ( p ).
539
C1 A first way to solve the problem
:Example C3: Let us assume that there will be given the point X \ 3 . Unknown is the point in the m r = 2 dimensional level surface M F of type sphere S r 2 = \ 3 which is from the point X \ 3 at extremal distance, either minimal or maximal. The distance function || X x ||2 for X \ 3 and X S r 2 describes the risk function f ( x1 , x2 , x3 ) = ( X 1 x1 ) 2 + ( X 2 x2 ) 2 + ( X 3 x3 ) 2 = R 2 = extr , x1 , x2 , x3
which represents an m 1 = 2 dimensional level surface M f of type sphere S r 2 \ 3 at the origin ( X 1 , X 2 , X 3 ) and level parameter R 2 . The conditional extremal problem is solved if the sphere S R 2 touches the other sphere S r 2 . This result is expressed in the language of the normal vector. n( p ) := grad f ( p ) = e1
wf wf wf + e2 + e3 = wx1 wx2 wx3
ª 2( X 1 x1 ) º = [e1 , e 2 , e 3 ] « 2( X 2 x2 ) » N p M f « 2( X x ) » 3 3 ¼p ¬ ª 2 x1 º n( p ) := grad F ( p ) = [e1 , e 2 , e 3 ] « 2 x2 » «2 x » ¬ 3¼ is an element of the normal space N p M f . The normal equation grad f ( p ) = O grad F ( p ) leads directly to three equations xi X 0 = O xi xi (1 O ) = X i
(i = 1, 2,3) ,
which are completed by the fourth equation F ( x1 , x2 , x3 ) = x12 + x2 2 + x3 2 r 2 = 0. Lateron we solve the 4 equations. Third, we interpret the differential equations r
grad f ( p ) = ¦ i =1 Oi grad Fi ( p ) by the variational problem, by direct differentiation namely
540
Appendix C: Lagrange Multipliers
L ( x1 ,..., xm ; O1 ,..., Or ) = r
= f ( x1 ,..., xm ) ¦ i =1 Oi Fi ( x1 ,..., xm ) =
extr
x1 ,..., xm ; O1 ,..., Or
wFi wf r ª wL « wx = wx ¦ i =1 Oi wx = 0 ( j = 1,..., m) j j « i « wL (i = 1,..., r ). « wx = Fi ( x j ) = 0 k ¬ :Example C4: We continue our third example by solving the alternative system of equations.
L ( x1 , x2 , x3 ; O ) = ( X 1 x1 ) 2 + ( X 2 x2 ) 2 + ( X 3 x3 ) O ( x12 + x22 + x32 r 2 ) = extr
x1 , x2 , x3 ; O
wL º = 2( X j x j ) 2O x j = 0 » wx j » wL » 2 2 2 2 = x1 + x2 + x3 r = 0 » wO ¼ X X º x1 = 1 ; x2 = 2 » 1 O 1 O » x12 + x22 + x32 r 2 = 0 ¼ X 12 + X 2 2 + X 32 r 2 = 0 (1 O ) 2 r 2 + X 12 + X 2 2 + X 32 = 0 (1 O ) 2 (1 O ) 2 =
X 12 + X 2 2 + X 32 1 X 12 + X 2 2 + X 32 1 O1, 2 = ± r r2
O1, 2 = 1 ±
r ± X 12 + X 2 2 + X 32 1 X 12 + X 2 2 + X 32 = r r rX 1 ( x1 )1, 2 = ± , X 12 + X 2 2 + X 32 ( x2 )1, 2 = ± ( x3 )1, 2 = ±
rX 2 X 12 + X 2 2 + X 32 rX 3 X + X 2 2 + X 32 2 1
, .
The matrix of second derivatives H decides upon whether at the point ( x1 , x2 , x3 , O )1, 2 we enjoy a maximum or minimum.
541
C1 A first way to solve the problem
H=(
w 2L ) = (G jk (1 O )) = (1 O )I 3 wx j xk
H (1 O > 0) > 0 ( minimum) º ( x1 , x2 , x3 ) is the point of minimum »¼
ª H(1 O < 0) < 0 ( maximum) «( x , x , x ) is the point of maximum . ¬ 1 2 3
Our example illustrates how we can find the global optimum under side conditions by means of the technique of Lagrange multipliers. :Example C5: Search for the global extremum of the function f ( x1 , x2 , x3 ) subject to two side conditions F1 ( x1 , x2 , x3 ) and F2 ( x1 , x2 , x3 ) , namely f ( x1 , x2 , x3 ) = f ( x, y , z ) = x y z (plane) ª F1 ( x1 , x2 , x3 ) = Z ( x, y, z ) := x 2 + 2 y 2 1 = 0 « F ( x , x , x ) = E ( x, y , z ) := 3x 4 z = 0 ¬ 2 1 2 3 J=(
(elliptic cylinder) (plane)
wFi ª2 x 4 y 0 º )=« , rk J ( x z 0 oder y z 0) = r = 2 . wx j ¬ 3 0 4 »¼ :Variational Problem:
L ( x1 , x2 , x3 ; O1 , O2 ) = L ( x, y, z; O , P ) = x y z O ( x + 2 y 2 1) P (3 x 4 z ) = 2
extr
x1 , x2 , x3 ; O , P
wL º = 1 2O x 3P = 0 » wx » wL 1 = 1 4O y = 0 O = » wy 4y » » wL 1 » = 1 4 P = 0 O = wz 4 » » wL » = x2 + 2 y 2 1 = 0 wO » » wL = 3 x 4 z = 0. » wP ¼ We multiply the first equation wL / wx by 4y, the second equation wL / wy by (2 x) and the third equation wL / wz by 3 and add ! 4 y 8O xy 12P y + 2 x + 8O xy 3 y + 12P y = y + 2 x = 0 .
542
Appendix C: Lagrange Multipliers
Replace in the cylinder equation (first side condition) Z(x, y, z)= x 2 + 2 y 2 1 = 0 , that is x1,2 = ±1/ 3. From the second condition of the plane (second side condition) E ( x, y, z ) = 3 x 4 z = 0 we gain z1,2 = ±1/ 4. As a result we find x1,2 , z1,2 and finally y1,2 = B 2 / 3. The matrix of second derivatives H decides upon whether at the point O 1,2 = B 3 / 8 we find a maximum or minimum. H=(
ª 2O w 2L )=« 0 wx j xk «¬ 0
0 0º 4O 0 » 0 0 »¼
ºª ª - 34 0 0 º » « H (O2 = 3 ) = « 0 - 3 0 » d 0 »« 8 « 0 02 0 » ¬ ¼ »« » «(maximum) »« 1 2 1 3 1 ( x, y, z; O , P )1 =( 13 ,- 32 , 14 ;- 83 , 14 ) » «( x, y, z; O , P ) 2 =(- 3 , 3 ,- 4 ; 8 , 4 ) is the restricted minmal solution point.»¼ «¬is the restricted maximal solution point. 3 3 ª 4 03 0 º H (O1 = ) = «0 2 0 » t 0 8 «0 0 0» ¬ ¼ (minimum)
The geometric interpretation of the Hesse matrix follows from E. Grafarend and P. Lohle (1991). The matrix of second derivatives H decides upon whether at the point ( x1 , x2 , x3 , O )1, 2 we enjoy a maximum or minimum.
Apendix D: Sampling distributions and their use: confidence intervals and confidence regions D1
A first vehichle: Transformation of random variables
If the probability density function (p.d.f.) of a random vector y = [ y1 ,… , yn ]c is known, but we want to derive the probability density function (p.d.f.) of a random vector x = [ x1 ,… , xn ]c (p.d.f.) which is generated by an injective mapping x =g(y) or xi = g i [ y1 ,… , yn ] for all i {1," , n} we need the results of Lemma D1. Lemma D1 (transformation of p.d.f.): Let the random vector y := [ y1 ,… , yn ]c be transformed into the random vector x = [ x1 ,… , xn ]c by an injective mapping x = g(y) or xi = gi [ y1 ,… , yn ] for all i {1,… , n} which is of continuity class C1 (first derivatives are continuous). Let the Jacobi matrix J x := (wg i / wyi ) be regular ( det J x z 0 ), then the inverse transformation y = g-1(x) or yi = gi1 [ x1 ,… , xn ] is unique. Let f x ( x1 ,… , xn ) be the unknown p.d.f., but f y ( y1 ,… , yn ) the given p.d.f., then f x ( x1 ,… , xn ) = f ( g11 ( x1 ,… , xn ),… , g11 ( x1 ,… , xn )) det J y with respect to the Jacobi matrix J y := [
wyi wg 1 ]=[ i ] wx j wx j
for all i, j {1,… , n} holds. Before we sketch the proof we shall present two examples in order to make you more familiar with the notation. Example D1 (“counter example”): The vector-valued random variable (y1, y2) is transformed into the vector-valued random variable (x1, x2) by means of x1 = y1 + y2 , J x := [
ª wx / wy1 wx ]= « 1 wy c ¬wx2 / wy1
x2 = y12 + y22 wx1 / wy2 º ª 1 = wx2 / wy2 »¼ «¬ 2 y1
1 º 2 y2 »¼
x12 = y12 + 2 y1 y2 + y22 , x2 + 2 y1 y2 = y12 + 2 y1 y2 + y22 x12 = x2 + 2 y1 y2 , y2 = ( x12 x2 ) /(2 y1 ) 1 x12 x2 1 , x1 y1 = y12 + ( x12 x2 ) 2 y1 2 1 y12 x1 y1 + ( x12 x2 ) = 0 2
x1 = y1 + y2 = y1 +
544
Appendix D: Sampling distributions and their use
x2 1 1 y1± = x1 ± 1 ( x12 x2 ) 2 4 2 y2± =
1 x1 B 2
x12 1 2 ( x1 x2 ) . 4 2
At first we have computed the Jacobi matrix J x, secondly we aimed at an inversion of the direct transformation ( y1 , y2 ) 6 ( x1 , x2 ) . As the detailed inversion step proves, namely the solution of a quadratic equation, the mapping x = g(y) is not injective. Example D2: Suppose (x1, x2) is a random variable having p.d.f. ªexp( x1 x2 ), x1 t 0, x2 t 0 f x ( x1 , x2 ) = « , otherwise. ¬0 We require to find the p.d.f. of the random variable (x1+ x2, x2 / x1). The transformation y1 = x1 + x2 , y2 =
x2 x1
has the inverse x1 =
y1 yy , x2 = 1 2 . 1 + y2 1 + y2
The transformation provides a one-to-one mapping between points in the first quadrant of the (x1, x2) - plane Px2 and in the first quadrant of the (y1, y2) - plane Py2 . The absolute value of the Jacobian of the transformation for all points in the first quadrant is wx1 wy1
w ( x1 , x2 ) = wx2 w ( y1 , y2 ) wy1
wx1 wy2 wx2 wy2
=
(1 + y2 ) 1 y2 (1 + y2 ) 1
y1 (1 + y2 ) 2 = y1 (1 + y2 ) 2 [(1 + y2 ) y2 ]
= y1 (1 + y2 ) 3 + y1 y2 (1 + y2 ) 3 = Hence we have found for the p.d.f. of (y1, y2)
y1 . (1 + y2 ) 2
D1
A first vehichle: Transformation of random variables
545
y1 ª exp( y1 ) , y1 > 0, y2 > 0 (1 + y2 ) 2 f y ( y1 , y2 ) = « « «¬0 , otherwise. Incidentally it should be noted that y1 and y2 are independent random variables, namely f y ( y1 , y2 ) = f1 ( y1 ) f ( y2 ) = y1 exp( y1 )(1 + y2 ) 2 .
h
Proof: The probability that the random variables y1 ,… , yn take on values in the region : y is given by
³" ³ f
y
( y1 ,… , yn )dy1 " dyn .
:y
If the random variables of this integral are transformed by the function xi = gi ( y1 ,… , yn ) for all i {1,… , n} which map the region :y onto the regions :x , we receive
³" ³ f
y
( y1 ,… , yn )dy1 " dyn =
:y
³" ³ f
y
( g11 ( x1 ,… , xn ),… , g n1 ( x1 ,… , xn )) det J y dx1 " dxn
:x
from the standard theory of transformation of hypervolume elements, namely dy1 " dyn = | det J y | dx1 " dxn or *(dy1 " dyn ) = | det J y | * (dx1 " dxn ). Here we have taken advantage of the oriented hypervolume element dy1 " dyn (Grassmann product, skew product, wedge product) and the Hodge star operator * applied to the n - differential form dy1 " dyn / n (the exterior algebra / n ). The star * : / p o / n p in R n maps a p - differential form onto a (n-p) - differential form, in general. Here p = n, n – p = 0 applies. Finally we define f x ( x1 ,… , xn ) := f ( g11 ( x1 ,… , xn ),… , g n1 ( x1 ,… , xn )) | det J y | as a function which is certainly non-negative and integrated over :x to one. h
546
Appendix D: Sampling distributions and their use
In applying the transformation theorems of p.d.f. we meet quite often the problem that the function xi = gi ( y1 ,… , yn ) for all i {1,… , n} is given but not the inverse function yi = gi1 ( x1 ,… , xn ) for all i {1,… , n} . Then the following results are helpful. Corollary D2 (Jacobian): If the inverse Jacobian | det J x | = | det(wgi / wy j ) | is given, we are able to compute. | det J y | = | det
wgi1 ( x1 ,… , xn ) | = | det J |1 wx j
= | det
wgi ( y1 ,… , yn ) 1 | . wy j
Example D3 (Jacobian): Let us continue Example D2. The inverse map ª g 1 ( y , y ) º ª x + x º 1 º wy ª 1 =« y = « 11 1 2 » = « 1 2 » » 2 wxc ¬ x2 / x1 1/ x1 ¼ ¬« g 2 ( y1 , y2 ) ¼» ¬ x2 / x1 ¼ jy = | J y | =
wy 1 x x +x = + 22 = 1 2 2 wx c x1 x1 x1
jx = | J x | = j y1 = | J y |1 =
wx x12 x2 = = 1 wy c x1 + x2 y1
allows us to compute the Jacobian Jx from Jy. The direct map ª g ( y , y ) º ª x º ª y /(1 + y2 ) º x=« 1 1 2 »=« 1»=« 1 » «¬ g 2 ( y1 , y2 ) »¼ ¬ x2 ¼ ¬ y1 y2 /(1 + y2 ) ¼ leads us to the final version of the Jacobian. jx = | J x | =
y1 . (1 + y2 ) 2
For the special case that the Jacobi matrix is given in a partitioned form, the results of Corollary D3 are useful. Corollary D3 (Jacobian): If the Jacobi matrix Jx is given in the partitioned form | J x |= ( then
wg i ªU º )=« », wy j ¬V ¼
D2
A second vehicle: Transformation of random variables
547
det J x = | det J x J xc | = det(UU c) det[VV c (VUc)(UU c) 1 UV c] if det(UU c) z 0 det J x = | det J x J xc | = det(VV c) det[UU c UV c(VV c) 1 VU c] , if det(VV c) z 0 | det J y | = | det J x |1 . Proof: The Proof is based upon the determinantal relations of a partitioned matrix of type ªA Uº 1 det « » = det A det(D VA U ) if det A z 0 V D ¬ ¼ ªA Uº 1 det « » = det D det( A UD U) if det D z 0 V D ¬ ¼ ªA Uº det « » = D det A V (adj A )U , ¬ V D¼ which have been introduced by G. Frobenius (1908): Über Matrizen aus positiven Elementen, Sitzungsberichte der Königlich Preussischen Akademie der Wissenschaften von Berlin, 471-476, Berlin 1908 and J. Schur (1917): Über Potenzreihen, die im Innern des Einheitskreises beschränkt sind, J. reine und angew. Math 147 (1917) 205-232.
D2
A second vehicle: Transformation of random variables
Previously we analyzed the transformation of the p.d.f. under an injective map of random variables y 6 g ( y ) = x . Here we study the transformation of polar coordinates [I1 , I2 ,… , In 1 , r ] Y as parameters of an Euclidian observation space to Cartesian coordinates [ y1 ,… , yn ] Y . In addition we introduce the hypervolume element of a sphere S n 1 Y, dim Y = n . First, we give three examples. Second, we summarize the general results in Lemma D4. Example D4 (polar coordinates: “2d”): Table D1 collects characteristic elements of the transformation of polar coordinates (I1 , r ) of type “longitude, radius” to Cartesian coordinates ( y1 , y2 ), their domain and range, the planar elements dy1 , dy2 as well as the circle S1 embedded into E 2 := {R 2 , G kl } , equipped with the canonical metric I 2 = [G kl ] and its total measure of arc Z1.
548
Appendix D: Sampling distributions and their use
Table D1 Cartesian and polar coordinates of a two-dimensional observation space, total measure of the arc of the circle (I1 , r ) [0, 2S ] × ]0, f[ , ( y1 , y2 ) R 2 dy1dy2 = rdrdI1 S1 := {y R 2 | y12 + y22 = 1} 2S
Z1 =
³ dI
1
= 2S .
0
Example D5 (polar coordinates: “3d”): Table D2 is a collectors’ item for characteristic elements of the transformation of polar coordinates (I1 , I2 , r ) of type “longitude, latitude, radius” to Cartesian coordinates ( y1 , y2 , y3 ), their domain and range, the volume element dy1 , dy2 , dy3 as well as of the sphere S2 embedded into E3 := {R 3 , G kl } equipped with the canonical metric I 3 = [G kl ] and its total measure of surface Z2. Table D2 Cartesian and polar coordinates of a three-dimensional observation space, total measure of the surface of the circle y1 = r cos I2 cos I1 , y2 = r cos I2 sin I1 , y3 = r sin I2 (I1 , I2 , r ) [0, 2S ] × ]
S S , [ × ]0, r[ , ( y1 , y2 ) R 2 2 2
( y1 , y2 , y3 ), R 3 dy1dy2 dy3 = r 2 dr cos I2 dI1dI2 S 2 := { y R 3 | y12 + y22 + y32 = 1} +S / 2
2S
Z2 =
³ dI ³ 1
0
dI2 cos I2 = 4S .
S / 2
Example D6 (polar coordinates: “4d”): Table D3 is a collection of characteristic elements of the transformation of polar coordinates (I1 , I2 , I3 , r ) to Cartesian coordinates ( y1 , y2 , y3 , y4 ), their domain and range, the hypervolume element dy1 , dy2 , dy3 , dy4 as well as of the 3 - sphere S3 embedded into E 4 := {R 4 , G kl } equipped with the canonical metric I 4 = [G kl ] and its total measure of hypersurface.
D2
A second vehicle: Transformation of random variables
549
Table D3 Cartesian and polar coordinates of a four-dimensional observation space total measure of the hypersurface of the 3-sphere y1 = r cos I3 cos I2 cos I1 , y2 = r cos I3 cos I2 sin I1 , y3 = r cos I3 sin I2 , y4 = r sin I3 (I1 , I2 , I3 , r ) [0, 2S ] × ]
S S S S , [ × ] , [ × ]0, 2S [ 2 2 2 2
dy1dy2 dy3dy4 = r 3 cos2 I3 cos I2 drdI3 dI2 dI1 J y :=
w ( y1 , y2 , y3 , y4 ) = w (I1 , I2 , I3 , r )
ª r cos I3 cos I2 sin I1 r cos I3 sin I2 cos I1 r sin I3 cos I2 cos I1 cos I3 cos I2 cos I1 º « » « + r cos I3 cos I2 cos I1 r cos I3 sin I2 sin I1 r sin I3 cos I2 sin I1 cos I3 cos I2 sin I1 » « 0 + r cos I3 cos I2 r sin I3 sin I2 cos I3 cos I2 » « » 0 0 r cos I3 sin I3 ¬ ¼ | det J y |= r 3 cos 2 I3 cos I2 S3 := { y R 4 | y12 + y22 + y32 + y42 = 1}
Z3 = 2S 2 . Lemma D4 (polar coordinates, hypervolume element, hypersurface element):
Let
ª cos I cos I cos I " cos I cos I º n 1 n2 n3 2 1» « ª y1 º « » « y » « cos In 1 cos In 2 cos In 3 " cos I 2 sin I1 » 2 « » « » « y3 » « cos In 1 cos In 2 cos In 3 " sin I2 » « » « » y « 4 » « cos In 1 cos In 2 " cos I3 » « " » = r« » « » « » " y « n 3 » « » « yn 2 » « cos In 1 cos In 2 sin In 3 » « » « » cos In 1 cos In 2 « yn 1 » « » «« y »» cos I sin I « » n 1 n 2 ¬ n ¼ ««sin I »» n 1 ¬ ¼
550
Appendix D: Sampling distributions and their use
be a transformation of polar coordinates (I1 , I2 ,… , In 2 , In 1 , r ) to Cartesian coordinates ( y1 , y2 ,… , yn 1 , yn ) , their domain and range given by
S S S S S S (I1 , I2 ,…, In2 , In1 , r ) [0, 2S ] × ] , + [ ×"× ] , + [ × ] , + [ × ]0, f[, 2 2 2 2 2 2 then the local hypervolume element dy1 ...dyn = r n 1dr cos n 2 In 1 cos n 3 In 2 ...cos 2 I3 cos I2 dIn 1dIn 2 ...dI3dI2 dI1 , as well as the global hypersurface element
Z n 1 =
+S / 2
+S / 2
2S
2 S ( n 1) / 2 := ³ cos Inn12 dIn 1 " ³ cos I2 dI2 ³ dI1 , n 1 S / 2 S / 2 0 *( ) 2
where J ( X ) is the gamma function. Before we care for the proof, let us define Euler’s gamma function. Definition D5 (Euler’s gamma function): f
*( x) = ³ e t t x 1 dt
( x > 0)
0
is Euler’s gamma function which enjoys the recurrence relation *( x + 1) = x*( x) subject to *(1) = 1
or
1 *( ) = S 2
*(2) = 1!
3 1 1 1 *( ) = *( ) = S 2 2 2 2
…
…
*(n + 1) = n !
p pq pq *( ) = *( ) q q q
if n is integer, n Z
+
if
p is a rational q
number, p / q Q+ . Example D7 (Euler’s gamma function):
D2
A second vehicle: Transformation of random variables
(i)
*(1) = 1
(i)
1 *( ) = S 2
(ii)
*(2) = 1
(ii)
3 1 1 1 *( ) = *( ) = S 2 2 2 2
(iii)
*(3) = 1 2 = 2
(iii)
5 3 3 3 *( ) = *( ) = S 2 2 2 4
(iv)
*(4) = 1 2 3 = 6
(iv)
7 5 5 15 S. *( ) = *( ) = 2 2 2 8
551
Proof: Our proof of Lemma D4 will be based upon computing the image of the tangent space Ty S n 1 E n of the hypersphere S n 1 E n . Let us embed the hypersphere S n 1 parameterized by (I1 , I2 ,… , In 2 , In 1 ) in E n parameterized by ( y1 ,… , yn ) , namely y E n , y = e1r cos In 1 cos In 2 " cos I2 cos I1 + +e 2 r cos In 1 cos In 2 " cos I2 sin I1 + " + +e n 1r cos In 1 sin In 2 + +e n r sin In 1. Note that I1 is a parameter of type longitude, 0 d I1 d 2S , while I2 ,… , In1 are parameters of type latitude, S / 2 < I2 < +S / 2,… , S / 2 < In1 < +S / 2 (open intervals). The images of the tangent vectors which span the local tangent space are given in the orthonormal n- leg {e1 , e 2 ,… , e n 1 , e n | 0} by g1 := DI y = e1r cos In 1 cos In 2 " cos I2 sin I1 + 1
+e 2 r cos In 1 cos In 2 " cos I2 cos I1 g 2 := DI y = e1r cos In 1 cos In 2 "sin I2 cos I1 2
e 2 r cos In 1 cos In 2 " sin I2 sin I1 + +e3 r cos In 1 cos In 2 " cos I2 ... g n 1 := DI y = e1r sin In 1 cos In 2 " cos I2 cos I1 " n 1
e n 1r sin In 1 sin In 2 + +e n r cos In 1
552
Appendix D: Sampling distributions and their use
g n := DI y = e1r cos In 1 cos In 2 " cos I2 cos I1 + n
+ e 2 r cos In 1 cos In 2 " sin I2 sin I1 + " + +e n 1r cos In 1 cos In 2 + e n r sin In 1 = y / r. {g1 ,… , g n 1} span the image of the tangent space in E n . gn is the hypersphere normal vector, || gn|| = 1. From the inner products < gi | gj > = gij, i, j {1,… , n} , we derive the Gauss matrix of the metric G:= [ gij]. < g1 | g1 > = r 2 cos 2 In 1 cos 2 In 2 " cos 2 I3 cos 2 I2 < g 2 | g 2 > = r 2 cos 2 In 1 cos 2 In 2 " cos 2 I3 " < g n 1 | g n 1 > = r 2 , < g n | g n > = 1. The off-diagonal elements of the Gauss matrix of the metric are zero. Accordingly det G n = det G n 1 = r n 1 (cos In 1 ) n 2 (cos In 2 ) n 3 " (cos I3 ) 2 cos I2 . The square root minant
det G n1 elegantly represents the Jacobian deter-
det G n , J y :=
w ( y1 , y2 ,… , yn ) = det G n . w (I1 , I2 ,… , In 1 , r )
Accordingly we have found the local hypervolume element det G n dr dIn 1 dIn 2 " dI3 dI2 dI1 . For the global hypersurface element Z n 1 , we integrate 2S
³ dI
1
+S / 2
³
= 2S
0
cos I2 dI2 = [sin I2 ]+SS // 22 = 2
S / 2 +S / 2
1 cos 2 I3 dI3 = [cos I3 sin I3 I3 ]+SS // 22 = S / 2 2 S / 2
³
+S / 2
1 4 cos3 I4 dI4 = [cos 2 I4 sin I4 2sin I4 ]+SS // 22 = 3 3 S / 2
³
...
D3 A first confidence interval of Gauss-Laplace normally distributed observations +S / 2
³
S / 2
(cos In 1 ) n 2 dIn 1 =
553
+S / 2
1 1 [(cos In 1 ) n 3 ]+SS // 22 + (cos In 1 ) n 4 dIn 1 n2 n 3 S³/ 2
recursively. As soon as we substitute the gamma function, we arrive at Zn-1. h
D3
A first confidence interval of Gauss-Laplace normally distributed observations P ,V 2 known, the Three Sigma Rule
The first confidence interval of Gauss-Laplace normally distributed observations constrained to ( P , V 2 ) known, will be computed as an introductory example. An application is the Three Sigma Rule. In the empirical sciences, estimates of certain quantities derived from observations are often given in the form of the estimate plus or minus a certain amount. For instance, the distance between a benchmark on the Earth’s surface and a satellite orbiting the Earth may be estimated to be (20, 101, 104.132 ± 0.023) m with the idea that the first factor is very unlikely to be outside the range 20, 101, 104.155 m to 20, 101, 104.109 m. A cost accountant for a publishing company in trying to allow for all factors which enter into the cost of producing a certain book, actual production costs, proportion of plant overhead, proportion of executive salaries, may estimate the cost to be 21 ± 1,1 Euro per volume with the implication that the correct cost very probably lies between 19.9 and 22.1 Euro per volume. The Bureau of Labor Statistics may estimate the number of unemployed in a certain area to be 2.4 ± .3 million at a given time though intuitively it should be between 2.1 and 2.7 million. What we are saying is that in practice we are quite accustomed to seeing estimates in the form of intervals. In order to give precision to these ideas we shall consider a particular example. Suppose that a random sample x {R, pdf } is taken from a Gauss-Laplace normal distribution with known mean P and known variance V 2 . We ask the key question. ?What is the probability J of the random interval ( P cV , P + cV ) to cover the mean P as a quantile c of the standard deviation V ? To put this question into a mathematical form we write the probabilistic twosided interval identity.
554
Appendix D: Sampling distributions and their use
P ( x1 < X < x2 ) = P ( P cV < X < P + cV ) = J , x2 = P + cV x1 =
§ 1 · exp ¨ 2 ( x P ) 2 ¸ dx = J © 2V ¹ cV V 2S
³ P
1
with a left boundary l = x1 and a right boundary r = x2 . The length of the interval is x2 x1 = r l . The center of the interval is ( x1 + x2 ) / 2 or P . Here we have taken advantage of the Gauss-Laplace pdf in generating the cumulative probability P( x1 < X < x2 ) = F( x2 ) F( x1 ) F( x2 ) F( x1 ) = F( P + cV ) F( P + cV ). Typical values for the confidence coefficient J are J = 0.95 ( J = 95% or 1 J = 5% negative confidence), J =0.99 ( J = 1% or 1 J = 1% negative confidence) or J = 0.999 ( J = 999% or 1 J = 1% negative confidence). O
O
f(x)
P-3V P-2V P-V
P
P+V
P+2V P+3V
x Figure D1: Probability mass in a two-sided confidence interval x1 < X< x2 or P cV < X< P + cV , three cases: (i) c = 1 , (ii) c = 2 and (iii) c = 3.
D3 A first confidence interval of Gauss-Laplace normally distributed observations
555
Consult Figure D1 for a geometric interpretation. The confidence coefficient J is a measure of the probability mass between x1 = P cV and x2 = P + cV . For a given confidence coefficient J x2
³ f ( x | P ,V
2
)dx = J
x1
establishes an integral equation. To make this point of view to be better understood let us transform the integral equations to its standard form. x6z= x2 = P + cV
1 (x P) x = P + V z V +c
1
1 § 1 · § 1 · exp ¨ 2 ( x P ) 2 ¸ dx = ³ exp ¨ z 2 ¸ dz = J ³ 2S © 2V ¹ © 2 ¹ c x = P cV V 2S 1
x2
³
f ( x | P , V 2 )dx =
+c
³ f ( z | 0,1)dz = J .
c
x1
The special Helmert transformation maps x to z, now being standard GaussLaplace normal: V 1 is the dilatation factor, also called scale variation, but P the translation parameter. The Gauss-Laplace pdf is symmetric, namely f ( x ) = f ( + x ) or f ( z ) = f ( + z ) . Accordingly we can write the integral identity x2
³
x1
x2
f ( x | P , V 2 )dx = 2 ³ f ( x | P , V 2 )dx = J 0
+c
³
c
c
f ( z | 0,1)dz = 2 ³ f ( z | 0,1)dz = J . 0
The classification of integral equations tells us that z
J ( z ) = 2 ³ f ( z )dz 0
is a linear Volterra integral equation the first kind. In case of a Gauss-Laplace standard normal pdf, such an integral equation is solved by a table. In a forward computation z
F( z ) :=
³
f
z
f ( z | 0,1)dz or ) ( z ) :=
1
§ 1
³ 2S exp ¨© 2 z
f
2
· ¸ dz ¹
are tabulated in a regular grid. For a given value F ( z1 ) or F ( z2 ) , z1 or z2 are determined by interpolation. C. F. Gauss did not use such a procedure. He took
556
Appendix D: Sampling distributions and their use
advantage of the Gauss inequality which has been reviewed in this context by F. Pukelsheim (1994). There also the Vysochanskii-Petunin inequality has been discussed. We follow here a two-step procedure. First, we divide the domain z [0, f] into two intervals z [0,1] and z [1, f ] . In the first interval f ( z ) is isotonic, differentiable and convex, f cc( z ) = f ( z )( z 2 1) < 0, while in the second interval isotonic, differentiable and concave, f cc( z ) = f ( z )( z 2 1) > 0 . z = 1 is the point of inflection. Second, we setup Taylor series of f ( z ) in the interval z [0,1] at the point z = 0 , while in the interval z [1, f ] at the point z = 1 and z [2, f] at the point z = 2 . Three examples of such a forward solution of the characteristic linear Volterra integral equation of the first kind will follow. They establish: The One Sigma Rule
The Two Sigma Rule
The Three Sigma Rule Box D1
Operational calculus applied to the Gauss-Laplace probability distribution “generating differential equations” f cc( z ) + 2 f c( z ) + f ( z ) = 0 subject to +f
³
f ( z )dz = 1
f
“recursive differentiation” 1 § 1 · f ( z) = exp ¨ z 2 ¸ 2S © 2 ¹ f c( z ) = zf ( z ) =: g ( z ) f cc( z ) = g c( z ) = f ( z ) zg ( z ) = ( z 2 1) f ( z ) f ccc( z ) = 2 zf ( z ) + ( z 2 1) g ( z ) = ( z 3 + 3 z ) f ( z ) f ( 4) ( z ) = (3 z 2 + 3) f ( z ) + ( z 3 + 3 z ) g ( z ) = ( z 4 6 z 2 + 3) f ( z ) f (5) ( z ) = (4 z 3 12 z ) f ( z ) + ( z 4 6 z 2 + 3) g ( z ) = ( z 5 + 10 z 3 15 z ) f ( z ) f (6) ( z ) = (5 z 4 + 30 z 2 15) f ( z ) + ( z 5 + 10 z 3 15 z ) g ( z ) = = ( z 6 15 z 4 + 45 z 2 15) f ( z )
D3 A first confidence interval of Gauss-Laplace normally distributed observations
557
f (7) ( z ) = (6 z 5 60 z 3 + 90 z ) f ( z ) + ( z 6 15 z 4 + 45 z 2 15) g ( z ) = = ( z 7 + 21z 5 105 z 3 + 105 z ) f ( z ) f (8) ( z ) = (7 z 6 + 105 z 4 315 z 2 + 105) f ( z ) + ( z 7 + 21z 5 105 z 3 + 105 z ) f ( z ) = = ( z 8 28 z 6 + 210 z 4 420 z 2 + 105) f ( z ) f (9) ( z ) = (8 z 7 168 z 5 + 840 z 3 840 z ) f ( z ) + +( z 8 28 z 6 + 210 z 4 420 z 2 + 105) g ( z ) = = ( z 9 + 36 z 7 378 z 5 + 1260 z 3 945z ) f ( z ) f (10) ( z ) = (9 z 8 + 252 z 6 1890 z 4 + 3780 z 2 945) f ( z ) + + ( z 9 + 36 z 7 378 z 5 + 1260 z 3 945) g ( z ) = = ( z10 45 z 8 + 630 z 6 3150 z 4 + 4725 z 2 945) f ( z ) ”upper triangle representation of the matrix transforming f ( z ) o f n ( z ) ” ª f ( z) º ª 1 « » « 0 « f c( z ) » « « f cc( z ) » « 1 « » « ccc « f ( z) » « 0 « (4) » « 3 « f ( z) » « f (5) ( z ) » = f ( z ) « 0 « « » « 15 « f (6) ( z ) » « « (7) » « 0 « f ( z) » « « (8) » « 105 « f ( z) » « 0 « f (9) ( z ) » « « (10) » ¬ 945 f z ( ) ¬« ¼»
D31
0 1 0 3 0
0 0 1 0 6
0 0 0 1 0
0 0 0 0 1
0 0 0 0 0
15 0
0 45
10 0
0 15
1 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 1 0 0 105 0 105 0 21 0 1 0 0 420 0 210 0 28 0 1 945 0 1260 0 378 0 36 0 0 4725 0 3150 0 630 0 45
ª1 º 0º « » z 0 »» « » « 2» 0» « z » » 3 0» « z » « 4» 0» « z » » 0 0» « z 5 » . « » 0 0» « z 6 » »« » 0 0» « z 7 » » 0 0» « z8 » « » 1 0 » « z 9 » » 0 1 ¼ « 10 » ¬« z ¼» 0 0 0 0 0
The forward computation of a first confidence interval of GaussLaplace normally distributed observations: P , V 2 known
We can avoid solving the linear Volterra integral equation of the first kind if we push forward the integration for a fixed value z. Example D8 (Series expansion of the Gauss-Laplace integral, 1st interval): Let us solve the integral 1
J ( z = 1) := 2³ f ( z )dz 0
558
Appendix D: Sampling distributions and their use
in the first interval 0 d z d 1 by Taylor expansion with respect to the successive differentiation of f ( z ) outlined in Box D1 and the specific derivatives f n (0) given in Table D1. Based on those auxiliary results, Box D2 presents us the detailed interpretation. First, we expand exp(z 2 / 2) up to order O(14). The specific Taylor series are uniformly convergent. Accordingly, in order to compute the integral, second we integrate termwise up to order O(15). For the specific value z=1, we have computed the coefficient of confidence J (1) = 0.683 . The result P( P V < X < P + V ) = 0.683 is known as the One Sigma Rule. 68.3 per cent of the sample are in the interval ]P 1V , P + 1V [ , 0.317 per cent outside. If we make 3 experiments, one experiment is outside the 1V interval. Box D2 A specific integral “expansion of the exponential function” x x 2 x3 xn + + + " + + O (n) |x|
z
³ 0
“the specific integral” 1 1 1 1 5 1 7 exp z dz = ( z z3 + f ( z )dz = z z + ³ 6 40 336 2S 0 2S 1 9 1 1 + z z11 + z13 + O (15)) 3456 42240 599040
(
z
2
)
“the specific values z=1” 1
J (1) = 2 ³ f ( z )dz = 0
2
1 1 1 1 1 1 (1 + + + + O (15)) 6 40 336 3456 42240 599040 2S =
2 2S
(1 0.166, 667 + 0.025, 000)
0.002,976 + 0.000, 289 0, 000, 024 + 0.000, 002) =
D3 A first confidence interval of Gauss-Laplace normally distributed observations
=
2 2S
559
0.855, 624 = 0.682, 689 = = 0.683
“coefficient of confidence” 1 0.683 = 1 317.311 103 = 1 . 3 Table D1 Special values of derivatives Gauss-Laplace probability distribution z=0 1 = 0.398, 942 2S f (0) =
1 , f c(0) = 0, 2S
f cc(0) =
1 1 , f cc(0) = 0.199, 471 2S 2!
f ccc(0) = 0, f ( 4 ) (0) = +
3 1 (4) , f (0) = +0.049,868 2S 4!
f (5) (0) = 0, f (6) (0) =
15 2S
,
1 (6) f (0) = 0.008,311. 6!
Example D9 (Series expansion of the Gauss-Laplace integral, 2nd interval): Let us solve the integrals 2
1
2
J ( z = 2) := 2 ³ f ( z )dz = 2 ³ f ( z )dz + 2 ³ f ( z )dz 0
0
1
2
J ( z = 2) = J ( z = 1) + 2 ³ f ( z )dz , 1
namely in the 2nd interval 1 d z d 2 . First, we setup Taylor series of f ( z ) “around the point z=1”. The derivatives of f ( z ) “at the point z=1” are collected up to order 10 in Table D2. Second, we integrate the Taylor series termwise and receive the specific integral of Box D3. Note that termwise integrated is permitted since the Taylor series are uniformly convergent. The detailed computation
560
Appendix D: Sampling distributions and their use
up to order O(12) has led us to the coefficient of confidence J (2) = 0.954 . The result P( P 2V < X < P + 2V ) = 0.954 is known as the Two Sigma Rule. 95.4 per cent of the sample interval ]P 2V , P + 2V [ , 0.046 per cent outside. If we make 22 experiments, one experiment is outside the 2V interval. Box D3 A specific integral “expansion of the exponential function” f ( z ) := f ( z ) = f (1) +
1 § 1 · exp ¨ z 2 ¸ 2S © 2 ¹
1 1 1 10 f c(1)( z 1) + f cc(1)( z 1) 2 + " + f (1)( z 1)10 + O(11) 1! 2! 10!
“the specific integral” 2
³ f ( z )dz
= f (1)( z 1) +
1
11 11 f c(1)( z 1) 2 + f cc(1)( z 1) 3 + 2 1! 3 2!
11 1 1 (4) f ccc(1)( z 1) 4 + f (1)( z 1)5 + 4 3! 5 4! 1 1 (5) 1 1 (10) + f (1)( z 1)6 + " + f (1)( z 1)11 + O(12) 6 5! 11 10! +
case z=2 2
J (2) = J (1) + 2 ³ f ( z )dz =J (1) + 2(0.241, 971 1
0.120, 985 + 0.020,122 0.004, 024 0.002, 016 + 0.000, 768 + 0.000,120 0.000, 088 0.000, 002 0.000, 050 + O(12)) = = 0.682, 690 + 0.271, 632 = = 0.954 “coefficient of confidence” 0.954 = 1 45.678 103 = 1
1 . 22
D3 A first confidence interval of Gauss-Laplace normally distributed observations
Table D2 Special values of derivatives Gauss-Laplace probability distribution z=1 1
§ 1· = 0.398,942, exp ¨ ¸ = 0.606,531 2S © 2¹ 1
§ 1· exp ¨ ¸ = 0.241,971 2S © 2¹
f (1) = f c(1) = f (1) =
1
§ 1· exp ¨ ¸ = 0.241,971, f cc(1) = 0 2S © 2¹
f ccc(1) = 2 f (1) =
2
§ 1· exp ¨ ¸ = 0.482,942 2S © 2¹
1 f ccc(1) = +0.080, 490 3! f ( 4 ) (1) = 2 f (1) = 0.482, 942 1 (4) f (1) = 0.020,122 4! f (5) (1) = 6 f (1) =
6
§ 1· exp ¨ ¸ = 1.451,826 2S © 2¹
1 (5) f (1) = 0.012, 098 5! f (6) (1) = 16 f (1) =
16
§ 1· exp ¨ ¸ = 3.871,536 2S © 2¹
1 (6) f (1) = 0.005, 377 6! f (7) (1) = 20 f (1) =
20
§ 1· exp ¨ ¸ = 4.829, 420 2S © 2¹
1 (7) f (1) = 0.000, 958 7! f (8) (1) = 132 f (1) = 31.940,172
561
562
Appendix D: Sampling distributions and their use
1 (8) f (1) = 0.000, 792 8! f (9) (1) = 28 f (1) = 6.775,188 1 (9) f (1) = 0.000, 019 9! f (10) (1) = 8234 f (1) = 1992.389 1 (10) f (1) = 0.000,549. 10! Example D10 (Series expansion of the Gauss-Laplace integral, 3rd interval): Let us solve the integrals 3
1
2
3
J ( z = 3) := 2 ³ f ( z )dz = 2 ³ f ( z )dz + 2 ³ f ( z ) dz + 2 ³ f ( z ) dz 0
0
1
2
3
J ( z = 3) = J ( z = 1) + J ( z = 2) + 2 ³ f ( z )dz, 2
namely in the 3rd interval 2 d z d 3 . First, we setup Taylor series of f(z) “around the point z=2”. The derivatives of f(z) “at the point z=2” are collected up to order 10 in Table D3. Second, we integrate the Taylor series termwise and receive the specific integral of Box D4. Note that termwise integration is permitted since the Taylor series are uniformly convergent. The detailed computation up to order O(12) leads us to the coefficient of confidence J (3) = 0.997 . The result P ( P 3V < X < P + 3V ) = 0.997 is known as the Three Sigma Rule. 99.7 per cent of the sample are in the interval ]P 3V , P + 3V [ , 0.003 per cent outside. If we make 377 experiments, one experiment is outside the 3V interval. Table D3 Special values of derivatives Gauss-Laplace probability distribution z=2 1 2S
= 0.398,942, exp ( 2 ) = 0.135,335
D3 A first confidence interval of Gauss-Laplace normally distributed observations
1
f (2) =
2S
563
exp ( 2 ) = 0.053,991
f c(2) = 2 f (2) = 0.107, 982 1 f cc(2) = +0.080, 986 2!
f cc(2) = 3 f (2),
1 f ccc(2) = 0.017, 997 3!
f ccc(2) = 2 f (2),
1 (4) f (2) = 0.011, 248 4!
f (4) (2) = 5 f (2),
1 (5) f (2) = +0.008, 099 5!
f (5) (2) = 18 f (2), f (6) (2) = 11 f (2),
1 (6) f (2) = 0.000,825 6!
f (7) (2) = 86 f (2),
1 (7) f (2) = 0.000, 921 7!
f (8) (2) = +249 f (2),
1 (8) f (2) = +0.000, 333 8! 1 (9) f (2) = +0.000, 028 9!
f (9) (2) = 190 f (2),
f (10) (2) = 2621 f (2),
1 (10) f (2) = 0.000, 039 . 10!
Box D4 A specific integral “expansion of the exponential function” f ( z ) := f ( z ) = f (2) +
1
§ 1 · exp ¨ z 2 ¸ 2S © 2 ¹
1 1 1 (10) f c(2)( z 2) + f cc(2)( z 2) 2 + " + f (2)( z 2)10 + O (11) 1! 3! 10! “the specific integral”
z
³ f (z 2
)dz = f (2)( z 2) +
11 11 f c(2)( z 2) 2 + f cc(2)( z 2)3 + 2 1! 3 2!
564
Appendix D: Sampling distributions and their use
+ +
11 1 1 ( 4) f ccc(2)( z 2) 4 + f (2)( z 2)5 + 4 3! 5 4!
1 1 (5) 1 1 (10) f (2)( z 2)6 + " + f (2)( z 2)11 + O (12) 6 5! 1110! case z=3 3
J (3) = J (1) + J (2) + 2 ³ f ( z ) dz = 2
= 0.682, 690 + 0.271, 672 + 2(0.053,991 0.053,991 + 0.026,995 0.004, 499 0.002, 250 + 0.001,350 0.000,118 + 0.000, 037 + 0.000, 003 0.000, 004 + +O(12)) = 0.682, 690 + 0.271, 632 + 0.043, 028 = 0.997 “coefficient of confidence” 0.997 = 1 2.65 10 3 = 1
D 32
1 . 377
The backward computation of a first confidence interval of GaussLaplace normally distributed observations: P , V 2 known
Finally we solve the Volterra integral equation of the first kind by the technique of series inversion, also called series reversion. Let us recognize that the interval integration of a Taylor series expanded Gauss-Laplace normal density distribution led us to a univariate homogeneous polynomial of arbitrary order. Such a univariate homogeneous polynomial y = a1 x + a2 x 2 + " + an x n (“input”) can be reversed as a univariate homogeneous polynomial x = b1 y + b2 y 2 + " + bn y n (“output”) as outlined in Table D4. Consult M. Abramowitz and I. A. Stegun (1965 p. 16) for a review, but E. Grafarend, T. Krarup and R. Syffus (1996) for a derivation based upon Computer Algebra. Table D4 Series inversion E. Grafarend, T. Krarup, R. Syffus: Journal of Geodesy 70 (1996) 276-286 “input: univariate homogeneous polynomial” y = a1 x + a2 x 2 + a3 x 3 + a4 x 4 + a5 x 5 + a6 x 6 + a7 x 7 + O ( x 8 ) “output: reverse univariate homogeneous polynomial” x = b1 y + b2 y 2 + b3 y 3 + b4 y 4 + b5 y 5 + b6 y 6 + b7 y 7 + O ( y 8 ) “coefficient relation”
D3 A first confidence interval of Gauss-Laplace normally distributed observations
(i)
a1b1 = 1
(ii)
a13b2 = a2
(iii)
a15b3 = 2a22 a1 a3
(iv)
a17 b4 = 5a1a2 a3 a12 a4 5a23
(v)
a19 b5 = 6a12 a2 a4 + 3a12 a32 + 14a24 a13 a5 21a1a22 a3
565
(vi) a111b6 = 7a13 a2 a5 + 7a13 a2 a4 + 84a1a23 a3 a14 a6 28a1a2 a32 42a25 28a12 a22 a4 (vii) a113b7 = 6a14 a2 a6 + 8a14 a2 a5 + 4a14 a42 + 120a12 a23 a4 + 180a12 a22 a32 + 132a26 a15 a7 36a13 a22 a5 72a13 a2 a3 a4 12a13 a33 330a1 a24 a3 . Example D 11 (Solving the Volterra integral equation of the first kind): Let us define the coefficient of confidence J = 0.90 or ninety per cent. We want to know the quantile c0.90 which determines the probability identity P ( P cV < X < P + cV ) = 0.90 . If you follow the detailed computation of Table D5, namely the input as well as the output data up to order O(5), you find the quantile c0.90 = 1.64 , as well as the confidence interval P ( P 1.64V < X < P + 1.64V ) = 0.90 . 90 per cent of the sample are in the interval ]P 1.64V , P + 1.64V [ , 10 per cent outside. If we make 10 experiments one experiment is outside the 1.62 V interval. Table D5 Series inversion quantile c0.90 (i) 1
input
z
J ( z ) = 2 ³ f ( z )dz + 2 ³ f ( z )dz = 0.682, 689 + 2[ f (1)( z 1) + 0
0
+" +
1 1 f ( n 1) (1)( z 1) n + O(n + 1)] n (n 1)!
11 f c(1)( z 1) 2 + 2 1!
566
Appendix D: Sampling distributions and their use
1 11 [J ( z ) 0.682, 689] = f (1)( z 1) + f c(1)( z 1) 2 + " + 2 2 1! 1 1 + f ( n 1) (1)( z 1)n + O ( n + 1)] n ( n 1)! y = a1 x + a2 x 2 + " + an x n + O ( n + 1) x := z 1 y := (0.900, 000 0.682, 689) / 2 = 0.108, 656 a1 := f (1) = 0.241, 971 a2 :=
11 f c(1) = 0.241, 971 2 1!
a3 :=
11 f cc(1) = 0 3 2!
a4 :=
11 f ccc(1) = 0.020,125 4 3!
a5 :=
1 1 (4) f (1) = 0.004, 024 5 4!
… an :=
1 1 f ( n 1) (1) . n (n 1)! (ii) output
b1 =
1 = 4.132, 726 a1
b2 =
b5 =
1 a2 = 8.539, 715 a13
b3 =
1 (2a22 a1a3 ) = 35.292,308 a15
b4 =
1 (5a1a2 a3 a12 a4 5a23 ) = 158.070 a17
1 (6a12 a2 a4 + 3a12 a32 + 14a24 a13 a5 21a1a22 a3 ) = 475.452,152 9 a1 b1 y = 0.449, 045, b2 y 2 = 0.100,821 ,
D4 A second confidence interval for the mean, variance known
567
b3 y 3 = 0.045, 273, b4 y 4 = 0.022, 032 , b5 y 5 = 0.007, 201 x = b1 y + b2 y 2 + b3 y 3 + b4 y 4 + b5 y 5 + O (6) = 0.624, 372 z = x + 1 = 1.624,372 = c0.90 . At this end we would like to give some sample references on computing the “inverse error function” x 1 1 y = F ( x ) := ³ exp( z 2 )dz =: erf x 2 2S 0 versus x = F 1 ( y ) = inv erf y , namely L. Carlitz (1963) and A. J. Strecok (1968).
D4
Sampling from the Gauss-Laplace normal distribution: a second confidence interval for the mean, variance known
The second confidence interval of Gauss-Laplace i.i.d. observations will be constructed for the mean Pˆ BLUUE of P , when the variance V 2 is known. “n” is the size of the sample, namely to agree to the number of observations. Before we present the general sampling distribution we shall work through two examples. Example D12 has been chosen for a sample size n = 2, while Example D12 for n=3 observations. Afterwards the general result is obvious and sufficiently motivated.
f 2 ( x ) = f 2 (Vˆ / V ) 2
2
Figure D2: Special Helmert pdf F
2 p
x := Vˆ 2 / V 2 for one degree of freedom p = 1
568
Appendix D: Sampling distributions and their use
f1 ( z 0,1)
P z = ( Pˆ P ) /(V 2 / 2) Figure D3: Special Gauss Laplace normal pdf of ( Pˆ P ) / V 2 / 2 Example D12
(Gauss-Laplace i.i.d. observations, observation space Y , dim Y = 2 )
In order to derive the marginal distributions of Pˆ BLUUE of P and Vˆ 2 BIQUUE of V 2 for a two dimensional Euclidean observation space we have to introduce various images. First, we define the probability function of two GaussLaplace i.i.d. observations and consequently implement the basic ( Pˆ , Vˆ 2 ) decomposition into the pdf. Second, we separate the quadratic form ( y 1P )c( y 1P ) into the quadratic form ( y 1Pˆ )c( y 1Pˆ ) , the vehicle to introduce Vˆ 2 BIQUUE of V 2 , and into the quadratic form ( Pˆ P ) 2 the vehicle to bring in Pˆ BLUUE of P . Third, we aim at transforming the quadratic form ( y 1Pˆ )c( y 1Pˆ ) = y c My, rk M = 1 , into the special form 1 2 ( y1 y2 ) 2 =: x . Fourth, we generate the marginal distributions f1 ( Pˆ P , V 2 / 2) of the mean Pˆ BLUUE of P and f 2 (V 2 ) of the sample variance Vˆ 2 BIQUUE of V 2 . The basic results of the example are collected in Corollary D6. The special Helmert pdf F 2 with one degree of freedom is plotted in Figure D2, but the special Gauss-Laplace normal pdf of variance V 2 / 2 in Figure D3. The first action item Let us assume an experiment of two Gauss-Laplace i.i.d. observations. Their pdf is given by f ( y1 , y2 ) = f ( y1 ) f ( y2 ) ,
D4 A second confidence interval for the mean, variance known
f ( y1 , y2 ) =
569
1 § 1 · exp ¨ 2 [( y1 P ) 2 + ( y2 P ) 2 ] ¸ , 2 2SV © 2V ¹
f ( y1 , y2 ) =
1 § 1 · exp ¨ 2 (y 1P )c(y 1P ) ¸ . V 2SV 2 2 © ¹ The second action item
The coordinates of the observation vector have been denoted by [ y1 , y2 ]c = = y Y, dim Y = 2 . The quadratic form ( y 1P )c( y 1P ) allows the fundamental decomposition y 1P = ( y 1Pˆ ) + 1( Pˆ P ) ( y 1P )c( y 1P ) = ( y 1Pˆ )c( y 1Pˆ ) + 1c1( Pˆ P ) 2 ( y 1P )c( y 1P ) = Vˆ 2 + 2( Pˆ P ) 2 . Here, Pˆ is BLUUE of P and Vˆ 2 BIQUUE of V 2 . The detailed computation proves our statement. [( y1 Pˆ ) + (Pˆ P )]2 + [( y2 Pˆ ) + (Pˆ P)]2 = = ( y1 Pˆ )2 + ( y2 Pˆ )2 + 2(Pˆ P )2 + 2(Pˆ P )[( y1 Pˆ ) + ( y2 Pˆ )] = = ( y1 Pˆ )2 + ( y2 Pˆ )2 + 2(Pˆ P )2 + 2Pˆ ( y1 Pˆ ) + 2Pˆ ( y2 Pˆ ) 2( y1 Pˆ )P 2( y2 Pˆ )P 1 Pˆ = ( y1 + y2 ) 2
Vˆ 2 = ( y1 Pˆ ) 2 + ( y2 Pˆ ) 2 . As soon as we substitute Pˆ and Vˆ 2 we arrive at ( y1 P ) 2 + ( y2 P ) 2 = [( y1 Pˆ ) + ( Pˆ P )]2 + [( y2 Pˆ ) + ( Pˆ P )]2 = = Vˆ 2 + 2( Pˆ P ) 2 , since the residual terms vanish: 2 Pˆ ( y1 Pˆ ) + 2 Pˆ ( y2 Pˆ ) 2( y1 Pˆ ) P 2( y2 Pˆ ) P = = 2 Pˆ y1 2 Pˆ 2 + 2Pˆ y2 2Pˆ 2 2P y1 + 2PPˆ 2 P y2 + 2PPˆ = = 2 Pˆ ( y1 + y2 ) 4 Pˆ 2 2 P ( y1 + y2 ) + 4 PPˆ = = 4 Pˆ 2 4 Pˆ 2 4PPˆ + 4 PPˆ = 0. The third action item The cumulative pdf
570
Appendix D: Sampling distributions and their use
dF = f ( y1 , y2 )dy1 dy2 = f1 ( Pˆ ) f 2 ( x)d Pˆ dx = f1 ( Pˆ ) f 2 (
Vˆ 2 Vˆ 2 ) d Pˆ d 2 2 V V
has to be decomposed into the first pdf f1 ( Pˆ P , V 2 / n ) representing the pdf of the sample mean Pˆ and the second pdf f 2 ( x ) of the new variable x := ( y1 y2 ) 2 /(2V 2 ) = Vˆ 2 / V 2 representing the sample variance Vˆ 2 , normalized by V 2 . ? How can the second decomposition f1f2 be understood? Let us replace the quadratic form ( y 1P )c( y 1P ) = Vˆ 2 + 2( Pˆ P ) 2 in the cumulative pdf 1 § 1 · exp ¨ 2 (y 1P )c(y 1P ) ¸ dy1 dy2 = 2 2SV © 2V ¹ 1 § 1 · = exp ¨ 2 [Vˆ 2 + 2( Pˆ P ) 2 ] ¸ dy1 dy2 2 2SV © 2V ¹
dF = f ( y1 , y2 )dy1 dy2 =
§ 1 (Pˆ P)2 · 1 § 1 Vˆ 2 · 1 1 exp ¨
exp ¨ dy1dy2 . ¸ 2 2 ¸ 2S V 2V © 2 V / 2 ¹ 2S © 2V ¹ 2 2 The quadratic form Vˆ , conventionally given in terms of the residual vector y 1Pˆ will be rewritten in terms of the coordinates [ y1 , y2 ]c = y of the observation vector. dF = f ( y1 , y2 )dy1dy2 =
1
1 1 Vˆ 2 = ( y1 Pˆ ) 2 + ( y2 Pˆ ) 2 = [ y1 ( y1 + y2 )]2 + [ y2 ( y1 + y2 )]2 2 2 1 1 1 Vˆ 2 = ( y1 y2 ) 2 + ( y2 y1 ) 2 = ( y1 y2 ) 2 . 4 4 2 The fourth action item 1 ( y1 y2 ) 2 will be introduced in the cumulative pdf 2V 2 dF = f ( y1 , y2 )dy1dy2 . The new surface element d Pˆ dx will be related to the old surface element dy1dy2 . The new variable x :=
ª D y Pˆ d Pˆ dx = | det « ¬ Dy x 1
1
D y Pˆ := 1
wPˆ 1 = , wy1 2
D y Pˆ º | dy dy = J dy1dy2 D y x »¼ 1 2 2
2
D y Pˆ := 2
1 wPˆ = wy2 2
D4 A second confidence interval for the mean, variance known
D y x := 1
wx y1 y2 = , V2 wy1
D y x := 2
571
wx y y = 1 2 2 . V wy2
The absolute value of the Jacobi determinant amounts to ª 1 « 2 | J |=| det « « y1 y2 «¬ V 2
1 º » y y V2 2 . » | = 1 2 2 , | J |1 = y1 y2 » y1 y2 V V 2 »¼
In consequence, we have derived dy1 dy2 =
V2 V d Pˆ dx = d Pˆ dx y1 y2 2 x
based upon 1 1 x= ( y1 y2 ) 2 2 x = ( y1 y2 ) . 2 V 2V In collecting all detailed partial results we can formulate a corollary. Corollary D6: (marginal probability distributions of Pˆ, V 2 given, and Vˆ 2 ): The cumulative pdf of a set of two observations is represented by dF = f ( y1 , y2 )dy1 dy2 = f1 ( Pˆ P , V 2 / 2) f 2 ( x)d Pˆ dx subject to § 1 ( Pˆ P ) 2 · 1 1 f1 ( Pˆ P , V 2 / 2) :=
exp ¨ ¸ 2 2S V © 2 V /2 ¹ 2 2 Vˆ 1 1 § 1 · f 2 ( x) = f 2 ( 2 ) := exp ¨ x ¸ V 2S x © 2 ¹ subject to +f
³
f1 ( Pˆ ) d Pˆ = 1
f
+f
and
³
f 2 ( x) dx = 1.
0
f1 ( Pˆ ) is the pdf of the sample mean Pˆ = ( y1 + y2 ) / 2 and f 2 ( x) the pdf of the sample variance Vˆ 2 = ( y1 y2 ) 2 / 2 = V 2 x . f1 ( Pˆ ) is a Gauss-Laplace pdf with mean P and variance V 2 / n , while f 2 ( x) is a Helmert F 2 with one degree of freedom. Example D13
(Gauss-Laplace i.i.d. observations, observation space Y , dim Y = 3 )
572
Appendix D: Sampling distributions and their use
In order to derive the marginal distribution of Pˆ BLUUE of P and Vˆ 2 BIQUUE of V 2 for a three-dimensional Euclidean observation space, we have to act in various scenes. First, we introduce the probability function of three GaussLaplace i.i.d. observations and consequently implement the ( Pˆ , Vˆ 2 ) decomposition in the pdf. Second, we force the quadratic form ( y 1Pˆ )c( y 1Pˆ ) to be decomposed into Vˆ 2 and ( Pˆ P ) 2 , actually a way to introduce the sample variance Vˆ 2 BIQUUE of Vˆ 2 and the sample mean Pˆ BLUUE of P . Third, we succeed to transform the quadratic form ( y 1Pˆ )c( y 1Pˆ ) = yMy, rkM = 2 into the canonical form z12 + z22 by means of [ z1 , z2 ]c = H[ y1 , y2 , y3 ]c . Fourth, we produce the right inverse H k = Hc in order to invert H, namely [ y1 , y2 , y3 ]c = = H '[ z1 , z2 ]c . Fifth, in order to transform the original quadratic form V 2 ( y 1Pˆ )c( y 1Pˆ ) into the canonical form z12 + z22 + z32 we review the general Helmert transformation z = V 1H( y P ) and its inverse y P = V Hcz subject to H SO(3) and identify its parameters translation, rotation and dilatation (scale). Sixth, we summarize the marginal probability distributions f1 ( Pˆ P , V 2 / 3) of the sample mean Pˆ BLUUE of P and f 2 (2Vˆ 2 / V 2 ) of the sample variance Vˆ 2 BIQUUE of V 2 . The special Helmert pdf F 2 with two degrees of freedom is plotted in Figure D4 while the special Gauss-Laplace normal pdf of variance V 2 / 3 in Figure D5. The basic results of the example are collected in Corollary D7.
f 2 (2Vˆ 2 | V 2 )
x := 2V 2 | V 2 Figure D4: Special Helmert pdf F p2 for two degrees of freedom, p = 2
D4 A second confidence interval for the mean, variance known
f1 ( Pˆ | P ,
573
V2 ) 3
( Pˆ P ) /
V2 3
Figure D5: Special Gauss-Laplace normal pdf of ( Pˆ P ) /
V2 3
The first action item Let us assume an experiment of three Gauss-Laplace i.i.d. observations. Their pdf is given by f ( y1 , y2 , y3 ) = f ( y1 ) f ( y2 ) f ( y3 ) , § 1 · f ( y1 , y2 , y3 ) = (2S ) 3 / 2 V 3 exp ¨ 2 [( y1 P ) 2 + ( y2 P ) 2 + ( y3 P ) 2 ] ¸ , © 2V ¹ 1 § · f ( y1 , y2 , y3 ) = (2S ) 3 / 2 V 3 exp ¨ 2 (y 1P )c( y 1P ) ¸ . © 2V ¹ The coordinates of the observation vector have been denoted by [ y1 , y2 , y3 ]c = y Y , dim Y = 2 . The second action item The quadratic form ( y1 1P )c( y2 1P ) allows the fundamental decomposition y 1P = ( y 1Pˆ ) + 1( Pˆ P ) , ( y 1P )c( y 1P ) = ( y 1Pˆ )c( y 1Pˆ ) + 1c1( Pˆ P ) 2 , ( y 1P )c( y 1P ) = 2Vˆ 2 + 3( Pˆ P ) 2 .
574
Appendix D: Sampling distributions and their use
Here, Pˆ is BLUUE of P and Vˆ 2 BIQUUE of V 2 . The detailed computation proves our statement. [( y1 Pˆ ) + ( Pˆ P )]2 + [( y2 Pˆ ) + ( Pˆ P )]2 + [( y3 Pˆ ) + ( Pˆ P )]2 = = ( y1 Pˆ ) 2 + ( y2 Pˆ ) 2 + ( y3 Pˆ ) 2 + 3( Pˆ P ) 2 + 2 Pˆ ( y1 Pˆ ) + 2 Pˆ ( y2 Pˆ ) + +2 Pˆ ( y3 Pˆ ) 2( y1 Pˆ ) P 2( y2 Pˆ ) P 2( y3 Pˆ ) P 1 Pˆ BLUUE of P : Pˆ = ( y1 + y2 + y3 ) 3 1 Vˆ 2 BIQUUE of V 2 : Vˆ 2 = [( y1 Pˆ ) 2 + ( y2 Pˆ ) 2 + ( y3 Pˆ ) 2 ]. 2 As soon as we substitute Pˆ and Vˆ 2 we arrive at ( y1 P ) 2 + ( y2 P ) 2 + ( y3 P ) 2 = [( y1 Pˆ ) 2 + ( Pˆ P )]2 + +[( y2 Pˆ ) 2 + ( Pˆ P ) 2 ]2 + [( y3 Pˆ ) 2 + ( Pˆ P ) 2 ]2 = = 2Vˆ 2 + 3( Pˆ P ) 2 . The third action item Let us begin with transforming the cumulative probability function dF = f ( y1 , y2 , y3 )dy1 dy2 dy3 = § 1 · = (2S ) 3 / 2 V 3 exp ¨ 2 (y 1P )c(y 1P ) ¸ dy1 dy2 dy3 = 2 V © ¹ 1 § · = (2S ) 3 / 2 V 3 exp ¨ 2 [2Vˆ 2 + 3( Pˆ P ) 2 ] ¸ dy1 dy2 dy3 © 2V ¹ into its canonical form. In detail, we transform the quadratic form Vˆ 2 BIQUUE of V 2 canonically. 1 1 2 1 2 1 1 y1 Pˆ = y1 y1 ( y2 + y3 ) = y1 ( y2 + y3 ) = y1 y2 y3 3 3 3 3 3 3 3 y2 Pˆ = y2
1 1 1 2 1 y2 ( y1 + y3 ) = y1 + y2 y3 3 3 3 3 3
1 1 1 1 2 y3 Pˆ = y3 y3 ( y1 + y2 ) = y1 y2 + y3 3 3 3 3 3 as well as ( y1 Pˆ ) 2 =
4 2 1 2 1 2 4 2 4 y1 + y2 + y3 y1 y2 + y2 y3 y3 y1 9 9 9 9 9 9
D4 A second confidence interval for the mean, variance known
( y2 Pˆ ) 2 =
1 2 4 2 1 2 4 4 2 y1 + y2 + y3 y1 y2 y2 y3 + y3 y1 9 9 9 9 9 9
( y3 Pˆ ) 2 =
1 2 1 2 4 2 2 4 4 y1 + y2 + y3 + y1 y2 y2 y3 y3 y1 9 9 9 9 9 9
575
and ( y1 Pˆ ) 2 + ( y2 Pˆ ) 2 + ( y3 Pˆ ) 2 =
2 2 ( y1 + y22 + y32 y1 y2 y2 y3 y3 y1 ). 3
We shall prove ( y 1Pˆ )c( y 1Pˆ ) = y cMy = z12 + z22 , rkM = 2, M \ 3×3 that the symmetric matrix M, ª 2 1 1º M = « 1 2 1» « » «¬ 1 1 2 »¼ has rank 2 or rank deficiency 1. Just compute M = 0 as a determinant identity as well as the subdeterminant 2 1 1
2
=3z0
ª 2 1 1º ª y1 º 1 c c ˆ ˆ ( y 1P ) ( y 1P ) = y My = [ y1 , y2 , y3 ] « 1 2 1» « y2 » . « »« » 3 «¬ 1 1 2 »¼ «¬ y3 »¼ ? How to transform a degenerate quadratic form to a canonical form? F.R. Helmert (1975, 1976 a, b) had the bright idea to implement what we call nowadays the forward Helmert transformation z1 = z2 =
1 ( y1 y2 ) 1 2
1 ( y1 + y2 2 y3 ) 23 or
z12 =
1 2 ( y1 2 y1 y2 + y22 ) 2
576
Appendix D: Sampling distributions and their use
1 z22 = ( y12 + y22 + 4 y32 + 2 y1 y2 4 y2 y3 4 y3 y1 ) 6 z12 + z22 =
2 2 ( y1 + y22 + y32 y1 y2 y2 y3 y3 y1 ). 3
Indeed we found ( y 1Pˆ )c( y 1Pˆ ) = ( y1 Pˆ ) 2 + ( y2 Pˆ ) 2 + ( y3 Pˆ ) 2 = z12 + z22 . In algebraic terms, a representation of the rectangular Helmert matrix is ª ªz º « z = « 1» = « ¬ z2 ¼ « « ¬
1 1 2 1 23
ª « z = H 23 y , H 23 := « « « ¬
1
º » ª y1 º » « y2 » 2 »« » » «¬ y3 »¼ 23 ¼ 0
1 2 1 23
1 1 2 1
23
1 1 2 1 23
º » ». 2 » » 23 ¼ 0
The rectangular Helmert matrix is right orthogonal,
ª « H 23 H c23 = « « « ¬
1 1 2 1 23
1 1 2 1 23
ª « º 0 »« « » 2 » «« » 23 ¼ « « ¬
1 1 2 1 1 2 0
º » 23 » 1 » ª1 0 º »=« » = ǿ2. 2 3 » ¬0 1 ¼ 2 » » 23 ¼ 1
The fourth action item By means of the forward Helmert transformation we could prove that z12 + z22 represents the quadratic form ( y 1Pˆ )c( y 1Pˆ ) . Unfortunately, the forward Helmert transformation only allows indirectly by a canonical transformation. What would be needed is the inverse Helmert transformation z 6 y , also called backward Helmert transformation. The rectangular Helmert matrix has the disadvantage to have no Cayley inverse. Fortunately, its right inverse H R := H c23 (H 23 H c23 ) 1 = H c23 R 3× 2 solves our problem. The inverse Helmert transformation y = H R z = H c23 z
D4 A second confidence interval for the mean, variance known
577
brings 2 ( y 1Pˆ )c( y 1Pˆ ) = ( y12 + y22 + y32 y1 y2 y2 y3 y3 y1 ) 3 via 1 1 ª « y1 = 1 2 z1 + 2 3 z2 « 1 1 « ' y = H 23z or « y2 = z1 + z2 1 2 23 « «y = 2 z , 2 «¬ 3 23 y12 + y22 + y32 =
1 2 1 2 1 z1 + z2 + z1 z2 + 2 6 3 1 1 1 + z22 + z22 z1 z2 + 2 6 3 2 + z22 , 3
1 1 1 1 1 1 y1 y2 + y2 y3 + y3 y1 = z12 + z22 + z1 z2 z22 z1 z2 z22 , 2 6 3 2 3 3 2 2 2 3 3 ( y1 + y22 + y32 y1 y2 y2 y3 y3 y1 ) = ( z12 + z22 ) = z12 + z22 , 3 3 2 2 into the canonical form. The fifth action item Let us go back to the partitioned pdf in order to inject the canonical representation of the deficient quadratic form y cMy, M \ 3×3 , rk M=2. Here we meet first the problem to transform § 1 · dF = (2S ) 3/ 2 V 3 exp ¨ 2 [2Vˆ 2 + 3( Pˆ P ) 2 ] ¸ dy1dy2 dy3 © 2V ¹ by an extended vector [ z1 , z2 , z3 ]c =: z into the canonical form § 1 · dF = (2S ) 3/ 2 exp ¨ ( z12 + z22 + z32 ) ¸ dz1dz2 dz3 , 2 © ¹ which is generated by the general forward Helmert transformation z = V 1 H ( y 1P )
578
Appendix D: Sampling distributions and their use
ª « « ª z1 º «z » = 1 « « 2» V « « «¬ z3 »¼ « «¬
1 1 2 1 23 1 3
1 1 2 1 23 1 3
º » » ª y1 P º 2 »« y2 P » » 23» « » ¬« y3 P ¼» 1 » 3 »¼ 0
or its backward Helmert transformation, also called the general inverse Helmert transformation ª « « ª y1 P º « y P» = V « « « 2 » « «¬ y3 P »¼ « «¬
1 1 2 1 1 2 0
1 23 1 23 2 23
1 º 3 »» z ª 1º 1 »« » z2 . 3»« » » «z » 1 » ¬ 3¼ 3 »¼
y 1P = V Hcz thanks to the orthonormality of the quadratic Helmert matrix H3, namely H 3 H c3 = H c3 H 3 = I 3 or H 31 = H c3 . Secondly, notice the transformation of the volume element dy1 dy2 dy3 = d ( y1 P )d ( y2 P )d ( y3 P ) = V 3 dz1 dz2 dz3 , which is due to the Jacobi determinant J = V 3 Hc = V 3 H = V 3 . Let us prove that z3 := 3V 1 ( Pˆ P ) =
Pˆ P V 3
brings the first marginal density f1 = ( Pˆ | P ,
V2 1 1 § 1 · ) :=
exp ¨ 2 3( Pˆ P ) 2 ¸ V 3 2 V 2S © ¹ 3
into the canonical form § 1 · exp ¨ z32 ¸ . 2S © 2 ¹ Let us compute Pˆ P as well as 3( Pˆ P ) 2 which concludes the proof. f1 ( z3 0,1) =
1
D4 A second confidence interval for the mean, variance known
579
ª y1 P º 1 1 1 « , , ] y2 P » » 3 3 3 « «¬ y3 P »¼ y + y2 + y3 3P º z3 = V 1 1 » 3 » y1 + y2 + y3 » = Pˆ y1 + y2 + y3 = 3Pˆ » 3 ¼ Pˆ P 2 1 z3 = 3V 1 , z3 = 2 3( Pˆ P ) 2 . V 3 z3 = V 1[
Indeed the extended Helmert matrix H3 is ingenious to decompose 1 ( y 1Pˆ )c( y 1Pˆ ) = z12 + z22 + z32 V2 into a canonical quadratic form relating z12 + z22 to Vˆ 2 and z32 to ( Pˆ P ) 2 . At this point, we have to interpret the general Helmert transformation z = V 1H( y P ) : Structure elements of the Helmert transformation scale or dilatation
V 1
rotation
H
translation
P
V 1 \ + produces a dilatation or a scale change, H SO(3) := {H \ 3×3 HcH = I 3 and H = +1} a rotation (3 parameters) and 1P \ 3 a translation. Please, prove for yourself that the quadratic Helmert matrix is orthonormal, that is HH c = H cH = I 3 and H = +1 . The sixth action item Finally we are left with the problem to split the cumulative pdf into one part f1 ( Pˆ ) which is a marginal distribution of the arithmetic mean Pˆ BLUUE of P and another part f 2 ( Pˆ ) which is a marginal distribution of the standard deviation Vˆ , Vˆ 2 BIQUUE of V 2 , Helmert’s F 22 with two degrees of freedom. First let us introduce polar coordinates (I1 , r ) which represent the Cartesian coordinates z1 = r cos I1 , z2 = r sin I1 . The index 1 is needed for later generalization to higher dimension. As a longitude, the domain of I1 is I1 [0, 2S ] or 0 d I1 d 2S . The new random variable z12 + z22 =|| z ||2 =: x or radius r relates to Helmert’s
F 2 = z12 + z22 =
2Vˆ 2 1 = ( y 1Pˆ )c( y 1Pˆ ) . V2 V2
580
Appendix D: Sampling distributions and their use
Secondly, the marginal distribution of the arithmetic mean Pˆ , Pˆ BLUUE of P , f1 ( Pˆ P ,
V V )d Pˆ = f1 ( z3 0,1)dz3 N ( P , ) 3 3
is a Gauss-Laplace normal distribution with mean P and variance V 2 / 3 generated by
V )d Pˆ = f1 ( z3 0,1) dz3 = 3 +f +f § 1 · § 1 · = (2S ) 1/ 2 exp ¨ z32 ¸ dz3 ³ ³ (2S ) 1 exp ¨ ( z12 + z22 ) ¸ dz1dz2 © 2 ¹ f f © 2 ¹
dF1 = f1 ( Pˆ P ,
or f1 ( Pˆ P ,
V 3 § 3 · )d Pˆ = (2S ) 1/ 2 exp ¨ 2 ( Pˆ P ) 2 ¸ d Pˆ . V 3 © 2V ¹
Third, the marginal distribution of the sample variance 2Vˆ 2 / V 2 = z12 + z22 =: x , Helmert’s F 2 distribution for p=n-1=2 degrees of freedom, p 1 1 § x· f 2 (2Vˆ 2 / V 2 ) = f 2 ( x) = x 2 exp ¨ ¸ p © 2¹ 2 p / 2 *( ) 2 is generated by dF2 =
+f
2S
§ 1 2· § 1 · 1/ 2 1 1 ³f (2S ) exp ¨© 2 z3 ¸¹ dz3 ³0 dI1 (2S ) 2 exp ¨© 2 x ¸¹ dx
dF2 = (2S ) 1Z2
1 § 1 · exp ¨ x ¸ dx subject to Z 2 = 2 © 2 ¹ dF2 =
2S
³ dI
1
= 2S
0
1 § 1 · exp ¨ x ¸ dx 2 © 2 ¹ and
x := z12 + z22 = r 2 dx = 2rdr, dr = dz1 dz2 = rdrdI1 =
dx 2r
1 dxdI1 2
is the transformation of the surface element dz1dz2 . In collecting all detailed results let us formulate a corollary.
D4 A second confidence interval for the mean, variance known
581
Corollary D7 (marginal probability distributions of Pˆ , V 2 given, and of Vˆ 2 ): The cumulative pdf of a set of three observations is represented by V2 dF = f ( y1 , y2 , y3 )dy1dy2 dy3 = f1 ( Pˆ | P , ) f 2 ( x )d Pˆ dx 3 subject to f1 ( Pˆ | P ,
§ 1 ( Pˆ P ) 2 · V2 1 1 ) := * * exp ¨ ¸ 2 3 2S V / 3 © 2 V /3 ¹ f 2 ( x) =
1 § 1 · exp ¨ x ¸ 2 © 2 ¹
subject to x := z12 + z22 = 2Vˆ 2 / V 2 , dx =
2 dVˆ 2 V2
and f
³
f
f1 ( Pˆ ) d Pˆ = 1
f
versus
³f
2
( x ) dx = 1.
0
f1 ( Pˆ ) is the pdf of the sample mean Pˆ = ( y1 + y2 + y3 ) / 3 and f 2 ( x) the pdf of the sample variance Vˆ 2 = ( y c 1Pˆ )c( y 1Pˆ ) / 2 normalized by V2. f1 ( Pˆ ) is a Gauss-Laplace pdf with mean Pˆ and variance V2/3, while f 2 ( x) a Helmert F2 with two degrees of freedom. In summary, an experiment with three Gauss-Laplace i.i.d. observations is characterized by two marginal probability densities, one for the mean Pˆ BLUUE of P and another one for Vˆ 2 BIQUUE of V 2 : Marginal probability densities n=3, Gauss-Laplace i.i.d. observations
Pˆ : f1 ( Pˆ P ,
Vˆ 2 :
V )d Pˆ = 3
= (2S )1/ 2
§ 1 (Pˆ P )2 · exp ¨ ¸ d Pˆ 2 V/ 3 © 2 V /3 ¹ 1
or
f 2 (2Vˆ 2 / V 2 )dVˆ 2 =
=
§ Vˆ 2 · 2 Vˆ 2 exp ¨ 2 ¸ dVˆ V2 © V ¹ or
582
Appendix D: Sampling distributions and their use
f1 ( z3 0,1)dz3 =
f 2 ( x)dx =
1 = (2S ) 1/ 2 exp( z32 ) dz3 2
=
z3 := D41
Pˆ P 3V
1 1 exp( x)dx 2 2
x := 2
Vˆ 2 V2
Sampling distributions of the sample mean Pˆ , V 2 known, and of the sample variance Vˆ 2
The two examples have prepared us for the general sampling distribution of the sample mean Pˆ , V 2 known, and of the sample variance Vˆ 2 for Gauss-Laplace i.i.d. observations, namely samples of size n. By means of Lemma D8 on the rectangular Helmert transformation and Lemma D9 on the quadratic Helmert transformation we prepare for Theorem D10 which summaries both the pdfs for Pˆ BLUUE of P , V 2 known, for the standard deviation Vˆ and for Vˆ 2 BIQUUE of V 2 . Corollary D11 focusses on the pdf of V = qVˆ where V is an unbiased estimation of the standard deviation V , namely E{V } = V . Lemma D8 (rectangular Helmert transformation): The rectangular Helmert matrix H n 1, n \ n×( n 1) transforms the degenerate quadratic form ( n 1)Vˆ 2 := ( y 1Pˆ )c( y 1Pˆ ) = y c My, rk M= n-1 subject to 1 Pˆ = 1cy n into the canonical form (n 1)Vˆ 2 = zcn 1z n 1 = z12 + " + zn21 . The special Helmert transformation y n 6 H n 1, n y n = z n 1 is represented by H n1,n := 1 ª « 1 2 « 1 « « 23 « « " « 1 « « (n 1)(n 2) « 1 « « n(n 1) «¬
1 1 2 1 23 " 1 (n 1)(n 2)
2 23 " 1 ( n 1)( n 2)
1 n(n 1)
1 n(n 1)
1 n(n 1)
0
0
"
0
0
"
0
" 1 ( n 1)( n 2)
"
" n 1 " ( n 1)( n 2) "
1 n(n 1)
º » » » 0 » » » " » » 0 » » n » » n( n 1) »¼ 0
.
D4 A second confidence interval for the mean, variance known
583
The inverse Helmert transformation z 6 y = H R z = H cz or yn = Hcn 1×n z n is based on its right inverse which thanks to the right orthogonality H n 1,n Hcn ,n 1 = I n 1 coincides with its transpose, H R = H c(HH c) 1 = H c R n× ( n 1) . Lemma D9 (quadratic Helmert transformation): The quadratic Helmert matrix H \ n× n , also called extended Helmert matrix or augmented Helmert matrix, transforms the quadratic form 1 1 ( y 1P )c( y 1P ) = 2 [( y 1Pˆ ) + 1( Pˆ P )]c[( y 1Pˆ ) + 1( Pˆ P )] V2 V subject to 1 Pˆ = 1cy n by means of z = V 1H( y 1P ) or y = V Hc( z 1P ) into the canonical form n 1 1 2 2 2 c " y 1 P y 1 P z z z z 2j + z n2 ( ) ( ) = + + + = ¦ 1 n 1 n V2 j =1
z 2n = n( Pˆ P ) 2 . Such a Helmert transformation y 6 z = V 1H( y 1P ) is represented by H := 1 ª « 1 2 « 1 « « 23 « 1 « « 3 4 « " « « 1 « « ( n 1)( n 2) « 1 « « n ( n 1) « 1 « «¬ n
1 1 2 1 23 1
0
2 23 1
0
"
0
0
"
0
"
0
3 3i4
3 4
34
1
1
1
( n 1)( n 2)
( n 1)( n 2)
( n 1)( n 2)
"
1 n ( n 1) 1 n
1 n ( n 1) 1 n
1 n ( n 1) 1 n
" " "
n 1 ( n 1)( n 2) 1 n( n 1) 1 n
º » » » 0 » » » 0 » » » » 0 » » » n » n( n 1) » » 1 » n ¼» 0
Since the quadratic Helmert matrix is orthonormal, the inverse Helmert transformation is generated by z 6 y 1P = V H cz .
584
Appendix D: Sampling distributions and their use
The proofs for Lemma D8 and Lemma D9 are based on generalizations of the special cases for n=2, Example D11, and for n=3, Example D12. Any proof will be omitted here. The highlight of this paragraph is the following theorem. Theorem D10 (marginal probability distribution of ( Pˆ , V 2 ) and Vˆ 2 ): The cumulative pdf of a set of n observations is represented by dF = f ( y1 ,… , yn )dy1 " dyn = = f1 ( Pˆ ) f 4 (Vˆ )d Pˆ dVˆ = f1 ( Pˆ ) f 2 (Vˆ 2 )d Pˆ dVˆ 2 as the product of the marginal pdf f1 ( Pˆ ) of the sample mean Pˆ = n 1 1y and the marginal pdf f 2 (Vˆ ) of the sample standard deviation Vˆ = ( y 1Pˆ )c( y 1Pˆ ) /( n 1) , also called r.m.s. (root mean square error), or the marginal pdf f 2 (Vˆ 2 ) of the sample variance Vˆ 2 . Those marginal pdfs are represented by dF1 = f1 ( Pˆ )d Pˆ dF4 = f 4 (Vˆ )dVˆ
dF2 = f 2 (Vˆ 2 )dVˆ 2 ,
and
(i) sample mean Pˆ f1 ( Pˆ ) = f1 ( Pˆ | P ,
z :=
ª 1 ( Pˆ P ) 2 º V2 1 ) := exp « » 2 V n ¬ 2 V /n ¼ 2S n
n 1 1 ( Pˆ P ) : f1 ( z )dz = exp( z 2 ) dz V 2 2S (ii) sample r.m.s. Vˆ p := n 1 dF4 = f 4 (Vˆ )dVˆ f 4 (Vˆ ) =
2pp p V p 2 p / 2 *( ) 2 x := n 1
f 4 ( x )dx =
2
Vˆ p 1 exp(
p Vˆ 2 ) 2 V2
p Vˆ = Vˆ V V
1 x p 1 exp( x 2 )dx p 2 2 p / 2 *( ) 2
D4 A second confidence interval for the mean, variance known
585
dF4 = f 4 ( x )dx
(iii) sample variance Vˆ 2
p := n -1 f 2 (Vˆ 2 ) =
1 p V p 2 p / 2 *( ) 2 x := ( n 1)
f 2 ( x )dx =
p p / 2Vˆ p 2 exp(
1 Vˆ 2 p ), 2 V2
p Vˆ 2 = 2 Vˆ 2 : 2 V V
p 1 1 x 2 exp( x )dx. p 2 p/2 2 *( ) 2
1
V2 ) as the marginal pdf of the sample mean BLUUE of P is a n Gauss-Laplace pdf with mean P and variance V 2 / n . f 2 ( x ), x := pVˆ / V , is the standard pdf of the normalized root-mean-square error with p degrees of freedom. In contrast, f 2 ( x ), x := pVˆ 2 / V 2 is a Helmert Chi Square F p2 pdf with p = n 1 degrees of freedom. f1 ( Pˆ | P ,
Before we present a sketch of a proof of Theorem D10 which will be run with five action items and a special reference to the first and second vehicle, we give some historical comments. S. Kullback (1934) refers the marginal pdf f1 ( Pˆ ) of the “arithmetic mean” Pˆ to S. D. Poisson (1827), F. Haussdorff (1901) and J. O. Irwins (1927). He has also solved the problem to find the marginal pdf of the “geometric mean”. The marginal pdf f 2 (Vˆ 2 ) of the sample variance Vˆ 2 has been originally derived by F. R. Helmert (1875, 1976 a, b). A historical discussion of Helmert’s distribution is offered by H. A. David (1957), W. Kruskal (1946), H. O. Lancaster (1965, 1966), K. Pearson (1931) and O. Sheynin (1995). The marginal pdf f 4 (Vˆ ) has not found any interest in practice so far. The reason may be found in the effect that Vˆ is not an unbiased estimate of the standard deviation V, namely E{Vˆ } = V . According to E. Czuber (1891 p. 162), K. D. P. Rosen (1948 p. 37) L. Schmetterer (1956 p. 203), R. Stoom (1967 p. 199, 218), M. Fisz (1971 p. 240) and H. Richter and V. Mammitzsch (1973 p. 42) have documented that p *( ) p 2 Vˆ = qVˆ V = 2 *( p + 1 ) 2
586
Appendix D: Sampling distributions and their use
is an unbiased estimation V of the standard deviation V, namely E{Vˆ } = V . p=n-1 again denotes the number of degrees of freedom. B. Schaffrin (1979 p. 240) has proven that ( y 1Pˆ )c( y 1P )
Vˆ p :=
p
1 2
= Vˆ
2p 2 p 1
p
1 2
is an asymptotic (“from above”) unbiased estimation Vˆ p of the standard deviation V. Let us implement V BLUUE of V into the marginal pdf f (Vˆ ) : Corollary D11
(marginal probability distributions of V for Gauss-Laplace i.i.d. observations, E{V } = V ):
The marginal pdf of V , an unbiased estimation of the standard deviation V, is represented by dF4 = f 4 (V )dV
p
f 4 (V ) =
2p V p 2p/2
§ § p + 1 ·2 · p +1 ) ) ¨ ¨ *( ¸ p 1 2 2 ¸ V 2 ¸ ¨ V exp ¨ ¸ p ¨ ¨ *( p ) ¸ ¸ p p ¨ ¸ ( ) 2 * p +1 ( ) ¨ © ¸ 2 ¹ 2 2 © ¹ *p(
and dF4 = f 4 ( x)dx p +1 *( ) 2 2 Vˆ x := V *( p ) 2 f 4 ( x) =
2
1 x p 1 exp( x 2 ) p 2 2 p / 2 *( ) 2 subject to
p +1 ) 2 . E{x} = 2 p *( ) 2 *(
D4 A second confidence interval for the mean, variance known
587
Figure D6 illustrates the marginal pdf for “degrees of freedom” p {1, 2,3, 4,5} . 2 x p 1 e x
2
/2
/[2 p / 2 *( p / 2)]
Figure. D6: Marginal pdf for the sample standard deviation V (r.m.s.) Proof: The first action item The pdf of n Gauss-Laplace i.i.d. observations is given by n
f ( y1 ," , yn ) = f ( y1 )" f ( yn ) = f ( yi ) i =1
§ 1 · f ( y1 ," , yn ) = (2S ) n / 2 V n exp ¨ 2 (y 1P )c(y 1P ) ¸ . © 2V ¹ The coordinates of the observation space Y have been denoted by [ y1 ," , yn ]c = y . Note dim Y = n . The second action item The quadratic form ( y 1P )c( y 1P ) > 0 allows the fundamental decomposition y 1P = (y 1Pˆ ) + 1( Pˆ P ) , (y 1P )c(y 1P ) = (y 1Pˆ )c(y 1Pˆ ) + 1c1( Pˆ P ) 2 1c1 = n
588
Appendix D: Sampling distributions and their use
(y 1P )c(y 1P ) = (n 1)Vˆ 2 + n( Pˆ P ) 2 . Here, Pˆ is BLUUE of P and Vˆ 2 BIQUUE of V 2 . The decomposition of the quadratic ( y 1P )c( y 1P ) into the sample variance Vˆ 2 and the square ( Pˆ P ) 2 of the shifted sample mean Pˆ P has already been proved for n=2 and n=3. The general result is obvious. The third action item Let us transform the cumulative probability into its canonical forms. dF = f ( y1 ," , yn )dy1 " dyn = § 1 · = (2S ) n / 2 V n exp ¨ 2 (y 1P )c(y 1P ) ¸ dy1 " dyn = © 2V ¹ § 1 · = (2S ) n / 2 V n exp ¨ 2 [( n 1)Vˆ 2 + n( Pˆ P ) 2 ] ¸ dy1 " dyn V 2 © ¹ z = V 1 H ( y 1P )
or
y 1P = V Hcz
1 ( y 1P )c( y 1P ) = z12 + z22 + " + zn21 + zn2 . V2 Here, we have substituted the divert Helmert transformation (quadratic Helmert matrix H) and its inverse. Again V 1 is the scale factor, also called dilatation, H an orthonormal matrix, also called rotation matrix, and 1P R n the translation, also called shift. dF = f ( y1 ," , yn )dy1 " dyn = =
§ 1 · 1 § 1 · exp ¨ zn2 exp ¨ ( z12 + " + zn2 ) ¸ dz1dz2 " dzn 1dzn ( n 1) / 2 ¸ 2 2 (2 S ) 2S © ¹ © ¹ 1
based upon dy1dy2 " dyn 1dyn = V n dz1dz2 " dzn 1dzn J = V n | Hc |= V n | H |= V n . J again denotes the absolute value of the Jacobian determinant introduced by the first vehicle. The fourth action item First, we identify the marginal distribution of the sample mean Pˆ . dF1 = f1 ( Pˆ | P ,
V2 )d Pˆ . n
D4 A second confidence interval for the mean, variance known
589
According to the specific structure of the quadratic Helmert matrix zn is generated by ª y1 P º 1 1 « y + " + yn n P # » = V 1 1 zn = V [ ," , ] , « » n n n «¬ yn P »¼ 1
upon substituting 1 y + " + yn Pˆ = 1cy = 1 y1 + " + yn = n Pˆ n n n( Pˆ P ) ( Pˆ P ) 2 1 zn = V 1 zn2 = n , dzn = n d Pˆ . 2 V V n Let us implement dzn in the marginal distribution. dF1 = +f
1
1 exp( zn2 )dzn 2 2S
+f
1
³ " ³ (2S ) ( n 1) / 2 exp[ ( z12 + " + zn21 )]dz1 " dzn 1 , 2 f f +f
+f
1 ( n 1) / 2 exp[ ( z12 + " + zn21 )]dz1 " dzn 1 = 1 ³f " f³ (2S ) 2 dF1 = dF1 =
1
1 exp( zn2 )dzn 2 2S
V2 1 1 ( Pˆ P ) 2 ˆ ˆ P = P P exp( ) d f ( | , )d Pˆ . 1 2 V2 n 2S V n n 1
The fifth action item Second, we identify the marginal distribution of the sample variance Vˆ 2 . We depart the ansatz dF2 = f 2 (Vˆ )dVˆ = f 2 (Vˆ 2 )dVˆ 2 in order to determine the marginal distribution f 2 (Vˆ ) of the sample root-meansquare errors Vˆ and the marginal distribution f 2 (Vˆ 2 ) of the sample variance Vˆ 2 . A first version of the marginal probability distribution dF2 is dF2 =
1 2S
f
1
³ exp( 2 z
f
2 n
)dzn
590
Appendix D: Sampling distributions and their use
1 1 exp[ ( z12 + " + zn21 )]dz1 " dzn 1 . ( n 1) / 2 (2S ) 2
Transform the Cartesian coordinates ( z1 ," , zn 1 ) R n 1 to spherical coordinates ()1 , ) 2 ," , ) n 2 , rn 1 ) . From the operational point of view, p = n 1 , the number of “degrees of freedom”, is an optional choice. Let us substitute the global hypersurface element Z n 1 or Z p into dF2 , namely f
1 2S dF2 = +S / 2
³
1
³ exp( 2 z
2 n
)dzn = 1
f
1 1 r n 2 exp( r 2 )dr 2( n 1) / 2 2 +S / 2
cosn 3In 2 dIn 2
S / 2
³
+S / 2
cosn 4In 3dIn 3 "
S / 2
³
S / 2
2S
cosI2 dI2 ³ dI1 0
1 1 dF2 = r p 1 exp( r 2 )dr p/2 (2S ) 2 +S / 2
³
S / 2
+S / 2
cos p 2I p 1
³
+S / 2
cos p 3I p 2 dI p 2 "
S / 2
S / 2
Z n 1 = Z p = S /2
=
³
S / 2
³
³
³
S / 2
S /2
cos p 3I p 2 dI p 2 "
S / 2
dF2 =
0
³
S /2
cos 2I3dI3
S / 2
dF2 =
2S
cosI2 dI2 ³ dI1
2 2 S ( n 1) / 2 = S p/2 = n 1 p ) *( *( ) 2 2
S /2
cos p 2I p 1dI p 1
+S / 2
cos2I3dI3
Zp (2S )
p/2
³
S / 2
2S
cosI2 dI2 ³ dM1 0
1 r p 1 exp( r 2 )dr 2
2 p 2 p / 2 *( ) 2
1 r p 1 exp( r 2 )dr . 2
The marginal distribution of the r.m.s. f 2 (Vˆ ) is generated as soon as we substitute the radius r = z12 + " + z n21 = n 1 Vˆ1 / V . Alternatively the marginal distribution of the sample variance f 2 (Vˆ 2 ) is produced when we substitute the radius square r 2 = z12 + " + zn21 = ( n 1)Vˆ 2 / V 2 . Project A p Vˆ dr = dVˆ V V dF2 = f 2 (Vˆ )dVˆ
r = n 1 Vˆ / V =
p
D4 A second confidence interval for the mean, variance known
f 2 (Vˆ ) =
591
2pp 1 p 2 Vˆ p -1 exp( Vˆ ) . 2V2 V p 2p/2
Indeed, f 2 (Vˆ ) establishes the marginal distribution of the root-mean-square error Vˆ with p=n-1 degrees of freedom. Project B ª dx = 2rdr Vˆ 2 Vˆ 2 2 x := r = (n 1) 2 = p 2 =: F p « « dr = dx = 1 dx V V 2r 2 x «¬ 2
r p 1 dr =
1 2p 1 x dx 2
dF2 = f 2 ( x)dx f 2 ( x) :=
p 1 1 x 2 exp( x ). p 2 2 p / 2 *( ) 2
1
Finally, we have derived Helmert’s Chi Square F p2 distribution f 2 ( x)dx = dF2 by substituting r 2, r p-1 and dr in factor of x := r 2 and dx = 2r dr. Project C Replace the radical coordinate squared r 2 = (n 1)Vˆ 2 / V 2 = pVˆ 2 / V 2 by rescaling on the basis p / V 2 x = r 2 = z12 + " + zn21 = z12 + " + z 2p = (n 1) dx =
Vˆ 2 p = 2 Vˆ 2 2 V V
p dVˆ V2
within Helmert’s F p2 with p=n-1 degrees of freedom dF2 = f 2 (Vˆ 2 )dVˆ 2 f 2 (Vˆ 2 ) =
1 p V p 2 p / 2 *( ) 2
p p / 2Vˆ p 2 exp(
1 p 2 Vˆ ) 2V2 .
Recall that f 2 (Vˆ 2 ) establishes the marginal distribution of the sample variance Vˆ 2 with p = n -1 degrees of freedom.
592
Appendix D: Sampling distributions and their use
Both, the marginal pdf f 2 (Vˆ ) of the sample standard deviation Vˆ , also called root-mean-square error, and the marginal pdf f 2 (Vˆ 2 ) of the sample variance Vˆ 2 document the dependence on the variance V 2 and its power V p . h “Here is my journey’s end.” (W. Shakespeare: Hamlet) D42
The confidence interval for the sample mean, variance known
An application of Theorem D10 is Lemma D12 where we construct the confidence interval for the sample mean Pˆ , BLUUE of P, variance V 2 known, on the basis of its sampling distribution. Example D14 is an example of a random sample of size n=4. Lemma D12 (confidence interval for the sample mean, variance known): The sampling distribution of the sample mean Pˆ = n 1 1cy , BLUUE of P, is Gauss-Laplace normal, P N ( Pˆ | P , V 2 / n) , if the observations yi , i {1," , n} , are Gauss-Laplace i.i.d. The “true” mean P is an element of the twosided confidence interval.
P ]Pˆ
V V c1D / 2 , Pˆ + c1D / 2 [ n n
with confidence P{Pˆ
V V c1D / 2 < P < Pˆ + c1D / 2 } = 1 D n n
of level 1-D. For three values of the coefficient of confidence J =1-D, Table D7 is a list of associated quantiles c1-D/2 . Example D14 (confidence interval for the sample mean Pˆ , V2 known): Suppose that a random sample ª y1 º ª1.2 º « y » «3.4 » y := « 2 » = « » , Pˆ = 2.7, V 2 = 9, r.m.s. : 3 « y3 » « 0.6 » « » « » «¬ y4 »¼ «¬5.6 »¼ of four observation is known from a Gauss-Lapace normal distribution with unknown mean P and a known standard deviation V =3. Pˆ BLUUE of P is the arithmetic mean. We intend to determine upper and lower limits which are rather
D4 A second confidence interval for the mean, variance known
593
certain to contain the unknowns parameter P between them. Previously, for samples of size 4 we have known that the random variable z=
Pˆ P Pˆ P = 3 V 2 n
is normally distributed with mean zero and unit variance. Pˆ is the sample mean 2.7 and 3/2 is V / n . The probability J = 1 D that z will be between any two arbitrarily chosen numbers c1 = - c and c2 = c is c2
P{c1 < z < +c2 } =
³ f ( z )dz = J = 1 D ,
c1
P{c < 2 < +c} =
+c
³ f ( z )dz = J = 1 D ,
c
c
c
P{c < z < +c} =
³
f ( z )dz
f c
³
f ( z )dz = 1
f
³
f ( z )dz = J = 1 D ,
f
D , 2
c
³
f ( z )dz =
f
D . 2
J is the coefficient of confidence, D the coefficient of negative confidence, also called complementary coefficient of confidence. The four representations of the probability J =1-D to include z in the confidence interval c < z < + c have led to the linear Voltera integral equation of the first kind c
³
f
f ( z )dz = 1
D 1+ J = . 2 2
Three values of the coefficient of confidence J or its compliment D are popular and listed in Table D6. Table D6 Values of the coefficient of confidence
J D D 2 1
D 1+ J = 2 2
0.950 0.050
0.990 0.010
0.999 0.001
0.025
0.005
0.000,5
0.975
0.995
0.999,5
594
Appendix D: Sampling distributions and their use
In solving the linear Voltera integral equation of the first kind z
³
f
1 1 f ( z * )dz * = 1 D ( z ) = [1 + J ( z )]. 2 2
Table D7 collects the quantiles c1-D/2 given the coefficients of confidence on their complements which we listed in Table D7. Table D7 Quantiles for the confidence interval of the sample mean, variance unknown 1 D / 2 = (1 + J ) / 2 0.975 0.995 0.999,5
J 0.95 0.99 0.999
D 0.05 0.01 0.001
c1-D/2 1.960 2.576 3.291
Given the quantiles c1-D/2, we are going to construct the confidence interval for the sample mean Pˆ , the variance V 2 to be known. For this purpose, we convert the forward transformations Pˆ o z = n ( Pˆ P ) / V to P.
Pˆ Pˆ1 := Pˆ
V z=P n
V V c1D / 2 < P < Pˆ + c1D / 2 =: Pˆ 2 . n n
The interval Pˆ1 < P < Pˆ 2 for the fixed value z = c1-D/2 contains the “true” mean P with probability J =1-D. Pˆ 2
P{Pˆ
c V V V2 c D < P < Pˆ + c D } = ³ f ( z )dz = ³ f ( Pˆ | P , )d Pˆ = J = 1 D n n 1 2 n 1 2 Pˆ c 1
since Pˆ +
Pˆ 2
³
f
f ( Pˆ )d Pˆ =
V c1D / 2 n
³
f
c1D / 2
f ( Pˆ )d Pˆ =
³
f
f ( Pˆ )d Pˆ = 1
D . 2
An animation of the coefficient of confidence and the probability functions of a confidence interval is offered by Figure D6 and D7.
D4 A second confidence interval for the mean, variance known
595
Figure D6: Two-sided confidence interval P ]Pˆ1 , Pˆ 2 [, f ( Pˆ | P , V 2 / n) p.d.f.:
f ( Pˆ | P , V 2 )
f ( Pˆ | P , V 2 )
D/2
D/2
J =1-D P
Pˆ1
Pˆ1 = Pˆ c1D / 2
Pˆ1 = Pˆ c1D / 2
V n
Pˆ 2
Pˆ1
V V , Pˆ 2 = Pˆ + c1D / 2 n n P
Pˆ1 = Pˆ + c1D / 2
V n
Figure D7: Two-sided confidence interval, quantile c1-D/2 Let us specify all the integrals to our example c1D / 2
³
f ( z )dz = 1 D / 2
f 1.960
³
f
2.576
f ( z )dz = 0.975 ,
³
3.291
f ( z )dz = 0.995 ,
f
³
f ( z )dz = 0.999,5 .
f
Those data lead to a triplet of confidence intervals. case (i) J =0.95, D =0.05, c1-D/2 =1.960 3 3 P{2.7 1.96 < P < 2.7 + 1.96} = 0.95 2 2 P{0.24 < P < +5.64} = 0.95 case (ii) J =0.99, D =0.01, c1-D/2 =2.576 3 3 P{2.7 2.576 < P < 2.7 + 2.576} = 0.99 2 2 P{1.164 < P < 6.564} = 0.99 case (iii) J =0.999, D =0.001, c1-D/2 =3.291
596
Appendix D: Sampling distributions and their use
3 3 P{2.7 3.291 < P < 2.7 + 3.291} = 0.999 2 2 P{2.236 < P < 7.636} = 0.999 . With probability 95% the “true” mean P is an element of the interval ]-0.24, +5.64[. In contrast, with probability 99% the “true” mean P is an element of the larger interval ]-1.164, +6.564[. Finally, with probability 99.9% the “true” mean P is an element of the largest interval ]-2.236, +7.636[.
D5
Sampling from the Gaus-Laplace normal distribution: a third confidence interval for the mean, variance unknown
In order to derive the sampling distributions for the sample mean, variance unknown, of Gauss-Laplace i.i.d. observation D51 introduces two examples (two and three observations, respectively, for generating Student’s t distribution. Lemma D 13 reviews Student’s t-distribution of the random variable n c ( Pˆ P ) / Vˆ where the sample mean Pˆ is BLUUE of P, whereas the sample variance Vˆ 2 is BIQUUE of V2. D52 by means of Lemma D13 introduces the confidence interval for the “true” mean P variance V2 unknown, which is based on Student’s probability distribution. For easy computation, Table D12 is its flow chart. D53 discusses The Uncertainly Principle generated by The Magic Triangle of (i) length of confidence interval, (ii) coefficient of negative confidence, also called the uncertainty number, and (iii) the number of observations. Various figures and examples pave the way for the routine analyst’s use of the confidence interval for the mean, variance unknown. D51
Student’s sampling distribution of the random variable ( Pˆ P ) / Vˆ
Two examples for n=2 or n=3 Gauss-Laplace i.i.d. observations keep us to derive Student’s t-distribution for the random variable n ( Pˆ P ) / Vˆ where Pˆ is BLUUE of P, whereas Vˆ 2 is BIQUUE of V2. Lemma D12 and its proof is the highlight of this paragraph in generating the sampling probability distribution of Student’s t. Example D15 (Student’s t-distribution for two Gauss-Laplace i.i.d. observations): First, assume an experiment of two Gauss-Laplace i.i.d. observations called y1 and y2: We want to prove that (y1+y2)/2 and (y1-y2)2/2 or the sample mean Pˆ and the sample variance Vˆ 2 are stochastically independent. y1 and y2 are elements of the joint pdf. f ( y1 , y2 ) = f ( y1 ) f ( y2 ) f ( y1 , y2 ) =
1 § 1 · exp ¨ 2 [( y1 P ) 2 + ( y2 P ) 2 ] ¸ 2SV 2 2 V © ¹
D5 A third confidence interval for the mean, variance unknown
f ( y1 , y2 ) =
597
1 § 1 · exp ¨ 2 (y 1P )c(y 1P ) ¸ . 2 2SV 2 V © ¹
The quadratic form (y 1P )c(y 1P ) is decomposed into the sample variance Vˆ 2 , BIQUUE of V2, and the deviate of the sample mean Pˆ , BLUUE of P, from P by means of the fundamental separation y 1P = y 1Pˆ + 1( Pˆ P ) (y 1P )c(y 1P ) = (y 1Pˆ )c(y 1Pˆ ) + 1c1( Pˆ P ) 2 (y 1P )c(y 1P ) = Vˆ 2 + 2( Pˆ P ) 2 f ( y1 , y2 ) =
1 § 1 · § 2 · exp ¨ 2 Vˆ 2 ¸ exp ¨ 2 ( Pˆ P ) 2 ¸ . 2SV 2 2 2 V V © ¹ © ¹
The joint pdf f (y1,y2) is transformed into a special form if we replace
Pˆ = ( y1 + y2 ) / 2 , 1 Vˆ 2 = ( y1 Pˆ ) 2 + ( y2 Pˆ ) 2 = ( y1 y2 ) 2 , 2 namely f ( y1 , y2 ) =
1 § 1 y + y2 · § 1 1 · exp ¨ 2 ( y1 y2 ) 2 ¸ exp ¨ 2 ( 1 P )2 ¸ . 2 2 2SV 2 2 V V © ¹ © ¹
Obviously the product decomposition of the joint pdf documented that (y1+y2)/2 and (y1-y2)2/2 or Pˆ and Vˆ 2 are independent random variables. Second, we intend to derive the pdf of Student’s random variable t := 2( Pˆ P ) / Vˆ , the deviate of the sample mean Pˆ form the “true” mean P, normalized by the sample standard deviations Vˆ . Let us introduce the direct Helmert transformation 1 y1 y2 Vˆ 2 y1 + y2 2 Pˆ P z1 = = , z2 = , P) = ( V V V 2 V 2 2 or ª 1 ª z1 º 1 « 2 «z » = « 1 ¬ 2¼ V « « ¬ 2
1 º 2 »» ª y1 P º , 1 » «¬ y2 P »¼ » 2 ¼
as well as the inverse
598
Appendix D: Sampling distributions and their use
ª 1 « 2 « « 1 « ¬ 2
ª y1 P º «y P» =V ¬ 2 ¼
1 º 2 »» ª z1 º , 1 » «¬ z2 »¼ » 2¼
which brings the joint pdf dF = f(y1, y2)dy1dy2 = f(z1, z2)dz1dz2 into the canonical form. 1
§ 1 · 1 § 1 · exp ¨ z12 ¸ exp ¨ z22 ¸ dz1dz2 2S © 2 ¹ 2S © 2 ¹
dF =
1 1 dy1dy2 = V2
1
2 1
2 dz1dz2 = dz1dz2 . 1
2
2
The Helmert random variable x := z12 or z1 = x replaces the random variable z1 such that dz1dz2 = dF =
1
1 1 dxdz2 2 x
1
§ 1 · 1 § 1 · exp ¨ x ¸ exp ¨ z22 ¸ dxdz2 2 2S 2 x © ¹ 2S © 2 ¹
is the joint pdf of x and z2. Finally, we introduce Student’s random variable t := 2 z2 =
V 2 z2 ( Pˆ P ) Pˆ P = V 2
z1 = x = t=
Pˆ P Vˆ
z2 x
Vˆ V
Vˆ = V z1 = V x
z2 = x t z22 = x t 2 .
Let us transform dF =f (x, z2) dx dz2 to dF =f (t, x) dt dx, namely from the joint pdf of the Helmert random variable x and the Gauss-Laplace normal variate z2 to the joint pdf of the Student random variable t and the Helmert random variable x.
D5 A third confidence interval for the mean, variance unknown
dz2 dx =
dz2 dx =
Dt z2 Dt x
599
Dx z2 dt dx Dx x 2 3/ 2 x t dt dx 2 1
x 0
dz2 dx = x dt dx dF = f (t, x)dt dx f (t , x) =
1 1 1 exp[ (1 + t 2 ) x ] . 2S 2 2
The marginal distribution of Student’s random variable t, namely dF3=f3(t)dt, is generated by f
f 3 (t ) :=
1 1 1 exp[ (1 + t 2 ) x]dx ³ 2S 2 0 2
subject to the standard integral f
1
³ exp(E x)dx = [ E exp ( E x )]
f 0
0
=
1 E
1 1 2 E := (1 + t 2 ), = 2 E 1+ t2 such that f 3 (t ) = dF3 =
1 1 , 2S 1 + t 2
1 1 dt 2S 1 + t 2
and characterized by a pdf f3(t) which is reciprocal to (1+t2). Example D16
(Student’s t-distribution for three Gauss-Laplace i.i.d. observations):
First, assume an experiment of three Gauss-Laplace i.i.d. observations called y1, y2 and y3: We want to derive the joint pdf f(y1, y2, y3) in terms of the sample mean Pˆ , BLUUE of P, and the sample variance Vˆ 2 , BIQUUE of V2. f(y1, y2, y3) = f(y1) f(y2) f(y3)
600
Appendix D: Sampling distributions and their use
f ( y1 , y2 , y3 ) =
1 § 1 · exp ¨ 2 [( y1 P ) 2 + ( y2 P ) 2 + ( y3 P ) 2 ] ¸ 3/ 2 3 (2S ) V 2 V © ¹
f ( y1 , y2 , y3 ) =
1 § 1 · exp ¨ 2 (y 1P )c(y 1P ) ¸ . 3/ 2 3 (2S ) V © 2V ¹
The quadratic form (y 1P )c(y 1P ) is decomposed into the sample variance Vˆ 2 and the deviate sample mean Pˆ from the “true” mean P by means of the fundamental separation y 1P = y 1Pˆ + 1( Pˆ P ) (y 1P )c(y 1P ) = (y 1Pˆ )c(y 1Pˆ ) + 1c1( Pˆ P ) 2 (y 1P )c(y 1P ) = 2Vˆ 2 + 3( Pˆ P ) 2 dF = f ( y1 , y2 , y3 )dy1dy2 dy3 = =
1 § 1 · § 3 · exp ¨ 2 2Vˆ 2 ¸ exp ¨ 2 ( Pˆ P ) 2 ¸ dy1dy2 dy3 . (2S )3/ 2 V 3 2 2 V V © ¹ © ¹
Second, we intend to derive the pdf of Student’s random variable t := 3( Pˆ P ) / Vˆ , the deviate of the sample mean Pˆ from the “true” mean P, normalized by the sample standard deviation Vˆ . Let us introduce the direct Helmert transformation z1 = z2 = z3 =
1
V 2
( y1 P + y2 P ) =
1
V 23 1
V 3
1
V 2
( y1 + y2 2 P )
( y1 P + y2 P 2 y3 + 2 P ) =
( y1 P + y2 P + y3 P ) =
1
V 3
1
V 23
( y1 + y2 2 y3 )
( y1 + y2 + y3 3P )
or ª « « ª z1 º «z » = 1 « « 2» V « « «¬ z3 »¼ « « ¬
1
1
1 2 1
1 2 1
23 1
23 1
3
3
º » » ª y1 P º 2 »« » » « y2 P » 23 » « y P »¼ 1 »¬ 3 » 3 ¼ 0
as well as its inverse
601
D5 A third confidence interval for the mean, variance unknown
ª « « ª y1 P º «y P» = V « « « 2 » « «¬ y3 P »¼ « « ¬
1
1
1 2 1
23 1
1 2
23 2
0
23
1 º » 3» z ª 1º 1 »« » » z2 3»« » «z » 1 »¬ 3¼ » 3¼
in general z = V 1 H (y P ) versus (y 1P ) = V H cz , which help us to bring the joint pdf dF(y1, y2, y3)dy1 dy2 dy3 = f(z1, z2, z3)dz1 dz2 dz3 into the canonical form. dF =
1 § 1 · § 1 · exp ¨ ( z12 + z22 ) ¸ exp ¨ z32 ¸ dz1dz2 dz3 2 (2S )3/ 2 © ¹ © 2 ¹
1 dy1dy2 dy3 = V3
1
1
1
1 2 1
23 1
3 1
1 2
23 2
3 1
0
23
dz1dz2 dz3 = dz1dz2 dz3 .
3
The Helmert random variable x := z12 + z22 or x = z12 + z22 replaces the random variable z12 + z22 as soon as we introduce polar coordinates z1 = r cos I1 , z2 = r sin I1 , z12 + z22 = r 2 =: x and compute the marginal pdf dF ( z3 , x) =
2S
1
1 1 § 1 · § 1 · exp ¨ z32 ¸ dz3 exp ¨ x ¸ dx ³ dI1 , 2S 2 2S © 2 ¹ © 2 ¹ 0 by means of
x := z12 + z22 = r 2 dx = 2rdr , dr = dz1dz2 = rdrdI1 = dF ( z3 , x) =
dx , 2r
1 dxdI1 , 2
1 § 1 · 1 § 1 · exp ¨ x ¸ exp ¨ z32 ¸ dxdz3 , 2 2 © ¹ 2S © 2 ¹
the joint pdf of x and z. Finally, we inject Student’s random variable
602
Appendix D: Sampling distributions and their use
t := 3
Pˆ P , Vˆ
decomposed into z3 =
3 V ( Pˆ P ) Pˆ P = z3 V 3
z12 + z22 = x = 2 t=
z3
Vˆ V V Vˆ = z12 + z22 = x V 2 2
2 z3 =
x
1
x t z32 =
2
1 2 xt . 2
Let us transform dF = f (x, z3) dx dz3 to dF = f (t, x) dt dx. Alternatively we may say that we transform the joint pdf of the Helmert random variable x and the Gauss-Laplace normal variate z3 to the joint pdf of the Student random variable t and the Helmert random variable x. Dt z3 Dt x
dz3 dx =
dz3 dx =
Dx z3 dtdx Dx x
x
1
2 0
2 2
dz3 dx =
x 3/ 2t
dtdx
1 x 2
dtdx
dF = f (t , x)dtdx f (t , x) =
1
1
2 2
2S
1 t2 x exp[ (1 + ) x]. 2 2
The marginal distribution of Student’s random variable t, namely dF3 = f3(t)dt, is generated by f 3 (t ) :=
1
1
2 2
2S
f
³ 0
1 t2 x exp[ (1 + ) x]dx 2 2
subject to the standard integral f
D ³ x exp( E x)dx = 0
*(D + 1) 1 1 t2 , D = , E = (1 + ) 2 2 2 E D +1
D5 A third confidence interval for the mean, variance unknown
603
such that f
³ 0
t2 1 x exp[ (1 + ) x]dx = 23/ 2 2 2
3 *( ) 2 t 2 3/ 2 (1 + ) 2
3 1 *( ) = S 2 2 3 1 f 3 (t ) = *( ) 2 2S
dF3 =
1 , t2 (1 + )3/ 2 2
2 dt . t 2 3/ 2 4(1 + ) 2
Again Student’s t-distribution is reciprocal t(1+t2/2)3/2. Lemma D12 (Student’s t-distribution for the derivate of the mean ( Pˆ P ) / Vˆ ,W.S. Gosset 1908): Let the random vector of observations y = [ y1 ," , yn ]c be Gauss-Laplace i.i.d. Student’s random variable t :=
Pˆ P n, Vˆ
where the sample mean Pˆ is BLUUE of P and the sample variance Vˆ 2 is BIQUUE of V 2 , associated to the pdf p +1 ) 2 f (t ) = p *( ) 2 *(
1 pS
1 . t 2 ( p +1) / 2 (1 + ) p
p = n 1 is the “degree of freedom” of Student’s distribution f p (t ) . W.S. Gosset published the t-distribution of the ratio n ( Pˆ P ) / Vˆ under the pseudonym “Student”: The probable error of a mean, Biometrika 6 (1908-09) 125. :Proof: The joint probability distribution of the random variable zn := n ( Pˆ P ) / V and the Helmert random variable x := z12 + " + zn21 = ( n 1)Vˆ 2 / V 2 represented by dF = f1 ( zn ) f 2 ( x )dzn dx
604
Appendix D: Sampling distributions and their use
due to the effect that zn and z12 + " + zn21 or Pˆ P and ( n 1)Vˆ 2 are stochastically independent. Let us take reference to the specific pdfs with n and n 1 = p degrees of freedom dF1 = f1 ( zn )dzn 1
§ 1 · exp ¨ zn2 ¸ 2S © 2 ¹
f1 ( zn ) =
dF2 = f 2 ( x)dx ,
and
and
f 2 ( x) =
1 1 p / 2 ( p 2) / 2 § 1 · ( ) x exp ¨ x ¸ , p 2 © 2 ¹ *( ) 2
or dF1 = f1 ( Pˆ )d Pˆ
f1 (Pˆ ) =
dF2 = f 2 (Vˆ 2 )dVˆ 2 ,
and
f 2 (Vˆ 2 ) =
2
§ n (Pˆ P ) · exp ¨ ¸ and 2 V 2S © 2 V ¹ n
1 p p / 2Vˆ p2 V 2 *( p /2) p
p/2
§ 1 p 2· exp ¨ Vˆ ¸ 2 © 2V ¹
we derived earlier. Here, let us introduce Student’s random variable t :=
zn Pˆ P n 1 = n Vˆ x
zn =
or
x t= n 1
x t. p
By means of the Jacobi matrix J and its absolute Jacobi determinant | J | ª dzn º ª dt º ª Dz t Dx t º ª dzn º « dx » = « D x D x » « dx » = J « dx » ¬ ¼ «¬ z ¬ ¼ x » ¼¬ ¼ n
n
ª p « J=« x «¬ 0
º 1 x 3/ 2 zn p » 2 » , | J |= »¼ 1
p x
, | J|-1 =
x p
we transform the surface element dzn dx to the surface element | J |1 dtdx, namely dzn dx =
x p
dtdx.
Lemma D13 (Gamma Function): Our final action item is to calculate the marginal probability distributions dF3 = f 3 (t )dt of Student’s random variable t, namely
605
D5 A third confidence interval for the mean, variance unknown
dF3 = f 3 (t )dt f 3 (t ) :=
f § 1 1 1 t2 · ( ) p / 2 ³ x ( p 1) / 2 exp ¨ (1 + ) x ¸ dx. p ¹ 2 pS *( p / 2) 2 © 2 0
1
Consult W. Gröbner and N. Hofreiter (1973 p.55) for the standard integral f
³x
D
exp( E x )dx =
0
D ! *(D + 1) = E D +1 E D +1
and f
1 t2 ( p 1) / 2 x exp[ (1 + ) x ]dx = 2( p +1) / 2 ³0 2 p
*(
p +1 ) 2
t2 (1 + )( p +1) / 2 p
,
where p = n 1 is the rank of the quadratic Helmert matrix Hn. Notice a result p +1 p 1 )=( )! . In summary, substituting the stanof the gamma function *( 2 2 dard integral dF3 = f 3 (t )dt p +1 ) 2 f 3 (t ) = p *( ) 2 *(
1
1 2 t pS (1 + )( p +1) / 2 p
resulted in Student’s t distribution, namely the pdf of Student’s random variable ( Pˆ P ) / Vˆ . h D52
The confidence interval for the mean, variance unknown
Lemma D12 is the basis for the construction of the confidence interval of the “true” mean, variance unknown, which we summarize in Lemma D13. Example D17 contains all details for computing such a confidence interval, namely Table D8, a collection of the most popular values of the coefficient of confidence, as well as Table D9, D10 and D11, listing the quantiles for the confidence interval of the Student random variable with p = n 1 degrees of freedom. Figure D8 and Figure D9 illustrate the probability of two-sided confidence interval for the mean, variance unknown, and the limits of the confidence interval. Table D12 as a flow chart paves the way for the “fast computation” of the confidence interval for the “true” mean, variance unknown.
606
Appendix D: Sampling distributions and their use
Lemma D14 (confidence interval for the sample mean, variance unknown): The random variable t = n ( Pˆ P ) / Vˆ characterized by the ratio of the deviate of the sample mean Pˆ = n 1 1cy , BLUUE of P , from the “true” mean P and the standard deviation Vˆ , Vˆ 2 = (y 1Pˆ )c(y 1Pˆ ) /( n 1) BIQUUE of the “true” variance V 2 , has the Student t-distribution with p = n 1 “degrees of freedom”. The “true” mean P is an element of the two-sided confidence interval
P ]Pˆ
Vˆ Vˆ c1D / 2 , Pˆ + c1D / 2 [ n n
with confidence P{Pˆ
Vˆ Vˆ c1D / 2 < P < Pˆ + c1D / 2 } = 1 D n n
of level 1 D . For three values of the coefficient of confidence J = 1 D , Table D9 is a list of associated quantiles c1D / 2 . Example D17 (confidence interval for the example mean Pˆ , V 2 unknown): Suppose that a random sample ª y1 º ª1.2 º «y » « » « 2 » = «3.4 » , Pˆ = 2.7, Vˆ 2 = 5.2, Vˆ = 2.3 « y3 » «0.6 » « » « » «¬ y4 »¼ ¬5.6 ¼ of four observations is characterized by the sample mean Pˆ = 2.7 and the sample variance Vˆ 2 = 5.2 . 2(2.7 P ) / 2.3 = n ( Pˆ P ) / Vˆ = t has Student’s pdf with p = n 1 = 3 degrees of freedom. The probability J = 1 D that t will be between any two arbitrarily chosen numbers c1 = c and c2 = +c is c2
P{c1 < t < c2 } =
³ f (t )dt = J = 1 D
c1
P{c < t < c} =
+c
³ f (t )dt = J = 1 D
c
c
c
P{c < t < c} =
³
f (t )dt
f c
³
f
f (t )dt = 1
³
f (t )dt = J = 1 D
f
D , 2
c
³
f
f (t )dt =
D 2
607
D5 A third confidence interval for the mean, variance unknown
J is the coefficient of confidence, D the coefficient of negative confidence, also called complementary coefficient of confidence. The four representations of the probability J = 1 D to include t in the confidence interval c < t < +c have led to the linear Volterra integral equation of the first kind c
³
f (t )dt = 1
f
D 1 = (1 + J ) . 2 2
Three values of the coefficient of confidence J or its complement D are popular and listed in Table D7. Table D8 Values of the coefficient of confidence
1
J
0.950
0.990
0.999
D
0.050
0.010
0.001
D 2
0.025
0005
0.000,5
0.975
0.995
0.999,5
D 1+ J = 2 2
In solving the linear Volterra integral equation of the first kind t
³
f
1 1 f (t )dt = 1 D (t ) = [1 + J (t )] , 2 2
which depends on the degrees of freedom p = n 1 , Table D9-D11 collect the quantiles c1D / 2 / n given the coefficients of confidence or their complements which we listed in Table D8. Table D9 Quantiles c1D / 2 / n for the confidence interval of the Student random variable with p = n 1 degrees of freedom 1 D / 2 = (1 + J ) / 2 = 0.975, J = 0.95, D = 0.05 p
n
1 2 3 4
2 3 4 5
1
c1D / 2
n 8.99 2.48 1.59 1.24
p
n
14 19 24 29
15 20 25 30
1
c1D / 2 n 0.554 0.468 0.413 0.373
608
Appendix D: Sampling distributions and their use
5 6 7 8 9
6 7 8 9 10
1.05 0.925 0.836 0.769 0.715
39 49 99 199 499
40 50 100 200 500
0.320 0.284 0.198 0.139 0.088
Table D10 Quantiles c1D / 2 / n for the confidence interval of the Student random variable with p = n 1 degrees of freedom 1 D / 2 = (1 + J ) / 2 = 0.995, J = 0.990,D = 0.010 n p n 1 1 c1D / 2 c1D / 2 n n
p
1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8 9 10
45.01 5.73 2.92 2.06 1.65 1.40 1.24 1.12 1.03
14 19 24 29 39 49 99 199 499
15 20 25 30 40 50 100 200 500
0.769 0.640 0.559 0.503 0.428 0.379 0.263 0.184 0.116
Table D11 Quantiles c1D / 2 / n for the confidence interval of the Student random variable with p = n 1 degrees of freedom 1 D / 2 = (1 + J ) / 2 = 0.999.5, J = 0.999, D = 0.001 p n n c1D / 2 c1D / 2 / n p c1D / 2 c1D / 2 / n 1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8 9 10
636.619 31.598 12.941 8.610 6.859 5.959 5.405 5.041 4.781
450.158 18.243 6.470 3.851 2.800 2.252 1.911 1.680 1.512
14 19 24 29 30 40 60 120 f
15 20 25 30 31 41 61 121 f
4.140 3.883 3.725 3.659 3.646 3.551 3.460 3.373 3.291
1.069 0.868 0.745 0.668 0.655 0.555 0.443 0.307 0
D5 A third confidence interval for the mean, variance unknown
609
Since Student’s pdf depends on p = n 1 , we have tabulated c1D / 2 / n for the confidence interval we are going to construct. The Student’s random variable t = n ( Pˆ P ) / Vˆ is solved for P , evidently our motivation to introduce the confidence interval Pˆ1 < P < Pˆ 2
Pˆ P Vˆ Vˆ Pˆ P = t P = Pˆ t Vˆ n n Vˆ Vˆ Pˆ1 := Pˆ c1D / 2 < P < Pˆ + c1D / 2 =: Pˆ 2 . n n
t= n
The interval Pˆ1 < P < Pˆ 2 for the fixed value t = c1D / 2 contains the “true” mean P with probability J = 1 D . P{Pˆ -
Vˆ Vˆ c1D / 2 < P < Pˆ + c1D / 2 } = n n =
+c
³ f (t )dt = J = 1 D
c
because of c1D / 2
c
³
f (t )dt =
³
f
f (t )dt = 1
f
D . 2
Figure D8 and Figure D9 illustrate the coefficient of confidence and the probability function of a confidence interval.
f (t )
J =1-D D/2
D/2
c1D / 2
+ c1D / 2
t
Figure D8: Two-sided confidence interval P ]Pˆ1 , Pˆ 2 [ , Student’s pdf for p = 3 degrees of freedom (n = 4) Pˆ1 = Pˆ Vˆ c1D / 2 / n , Pˆ 2 = Pˆ + Vˆ c1D / 2 / n
Pˆ + Vˆ c1D / 2 / n
Pˆ Vˆ c1D / 2 / n Pˆ1
P
Pˆ 2
Figure D9: Two-sided confidence interval for the “true” mean P , quantile c1D / 2
610
Appendix D: Sampling distributions and their use
Let us specify all the integrals to our example. c1D / 2
³
f (t )dt = 1 D / 2
f
c0.975
³
f
c0.999 ,5
c0.995
f (t )dt = 0.975,
³
f (t )dt =0.995,
f
³
f (t )dt =0.999,5.
f
These data substituted into Tables D9-D11 lead to the triplet of confidence intervals for the “true” mean. case (i ) : J = 0.95, D = 0.05, 1 D / 2 = 0.975 p = 3, n = 4, c1D / 2 / n = 1.59 P{2.7 2.3 1.59 < P < 2.7 + 2.3 1.59} = 0.95 P{0.957 < P < 6.357} = 0.95 . case (ii ) : J = 0.99, D = 0.01, 1 D / 2 = 0.995 p = 3, n = 4, c1D / 2 / n = 2.92 P{2.7 2.3 2.92 < P < 2.7 + 2.3 2.92} = 0.99 P{4.016 < P < 9.416} = 0.99 . case (iii ) : J = 0.999, D = 0.001, 1 D / 2 = 0.999,5 p = 3, n = 4, c1D / 2 / n = 6.470 P{2.7 2.3 6.470 < P < 2.7 + 2.3 6.470} = 0.999 P{12.181 < P < 17.581} = 0.999 . The results may be summarized as follows: With probability 95% the “true” mean P is an element of the interval ]-0.957,+6.357[. In contrast, with probability 99% the “true” mean P is an element of the larger interval ]-4.016,+9.416[. Finally, with probability 99.9% the “true” mean P is an element of the largest interval ]-12.181,+17.581[. If we compare the confidence intervals for the mean P , V 2 known, versus V 2 unknown, we realize much larger intervals for the second model. Such a result is not too much a surprise, since the model “ V 2 unknown” is much weaker than the model “ V 2 known”.
D5 A third confidence interval for the mean, variance unknown
611
Flow chart: Confidence interval for the mean P , V 2 unknown: Choose a coefficient of confidence J according to Table D8.
Table D12:
Solve the linear Volterra integral equation of the first kind F (c1D / 2 ) = 1 D / 2 = (1 + J ) / 2 by Table 9, Table 10 or Table 11 for a Student pdf with p = n 1 degrees of freedom: read c1D / 2 / n . Compute the sample mean Pˆ and the sample standard deviation Vˆ . Compute Vˆ c1D / 2 / n and Pˆ Vˆ c1D / 2 / n , Pˆ + Vˆ c1D / 2 / n . D53
The Uncertainty Principle
Figure D10: Length of the confidence interval for the mean against the number of observations Figure D10 is the graph of the function 'P (n;D ) := 2Vˆ c1D / 2 / n , where 'P is the length of the confidence interval of the “true” mean P , V 2 unknown. The independent variable of the function is the number of observations n. The function 'P (n;D ) is plotted for fixed values of the coefficient of complementary confidence D , namely D = 10%, 5%, 1%. For reasons given later the coefficient of complementary confidence D is called uncertainty number. The graph function 'P (n;D) illustrates two important facts. Fact #1: For a constant uncertainty number D , the length of the confidence interval 'P is smaller the larger the number of observations n is chosen.
612
Appendix D: Sampling distributions and their use
Fact #2:
For a constant number of observations n, the smaller the number of uncertainly D is chosen, the larger is the confidence interval 'P . Evidently, the diversive influences of (i) the length of the confidence interval 'P , (ii) the uncertainly number D and (iii) the number of observations n, which we collected in The Magic Triangle of Figure D11, constitute The Uncertainly Principle, in formula 'P (n 1) t: k PD , where kPD is called the quantum number for the mean P which depends on the uncertainty number D . Table D13 is a list of those quantum numbers. Let us interpret the uncertainty relation 'P (n 1) t kPD . The product 'P (n 1) defines geometrically a hyperbola which we approximated out of the graph of Figure D10. Given the uncertainty number D , the product 'P (n 1) has a smallest number, here denoted by k PD . For instance, choose D = 1% such that k PD /(n 1) d 'P or 16.4 /(n 1) d 'P . For n taken values 2, 11, 101, we get the inequalities 8.2 d 'P , 1.64 d 'P , 0.164 d 'P . length of the confidence interval
uncertainty number D Figure D11:
number of observations n
The Magic Triangle, constituents: (i) uncertainty number D , (ii) number of observations n, (iii) length of the confidence interval 'P . Table D13
Coefficient of complementary confidence D , uncertainty number D , versus quantum number of the mean k PD (E. Grafarend 1970) D kPD 10%
6.6
5%
9.6
1%
16.4
o0
of
D6 A fourth confidence interval for the variance
D6
613
Sampling from the Gauss-Laplace normal distribution: a fourth confidence interval for the variance
Theorem D10 already supplied us with the sampling distribution of the sample variance, namely with the probability density function f ( x) of Helmert’s random variable x = (n 1)Vˆ 2 / V 2 of Gauss-Laplace i.i.d. observations n of sample variance Vˆ 2 , BIQUUE of V 2 . D61 introduces accordingly the so far missing confidence interval for the “true” variance V 2 . Lemma D15 contains the details, followed by Example D16. Table D15, Table D16 and Table D17 contain the properly chosen coefficients of complementary confidence and their quantiles c1 ( p;D / 2), c2 ( p;1 D / 2) , dependent on the “degrees of freedom” p = n 1 . Table D18 as a flow chart summarizes various steps in computing a confidence interval for the variance V 2 . D 62 reviews The Uncertainty Principle which is built on (i) the coefficient of complementary confidence D , also called uncertainty number, (ii) the number of observations n and (iii) the length 'V 2 (n;D ) of the confidence interval for the “true” variance V 2 . D61
The confidence interval for the variance
Lemma D15 summaries the construction of a two-sided confidence interval for the “true” variance based upon Helmert’s Chi Square distribution of the random variable (n-1)Vˆ 2 / V 2 where Vˆ 2 is BIQUUE of V 2 . Example D18 introduces a random sample of size n = 100 with an empirical variance Vˆ 2 = 20.6 . Based upon coefficients of confidence and complementary confidence given in Table D14, related confidence intervals are computed. The associated quantiles for Helmert’s Chi Square distribution are tabulated in Table D15 (J = 0.95) , Table D16 (J = 0.99) and Table D17 (J = 0.998). Finally, Table D18 as a flow chart pares the way for the “fast computation” of the confidence interval for the “true” variance V 2 . Lemma D15 (confidence interval for the variance): The random variable x =(n-1)Vˆ 2 / V 2 , also called Helmert’s F 2 , characterized by the ration of the sample variance Vˆ 2 BIQUUE of V 2 , and the “true” variance V 2 , has the FM2 pdf of p = n 1 degrees of freedom, if the random observations yi , i {1," n} are Gauss-Laplace i.i.d. The “true” variance V 2 is an element of the two-sided confidence interval (n 1)Vˆ 2 (n 1)Vˆ 2 V2 ] , [ c2 ( p;1 D / 2) c1 ( p; D / 2) with confidence (n 1)Vˆ 2 (n 1)Vˆ 2 P{
614
Appendix D: Sampling distributions and their use
of level 1 D . Tables D5, D16 and D14 list the quantiles c1 ( p;1 D / 2) and c2 ( p;D / 2) associated to three values of Table D17 of the coefficient of complementary confidence 1 D . In order to make yourself more familiar with Helmert’s Chi Square distribution we recommend to solve the problems of Exercise D1. Exercise D1 (Helmert’s Chi Square FM2 distribution): Helmert’s random variable x := (n-1)Vˆ 2 / V 2 = pVˆ 2 / V 2 has the non-symmetric FM2 pdf. Prove that the first four central moments are, (i ) S 1 = 0, E{x} = P = p (ii ) S 2 = V x2 = 2 p (iii ) S 3 = (2 p )3 / 2 (8 / p)1/ 4 (coefficient of skewness S 32 / S 22 = 8 / p ) (iv) S 4 = 6V 4 (1 + 2 p) (coefficient of curtosis S 4 / S 22 3 = 3 + 12 / p ). Guide p º E{Vˆ 2 } » E{x} = p (i) V2 » E{Vˆ 2 } = V 2 ("unbiasedness") »¼ E{x} =
2 2 º V4 = V4» n 1 p » D{x} = 2 p. (ii) 2 » p D{x} = 4 D{Vˆ 2 } » V ¼ D{Vˆ 2 } =
Example D18 (confidence interval for the sample variance Vˆ 2 ): Suppose that a random sample ( y1, " yn ) Y of size n = 100 has led to an empirical variance Vˆ 2 = 20.6 . x =(n-1)Vˆ 2 / V 2 or 99 20.6 / V 2 = 2039.4 / V 2 has Helmert’s pdf with p = n 1 = 99 degrees of freedom. The probability J = 1 D that x will be between c1 ( p;D / 2) and c2 ( p;1 D / 2) is c2
P{c1 ( p;D / 2) < x < c 2 ( p;1 D / 2)} =
³ f ( x)dx = J = 1 D
c1
615
D6 A fourth confidence interval for the variance c2
P{c1 ( p;D / 2) < x < c 2 ( p;1 D / 2)} =
c1
³ f ( x)dx ³ f ( x)dx = J = 1 D 0
c2
0
c1
³ f ( x)dx = F(c ) = 1 D / 2 = (1 + J ) / 2, ³ f ( x)dx = F(c ) = D / 2 = (1 J ) / 2 2
1
0
0
P{c1 ( p; D / 2) < x < c2 ( p;1 D / 2)} = F(c2 ) F(c1 ) = (1 + J ) / 2 (1 J ) / 2 = J c1 ( p; D / 2) < x < c2 ( p;1 D / 2) c1 ( p; D / 2) < (n 1)
Vˆ 2 < c2 ( p;1 D / 2) V2
or
V2 1 1 (n 1)Vˆ 2 (n 1)Vˆ 2 2 V < < < < c2 (n 1)Vˆ 2 c1 c2 c1 P{
(n 1)Vˆ 2 (n 1)Vˆ 2
Since Helmert’s pdf f ( x; p ) is now-symmetric there arises the question how to distribute the confidence J or the complementary confidence D = J 1 on the confidence interval limits c1 and c2 , respectively. If we setup F(c1 ) = D / 2 , we define a cumulative probability half of the complementary confidence. If F(c2 ) F(c1 ) = P{c1 ( p;D / 2) < x < c2 ( p;1 D / 2)} = 1 D = J is the cumulative probability contained in the interval c1 < x < c2 , we derive F(c2 ) = 1 D / 2 . Accordingly c1 ( p;D / 2) < x < c2 ( p;1 D / 2) is the confidence interval based upon the quantile c1 with cumulative probability D / 2 and the quantile c2 with cumulative probability 1 D / 2 . The four representations of the cumulative probability of the confidence interval c1 < x < c2 establish two linear Volterra integral equations of the first kind c1
³ 0
c2
f ( x)dx = D / 2
and
³ f ( x)dx = 1 D / 2 , 0
dependent on the degree of freedom p = n 1 of Helmert’s pdf f ( x, p ) . As soon as we have established the confidence interval c1 ( p;D / 2) < x < c2 ( p;1 D / 2) for Helmert’s random variable x = (n -1)Vˆ 2 / V 2 = pVˆ 2 / V 2 , we are left with the problem of how to generate a confidence interval for the “true” variance V 2 , the sample variance Vˆ 2 given. If we take the reciprocal interval c21 < x 1 < c11 for Helmert’s inverse random variable 1/ x = V 2 /[(n 1)Vˆ 2 ] , we are able to multiply both sides by (n 1)Vˆ 2 . In summary, a confidence interval which corresponds to c1 < x < c2 is (n 1)Vˆ 2 / c2 < V 2 < (n 1)Vˆ 2 / c1 . Three values of the coefficient of confidence J or of complementary confidence D = 1 J which are most popular we list in Table 14.
616
Appendix D: Sampling distributions and their use
0.5
f(x)
0
c1 ( p;D / 2)
c2 ( p;1 D / 2)
x := (n-1)Vˆ 2 / V 2 = pVˆ 2 / V Table 14: y = f ( x ), x := ( n 1)Vˆ 2 / V 2 = pVˆ 2 / V 2 Figure D12: Two-sided confidence interval V 2 ] pV 2 / c2 , pV 2 / c1 [ Helmert’s pdf, f(x) F(c2 ) = 1 D / 2, F(c1 ) = D / 2 , F(c2 ) F(c1 ) = 1 D = J (n 1)Vˆ 2 c2 ( p;1 D / 2) Figure D13:
V2
(n 1)Vˆ 2 c1 ( p;D / 2)
Two-sided confidence interval for the “true” variance V 2 , quantiles c1 ( p;D / 2) and c2 ( p;1 D / 2) Table D14 Values of the coefficient of confidence 0.950 0.990 0.998
J D
0.050
0.010
0.002
D /2
0.025
0.005
0.001
1- D / 2
0.975
0.995
0.999
In solving the linear Volterra integral equations of the first kind c1
³ 0
c2
f ( x)dx = F(c1 ) = D / 2
and
³ f ( x)dx = F(c ) = 1 D / 2 , 2
0
which depend on the degrees of freedom p = n 1 , Tables D15-D17 collect the quantiles c1 ( p;D / 2) and c2 ( p;1 D / 2) for given values p and D .
617
D6 A fourth confidence interval for the variance
Table D15 Quantiles c1 ( p;D / 2) , c2 ( p;1 D / 2) for the confidence interval of the Helmert random variable with p = n 1 degrees of freedom D / 2 = 0.025,1 D / 2 = 0.975,D = 0.05, J = 0.95 p
n
c1 ( p;D / 2)
c2 ( p;1 D / 2)
p
n
c1 ( p;D / 2)
c2 ( p;1 D / 2)
1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8 9 10
0.000 0.506 0.216 0.484 0.831 1.24 1.69 2.18 2.70
5.02 7.38 9.35 11.1 12.8 14.4 16.0 17.5 19.0
14 19 24 29 39 49 99
15 20 25 30 40 50 100
5.63 8.91 12.4 16.0 23.7 31.6 73.4
26.1 32.9 39.4 45.7 58.1 70.2 128
Table D16 Quantiles c1 ( p;D / 2) , c2 ( p;1 D / 2) for the confidence interval of the Helmert random variable with p = n 1 degrees of freedom D / 2 = 0.005, 1 D / 2 = 0.995, D = 0.01, J = 0.99 p
n
c1 ( p;D / 2)
c2 ( p;1 D / 2)
p
n
c1 ( p;D / 2)
c2 ( p;1 D / 2)
1 2 3 4
2 3 4 5
0.000 0.010 0.072 0.207
7.88
8 9
9 10
1.34 1.73
22.0 23.6
5 6 7
6 7 8
0.412 0.676 0.989
14 19 24 29
15 20 25 30
4.07 6.84 9.89 13.1
31.3 38.6 45.6 52.3
39 49 99
40 50 100
20.0 27.2 66.5
65.5 78.2 139
Table D17 Quantiles c1 ( p;D / 2) , c2 ( p;1 D / 2) for the confidence interval of the Helmert random variable for p = n 1 degrees of freedom D / 2 = 0.001, 1 D / 2 = 0.999, D = 0.002, J = 0.998 p
n
c1 ( p;D / 2)
c2 ( p;1 D / 2)
p
n
c1 ( p;D / 2)
c2 ( p;1 D / 2)
1
2
0.00
10.83
9
10
1.15
27.88
618 2 3 4 5 6 7 8
Appendix D: Sampling distributions and their use
3 4 5 6 7 8 9
0.00 0.02 0.09 0.21 0.38 0.60 0.86
13.82 16.27 18.47 20.52 22.46 24.32 26.13
14 19 24 29 50 70 99 100
15 20 25 30 51 71 100 101
3.04 5.41 8.1 11.0 24.7 39.0 60.3 61.9
36.12 43.82 51.2 58.3 86.7 112.3 147.3 149.4
Those data collected in Table D15, Table D16 and Table D17 lead to the triplet of confidence intervals for the “true” variance case (i ) : J = 0.95, D = 0.05, D / 2 = 0.025, 1 D / 2 = 0.975 p = 99, n = 100, c1 ( p;D / 2) = 73.4, c2 ( p;1 D / 2) = 128 P{
99 20.6 99 20.6
case (ii ) : J = 0.99, D = 0.01, D / 2 = 0.005, 1 D / 2 = 0.995 p = 99, n = 100, c1 ( p;D / 2) = 66.5, c2 ( p;1 D / 2) = 139 P{
99 20.6 99 20.6
case (iii ) : J = 0.998, D = 0.002,D / 2 = 0.001,1 D / 2 = 0.999 p = 99, n = 100, c1 ( p;D / 2) = 60.3, c2 ( p;1 D / 2) = 147.3 P{
99 20.6 99 20.6
The results can be summarized as follows. With probability 95%, the “true” variance V 2 is an element of the interval ]15.9,27.8[. In contrast, with probability 99%, the “true” variance is an element of the larger interval ]14.7,30.7[. Finally, with probability 99.8% the “true” variance is an element of the largest interval ]13.8, 33.8[. If we compare the confidence intervals for the variance V 2 , we realize much larger intervals for smaller complementary confidence namely 5%, 1% and 0.2%. Such a result is subject of The Uncertainty Principle.
D6 A fourth confidence interval for the variance
619
Flow chart:
Table D18:
Confidence interval for the variance V 2 Choose a coefficient of confidence J according to Table D14. Solve the linear Volterra integral equation of the first kind F(c1 ( p;D / 2)) = D / 2, F(c2 ( p;1 D / 2)) = 1 D / 2 by Table 15, Table 16 or Table 17 for a Helmert pdf with p = n 1 degrees of freedom. Compute the sample variance Vˆ 2 and the quantiles
V2 ] D62
(n 1)Vˆ 2 (n 1)Vˆ 2 , [. c2 ( p;1 D / 2) c1 ( p; D / 2)
The Uncertainty Principle
Figure D14:
Length of the confidence interval for the variance against the number of observations.
Figure D14 is the graph of the function 'V 2 (n;D ) := (n 1)Vˆ 2 (
1 1 ), c1 (n 1;D / 2) c2 (n 1;1 D / 2)
where 'V 2 is the length of the confidence interval of the “true” variance V 2 . The independent variable of the functions is the number of observations n. The function 'V 2 (n;D ) is plotted for fixed values of the coefficient of complementary confidence D , namely D = 5% , 1%, 0.2%. For reasons given later on the coefficient of complementary confidence D is called uncertainty number. The graph of the function 'V 2 (n;D ) illustrates two important facts.
620
Appendix D: Sampling distributions and their use
For a contrast uncertainty number D , the length of the confidence interval 'V 2 is smaller, the larger number of observations n is chosen. For a contrast number of observations n, the smaller the number of uncertainty D is chosen, the larger is the confidence interval 'V 2 .
Fact #1:
Fact #2:
Evidently, the divisive influences of (i) the length of the confidence interval 'V 2 , (ii) the uncertainty number D and (iii) the number of observations n, which we collect in The Magic Triangle of Figure D15, constitute The Uncertainty Principle, formulated by the inequality 'V 2 (n 1) t kV D , 2
where kV D is called quantum number for the variance V 2 . The quantum number depends on the uncertainty number D . Let us interpret the uncertainty relation 'V 2 (n 1) t kV D . The product 'V 2 (n 1) defines geometrically a hyperbola which we approximated to the graph of Figure D14. Given the uncertainty number D , the product 'V 2 (n 1) has a smallest number denoted by kV D . For instance, choose D = 1% such that kV D /(n 1) d 'V 2 or 42 /(n 1) d 'V 2 . For the number of observations n, for instance n = 2, 11, 101, we find the inequalities 42 d 'V 2 , 4.2 d 'V 2 , 0.42 d 'V 2 . length of the confidence interval 2
2
2
2
uncertainty number D number of observations n Figure D15: The Magic Triangle, constituents: (i) uncertainty number D , (ii) number of observations n, (iii) length of the confidence interval 'V 2 . Table D19 Coefficient of complementary confidence D , uncertainty number D versus quantum number of the variance kV D (E. Grafarend 1970) 2
D
kV D
10%
19.5
5%
25.9
1%
42.0
o0
of
2
D7 Sampling from the multidimensional Gauss-Laplace normal distribution
621
At the end, pay attention to the quantum numbers kV D listed in Table D19. 2
D7
Sampling from the multidimensional Gauss-Laplace normal distribution: the confidence region for the fixed parameters in the linear Gauss-Markov model
Example D19
(special linear Gauss-Markov model, marginal probability distributions)
For a simple linear Gauss-Markov model E{y} = Aȟ, A \ n × m , rk A = m , namely n=3, m=2, D{y} = IV 2 of Gauss-Laplace i.i.d. observations, we are going to compute •
the sample pdf of ȟˆ BLUUE of ȟ ,
•
the sample pdf of Vˆ 2 BIQUUE of V 2 .
We follow the action within seven items. First, we identify the pdf of GaussLaplace i.i.d. observations y \ n . Second, we review the estimations of ȟˆ BLUUE of ȟ as well as Vˆ 2 BIQUUE of V 2 . Third, we decompose the Euclidean norm of the observation space || y E{y} ||2 into the Euclidean norms || y Aȟˆ ||2 and || ȟˆ ȟ ||2 . Fourth, we present the eigenspace A cA analysis and the eigenspace synthesis of the associated matrices M := I n A( A cA) 1 A c and N := A cA within the Eulidean norms || y Aȟˆ ||2 = y cMy and || ȟˆ ȟ ||2AcA = = (ȟˆ ȟ )cN (ȟˆ ȟ ) , respectively. The eigenspace representation leads us to canonical random variables ( z1 ," , zn m ) relating to the norm || y Aȟˆ ||2 = = y cMy and ( zn m+1 ," , zn ) relating to the norm || ȟˆ ȟ ||2N which are standard Gauss-Laplace normal. Fifth, we derive the cumulative probability of Helmert’s random variable x := (n rk A)Vˆ 2 / V 2 = ( n m)Vˆ 2 / V 2 and of the unknown parameter vector ȟˆ BLUUE of ȟ or its canonical counterpart Șˆ BLUUE of Ș , multivariate Gauss-Laplace normal. Action six generates the marginal pdf of ȟˆ or Șˆ , both BLUUE of ȟ or Ș , respectively. Finally, action seven leads us to Helmert’s Chi Square distribution F p2 with p = n rk A = n m (here: n–m = 1) “degrees of freedom”. The first action item Let us assume an experiment of three Gauss-Laplace i.i.d. observations [ y1 , y2 , y3 ]c = y which constitute the coordinates of the observation space Y, dim Y = 3. The observations yi , i {1, 2,3} are related to a parameter space ; with coordinates [[1, [ 2 ] = ȟ in the sense to generate a straight line y{k} = [1 + [ 2 k . The fixed effects [ j , j {1, 2} , define geometrically a straight line, statistically a special linear Gauss-Markov model of one variance component, namely the first moment
622
Appendix D: Sampling distributions and their use
ª y1 º ½ ª1 k1 º ª1 0 º ª [1 º « ª[ º °« » ° « » E{y} = Aȟ E ® « y2 » ¾ = «1 k2 » « » = «1 1 »» « 1 » , rk A = 2 . ° « y » ° «1 k » ¬[ 2 ¼ «1 2 » ¬[ 2 ¼ 3¼ ¬ ¼ ¯¬ 3 ¼ ¿ ¬ The central second moment ª y1 º ½ ª1 0 0 º ° ° D{y} = I nV D{y} = D ® «« y2 »» ¾ = ««0 1 0 »» V 2 , V 2 < 0 . ° « y » ° «0 0 1 » ¼ ¯¬ 3 ¼ ¿ ¬ 2
k represents the abscissa as a fixed random, y the ordinate as the observation, naturally a random effect. Samples of the straight line are taken at k1 = 0, k2 = 1, k3 = 2 , this calling for y (k1 ) = y (0) = y1 , y (k2 ) = y (1) = y2 , y (k3 ) = = y (2) = y3 , respectively. E{y} is a consistent equation. Alternatively, we may say E{y} R ( A ) . The matrix A \ 3× 2 is rank deficient by p = n rk A = 1, also called “degree of freedom”. The dispersion matrix D{y} , the central moment of second order, is represented as a linear model, too, namely by onevariance component V 2 . The joint probability function of the three GaussLaplace i.i.d. observations dF = f ( y1 , y2 , y3 )dy1dy2 dy3 = f ( y1 ) f ( y2 ) f ( y3 )dy1dy2 dy3 § 1 · f ( y1 , y2 , y3 ) = (2S ) 3/ 2 V 3 exp ¨ 2 (y E{y})c( y E{y}) ¸ 2 V © ¹ will be transformed by means of the special linear Gauss-Markov model with one-variance component. The second action item For such a transformation, we need ȟˆ BLUUE of ȟ and Vˆ 2 BIQUUE V 2 . ȟˆ = ( AcA) 1 Acy ,
ȟˆ BLUUE of ȟ :
ª 3 3º 1 ª 5 3º 1 A cA = « » , ( A cA) = 6 « 3 3 » , 3 5 ¬ ¼ ¬ ¼ ª y1 º ˆȟ = 1 ª 5 2 1º « y » , 2 6 «¬ 3 0 3 »¼ « » «¬ y3 »¼
D7 Sampling from the multidimensional Gauss-Laplace normal distribution
Vˆ 2 = Vˆ 2 BIQUUE V 2 :
=
623
1 (y Aȟˆ )c(y Aȟˆ ) = n rk A
1 y c(I n A ( A cA ) 1 A c)y , n rk A
1 Vˆ 2 = ( y12 4 y1 y2 + 2 y1 y3 + 4 y22 4 y2 y3 + y32 ) . 6 The third action item The quadratic form (y E{y})c(y E{y}) allows the fundamental decomposition y E{y} = y Aȟ = y Aȟˆ + A(ȟˆ ȟ ) (y E{y})c(y E{y}) = ( y Aȟ)c( y Aȟ) = ( y Aȟˆ )c( y Aȟˆ ) + (ȟˆ ȟ)cAcA(ȟˆ ȟ) (y E{y})c(y E{y}) = (n rk A)ıˆ 2 + (ȟˆ ȟ )cA cA(ȟˆ ȟ ) ª3 3º ˆ (y E{y})c(y E{y}) = ıˆ 2 + (ȟˆ ȟ )c « » (ȟ ȟ ). ¬3 5 ¼ The fourth action item In order to bring the quadratic form (y E{y})c(y E{y}) = ( n rk A)Vˆ 2 + +(ȟˆ ȟ )cAcA(ȟˆ ȟ ) into a canonical form, we introduce the generalised forward and backward Helmert transformation HH c = I n z = V H (y E{y}) = V 1 H (y Aȟ ) 1
and y E{y} = V H cz (y E{y})c(y E{y}) = V 2 z cHH cz = V 2 z cz 1 (y E{y})c(y E{y}) = z cz = z12 + z22 + z32 . 2 V ?How to relate the sample variance Vˆ 2 and the sample quadratic form (ȟˆ ȟ )cAcA(ȟˆ ȟ ) to the canonical quadratic form z cz ? Previously, for the example of direct observations in the special linear GaussMarkov model E{y} = 1P , D{y} = I nV 2 we succeeded to relate z12 + " + zn21 to Vˆ 2 and zn2 to ( Pˆ P ) 2 . Here the sample variance Vˆ 2 , BIQUUE V 2 , as well as
624
Appendix D: Sampling distributions and their use
the quadratic form of the deviate of the sample parameter vector ȟˆ from the “true” parameter vector ȟ have been represented by
Vˆ 2 =
1 1 (y Aȟˆ )c(y Aȟˆ ) = y c[I n A( A cA) 1 A c]y n rk A n rk A rk[I n A( A cA) 1 A c] = n rk A = n m = 1 versus (ȟˆ ȟ )cA cA(ȟˆ ȟ ), rk( A cA) = rk A = m .
The eigenspace of the matrices M and N, namely M := I n A ( A cA ) 1 A =
N := A cA =
and
ª 1 2 1 º 1« = « 2 4 2 »» 6 ¬« 1 2 1 »¼
ª3 3 º =« », ¬3 5 ¼
will be analyzed. The eigenspace analysis of the
The eigenspace analysis of the
matrix M
matrices N, N-1
j {1," , n} V cMV = ª Vcº = « 1 » M [ V1 , V2 ] = ¬ V2c ¼
i {1," , m} U cNU = Diag (J 1 ," , J m ) = ȁ N U cN 1U = Diag (O1 ," , Om ) = ȁ N
= Diag ( P1 ," , P n m , 0," , 0) = ȁ M
J1 =
1 1 ," , J m = O1 Om
O1 =
1 1 ," , Om = J1 Jm
1
rk M = n m
rk N = rk N 1 = m
Orthonormality of the eigencolumns
Orthonormality of the eigencolumns
V1cV1 = I n m , V2cV2 = I m V1cV2 = 0 \ ( n m )× m ª V1c º « V c » [ V1 ¬ 2¼
ªI V2 ] = « n m ¬ 0
0º I m »¼
U cU = I m
625
D7 Sampling from the multidimensional Gauss-Laplace normal distribution
v1cv1 = 1 v1cv2 = 0 ... vnc 1vn = 0 vnc vn = 1
u1cu1 = 1 u1cu2 = 0 ... umc 1um = 0 umc um = 1
V SO(n) :eigencolumns: (M P j I n ) v j = 0 :eigenvalues: | M P j I n |= 0
U SO(m) :eigencolumns: (N J j I n )u j = 0 :eigenvalues: | N J i I m |= 0 in particular
eigenspace analysis of the matrix M, rkM=n-m, M \ 3×3 , A \ 3× 2 , rk M = 1
eigenspace analysis of the matrix N, rkN=m, N \ 2× 2 , rk N = 2
ȁ M = Diag (1, 0, 0)
ȁ N = Diag (0.8377, 7.1623)
ª 0.4082 0.7024 0.5830 º V = [ V1 , V2 ] = «« 0.8165 0.5667 0.1109»» «¬ 0.4082 0.4307 0.8049»¼ V1 \ 3×1 V2 \ 3×2
ª 0.8112 0.5847 º U=« » ¬ 0.5847 0.8112 ¼ U \ 2×2
to be completed by eigenspace synthesis of the matrix M M = Vȁ M V c
eigenspace synthesis of the matrix N, N-1 N = Uȁ N Uc, N 1 = Uȁ N Uc 1
ª Vcº M = [V1 , V2 ]ȁ M « 1 » = ¬ V2c ¼ ª Vcº = [V1 , V2 ]Diag ( P1 ," , P n m , 0," , 0) « 1 » ¬ V2c ¼ M = V1 Diag ( P1 ," , P n m )V1c
N = UDiag (J 1 ," , J m )U c N 1 = UDiag (O1 ," , Om )U c
in particular M= ª 0.4082 º « 0.8165» P 0.4082 0.8165 0.4082 ] « » 1[ «¬ 0.4082 »¼
P1 = 1
versus
N= ª 0.8112 0.5847 º ªJ 1 0 º « 0.5847 0.8112 » « 0 J » ¬ ¼¬ 2¼ ª 0.8112 0.5847 º «0.5847 0.8112 » ¬ ¼
J 1 = 0.8377 , J 2 = 7.1623
626
Appendix D: Sampling distributions and their use
N 1 =
P1 = 1
ª 0.8112 0.5847 º ªO1 0 º ª 0.8112 0.5847 º =« »« »« » ¬ 0.5847 0.8112 ¼ ¬ 0 O2 ¼ ¬ 0.5847 0.8112 ¼ 1 1 versus O1 = J 1 = 1.1937, O2 = J 2 = 0.1396.
The non-vanishing eigenvalues of the matrix M have been denoted by ( P1 ," , P n m ) , m eigenvalues are zero such that eigen (M ) = ( P1 ," , P n m , , 0," , 0) . The eigenvalues of the regular matrix N span eigen (N) = (J 1 ," , J m ) . Since the dispersion matrix D{ȟˆ} = ( AcA) 1V 2 = N 1V 2 is generated by the inverse of the matrix A cA = N , we have computed, in addition, the eigenvalues of the matrix N 1 by means of eigen(N 1 ) = (O1 ," , Om ) . The eigenvalues of N and N 1 , respectively, are related by
J 1 = O11 ," , J m = Om1
O1 = J 11 ," , Om = J m1 .
or
In the example, the matrix M had only one non-vanishing eigenvalue P1 = 1 . In contrast, the regular matrix N was characterized by two eigenvalues J 1 = 0.8377 and J 2 = 7.1623 , its inverse matrix N 1 by O1 = 1.1937 and O2 = 0.1396 . The two quadratic forms, namely y cMy = y cVȁ M V cy
versus (ȟˆ ȟ )cN (ȟˆ ȟ ) = (ȟˆ ȟ )cUȁ M U c(ȟˆ ȟ ) ,
build up the original quadratic form 1 (y E{y})c(y E{y}) = V2 1 1 = 2 y cMy + 2 (ȟˆ ȟ )cN (ȟˆ ȟ ) = V V 1 1 = 2 y cVȁ M V cy + 2 (ȟˆ ȟ )cUȁ N U c(ȟˆ ȟ ) V V in terms of the canonical random variables V cy = y y = Vy
Uc(ȟˆ ȟ ) = Șˆ Ș
and such that
1 1 1 ( y E{y})c( y E{y}) = 2 ( y )cȁ M y + 2 ( Șˆ Ș) ȁ N ( Șˆ Ș) 2 V V V 1 1 (y E{y})c(y E{y}) = 2 2 V V
nm
¦ (y j =1
) Pj +
2 j
1 V2
m
¦ ( Șˆ
i
i =1
1 (y E{y})c( y E{y}) = z12 + " + zn2 m + z 2 V2
n m +1
Și ) 2 J i + " + zn2 .
627
D7 Sampling from the multidimensional Gauss-Laplace normal distribution
The quadratic form z cz splits up into two terms, namely z12 + " + zn2 m = =
1 V2
nm
2 j
¦ (y ) j =1
zn2 m +1 + " + zn2 = and
Pj
=
1 V2
m
¦ (Kˆ K ) J 2
i =1
i
i
j
,
here z12 =
1 2 Vˆ 2 y1* = 2 V2 V and
z22 + z32 =
z12 =
1 [(Kˆ1 K1 ) 2 J 1 + (Kˆ2 K2 ) 2 J 2 ] , V2 or
1 ( y12 4 y1 y2 + 2 y1 y3 + 4 y22 4 y2 y3 + y32 ) 6V 2 and
z22 + z32 =
ª3 3 º ˆ 1 1 [0.8377(Kˆ1 K1 )2 + 7.1623(Kˆ2 K 2 ) 2 ] = 2 (ȟˆ ȟ )c « » (ȟ ȟ ). 2 V V ¬3 5 ¼ The fifth action item
We are left with the problem to transform the cumulative probability dF = = f ( y1 , y2 , y3 )dy1dy2 dy3 into the canonical form dF = f ( z1 , z2 , z3 )dz1dz2 dz3 . Here we take advantage of Corollary D3. First, we introduce Helmert’s random variable x := z12 and the random variables [ˆ1 and [ˆ2 of the unknown parameter vector ȟˆ of fixed effects (ȟˆ ȟ )cA cA(ȟˆ ȟ ) = z22 + z32 = || z ||2AcA if we denote z := [ z2 , z3 ]c . dz1dz2 = det A cA d [ˆ1d [ˆ2 = 6 d [ˆ1d [ˆ2 , according to Corollary D3 is special represention of the surface element by means of the matrix of the metric A cA . In summary, the volume element dz1dz2 dz3 =
dx x
det A cA d[ˆ1d[ˆ2
leads to the first canonical representation of the cumulative probability dF =
1 dx | A cA |1/ 2 1 exp( x) exp[ 2 (ȟˆ ȟ )cA cA(ȟˆ ȟ )]d [ˆ1d [ˆ2 . 2 2 2V 2S x 2SV 1
The left pdf establishes Helmert’s pdf of x = z12 = Vˆ 2 / V 2 , dx = V 2 dVˆ 2 . In contrast, the right pdf characterizes the bivariate Gauss-Laplace pdf of || ȟˆ ȟ ||2 .
628
Appendix D: Sampling distributions and their use
Unfortunately, the bivariate Gauss-Laplace A cA normal pdf is not given in the canonical form. Therefore, second we do correlate || ȟˆ ȟ ||2AcA by means of eigenspace synthesis. A cA = Uȁ AcA Uc = UDiag (J 1 , J 2 )Uc = UDiag ( || ȟˆ ȟ ||2AcA := (ȟˆ ȟ )cA cA(ȟˆ ȟ ) = (ȟˆ ȟ )cUDiag ( | A cA |1/ 2 = J 1J 2 =
1 1 , )Uc O1 O2
1 1 , )U c(ȟˆ ȟ ) O1 O2
1
O1O2
Șˆ Ș := U c(ȟˆ ȟ ) ȟˆ ȟ = U ( Șˆ Ș) || ȟˆ ȟ ||2AcA = ( Șˆ Ș)c Diag (
|| ȟˆ ȟ ||
2 A cA
1 1 , )( Șˆ Ș) =|| Șˆ Ș ||2D O1 O2
ª 1 « 1.1937 3 3 ª º = (ȟˆ ȟ ) c « (ȟˆ ȟ ) = ( Șˆ Ș) c « » « 0 ¬3 5 ¼ «¬
º » » ( Șˆ Ș) =|| Șˆ Ș ||D2 . 1 » 0.1396 »¼ 0
By means of the canonical variables Șˆ = Ucȟˆ we derive the cumulative probability dF =
1
V 2S
1 1 1 (Kˆ1 K1 )2 (Kˆ2 K2 )2 exp( x)dx exp[ ( + )]dKˆ1dKˆ2 2 O1 O2 2V 2 x 2SV 2 O1O2
1
or dF = f ( x) f (Kˆ1 ) f (Kˆ2 ) dxdKˆ1dKˆ2 .
Third, we prepare ourselves for the cumulative probability dF = = f ( z1 , z2 , z3 )dz1dz2 dz3 . We depart from the representation of the volume element dz1dz2 dz3 =
dx 2 x
det A cA d[ˆ1d[ˆ2
subject to x = z12 =
1 2 1 Vˆ = 2 y cMy 2 V V
ȟˆ = ( A cA) 1 A cy .
D7 Sampling from the multidimensional Gauss-Laplace normal distribution
629
The Helmert random variable is a quadratic form of the coordinates ( y1 , y2 , y3 ) of the observation vector y. In contrast, ȟˆ BLUUE of ȟ is a linear form of the coordinates ( y1 , y2 , y3 ) of observation vector y. The transformation of the volume element dxd [ˆ1d [ˆ2 = | J x | dy1dy2 dy3 is based upon the Jacobi matrix J x ( y1 , y2 , y3 ) ª D1 x « ˆ J x = « D1[1 « ˆ ¬ D1[ 2
D2 x D [ˆ
2 1
D2[ˆ2
D3 x º ac » ª º 1× 3 D3[ˆ1 » = « 1 c c» » ¬( A A) A ¼ 2 × 3 ˆ D3[ 2 ¼
ª D1 x º ª y1 2 y2 + y3 º 2 1 « « » a = « D2 x » = 2 My = 2 y1 + 4 y2 2 y3 »» V 3V 2 « «¬ D3 x »¼ «¬ y1 2 y2 + y3 »¼ x=
1 1 y cMy = ( y12 4 y1 y2 + 2 y1 y3 + 4 y22 4 y2 y3 + y32 ) 2 V 6V 2 ( A cA) 1 A c =
ª 2 y1 4 y2 + 2 y3 1 « Jx = 5 6V 2 « «¬ 3
1 ª 5 2 1º 6 «¬ 3 0 3 »¼
4 y1 + 8 y2 4 y3 2 0
2 y1 4 y2 + 2 y3 º » 1 » »¼ 3
det J x = det( A cA) 1 det[aca acA( AcA) 1 Aca] det J x =
det[aca acA ( A cA ) 1 A ca] det( A cA)
4 4 y cM cMy = 4 y cMy 4 V V 4 acA( A cA) 1 Aca = 4 y cM cA( AcA) 1 AcMy = V aca =
=
4 y c[I 3 A ( A cA ) 1 A c]A( A cA) 1 Ac[I 3 A( A cA) 1 Ac]y = 0 V4 det J x =
2 y cMy
V
2
det( A cA)
630
Appendix D: Sampling distributions and their use
| det J y |=| det J x |1 =
V2 2
det( A cA) y cMy
.
The various representations of the Jacobian will finally lead us to the special form of the volume element 1 1 y cMy dx d[ˆ1 d[ˆ2 = 2 dy1 dy2 dy3 2 V | AcA |1/ 2 and the cumulative probability 1 1 dF = 3 exp[ ( z12 + z22 + z32 )]dz1dz2 dz3 = 3/ 2 2 V (2S ) 1 dx | A cA |1/ 2 1 exp( x) exp[ 2 (ȟˆ ȟ )cA cA(ȟˆ ȟ )]d[ˆ1d[ˆ2 = 2 2 2V V 2S x 2SV 1 1 exp[ (y E{y})c(y E{y})]dy1dy2 dy3 . = 3 2 V (2S )3/ 2 1
=
The sixth action item The first target is to generate the marginal pdf of the unknown parameter vector ȟˆ , BLUUE of ȟ . f
dF1 = ³ 0
f
dF1 = ³ 0
1 dx | A cA |1/ 2 § 1 · exp( x) exp ¨ 2 (ȟˆ ȟ )cAcA(ȟˆ ȟ ) ¸ d [ˆ1 d [ˆ2 2 2 2S x 2SV © 2V ¹ 1
§ 1 1 dx 1 exp( x) exp ¨ 2 2 2 2S x 2SV O1O2 © 2V 1
2
¦ i =1
(Kˆi Ki ) 2 Oi
Let us substitute the standard integral f
³ 0
1 dx exp( x) =1, 2 2S x 1
in order to have derived the marginal probability dF1 = f1 (ȟˆ | ȟ, ( A cA) 1V 2 ) d[ˆ1d[ˆ2 1/ 2
| A cA | § 1 · f1 (ȟˆ ) := exp ¨ 2 (ȟˆ ȟ )cA cA(ȟˆ ȟ ) ¸ 2 2SV © 2V ¹ dF1 = f1 ( Șˆ | Ș, ȁ N V 2 )dKˆ1dKˆ2 1
f1 ( Șˆ ) =
1 2SV 2
§ 1 2 (Kˆ Ki ) 2 · exp ¨ ¦ i ¸. Oi O1O2 © 2 i =1 ¹ 1
· ¸ dKˆ1 dKˆ2 ¹
D7 Sampling from the multidimensional Gauss-Laplace normal distribution
631
The seventh action item The second target is to generate the marginal pdf of Helmert’s random variable x := Vˆ 2 / V 2 , Vˆ 2 BIQUUE V 2 , namely Helmert’s Chi Square pdf F p2 with p=nrkA (here p=1) “degree of freedom”. dF2 =
1 dx exp( x) 2 2S x 1
+f +f
| A cA |1/ 2 § 1 · 2 ³f f³ 2SV 2 exp ¨© 2V 2 || ȟˆ ȟ ||A cA ¸¹ d[ˆ1d[ˆ2 .
Let us substitute the integral +f +f
| A cA |1/ 2 § 1 · 2 ³f f³ 2SV 2 exp ¨© 2V 2 || ȟˆ ȟ ||A cA ¸¹ d[ˆ1d[ˆ2 = 1 in order to have derived the marginal distribution dF2 = f 2 ( x)dx, 0 d x d f f 2 ( x) =
p2 1 1 2 x exp( x) , p/2 2 2 *( p / 2)
subject to 1 p = n rk A = n m , here: p = 1 , *( ) = S 2 f 2 ( x) =
1 2S
1
1 exp( x) . 2 x
The results of the example will be generalized in Lemma D. Theorem D16
(marginal probability distributions, special linear GaussMarkov model):
E{y} = Aȟ
ª A \ n×m , rk A = m, E{y} R ( A ) subject to « 2 + D{y} = I nV 2 ¬V \ defines a special linear Gauss-Markov model of fixed effects ȟ \ m and V 2 \ + based upon Gauss-Laplace i.i.d. observations y := [ y1 ," , yn ]c . ª E{ȟˆ} = ȟ ȟˆ = ( A cA) 1 A cy subject to « «¬ D{ȟˆ} = ( A cA) 1V 2 and
632
Appendix D: Sampling distributions and their use
ª E{Vˆ 2 } = Vˆ 2 1 4 Vˆ = (y Aȟˆ )c(y Aȟˆ ) subject to « « D{Vˆ 2 } = 2V n rk A «¬ n rk A 2
identify ȟˆ BLUUE ȟ and Vˆ 2 BIQUUE of V 2 . The cumulative pdf of the multidimensional Gauss-Laplace probability distribution of the observation vector y = [ y1 ," , yn ]c Y f (y | E{y}, D{y} = I nV 2 )dy1 " dyn = =
1 § 1 · exp ¨ 2 (y E{y})c(y E{y}) ¸ dy1 " dyn = n/2 n (2S ) V © 2V ¹ 2 2 ˆ ˆ ˆ = f1 (ȟ ) f 2 (Vˆ )d [1 " d [ m dVˆ
can be split into two marginal pdfs f1 (ȟˆ ) of ȟˆ , BLUUE of ȟ , and f 2 (Vˆ 2 ) of Vˆ 2 , BIQUUE of V 2 . (i) ȟˆ BLUUE of ȟ The marginal pdf of ȟˆ , BLUUE of ȟ , is represented by (1st version) dF1 = f1 (ȟˆ )d[1 " d[ m f1 (ȟˆ ) =
1 (2S )
m/ 2
V
m
A cA
1/ 2
§ 1 · exp ¨ 2 (ȟˆ ȟ )cA cA(ȟˆ ȟ ) ¸ d [1 " d [ m © 2V ¹ or (2nd version)
dF1 = f1 ( Șˆ )dK1 " dKm f1 (Kˆ ) =
§ 1 1 (O1O2 " Om 1Om ) 1/ 2 exp ¨ 2 (2S ) m / 2 (V 2 ) m / 2 © 2V
(Kˆi K ) 2 · ¸, ¦ Oi i =1 ¹ m
by means of Principal Component Analysis PCA also called Singular Value Decomposition (SVD) or Eigenvalue Analysis (EIGEN) of ( A cA ) 1 , Ș = U[c ȟ
ª U[c ( A cA) 1 U[ = ȁ = Diag (O1 , O2 ," , Om 1 , Om ) subject to « Șˆ = U[c ȟˆ «¬ U[c U[ = I m , det U[ = +1 f1 ( Șˆ | Ș, ȁV 2 ) = f (Kˆ1 ) f (Kˆ2 ) " f (Kˆm 1 ) f (Kˆm )
D7 Sampling from the multidimensional Gauss-Laplace normal distribution
f (Kˆi ) =
1
V Oi
633
§ 1 (Kˆ Ki ) 2 · exp ¨ 2 i ¸ i {1," , m}. Oi 2S © 2V ¹
The transformed fixed effects (Kˆ1 ," ,Kˆm ) , BLUUE of (K1 ," ,Km ) , are mutually independent and Gauss-Laplace normal
Kˆi N (Ki | V 2 Oi ) i {1," , m} . (3rd version) zi :=
Kˆi K 2
V Oi
: f1 ( zi )dzi =
1
1 exp( zi2 )dzi i {1," , m} . 2 2S
(ii) Vˆ 2 BIQUUE V 2 The marginal pdf of Vˆ 2 , BIQUUE V 2 , is represented by (1st version) p = n rk A dF2 = f 2 (Vˆ 2 )dVˆ 2 f 2 (Vˆ 2 ) =
§ 1 Vˆ 2 · 1 p / 2 p2 ˆ V exp p ¨ p 2 ¸. V p 2 p / 2 *( p / 2) © 2 V ¹ (2nd version) dF2 = f 2 ( x)dx
Vˆ 2 p 1 = 2 Vˆ 2 = 2 (y Aȟˆ )c(y Aȟˆ ) 2 V V V p 1 1 1 f 2 ( x) = p / 2 x 2 exp( x) . 2 2 *( p / 2)
x := (n rk A)
f2(x) as the standard pdf of the normalized sample variance is a Helmert Chi Square F p2 pdf with p = n rk A “degree of freedom”. :Proof: The first action item n First, let us decompose the quadratic form || y E{y} ||2 into estimates E {y} of E{y} . n n y E{y} = y E {y} + ( E {y} E{y}) y E{y} = y A[ˆ + A(ȟˆ ȟ )
634
Appendix D: Sampling distributions and their use
and n n n n (y E{y})c(y E{y}) = ( y E {y})c( y E {y}) + ( E {y} E{y})c( E {y} E{y}) n n || y E{y} ||2 =|| y E {y} ||2 + || E {y} E{y} ||2 ( y E{y})c( y E{y}) = ( y Aȟˆ )c( y Aȟˆ ) + ( ȟˆ ȟ )cA cA( ȟˆ ȟ) || y E{y} ||2 = || y Aȟˆ ||2 + || ȟˆ ȟ ||2AcA . Here, we took advantage of the orthogonality relation. (ȟˆ ȟ )cAc( y Aȟˆ )c = (ȟˆ ȟ)cAc(I n A( AcA) 1 Ac) y = = (ȟˆ ȟ )c( A c A cA( A cA) 1 A c)y = 0. The second action item Second, we implement Vˆ 2 BIQUUE of V 2 into the decomposed quadratic form. || y Aȟˆ ||2 = (y Aȟˆ )c(y Aȟˆ ) = y c(I n A( A cA) 1 A c)y = = y cMy = (n rk A)Vˆ 2 || y E{y} ||2 = (n rk A)Vˆ 2 + (ȟˆ ȟ )cA cA(ȟˆ ȟ ) || y E{y} ||2 = y cMy + (ȟˆ ȟ )cN(ȟˆ ȟ ). The matrix of the normal equations N:=A cA, rk N = rk A cA = rk A = m , and the matrix of the variance component estimation M := I n A( A cA) 1 A c, rk M = = n rk A = n m have been introduced since their rank forms the basis of the generalized forward and backward Helmert transformation. HH c = I n z = V 1H( y E{y}) = V 1H( y Aȟ ) and y E{y} = V H cz 1 (y E{y})c(y E{y}) = z cH cHz = z cz V2 1 || y E{y} ||2 =|| z ||2 . V2
D7 Sampling from the multidimensional Gauss-Laplace normal distribution
635
The standard canonical variable z \ n has to be associated with norms || y Aȟˆ || and || ȟˆ ȟ ||AcA . The third action item Third, we take advantage of the eigenspace representation of the matrices (M, N) and their associated norms. y cMy = y cVȁ M V cy versus (ȟˆ -ȟ )cN(ȟˆ -ȟ )=(ȟˆ -ȟ )cUȁ N U c(ȟˆ -ȟ ) ȁ M = Diag ( P1 ," , P n m , 0," , 0) versus ȁ N = Diag (J 1 ," , J m ). \ n = \ nm × \ m \m m eigenvalues of the matrix M are zero, but n rk A = n m is the number of its non-vanishing eigenvalues which we denote by ( P1 ," , P n m ) . In contrast, m = rk A is the n-m number of eigenvalues of the matrix N, all non-zero. The canonical random variable V cy = y y = Vy and Uc(ȟˆ ȟ ) = Șˆ Ș lead to 1 1 1 (y E{y})c(y E{y}) = 2 ( y ) ȁ M y + 2 ( Șˆ Ș)cȁ N ( Șˆ Ș) 2 V V V 1 1 (y E{y})c(y E{y}) = 2 2 V V
nm
¦ ( y j ) 2 P j + j =1
1 V2
m
¦ (Kˆ i =1
i
Ki ) 2 J i
1 (y E{y})c(y E{y}) = z12 + " + zn2 m + zn2 m +1 + " + zn2 2 V subject to z12 + " + zn2 m = =
1 V2
nm
2 j
¦ (y ) j =1
and
zn2 m +1 + " + zn2 = =
Pj
1 V2
m
¦ (Kˆ i =1
i
Ki ) 2 J i
|| z ||2 = z cz = z12 + " + zn2 m + zn2 m +1 + " + zn2 = 1 1 y cMy + 2 (ȟˆ ȟ )cN(ȟˆ ȟ ) = V2 V 1 1 = 2 || y E{y} ||2 = 2 (y E{y})c(y E{y}). V V =
Obviously, the eigenspace synthesis of the matrices N = A cA and M = I n A( A cA) 1 A c has guided us to the proper structure synthesis of the generalized Helmert transformation.
636
Appendix D: Sampling distributions and their use
The fourth action item Fourth, the norm decomposition unable us to split the cumulative probability dF = f ( y1 ," , yn )dy1 " dyn into the pdf of the Helmert random variable x := z12 + " + zn2 m = V 2 (n rk A)Vˆ 2 = = V 2 (n m)Vˆ 2 and the pdf of the difference random parameter vector zn2m+1 + " + zn2 = V 2 (ȟˆ ȟ )cA cA(ȟˆ ȟ ) . dF = f ( z1 ," , zn m , zn m +1 ," , zn )dz1 " dzn m dzn m +1dzn f ( z1 ," , zn ) = =(
1 § 1 · exp ¨ z cz ¸ = (2S ) n / 2 © 2 ¹
m 1 n 2m § 1 · 1 § 1 · ) exp ¨ ( z12 + " + zn2 m ) ¸ ( ) 2 exp ¨ ( zn2 m +1 + " + zn2 ) ¸ . 2S © 2 ¹ 2S © 2 ¹
Part A x := r 2 = z12 + " + zn2 m dx = 2( z1 dz1 + " + zn m dzn m ) z1 = r cos In m 1 cos In m 2 " cos I2 cos I1 z2 = r cos In m 1 cos In m 2 " cos I2 sin I1 ... zn m 1 = r cos In m1 sin In m 2 zn m = r sin In m 1 ª z1 º « " » = 1 Diag ( P ," , P )V cy 1 nm 1 « » V «¬ zn m »¼ V = [V1 , V2 ] , V1cV1 = I n m , V2cV2 = I m , V1cV2 = 0 VV c = I n , V \ n× n , V1 \ n× ( n m ) , V2 \ n× m and ª zn m+1 º « " » = 1 Diag ( J ," , J )Uc(ȟˆ ȟ ) 1 m « » V «¬ zn »¼ altogether
D7 Sampling from the multidimensional Gauss-Laplace normal distribution
637
ª z1 º « ... » « » ª Diag ( P1 ," , Pn m ) V1cy º « zn m » 1 ». « » =V « «¬ Diag ( J 1 ," , J m )Uc(ȟˆ ȟ ) »¼ « zn m+1 » « ... » « » «¬ zn »¼ The partitioned vector of the standard random variable z is associated with the norm || zn m ||2 and || zm ||2 , namely || zn m ||2 + || zm ||2 = z12 + " + zn2 m + zn2 m +1 + " + zn2 = = + =
1 y cV1 Diag ( P1 ," , P n m ) Diag ( P1 ," , P n m )V1cy + V2
1 ˆ (ȟ ȟ )cUDiag ( J 1 ," , J m ) Diag ( J 1 ," , J m )Uc(ȟˆ ȟ ) = V2
1 1 y cV1 Diag ( P1 ," , P n m )V1cy + 2 (ȟˆ ȟ )cUDiag (J 1 ," , J m )U c(ȟˆ ȟ ) = 2 V V dz1dz2 " dzn m 1dzn m = r n m 1dr (cos In m 1 ) n m 1 (cos In m 2 ) n m 2 " " cos 2 I3 cos I2 dIn m 1dIn m 2 " dI3 dI2 dI1 .
The representation of the local (n-m) dimensional hypervolume element in terms of polar coordinates (I1 , I2 ," , In m 1 , r ) has already been given by Lemma D4. Here, we only transform the random variable r to Helmert’s random variable x. x := r 2 : dx = 2rdr , dr = r n m 1dr =
dx 2 x
, r n m 1 = x ( n m 1) / 2
1 ( n m 1) / 2 x dx. 2
Part A concludes with the representation of the left pdf in terms of Helmert’s polar coordinates dFA = ( =
1 n 2m 1 ) exp ( z12 + " + zn2 m )dz1 " dzn m = 2S 2
1 1 n 2m n m2 2 ( ) x dx(cos In m 1 ) n m 1 (cos In m 2 ) n m 2 " 2 2S " cos 2 I3 cos I2 dIn m 1dIn m 2 " dI3 dI2 dI1 . Part B
638
Appendix D: Sampling distributions and their use
Part B focuses on the representation of the right pdf, first in terms of the random variables ȟˆ , second in terms of the canonical random variables Kˆ . dFr = (
1 m2 § 1 · ) exp ¨ ( zn2 m +1 + " + zn2 ) ¸ dzn m +1 " dzn 2S 2 © ¹ zn2m+1 + " + zn2 =
dzn m +1 " dzn =
1 ˆ (ȟ ȟ )cA cA(ȟˆ ȟ ) V2
1
V m/2
| A cA |1/ 2 d[ˆ1 " d[ˆm .
The computation of the local m-dimensional hyper volume element dzn m +1 " dzn has followed Corollary D3 which is based upon the matrix of the metric V 2 AcA . The first representation of the right pdf is given by dFr = ( =(
1 m2 § 1 · ) exp ¨ ( zn2 m +1 + " + zn2 ) ¸ dzn m +1 " dzn = 2S © 2 ¹
1 m2 | A cA |1/ 2 § 1 · ) exp ¨ 2 (ȟˆ ȟ )cA cA(ȟˆ ȟ ) ¸ d [ˆ1 " d [ˆm . m/ 2 2S V © 2V ¹
Let us introduce the canonical random variables (Kˆ1 ," ,Kˆm ) which are generated by the correlating quadratic form || ȟˆ ȟ ||2AcA . (ȟˆ ȟ )cA cA(ȟˆ ȟ ) = (ȟˆ ȟ )cUDiag (
1 1 ," , )U c(ȟˆ ȟ ) . O1 Om
Here, we took advantage of the eigenspace synthesis of the matrix A cA =: N and ( A cA) 1 =: N 1 . Such an inverse normal matrix is the representing dispersion matrix D{ȟˆ} = ( AcA) 1V 2 = N 1V 2 . UUc = I m U SO(m) := {U \ n× m | UUc = I m ,| U |= +1} N := A cA = UDiag (J 1 ," , J m )U c versus N 1 := ( A cA) 1 = UDiag (O1 ," , Om )U c subject to
J 1 = O11 ," , J m = Om1 or O1 = J 11 ," , Om = J m1 | A cA |1/ 2 = J 1 "J m =
1 O1 " Om
Șˆ Ș := Uc(ȟˆ ȟ ) ȟˆ ȟ := Uc( Șˆ Ș)
D7 Sampling from the multidimensional Gauss-Laplace normal distribution
639
1 1 || ȟˆ ȟ ||2AcA =: (ȟˆ ȟ )cA cA (ȟˆ ȟ ) = ( Șˆ Ș)cDiag ( ,", )( Șˆ Ș) . O1 Om The local m-dimensional hypervolume element d[ˆ1 " d[ˆm is transformed to the local m-dimensional hypervolume element dKˆ1 " dKˆm by d[ˆ1 " d[ˆm =| U | dKˆ1 " dKˆm = dKˆ1 " dKˆm . Accordingly we have derived the second representation of the right pdf f ( Șˆ ) . dFr = (
§ 1 · 1 m2 1 1 1 ) exp ¨ 2 (Șˆ Ș)cDiag ( ,", )(Șˆ Ș) ¸ dKˆ1 " dKˆm . m/2 O1 Om 2S V O1 "Om © 2V ¹ Part C
Part C is an attempt to merge the left and right pdf 1 1 n 2m ( n m 2) / 2 ( ) x dxd Zn m 1 2 2S § 1 · exp ¨ (ȟˆ ȟ )cA cA(ȟˆ ȟ ) ¸ d [ˆ1 " d [ˆm 2 © ¹
dF = dFA dFr =
(
1 m2 | A cA |1/ 2 ) 2S Vm
or dF = dFA dFr =
(
1 1 n 2m ( n m 2) / 2 ( ) x dxd Zn m 1 2 2S
§ 1 · 1 m2 1 1 1 ) exp ¨ 2 ( Șˆ Ș)c Diag ( ," , )( Șˆ Ș) ¸ dKˆ1 " dKˆm . m 2S V O1 " Om O1 Om © 2V ¹
The local (n-m-1)-dimensional hypersurface element has been denoted by dZ n m 1 according to Lemma D4. The fifth action item Fifth, we are going to compute the marginal pdf of ȟˆ BLUUE of ȟ . dF1 = f1 (ȟˆ )d[ˆ1 " d[ˆm as well as dF1 = f1 ( Șˆ )dKˆ1 " dKˆm include the first marginal pdf f1 (ȟˆ ) and f1 ( Șˆ ) , respectively. The definition
640
Appendix D: Sampling distributions and their use
f 1 1 nm 1 m | AcA |1/ 2 § 1 · f1 (ȟˆ ) := ³ dx³9dZn m 1 ( ) 2 x( n m 2) / 2 ( ) 2 exp ¨ (ȟˆ ȟ)cAcA(ȟˆ ȟ) ¸ 2 m/2 2 2 S 2 S 2 V ( ) © ¹ 0 subject to f
1 1 n 2m ( n m 2) / 2 9 dx d Z =1 ³0 ³ n m 1 2 ( 2S ) x
leads us to f1 (ȟˆ ) = (
1 m2 | A cA |1/ 2 § 1 · ) exp ¨ (ȟˆ ȟ )cA cA(ȟˆ ȟ ) ¸ . 2S 2 Vm © ¹
Unfortunately, such a general multivariate Gauss-Laplace normal distribution cannot be tabulated. An alternative is offered by introducing canonical unknown parameters Șˆ as random variables. The definition f
1 1 nm f1 ( Șˆ ) := ³ dx ³9d Zn m 1 ( ) 2 x ( n m 2) / 2 2 2S 0
(
§ 1 · 1 m2 1 1 1 ) (O1O2 " Om 1Om ) 1/ 2 exp ¨ 2 ( Șˆ Ș)c Diag ( ," , )( Șˆ Ș) ¸ m 2S V O1 Om © 2V ¹ subject to f
1 1 n 2m ( n m 2) / 2 9 dx d Z ( ) x =1 1 n m ³0 ³ 2 2S
alternatively leads us to f1 ( Șˆ ) = (
1 m2 1 ) 2S V m
§ 1 exp ¨ 2 O1 " Om © 2V 1
(Kˆi K ) 2 · ¸ ¦ Oi i =1 ¹ m
f1 (Kˆ1 ," ,Kˆm ) = f1 (Kˆ1 )" f1 (Kˆm ) f1 (Kˆi ) :=
1
V Oi
§ 1 (Kˆi K ) 2 · exp ¨ ¸ i {1," , m} . Oi 2S © 2 ¹
Obviously the transformed random variables (Kˆ1 ," ,Kˆm ) BLUUE of (K1 ," ,Km ) are mutually independent and Gauss-Laplace normal. The sixth action item Sixth, we shall compute the marginal pdf of Helmert’s random variable x = (n rkA)Vˆ 2 / V 2 = (n m)Vˆ 2 / V 2 , Vˆ 2 BIQUUE V 2 ,
D7 Sampling from the multidimensional Gauss-Laplace normal distribution
641
dF2 = f 2 ( x)dx includes the second marginal pdf f 2 ( x) . The definition +f +f 1 1 nm 1 m | A cA |1/ 2 f 2 ( x) := 9³ d Zn m 1 ( ) 2 x ( n m 2) / 2 ³ d [ˆ1 " ³ d [ˆm ( ) 2 exp 2 2S 2S Vm f f
§ 1 ˆ · ˆ ¨ 2 (ȟ ȟ )cAcA(ȟ ȟ ) ¸ © 2V ¹ subject to
Zn m 1 = ³9d Zn m 1 =
2 S ( n m 1) / 2 , n m 1 *( ) 2
according to Lemma D4 +f
+f m 1/ 2 ˆ " d [ˆ ( 1 ) 2 | A cA | exp § 1 (ȟˆ ȟ )cA cA(ȟˆ ȟ ) · = " d [ m 1 ¨ ¸ 2 ³ f³ 2S Vm © 2V ¹ f
=
+f
³
f
+f
dz1 " ³ dzm ( f
1 m2 § 1 · ) exp ¨ ( z12 + " + zm2 ) ¸ = 1 2S 2 © ¹ leads us to
p := n rk A = n m
S *(
n m 1 1 n m 1 nm p ) = * ( )* ( ) = *( ) = *( ) 2 2 2 2 2 f 2 ( x) =
p 1 1 1 2 x exp( x) , p/2 2 2 *( p / 2)
namely the standard pdf of the normalised sample variance, known as Helmert’s Chi Square pdf F p2 with p = n rk A = n m “degree of freedom”. If you substitute x = (n rkA)Vˆ 2 / V 2 = (n m)Vˆ 2 / V 2 , dx = (n rk A)V 2 dVˆ 2 = ( n m)V 2 dVˆ 2 we arrive at the pdf of the sample variance Vˆ 2 , in particular dF2 = f 2 (Vˆ 2 )dVˆ 2 f 2 (Vˆ 2 ) =
§ 1 Vˆ 2 · 1 p / 2 p 2 ˆ V exp p ¨ p 2 ¸ . V r 2 p / 2 *( p / 2) © 2 V ¹
642
Appendix D: Sampling distributions and their use
Here is my proof’s end. Theorem D17
(marginal probability distributions, special linear GaussMarkov model with datum defect): E{y} = Aȟ
A \ n× m , r := rkA < min{n, m}
subject to E{y} R ( A ) D{y} = VV 2
V \ n× n , rkV = n
defines a special linear Gauss-Markov model with datum defect of fixed effects ȟ \ m and a positive definite variance-covariance matrix D{y} = Ȉ y of multivariate Gauss-Laplace distributed observations y := [ y1 ," , yn ]c . ȟˆ = Ly “linear” ȟˆ = A + y
subject to
|| LA I m ||2 = min “minimum bias” trD{ȟˆ} = trLȈ y Lc = min “best” and
Vˆ 2 =
1 (y Aȟˆ )cȈ y1 (y Aȟˆ ) n rk A
subject to
ª E{Vˆ 2 } = V 2 « 4 « D{Vˆ 2 } = 2V «¬ n rk A
identify ȟˆ BLIMBE of ȟ and Vˆ 2 BIQUUE of V 2 . Part A The cumulative pdf of the multivariate Gauss-Laplace probability distribution of the observation vector y = [ y1 ,… , yn ]c Y f (y | E{y}, D{y} = Ȉ y )dy1 " dyn = =
1 1 § 1 · exp ¨ (y E{y})cȈ y1 (y E{y}) ¸ dy1 " dyn n/2 1/ 2 (2S ) | Ȉ y | © 2 ¹ is transformed by
Ȉ y = : cDiag (V 1 ,… , V n ) : = : cDiag ( V 1 ,… , V n ) Diag ( V 1 ,… , V n ) : Ȉ1/y 2 := Diag ( V 1 ,… , V n ) : Ȉ y = ( Ȉ1/y 2 )cȈ1/y 2 versus
D7 Sampling from the multidimensional Gauss-Laplace normal distribution
Ȉ y1 = : cDiag ( = W cDiag (
1
V1
,… ,
1
Vn
Ȉ 1/ 2 := Diag (
1 1 ,… , ) : = V1 Vn ) Diag ( 1
V1
,… ,
1
V1 1
Vn
,… ,
1
Vn
)W
)W
Ȉ 1 = ( Ȉ y1/ 2 )cȈ y1/ 2 subject to the orthogonality condition WW c = I n || y E{y} ||2Ȉ := (y E{y})cȈ y1 (y E{y}) = 1 y
= (y E{y})cW cDiag (
1
V1
,… ,
1
Vn
) Diag (
1
V1
,… ,
1
Vn
) W( y E{y})c =
= (y E{y })c(y E{y }) =:|| y E{y } ||I2
n
subject to the star or canonical coordinates y = Diag (
1
V1
," ,
1
Vn
) Wy = Ȉ y1/ 2 y
f (y | E{y }, D{y } = I n )dy1 " dyn = 1 § 1 · exp ¨ (y E{y })c(y E{y }) ¸ dy1 " dyn = n/2 (2S ) © 2 ¹ 1 § 1 · = exp ¨ || y E{y } ||2 ¸ dy1 " dyn n/2 (2S ) © 2 ¹ =
into the canonical Gauss-Laplace pdf. Part B The marginal pdf of ȟˆ BLUUE of ȟ , is represented by (1st version) dF1 = f1 (ȟˆ )d[1 " d[ n .
643
Appendix E: Statistical Notions Definitions und lemmas relating to statistics are given, neglecting their proofs. First, we reviews the statistical moments of a probability distribution of random vectors and list the Gauss-Laplace normal distribution and its moments. We slightly generalize the Gauss-Laplace normal distribution by introducing the notion of a quasi - normal distribution. At the end of the Chapter E1 we review two lemmas about the Gauss-Laplace 3ı rule, namely the Gauss-Laplace inequality and the Vysochainskii – Potunin inequality. Chapter E2 reviews the special linear error propagation as well as the general nonlinear error propagation, namely based on y = g(x) introducing the moments of second, third and fourth order. The special role of the Jacobi matrix as well as the Hesse matrix is clarified. Chapter E3 reviews useful identities like E{yy c
y} and E{yy c
yy c} as well as Ȍ := E{(y E{y})(y E{y})c
(y E{y})} relating to the matrix of obliquity and ī := E{(y E{y}) (y E{y})c
(y E{y})(y E{y})c} + ... relating to the matrix of courtosis. The various notions of identifiability and unbiased estimability in Chapter E4 are reviewed, both for the moments of first order and for the central moments of second order.
E1:
Moments of a probability distribution, the Gauss-Laplace normal distribution and the quasi-normal distribution
First, we define the moments of a probability distribution of a vector valued random function and explain the notion of a Gauss-Laplace normal distribution and its moments. Especially we define the terminology of a quasi – GaussLaplace normal distribution. Definition E1 (statistical moments of a probability distribution): (1) The expectation or first moment of a continuous random vector [ X i ] for all i = 1,… , n of a probability density f ( x1 ,… xn ) is defined as the mean vector ȝ := [ P1 ,… , P n ] of E{Xi } = ȝ i +f
+f
f
f
ȝ i := E{Xi } = ³ … ³
xi f ( x1 ,… , xn )dx1 … dxn .
(E1)
The first moment related to the random vector [ei ] := [ Xi E{Xi }] is called first central moment
S i := E{Xi ȝi } = E{Xi } ȝi = 0.
(E2)
(2) The second moment of a continuous random vector [ X i ] for all i = 1,… , n of a probability density f ( x1 ,… , xn ) is the mean matrix +f
+f
f
f
ȝ ij := E{Xi X j } = ³ … ³
xi x j f ( x1 ,… , xn )dx1 " dxn .
(E3)
E1 Moments of a probability distribution
645
The second moment related to the random vector [ei ] := [ X i E{Xi }] is called variance – covariance matrix or dispersion matrix [V ij ] , especially variance or dispersion V 2 for i = j or covariance for i z j :
S ii = V i2 = V {Xi } = D{Xi } := E{( Xi ȝi ) 2 }
(E4)
S ij = V ij = C{Xi , X j } := E{( Xi ȝi )( X j ȝ j )}
(E5)
D{x} = [V ij ] = [C{Xi , X j }] = E{( x E{x})( x E{x})c} .
(E6)
x := [ X1 ,… , X n ]c is a collection of the n × 1 random vector. The random variables X i , X j for i z j are called with respect to the central moment of second order uncorrelated if V ij = 0 for i z j . (3) The third central moment with respect to the random vector [ei ] := [ Xi E{Xi }] defined by
S ijk := E{ei e j e k } = E{( Xi ȝ i )( X j ȝ j )( X k ȝ k )}
(E7)
contains for i = j = k the vector of obliquity with the components Ȍ i := E{ei3 } .
(E8)
Uncorrelation with respect to the central moment up to third order is defined by ª 1 d i1 z i2 z i3 d n E{e e e } = E{e } «« and j =1 «¬ 0 d n1 + n2 + n3 d 3. 3
n1 n2 n3 i1 i2 i3
nj ij
(E9)
(4) The fourth central moment relative to the random vector [ei ] := [ Xi E{Xi }]
S ijk A := E{ei e j ek eA } = E{( Xi ȝi )( X j ȝ j )( Xk ȝ k )( XA ȝ A )}
(E10)
leads for i1 = i2 = i3 = i4 to the vector of curtosis with the components Ȗ i := E{ei4 } 3{V i2 }2 .
(E11)
Uncorrelation with respect the central moments up the fourth order is defined by ª 1 d i1 z i2 z i3 z i4 d n E{e e e e } = E{e } «« und j =1 «¬ 0 d n1 + n2 + n3 + n4 d 4. 4
n1 n2 n3 n4 i1 i2 i3 i4
nj ij
(E12)
646
Appendix E: Statistical Notions
(5) The central moments of the nth order relative to the random vector [ei ] := [ Xi E{Xi }] are defined by
S i …i := E{ei … ei } = E{( Xi P i )… ( Xi P i )}. n
1
n
1
1
n
1
(E13)
n
A special distribution is the Gauss-Laplace normal distribution of random vectors in R n . Note that alternative distributions on manifolds exist in large numbers, for instance the von Mises distribution on S 2 or the Fisher distribution on S 3 of Chapter 7. Definition E2 (Gauss-Laplace normal distribution): An n × 1 random vector x := [ X1 ,… , X n ]c is a Gauss-Laplace normal distribution if its probability density f ( x1 ,… xi ) has the representation 1 f (x) = (2S ) n / 2 | Ȉ |1/ 2 exp[ (x E{x})cȈ 1 (x E{x})]. (E14) 2 Symbolically we can write x N (ȝ, Ȉ) (E15) with the moment of first order or mean vector ȝ := E{x} and the central moment of second order or variance – covariance matrix Ȉ := E{(x E{x})( x E{x})c}. The moments of the Gauss-Laplace normal distribution are given next. Lemma E3: (moments of the Gauss-Laplace normal distribution): Let the n × 1 random vector x := [ X1 ,… , X n ]c follow a GaussLaplace normal distribution. Then all central moments of odd order disappear and the central moments of even order are product sums of the central moments of second order exclusively. E{ei … ei } = 0 n = 2m + 1, m = 1,… , f
(E16)
E{ei … ei } = fct(V i2 , V i i ,… , V i2 ,… , V i2 ) n = 2m, m = 1,… , f
(E17)
S ij = V ij , S ii = V i2 ,
(E18)
S ijk = 0,
(E19)
S ijk A = S ijS k A + S ik S jA + S iAS jk ,
(E20)
S iijj = S iiS jj + 2S ijS ij = V i2V 2j + 2V ij2 , S iiii = 3(V i2 )2
(E21)
S ijk Am = 0,
(E22)
n
1
1
S i i …i 12
n
2 m 2 i2 m 1i2 m
1
= S i i …i 12
2 m2
12
Si
2 m 1i2 m
n
2
+ S i i …i 12
2 m 3i2 m 1
Si
2 m 2 i2 m
+ … + S i i …i S i i . (E23) 2 3
2 m 1
1 2m
E1 Moments of a probability distribution
647
The vector of obliquity ȥ := [\ 1 ,… ,\ m ]c and the vector of curtosis Ȗ := [J 1 , … , J m ]c vanish. A weaker assumption compared to the Gauss-Laplace normal distribution is the assumption that the central moments up to the order four are of the form (E18)(E21). Thus we allow for a larger class of distributions which have a similar structure compared to (E18)-(E21). Definition E4 (quasi - Gauss-Laplace normal distribution): A random vector x is quasi – Gauss-Laplace normally distributed if it has a continuous symmetric probability distribution f ( x ) which allows a representation of its central moments up to the order four of type (E18)-(E21). Of special importance is the computation of error bounds for the Gauss-Laplace normal distribution, for instance. As an example, we have the case called the 3V rule which states that the probability for the random variable for X falling away, from its mean by more then 3 standard deviations (SDs) is at most 5 %, P{| X ȝ |t 3V } d
4 < 0.05 . 81
(E24)
Another example is the Gauss-Laplace inequality, that bounds the probability for the deviation from the mode Q . Lemma E5 (Gauss inequality): The expected squared deviation from the mode Ȟ is P{| X Ȟ |t r} d
4W 2 for all r t 4 / 3 W 9r 2
P{| X Ȟ |t r} d 1 (r / 3W ) for all r d 4 / 3 W
(E25) (E26)
subject to W 2 := E{( X Ȟ ) 2 } . Alternatively, we take advantage of the Vysochanskii – Potunin inequality. Lemma E6 (Vysochanskii – Potunin inequality): The expected squared deviation from an arbitrary point D \ is 4U 2 for all r t 8 / 3 U 9r 2
(E27)
4U 2 1 for all r d 8 / 3 U 3r 2 3
(E28)
P{| X D |t r} d P{| X D |t r} d
subject to U 2 := E{( X D ) 2 }.
648
Appendix E: Statistical Notions
References about the two inequalities are C.F. Gauss (1823), F. Pukelsheim (1994), J.R. Savage (1961), B. Ulin (1653), D.F. Vysochaniskii and Y. E. Petunin (1980, 1983).
E2:
Error propagation
At the beginning we note some properties of the operators expectation E and dispersion D. Afterwards we review the special and general, also nonlinear error propagation. Lemma E7 (expectation operators E{X} ): E is defined as a linear operator in the space of random variables in R n , also called expectation operator. For arbitrary constants D , E , G \ there holds the identity E{D Xi + E Xi + G } .
(E29)
Let A and B be two m × n and m × A matrices and į an m × 1 vector of constants x := [ X i ,… , X n ]c and y := [ Yi ,… , Yn ]c two n × 1 and A × 1 random vectors such that E{AX + BX + į} = AE{x} + BE{y} + į
(E30)
holds. The expectation operator E is not multiplicative that is E{Xi X j } = E{Xi }E{X j } + C{Xi , X j } z E{Xi }E{X j } ,
(E31)
if X i and X j are correlated. Lemma E8 (special error propagation): Let y be an n × 1 dimensional random vector which depends linear of the m × 1 dimensional random vector x by means of a constant n × m dimensional matrix A and of a constant dimensional vector į of the form y := Ax + į . Then hold the “error propagation law” D{y} = D{Ax + į} = AD{x}Ac.
(E32)
The dispersion function D is not addition, in consequence a nonlinear operator that is D{Xi + X j } = D{Xi } + D{X j } + 2C{Xi , X j } z D{Xi } + D{X j } ,
(E33)
if X i and X j are correlated. The “special error propagation law” holds for a linear transformation x o y = Ax + į . The “general nonlinear error propagation” will be presented by Lemma E7 and E8. The detailed proofs are taken from Chapter 8-2, Examples.
649
E2 Error propagation
Corollary E9 (“nonlinear error propagation”): Let y = g ( x) be a scalarvalued function between one random variable x and one random variable y. g(x) is assumed to allow a Taylor expansion around the fixed approximation point [ 0 : 1 g ( x) = g ([ 0 ) + g c([ 0 )( x [ 0 ) + g cc([ 0 )( x [ 0 ) 2 + O (3) = (E34) 2 = J 0 + J 1 ( x [ 0 ) + J 2 g ([ 0 ) 2 + O (3). Then the expectation and dispersion identities hold E{ y} = g ( P x ) +
1 g cc([ 0 )V x2 + O (3) = g ( P x ) + J 2V x2 + O (3), 2
D{ y} = J 12V x2 + 4V x2 [J 1J 2 ( P x [ 0 ) + J 22 ( P x [ 0 ) 2 ] J 22V x4 + +E{( x P x )3 }[2J 1J 2 + 4J 22 ( P x [ 0 )] + E{( x P x ) 4 }J 22 + O (3).
(E35) (E36)
For the special case of a fixed approximation point ȟ 0 chosen to coincide with mean value P x = [ 0 and x being quasi – GaussLaplace normal distributed we arrive at the identities E{ y} = g ( P x ) +
1 g cc( P x )V x2 + O (4), 2
1 D{ y} = [ g c( P )]2 V x2 + [ g cc( P x )]2 V x4 + O (3). 2
(E37) (E38)
The representation (E37) and (E38) characterize the nonlinear error propagation which is in general dependent of the central moments of the order two and higher, especially of the obliquity and the curtosis. Lemma E10: (“nonlinear error propagation“): Let y = f(x) be a vectorvalued function between the m × 1 random vector x and the n × 1 random vector y. g(x) is assumed to allow a Taylor expansion around the m × 1 fixed approximation vector ȟ 0 : g(x) = g(ȟ 0 ) + gc(ȟ 0 )( x ȟ 0 ) + 1 + g cc(ȟ 0 )[( x ȟ 0 )
( x ȟ 0 )] + O (3) = 2 1 = Ȗ 0 + J (x ȟ 0 ) + H[( x ȟ 0 )
( x ȟ 0 ) + O (3). 2
(E39)
650
Appendix E: Statistical Notions
With the n × m Jacobi matrix J := [ J j g i (ȟ 0 )] and the n × m 2 Hesse matrix H := [vecH1 ,… , vecH n ]c, Hi := [w j w k g i (ȟ 0 )] (i = 1,…, n; j, k = 1,…, m)
(E40)
there hold the following expectation and dispersion identities (“nonlinear error propagation”) 1 E{y} = ȝ y = g (ȝ x ) + HvecȈ + O (3) 2
(E41)
1 D{y} = Ȉ y = JȈJ c + J [ Ȉ
(ȝ x ȟ 0 )c + (ȝ x ȟ 0 )c
Ȉ]H c + 2 1 + H[ Ȉ
(ȝ x ȟ 0 ) + (ȝ x ȟ 0 )
Ȉ]J c + 2 1 + H[ Ȉ
(ȝ x ȟ 0 )(ȝ x ȟ 0 )c + (ȝ x ȟ 0 )(ȝ x ȟ 0 )c
Ȉ + 4 +(ȝ x ȟ 0 )
Ȉ
(ȝ x ȟ 0 )c + (ȝ x ȟ 0 )c
Ȉ
(ȝ x ȟ 0 )]H c + 1 + J E{(x ȝ x )(x ȝ x )c
(x ȝ x )c}H c + 2 1 + H E{(x ȝ x )
(x ȝ x )(x ȝ x )c}J c + 2 1 + H [ E{( x ȝ x )( x ȝ x )c
( x ȝ x )} (ȝ x ȟ 0 )c + 4 +(ȝ x ȟ 0 ) E{( x ȝ x )c
(x ȝ x )( x ȝ x )c} + +(I
(ȝ x ȟ 0 )) E{( x ȝ x )( x ȝ x )c
( x ȝ x )c} + + E{( x ȝ x )
(x ȝ x )( x ȝ x )c} ((ȝ x ȟ 0 )c)
I)]H c + 1 + H {(x ȝ x )(x ȝ x )c
(x ȝ x )(x ȝ x )c} 4 vecȈ(vecȈ)c]H c + O (3) .
(E42)
In the special case that the fixed approximations vector ȟ 0 coincides to the mean vector ȝ x = ȟ 0 and the random vector x is quasi – Gauss-Laplace normally distributed the following identities hold: 1 E{y} = ȝ y = g (ȝ x ) + HvecȈ + O (4) 2
(E43)
1 D{y} = Ȉ y = JȈJ c HvecȈ(vecȈ)cH c + 4 1 + H E{(x ȝ x )( x ȝ x )c
( x ȝ x )( x ȝ x )c} H c + O (3) = 4 1 = JȈJ c + H[ Ȉ
Ȉ + E{(x ȝ x )c
Ȉ
(x ȝ x )}] H c + O (3) . 4
(E44)
651
E3 Useful identities
E3:.Useful identities Notable identities about higher order moments are the following. Lemma E11 (identities: higher order moments): (i) Kronecker products #1 E{yy c} = E{(y E{y})( y E{y})c} + E{y}E{y}c
(E45)
E{yy c
y} = E{(y E{y})( y E{y})c
( y E{y})} + + E{yy c}
E{y} + E{y
E{y}c
y} +
(E46)
+ E{y}
E{yy c} 2 E{y}
E{y}c
E{y} E{yy c
yy c} = ( Ȍ
E{y}c E{y}c
Ȍ Ȍc
E{y} E{y}
Ȍc + E{yy c}
E{yy c} + + E{y c
E{yy c}
y} + E{y
y}E{y
y}c
(E47)
2 E{y}E{y}c
E{y}E{y}c . (ii) Kronecker products #2 Ȍ := E{(y E{y})( y E{y})c
( y E{y})} ( := E{( y E{y})( y E{y})c
( y E{y})( y E{y}) c} E{( y E{y})( y E{y})c
E{( y E{y})( y E{y})c} E{( y E{y})c
E{( y E{y})( y E{y})c}
( y E{y})} E{( y E{y})
( y E{y})}E{( y E{y})c
( y E{y})c} .
(E48)
(E49)
The n 2 × n matrix Ȍ contains the components of obliquity, the n 2 × n 2 matrix ( the components of curtosis relative to the n × 1 central random vector y E{y} . (iii) Covariances between linear and quadratic forms C{F1 y + E1 , F2 y + E 2 } := E{(F1 y + E1 E{F1 y + E1 }) ×(F2 y + E 2 E{F2 y + E 2 })c } = F1 ȈF2
(E50)
(linear error propagation) C{Fy + E, yc Hy} := E{(Fy + E E{Fy + E})(ycHy E{ycHy})} = = F[ E{yy c
yc } E{y}E{yc
y c}] vec H = 1 = FȌc vec(H + Hc ) + FȈ(H + Hc ) E{y} 2
(E51)
652
Appendix E: Statistical Notions
C{ycGy , yc Hy} := E{(ycGy E{ycGy})(yc Hy E{ycHy})} = = (vec G )c[ E{yy c
yyc} E{y
y}E{yc
y c}] vec H = 1 = [vec(G + Gc )]c ( vec( H + Hc ) 4 1 [vec(G + Gc )]c ( Ȍc
E{y} + Ȍ
E{y}c ) vec( H + Hc ) + 2 1 + tr[(G + Gc ) Ȉ( H + Hc ) Ȉ] + E{y}c (G + Gc ) Ȉ(H + Hc ) E{y}. 2
(E52)
(iv) quasi-Gauss-Laplace-normally distributed data C{F1 y + į1 , F2 y + į 2 } = F1 ȈF2
(E53)
(independent from any distribution) C{Fy + į, y cHy} = FȈ(H + Hc ) E{y}
(E54)
1 C{ycGy , yc Hy} = tr[(G + Gc ) Ȉ(H + Hc ) Ȉ] + 2 + E{y}c (G + Gc ) Ȉ(H + Hc ) E{y}.
(E55)
E4: The notions of identifiability and unbiasedness The notions of identifiability and unbiasedness are introduced. According to the classical concept we call an arbitrary vector y := Fȟ in a linear model “identifiable” if for a given probability of the observed random vector y with a given dispersion structure for the likelihood L( y; M) there holds the identity L(y; M1 ) = L(y; M 2 ) M1 = M 2 .
(E56)
If the likehood function L( y; M) belongs to the class of a exponential distributions for instance a Gauss-Laplace normal distribution, the likelihood function is identifiable if the related “information matrix” I (M) := E (
w 2 ln L(y; M) w ln L w ln L wM wM ) = E{( )( )c} = [ ]I (ȟ )( ]c wM i wM j wM wM wȟ wȟ
(E57)
is regular. It has be emphasized that the matrix J (ȟ ) in the case of a GaussLaplace normal distribution the corresponding “normal equation matrix” are identical: y ~ N (Aȟ, Ȉ) J (ȟ ) = A cȈ 1 A .
(E58)
In order to rely on the more general notion of a probability distribution, for instance characterized by the moments of order one to order four. We introduce the notion of “identifiability”.
653
E4 The notions of identifiability and unbiasedness
Definition E 12 (identifiability): An arbitrary vector M := Fȟ with respect to a linear model is identifiable, if the implication Aȟ1 = Aȟ 2 holds for all ȟ i R m
M1 = Fȟ1 = Fȟ 2 = M 2
(E59)
(i = 1, 2) .
Corollary E13 informs us about the equivalent formulation, namely that M := FY is identifiable if R ( Fc) R( A c) . Corollary E13 ( R ( Fc) R( A c) ): M := Fȟ identifiable in a linear model R ( Fc) R ( A c) . The concept of “identifiability” is related to the concept of “estimability” or “unbiasedness”. Definition E14 (estimability): An arbitrary vector M := Fȟ is called in a linear model “estimable” if there exists a function such that E{L(y )} = M
(E60)
holds. M = Fȟ with respect to a linear model is “linear estimable” if there exists a matrix L such that E{L(y )} = M
(E61)
holds. In all these cases L( y ) or Ly is an “unbiased estimable quantity” of M . A bridge between “identifiability” and “unbiased estimation” is built on the result of Theorem E15. Theorem E15 (“identifiability” versus “unbiased estimability”): Let M = Fȟ be an arbitrary vector built in a linear model. The following statements are equivalent. (i) M is identifyable, (ii) M is estimable, (iii) M is linear estimable, (iv) M is invariant with respect to all transformations which do not influence the observed random vector y (“S-transformation”), (v) R (Fc) R( A c) .
654
Appendix E: Statistical Notions
The above statements are the reason that for the vectors M = Fȟ only linear estimations are analyzed. Up to now we only analyzed the notion of identifiability and estimability of the unknown parameter vector. Now we tend to the question how to analyze the notions of “identifiability” and “estimability” of the unknown variance component within a linear model. Definition E16 (“identifiability of a variance component”): The variance component V 2 with respect to a linear model E{y} = Aȟ , D{y} = VV 2 is called “identifiable” if the implication VV 12 = VV 22 is fulfilled for all V i2 R +
V 12 = V 22
(E62)
(i = 1, 2) .
Similarly to Corollary E13 there exists an equivalent characterization with a special comment: Definition E17 works only if we recognize that R + allows only for positive real numbers. In addition, the matrix V is not allowed to contain the zero matrix in order that V 2 becomes identifiable. Similarly to Definition E14 we obtain the notion of estimability: Definition E17 (“estimability of a variance component”): The variance component V 2 is called with respect to the linear model E{y} = Aȟ , D{y} = VV 2 “estimable”, if the function M exists such that E{M (y )} = V 2
(E63)
is fulfilled. V 2 is called within this linear model “quadratically estimable” if there are symmetric matrices such that E{(vec M )c(y
y )} = E{y cMy} = V 2
(E64)
with the n × n matrix M . In these cases we call M( y ) or (vec M)c( y
y ) “unbiased estimable” with respect to the scalar V 2 . The necessary condition for a quadratic unbiased estimation of the variance component V 2 in a linear model is (vec M)c( A
A ) = 0 . In addition, for y z 0 the variance component is always positive, namely if M is a positive definite matrix.
Appendix F: Bibliographic Indexes There are various bibliographic indexes to publications in statistics. For instance, the American Statistical Association and the Institute of Mathematical Statistics publishes the “Current Index to Statistics” (www.statindex.org). Reference are drawn from about 120 “core journals” that are fully indexed, about 400 “noncore journals” from which articles are selected that have Adv. statistical content, proceedings and edited books, and other sources. The current index to statistics Extended Database (CIS-ED) includes coverage of about 120 "core journals", selected articles since 1974 from about 900 additional journals (cumulatively), and about 8,000 books in statistics published since 1974. The CIS Print Volume is published annually. Each year's edition indexes primarily material appearing in the statistics literature in a single year. The Print Volume consists of two main components. The Author Index lists articles by the names of authors, and includes the full citation for each article. The Subject Index lists articles by the significant words appearing in the title and additional key words descriptive of the subject matter of the article. (For instance, Volume 24 of the Print Volume contains approximately 9300 entries, 71% of which appeared in 1998. About 4800 of the entries came from 111 core journals.) Alternatively there is “Statistical Theory & Method Abstracts” published by “International Statistics Institute” (Voorburg, Netherlands). Abstracts are given of papers on probability and statistics, as well as important applications. Relevant papers are taken from journals specialized in these subject fields, as well as from journals that are largely devoted to other fields, but regularly contain papers of interest. Other sources include collective works such as conference proceedings, Festschriften, and commemorative volumes. In addition, research reports and monographs in a series, which are not proper textbooks but have the character of extended research paper are being abstracted. On the following journals, abstracting is virtually complete Appl. Prob. Ann. Appl. Prob. Ann. Inst. Henri Poincaré Ann. Prob. Ann. Statist. Appl. Statist. Appl. Stoch. Models Data anal. Austral. & New Zealand J. Statist. Bernoulli Biometrics Biometrika
Biom. J. Calcutta Statist. Ass. Bull. Canad. J. Statist. Commun. Statist.-Simul. Comp. Commun. Statist.-Theor. Meth. Comp. Statist. Data Anal. Comp. Statist. Econometric Rev. Econometric Theory Egypt. Statist. J. Environmetrics
656
Appendix F: Bibliographic Indexes
Extremes Finance and Stochastics Insurance: Math. Econom. Int. Statist. Rev. J. Amer. Statist. Ass. J. Appl. Prob. J. Appl. Statist. J. Appl. Statist. Sci. J. Biopharmaceutical statist. J. Chinese Statist. Ass. J. Comp. Graph. Statisti. J. Econometrics J. Indian Soc. Agri. Statist. J. Indian Statist. Ass. J. Italian Statist. Soc. J. Japanist. Statist. Soc. J. Korean. Statist. Soc. J. Multivariate Anal. J. Nonparameteric Statist. J. R. Statist. Soc. B J. Risk and Uncertainty J. Statist. Comp. Simul. J. Statist. Planning Infer. J. Theor. Prob. J. Time Ser. Anal. Korean J. Appl. Statist. Kybernetika Lifetime Data Anal. Markov Process. Rel. Fields Math. Methods Statist. Metrika Metron Monte Carlos Methods Appl. Pakistan J. Statist. Prob. Math. Statist.
Prob. Theory relat. Fields Prob. Engrg. Mechanics Proc. Inst. Statist. Math. Psychometrika Queueing Systems Theory Appl. Random Operators & Stoch. Equat. Random Structures & Algorithms Rev. Brasil. Prob. Eststist. Rev. Financial Studies S. Afr. Statist. J. Sankhya (A and B) Scand. J. Statist. Sequent. Anal. Statist. & Dec. Statist. Med. Statistica Statistica Sinica Statistics Statist. & computing Statist. Methods Med. Res. Statist. Neerl. Statist. Papers Statist. Prob. Letters Statist. Sci. Statistician Stoch. Anal. Appl. Stoch. Models Stoch. Processes Appl. Stoch. & Stoch. Reports Student Technometrics Test Theory Prob. Appl.
Each issue contains an Author Index where all the abstracts are listed according to names of all the authors showing the abstract number and the classification number, Classfication Scheme 2000 (http://www.cbs.nl/isi/stma.htm)
Appendix F: Bibliographic Indexes
657
contain 16 subject entries, namely Appl. Prob. Ann. Appl. Prob. Ann. Inst. Henri Poincaré Ann. Prob. Ann. Statist. Appl. Statist. Appl. Stoch. Models Data anal. Austral. & New Zealand J. Statist. Bernoulli Biometrics Biometrika
Biom. J. Calcutta Statist. Ass. Bull. Canad. J. Statist. Commun. Statist.-Simul. Comp. Commun. Statist.-Theor. Meth. Comp. Statist. Data Anal. Comp. Statist. Econometric Rev. Econometric Theory Egypt. Statist. J. Environmetrics
References Abbe, E. (1906): Über die Gesetzmäßigkeit in der Verteilung der Fehler bei Beobachtungsreihen, Gesammelte Abhandlungen, vol. II, Jena 1863 (1906) Abdous, B. and R. Theodorescu (1998): Mean, median, mode IV, Statistica Neerlandica 52 (1998) 356-359 Absil, P.-A. et al. (2002) : A Grassmann-Rayleigh quotient iteration for computing invariant subspaces, SIAM Review 44 (2002) 57-73 Abramowitz, M. and J.A. Stegun (1965): Handbook of mathematical functions, Dover Publication, New York 1965 Adams, M. and V. Guillemin (1996): Measure theory and probability, 2nd edition, Birkhäuser Verlag, Boston 1996 Adatia, A. (1996): Asymptotic blues of the parameters of the logistic distribution based on doubly censored samples, J. Statist. Comput. Simul. 55 (1996) 201-211 Adelmalek, N.N. (1974): On the discrete linear L1 approximation and L1 solutions of overdetermined linear equations, J. Approximation Theory 11 (1974) 38-53 Afifi, A.A. and V. Clark (1996): Computer-aided multivariate analysis, Chapman and Hall, Boca Raton 1996 Agostinelli, C. and M. Markatou (1998): A one-step robust estimator for regression based on the weighted likelihood reweighting scheme, Stat.& Prob. Letters 37 (1998) 341350 Agrò, G. (1995): Maximum likelihood estimation for the exponential power function parameters, Comm. Statist. Simul. Comput. 24 (1995) 523-536 Aickin, M. and C. Ritenbaugh (1996): Analysis of multivariate reliability structures and the induced bias in linear model estimation, Statistics in Medicine 15 (1996) 16471661 Aitchinson, J. and I.R. Dunsmore (1975): Statistical prediction analysis, Cambridge University Press, Cambridge 1975 Aitken, A.C. (1935): On least squares and linear combinations of observations, Proc. Roy. Soc. Edinburgh 55 (1935) 42-48 Airy, G.B. (1861): On the algebraical and numerical theory of errors of observations and the combination of observations, Macmillan Publ., London 1861 Akdeniz, F., Erol, H. and F. Oeztuerk (1999): MSE comparisons of some biased estimators in linear regression model, J. Applied Statistical Science 9 (1999) 73-85 Albert, A. (1969): Conditions for positive and nonnegative definiteness in terms of pseudo inverses, SIAM J. Appl. Math. 17 (1969) 434-440 Albertella, A. and F. Sacerdote (1995): Spectral analysis of block averaged data in geopotential global model determination, J. Geodesy 70 (1995) 166-175 Alcala, J.T., Cristobal, J.A. and W. Gonzalez-Manteiga (1999): Goodness-of-fit test for linear models based on local polynomials, Statistics & Probability Letters 42 (1999) 39-46 Aldrich, J. (1999): Determinacy in the linear model: Gauss to Bose and Koopmans, International Statistical Review 67 (1999) 211-219 Alefeld, G. and J. Herzberger (1983): Introduction to interval computation. Computer science and applied mathematics, Academic Press, New York - London 1983 Ali, A.K.A., Lin, C.Y. and E.B. Burnside (1997): Detection of outliers in mixed model analysis, The Egyptian Statistical Journal 41 (1997) 182-194 Allende, S., Bouza, C. and I. Romero (1995): Fitting a linear regression model by combining least squares and least absolute value estimation, Questiio 19 (1995) 107-121
660
References
Allmer, F. (2001): Louis Krüger (1857-1923), 25 pages, Technical University of Graz, Graz 2001 Alzaid, A.A. and L. Benkherouf (1995): First-order integer-valued autoregressive process with Euler marginal distributions, J. Statist. Res. 29 (1995) 89-92 Ameri, A. (2000): Automatic recognition and 3D reconstruction of buildings through computer vision and digital photogrammetry, Dissertation, Stuttgart University, Stuttgart 2000 An, H.-Z., F.J. Hickernell and L.-X. Zhu (1997): A new class of consistent estimators for stochastic linear regressive models, J. Multivar. Anal. 63 (1997) 242-258 Anderson, P.L. and M.M. Meerscaert (1997): Periodic moving averages of random variables with regularly varying tails, Ann. Statist. 25 (1997) 771-785 Anderson, T.W. (1973): Asymptotically efficient estimation of covariance matrices with linear structure, The Annals of Statistics 1 (1973) 135-141 Anderson, T.W. (2003): An introduction to multivariate statistical analysis, Wiley, Stanford CA, 2003 Anderson, T.W. and M.A. Stephens (1972): Tests for randomness of directions against equatorial and bimodal alternatives, Biometrika 59 (1972) 613-621 Anderson, W.N. and R.J. Duffin (1969): Series and parallel addition of matrices, J. Math. Anal. Appl. 26 (1969) 576-594 Ando, T. (1979): Generalized Schur complements, Linear Algebra Appl. 27 (1979) 173186 Andrews, D.F. (1971): Transformations of multivariate data, Biometrics 27 (1971) 825840 Andrews, D.F. (1974): A robust method for multiple linear regression, Technometrics 16 (1974) 523-531 Andrews, D.F., Bickel, P.J. and F.R. Hampel (1972): Robust estimates of location, Princeton University Press, Princeton 1972 Anh, V.V. and T. Chelliah (1999): Estimated generalized least squares for random coefficient regression models, Scandinavian J. Statist. 26 (1999) 31-46 Anido, C. and T. Valdés (2000): Censored regression models with double exponential error distributions: an iterative estimation procedure based on medians for correcting bias, Revista Matemática Complutense 13 (2000) 137-159 Ansley, C.F. (1985): Quick proofs of some regression theorems via the QR Algorithm, The American Statistician 39 (1985) 55-59 Antoch, J. and H. Ekblom (2003): Selected algorithms for robust M- and L-Regression estimators, Developments in Robust Statistics, pp. 32-49, Physica Verlag, Heidelberg 2003 Anton, H. (1994): Elementary linear algebra, Wiley, New York 1994 Arnold, B.C. and N. Balakrishnan (1989): Relations, bounds and approximations for order statistics, Lecture Notes in Statistics 53 (1989) 1-37 Arnold, B.C. and R.M. Shavelle (1998): Joint confidence sets for the mean and variance of a normal distribution, American Statistical Association 52 (1998) 133-140 Arnold, B.F. and P. Stahlecker (1998): Prediction in linear regression combining crisp data and fuzzy prior information, Statistics & Decisions 16 (1998) 19-33 Arnold, B.F. and P. Stahlecker (1999): A note on the robustness of the generalized least squares estimator in linear regression, Allg. Statistisches Archiv 83 (1999) 224-229 Arnold, K.J. (1941): On spherical probability distributions, P.H. Thesis, Boston 1941 Arnold, S.F. (1981): The theory of linear models and multivariate analysis, J. Wiley, New York 1981
References
661
Arrowsmith, D.K. and C.M. Place (1995): Differential equations, maps and chaotic behaviour, Champman and Hall, London 1995 Arun, K.S. (1992): A unitarily constrained total least squares problem in signal processing, SIAM J. Matrix Anal. Appl. 13 (1992) 729-745 Atkinson, A.C. and L.M. Haines (1996): Designs for nonlinear and generalized linear models, Handbook of Statistik 13 (1996) 437-475 Aven, T. (1993): Reliability and risk analysis, Chapman and Hall, Boca Raton 1993 Awange, J.L. (2002): Gröbner bases, multipolynomial resultants and the Gauss-Jacobi combinatorial algorithms – adjustment of nonlinear GPS/LPS observations, Schriftenreihe der Institute des Studiengangs Geodäsie und Geoinformatik, Report 2002.1 Awange, J. and E.W. Grafarend (2002): Linearized Least Squares and nonlinear GaussJacobi combinatorial algorithm applied to the 7-parameter datum transformation C7(3) problem, Z. Vermessungswesen 127 (2002) 109-116 Axler, S., Bourdon, P. and W. Ramey, (2001): Harmonic function theory, 2nd ed., Springer Verlag, New York 2001 Azzalini, A. (1996): Statistical inference, Chapman and Hall, Boca Raton 1996 Azzam, A.M.H. (1996): Inference in linear models with nonstochastic biased factors, Egyptian Statistical Journal, ISSR - Cairo University 40 (1996) 172-181 Azzam, A., Birkes, D. and J. Seely (1988): Admissibility in linear models with polyhedral covariance structure, probability and statistics, essays in honor of Franklin A. Graybill, J.N. Srivastava, Ed. Elsevier Science Publishers, B.V. (North-Holland) 1988 Baarda, W. (1967): A generalization of the concept strength of the figure, Publications on Geodesy, New Series 2, Delft 1967 Baarda, W. (1968): A testing procedure for use in geodetic networks, Netherlands Geodetic Commission, New Series, Delft, Netherlands, 2 (5) 1968 Baarda, W. (1973): S-transformations and criterion matrices, Netherlands Geodetic Commission, Vol. 5, No. 1, Delft 1973 Baarda, W. (1977): Optimization of design and computations of control networks, F. Halmos and J. Somogyi (eds.) Akademiai Kiado, Budapest 1977, 419-436 Babai, L. (1986): On Lovasz' lattice reduction and the nearest lattice point problem, Combinatorica 6 (1986) 1-13 Babu, G.J. and E.D. Feigelson (1996): Astrostatistics, Chapman and Hall, Boca Raton 1996 Baddeley, A.J. (2000): Time-invariance estimating equations, Bernoulli 6 (2000) 783-808 Bai, J. (1994): Least squares estimation of a shift in linear processes, J.Time Series Analysis 15 (1994) 453-472 Bai, Z.D. and Y. Wu (1997): General M-estimation, J. Multivariate Analysis 63 (1997) 119-135 Bai, Z.D., Rao, C.R. and Y.H. Wu (1997): Robust inference in multivariate linear regression using difference of two convex functions as the discrepancy measure, Handbook of Statistics 15 (1997) 1-19 Bai, Z.D., Chan, X.R., Krishnaiah, P.R. and L.C. Zhao (1987): Asymptotic properties of the EVLP estimation for superimposed exponential signals in noise, Technical Report 87-19, CMA, University of Pittsburgh, Pittsburgh 1987 Baksalary, J.K. and A. Markiewicz (1988): Admissible linear estimators in the general Gauss-Markov model, J. Statist. Planning and Inference 19 (1988) 349-359 Baksalary, J.K. and F. Pukelsheim (1991b): On the Löwner, minus and star partial orderings of nonnegative definite matrices and their squares, Linear Algebra and its Applications 151 (1991) 135-141
662
References
Baksalary, J.K., Liski, E.P. and G. Trenkler (1989): Mean square error matrix improvements and admissibility of linear estimators, J. Statist. Planning and Inference 23 (1989) 313-325 Baksalary, J.K., Rao, C.R. and A. Markiewicz (1992): A study of the influence of the 'natural restrictions' on estimation problems in the singular Gauss-Markov model, J. Statist. Planning and Inference 31 (1992) 335-351 Balakrishnan, N. and Basu, A.P. (1995) (eds.): The exponential distribution, Gordon and Breach Publishers, Amsterdam 1995 Balakrishnan, N. and R.A. Sandhu (1996): Best linear unbiased and maximum likelihood estimation for exponential distributions under general progressive type-II censored samples, Sankhya 58 (1996) 1-9 Bamler, R. and P. Hartl (1998): Synthetic aperture radar interferometry, Inverse Problems 14 (1998) R1-R54 Banachiewicz, T. (1937): Zur Berechnung der Determinanten, wie auch der Inversen und zur darauf basierten Auflösung der Systeme linearen Gleichungen, Acta Astronom. Ser. C3 (1937) 41-67 Bansal, N.K., Hamedani, G.G. and H. Zhang (1999): Non-linear regression with multidimensional indices, Statistics & Probability Letters 45 (1999) 175-186 Bapat, R.B. (2000): Linear algebra and linear models, Springer, New York 2000 Barankin, E.W. (1949): Locally best unbiased estimates, Ann. Math. Statist. 20 (1949) 477-501 Barbieri, M.M. (1998): Additive and innovational outliers in autoregressive time series: a unified Bayesian approach, Statistica 3 (1998) 395-409 Barham, R.H. and W. Drane (1972): An algorithm for least squares estimation of nonlinear parameters when some of the parameters are linear, Technometrics 14 (1972) 757766 Bar-Itzhack, I.Y. and F. L., Markley (1990): Minimal parameter solution of the orthogonal matrix differential equation, IEEE Transactions on automatic control 35 (1990) 314-317 Barlow, J.L. (1993): Numerical aspects of solving linear least squares problems, C.R. Rao, ed., Handbook of Statistic 9 (1993) 303-376 Barlow, R.E. and F. Proschan (1966): Tolerance and confidence limits for classes of distributions based on failure rate, Ann. Math. Statist 37 (1966) 1593-1601 Barlow, R.E., Clarotti, C.A., and F. Spizzichino (eds) (1993): Reliability and decision making, Chapman and Hall, Boca Raton 1993 Barnard, J., McCulloch, R. and X.-L. Meng (2000): Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage, Statistica Sinica 10 (2000) 1281-1311 Barnard, M.M. (1935): The scalar variations of skull parameters in four series of Egyptian skulls, Ann. Eugen. 6 (1935) 352-371 Barndorff-Nielsen, O.E. (1978): Information and exponential families in statistical theory, Wiley & Sons, Chichester & New York 1978 Barndorff-Nielsen, O.E., Cox, D.R. and C. Klüppelberg (2001): Complex stochastic systems, Chapman and Hall, Boca Raton, Florida 2001 Barnett, V. (1999): Comparative statistical inference, 3rd ed., Wiley, Chichester 1999 Barone, J. and A. Novikoff (1978): A history of the axiomatic formulation of probability from Borel to Kolmogorov, Part I, Archive for History of Exact Sciences 18 (1978) 123-190 Barrio, R. (2000): Parallel algorithms to evaluate orthogonal polynomial series, SIAM J. Sci. Comput 21 (2000) 2225-2239
References
663
Barrlund, A. (1998): Efficient solution of constrained least squares problems with Kronecker product structure, SIAM J. Matrix Anal. Appl. 19 (1998) 154-160 Barrodale, I. and D.D. Oleski (1981): Exponential approximation using Prony's method, The Numerical Solution of Nonlinear Problems, eds. Baker, C.T.H. and C. Phillips, 258-269, 1998 Bartlett, M.S. and D.G. Kendall (1946): The statistical analysis of variance-heterogeneity and the logarithmic transformation, Queen's College Cambridge, Magdalen College Oxford, Cambridge/Oxford 1946 Bartoszynski, R. and M. Niewiadomska-Bugaj (1996): Probability and statistical inference, Wiley, New York 1996 Barut, A.O. and R.B.Haugen (1972): Theory of the conformally invariant mass, Annals of Physics 71 (1972) 519-541 Basset JR., G. and R. Koenker (1978): Asymptotic theory of least absolute error , J. Amer. Statist. Assoc. 73 (1978) 618-622 Bateman, H. (1910a): The transformation of the electrodynamical equations, Proc. London Math. Soc. 8 (1910) 223-264, 469-488 Bateman, H. (1910b): The transformation of coordinates which can be used to transform one physical problem into another, Proc. London Math. Soc. 8 (1910) 469-488 Bates, D.M. and M.J. Lindstorm (1986): Nonlinear least squares with conditionally linear parameters, Proceedings of the Statistical Computation Section, American Statistical Association, Washington 1986 Bates, D.M. and D.G. Watts (1980): Relative curvature measures of nonlinearity (with discussion), J. Royal Statist. Soc. Ser. B 42 (1980) 1-25 Bates, D.M. and D.G. Watts (1988a): Nonlinear regression analysis and its applications, John Wiley, New York 1988 Bates, D.M. and D.G. Watts (1988b): Applied nonlinear regression, J. Wiley, New York 1988 Bates, R.A., Riccomagno, E., Schwabe, R. and H.P. Wynn (1998): Lattices and dual lattices in optimal experimental design for Fourier models, Computational Statistics & Data Analysis 28 (1998) 283-296 Batschelet, E. (1965): Statistical methods for the analysis of problems in animal orientation and certain biological rhythms, Amer. Inst. Biol. Sciences, Washington 1965 Batschelet, E. (1971): Recent statistical methods for orientation, (Animal Orientation Symposium 1970 on Wallops Island), Amer. Inst. Biol. Sciences, Washington, D.C., 1971 Batschelet, E. (1981): Circular statistics in biology, Academic Press, London 1981 Bauer, H. (1992): Maß- und Integrationstheorie, 2. Auflage, Walter de Gruyter, Berlin / New York 1992 Bauer, H. (1996): Probability theory, de Gruyter Verlag, Berlin-New York 1996 Bayen, F. (1976): Conformal invariance in physics, in: Cahen, C. and M. Flato (eds.), Differential geometry and relativity, Reidel Publ., pages 171-195, Dordrecht 1976 Beale, E.M. (1960): Confidence regions in non-linear estimation, J. Royal Statist. Soc. B 22 (1960) 41-89 Beaton, A.E. and J.W. Tukey (1974): The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data, Technometrics 16 (1974) 147-185 Becker, T., Weispfennig, V. and H. Kredel (1998): Gröbner bases: a computational approach to commutative algebra, New York, Springer 1998 Beckermann, B. and E.B. Saff (1999): The sensitivity of least squares polynomial approximation, Int. Series of Numerical Mathematics, vol. 131: Applications and computation of orthogonal polynomials (eds. W. Gautschi, G.H. Golub, G. Opfer) pp. 119, Birkhäuser Verlag, Basel 1999
664
References
Beckers, J., Harnad, J., Perroud, M. and P. Winternitz (1978): Tensor fields invariant under subgroups of the conformal group of space-time, J. Math. Phys. 19 (1978) 2126-2153 Behnken, D.W. and N.R. Draper (1972): Residuals and their variance, Technometrics 11 (1972) 101-111 Behrens, W.A. (1929): Ein Beitrag zur Fehlerberechnung bei wenigen Beobachtungen, Landwirtschaftliche Jahrbücher 68 (1929) 807-837 Beichelt, F. (1997): Stochastische Prozesse für Ingenieure, Teubner Stuttgart 1997 Belikov, M.V. (1991): Spherical harmonic analysis and synthesis with the use of columnwise recurrence relations, Manuscripta Geodaetica 16 (1991) 384-410 Belikov, M.V. and K.A. Taybatorov (1992): An efficient algorithm for computing the Earth’s gravitational potential and its derivatives at satellite altitudes, Manuscripta Geodaetica 17 (1992) 104-116 Belmehdi, S., Lewanowicz, S. and A. Ronveaux (1997): Linearization of the product of orthogonal polynomials of a discrete variable, Applicationes Mathematicae 24 (1997) 445-455 Ben-Israel, A. and T. Greville (1974): Generalized inverses: Theory and applications, Wiley, New York 1974 Benbow, S.J. (1999): Solving generalized least-squares problems with LSQR, SIAM J. Matrix Anal. Appl. 21 (1999) 166-177 Benda, N. and R. Schwabe (1998): Designing experiments for adaptively fitted models, in: MODA 5 – Advances in model-oriented data analysis and experimental design, Proceedings of the 5th International Workshop in Marseilles, eds. Atkinson, A.C., Pronzato, L. and H.P. Wynn, Physica-Verlag, Heidelberg 1998 Bennett, R.J. (1979): Spatial time series, Pion Limited, London 1979 Beran, R.J. (1968): Testing for uniformity on a compact homogeneous space, J. App. Prob. 5 (1968) 177-195 Beran, R.J. (1979): Exponential models for directional data, Ann. Statist. 7 (1979) 11621178 Beran, R.J. (1994): Statistical methods for long memory processes, Chapman and Hall, Boca Raton 1994 Berberan, A. (1992): Outlier detection and heterogeneous observations – a simulation case study, Australian J. Geodesy, Photogrammetry and Surveying 56 (1992) 49-61 Berger, M.P.F. and F.E.S. Tan (1998): Optimal designs for repeated measures experiments, Kwantitatieve Methoden 59 (1998) 45-67 Berman, A. and R.J. Plemmons (1979): Nonnegative matrices in the mathematical sciences, Academic Press, New York 1979 Bertsekas, D.P. (1996): Incremental least squares methods and the extended Kalman filter, Siam J. Opt. 6 (1996) 807-822 Berry, J.C. (1994): Improving the James-Stein estimator using the Stein variance estimator, Statist. Probab. Lett. 20 (1994) 241-245 Bertuzzi, A., Gandolfi, A. and C. Sinisgalli (1998): Preference regions of ridge regression and OLS according to Pitman’s criterion, Sankhya: The Indian J.Statistics 60 (1998) 437-447 Bessel, F.W. (1838): Untersuchungen über die Wahrscheinlichkeit der Beobachtungsfehler, Astronomische Nachrichten 15 (1838) 368-404 Betensky, R.A. (1997): Local estimation of smooth curves for longitudinal data, Statistics in Medicine 16 (1997) 2429-2445 Beylkin, G. and N. Saito (1993): Wavelets, their autocorrelation function and multidimensional representation of signals, in: Proceedings of SPIE - The international society of optical engineering, Vol. LB 26, Int. Soc. for Optical Engineering, Bellingham 1993
References
665
Bhatia, R. (1996): Matrix analysis, Springer Verlag, New York 1996 Bhattacharya, R.N. and C.R. Rao (1976): Normal approximation and asymptotic expansions, Wiley, New York 1976 Bhattacharya, R.N. and E.C. Waymire (2001): Iterated random maps and some classes of Markov processes, D. N. Shanbhag and C.R. Rao, eds., Handbook of Statistic 19 (2001) 145-170 Bibby, J. (1974): Minimum mean square error estimation, ridge regression, and some unanswered questions, colloquia mathematica societatis Janos Bolyai, Progress in statistics, ed. J. Gani, K. Sarkadi, I. Vincze, Vol. I, Budapest 1972, North Holland Publication Comp., Amsterdam 1974 Bickel, P.J. and K.A. Doksum (1977a): Mathematical statistics - Distribution theory for transformations of random vectors, pp. 9-41, Holden-Day Inc 1977 Bickel, P.J. and K.A. Doksum (1977b): Mathematical statistics – Optimal tests and confidence intervals: Likelihood ratio tests and related procedures, pp. 192-247, HoldenDay Inc 1977 Bickel, P.J. and K.A. Doksum (1977c): Mathematical statistics – Basic ideas and selected topics, pp. 369-406, Holden-Day Inc 1977 Bickel, P.J. and K.A. Doksum (1981): An analysis of transformations revisited, J.the Maerican Statistical Association 76 (1981) 296-311 Bickel, P.J., Doksum, K. and J.L. Hodges (1982): A Festschrift for Erich L. Lehmann, Chapman and Hall, Boca Raton 1982 Bierman, G.J. (1977): Factorization Methods for discrete sequential estimation, Academic Press, New York 1997 Bill, R. (1985b): Kriteriummatrizen ebener geodätischer Netze, Deutsche Geodätische Kommission, München, Reihe A, No. 102 Bilodeau, M. and D. Brenner (1999): Theory of multivariate statistics, Springer Verlag 1999 Bilodeau, M. and P. Duchesne (2000): Robust estimation of the SUR model, The Canadian J.Statistics 28 (2000) 277-288 Bingham, C. (1964): Distributions on the sphere and propetive plane, PhD. Thesis, Yale University 1964 Bingham, C. (1974): An antipodally symmetric distribution on the sphere, Ann. Statist. 2 (1974) 1201-1225 Bingham, C., Chang, T. and D. Richards (1992): Approximating the matrix Fisher and Bingham distributions: Applications to spherical regression and Procrustes analysis, J.Multivariate Analysis 41 (1992) 314-337 Bingham, N.H. (2001): Random Walk and fluctuation theory, D. N. Shanbhag and C.R. Rao, eds., Handbook of Statistic 19 (2001) 171-213 Bini, D. and V. Pan (1994): Polynomial and matrix computations, Vol. 1: Fundamental Algorithms, Birkhäuser, Boston 1994 Bill, R. (1984): Eine Strategie zur Ausgleichung und Analyse von Verdichtungsnetzen, Deutsche Geodätische Kommission, Bayerische Akademie der Wissenschaften, Report C295, 91 pp., München 1984 Bischof, C.H. and G. Quintana-Orti (1998): Computing rank-revealing QR factorizations of dense matrices, ACM Transactions on Mathematical Software 24 (1998) 226-253 Bischoff, W. (1992): On exact D-optimal designs for regression models with correlated observations, Ann. Inst. Statist. Math. 44 (1992) 229-238 Bjerhammar, A. (1951a): Rectangular reciprocal matrices with special reference to calculation, Bull. Géodésique 20 (1951) 188-210
666
References
Bjerhammar, A. (1951b): Application of calculus of matrices to the method of leastsquares with special reference to geodetic calculations, Trans. RIT, No. 49, Stockholm 1951 Bjerhammar, A. (1955): En ny matrisalgebra, SLT 211-288, Stockholm 1955 Bjerhammar, A. (1958): A generalized matrix algebra, Trans. RIT, No. 124, Stockholm 1958 Bjerhammar, A. (1973): Theory of errors and generalized matrix inverses, Elsevier, Amsterdam 1973 Björck, A. (1967): Solving linear least squares problems by Gram-Schmidt orthogonalization, Nordisk Tidskr. Informationsbehandlung (BIT) 7 (1967) 1-21 Björck, A. (1996): Numerical methods for least squares problems, SIAM, Philadelphia 1996 Björck, A. and G.H. Golub (1973): Numerical methods for computing angles between linear subspaces, Mathematics of Computation 27 (1973) 579-594 Björkström, A. and R. Sundberg (1999): A generalized view on continuum regression, Scandinavian J.Statistics 26 (1999) 17-30 Blaker, H. (1999): Shrinkage and orthogonal decomposition, Scandinavian J.Statistics 26 (1999) 1-15 Blewitt, G. (2000): Geodetic network optimization for geophysical parameters, Geophysical Research Letters 27 (2000) 2615-3618 Bloomfield, P. and W.L. Steiger (1983): Least absolute deviations - theory, applications and algorithms, Birkhäuser Verlag, Boston 1983 Bobrow, J.E. (1989): A direct minimization approach for obtaining the distance between convex polyhedra, Int. J. Robotics Research 8 (1989) 65-76 Boggs, P.T., Byrd, R.H. and R.B. Schnabel (1987): A stable and efficient algorithm for nonlinear orthogonal distance regression, SIAM J. Sci. Stat. Comput. 8 (1987) 10521078 Bolfarine, H. and M. de Castro (2000): ANOCOVA models with measurement errors, Statistics & Probability Letters 50 (2000) 257-263 Bollerslev, T. (1986): Generalized autoregressive conditional heteroskedasticity, J. Econometrics 31 (1986) 307-327 Booth, J.G. and J.P. Hobert (1996): Standard errors of prediction in generalized linear mixed models, J. American Statist. Assoc. 93 (1996) 262-272 Bordes L., Nikulin, M. and V. Voinov (1997): Unbiased estimation for a multivariate exponential whose components have a common shift, J. Multivar. Anal. 63 (1997) 199-221 Borg, I. and P. Groenen (1997): Modern multidimensional scaling, Springer Verlag, New York 1997 Borovkov, A.A. (1998): Mathematical statistics, Gordon and Breach Science Publishers, Amsterdam 1998 Borre, K. (2001): Plane networks and their applications, Birkhäuser Verlag, Basel 2001 Bose, R.C. (1944): The fundamental theorem of linear estimation, Proc. 31st Indian Scientific Congress (1944) 2-3 Bossler, J. (1973): A note on the meaning of generalized inverse solutions in geodesy, J. Geoph. Res. 78 (1973) 2616 Bossler, J., Grafarend, E.W. and R. Kelm (1973): Optimal design of geodetic nets II, J. Geoph. Res. 78 (1973) 5887-5897 Boulware, D.G., Brown, L.S. and R.D. Peccei (1970): Deep inelastic electroproduction and conformal symmetry, Physical Review D2 (1970) 293-298
References
667
Box, G.E.P. and D.R. Cox (1964): Analysis of transformations, J.the Royal Statistical Society, Series B 26 (1964) 211-252 Box, G.E.P. and G. Tiao (1973): Bayesian inference in statistical analysis, AddisonWesley, Reading 1973 Box, G.E.P. and N.R. Draper (1987): Empirical model-building and response surfaces, J. Wiley, New York 1987 Box, M.J. (1971): Bias in nonlinear estimation, J. Royal Statistical Society B33 (1971) 171-201 Branco, M.D. (2001): A general class of multivariate skew-elliptical distributions, J.Multivariate Analysis 79 (2001) 99-113 Brandt, S. (1992): Datenanalyse. Mit statistischen Methoden und Computerprogrammen, 3. Aufl., BI Wissenschaftsverlag, Mannheim 1992 Brandt, S. (1999): Data analysis: statistical and computational methods for scientists and engineers, 3rd ed., Springer Verlag, New York 1999 Braess, D. (1986): Nonlinear approximation theory, Springer-Verlag, Berlin 1986 Breckling, J. (1989): Analysis of directional time series: application to wind speed and direction, Springer Verlag, Berlin 1989 Brémaud P. (1999): Markov Chains – Gibbs Fields, Monte Carlo Simulation and Queues, Springer Verlag New York 1999 Breslow, N.E. and D.G. Clayton (1993): Approximate inference in generalized linear mixed models, J. Amer. Statist. Assoc. 88 (1993) 9-25 Brezinski, C. (1999): Error estimates for the solution of linear systems, SIAM J. Sci. Comput. 21 (1999) 764-781 Brill, M. and E. Schock (1987): Iterative solution of ill-posed problems - a survey, in: Model optimization in exploration geophysics, ed. A. Vogel, Vieweg, Braunschweig 1987 Bro, R. (1997): A fast non-negativity-constrained least squares algorithm, J. Chemometrics 11 (1997) 393-401 Bro, R. and S. de Jong (1997): A fast non-negativity-constrained least squares algorithm, J.Chemometrics 11 (1997) 393-401 Brock, J.E. (1968): Optimal matrices describing linear systems, AIAA J. 6 (1968) 12921296 Brockwell, P.J. (2001): Continuous-time ARMA Processes, D. N. Shanbhag and C.R. Rao, eds., Handbook of Statistic 19 (2001) 249-276 Brovelli, M.A., Sanso, F. and G. Venuti (2003): A discussion on the Wiener-Kolmogorov prediction principle with easy-to-compute and robust variants, J. Geodesy 76 (2003) 673-683 Brown, B. and R. Mariano (1989): Measures of deterministic prediction bias in nonlinear models, Int. Econ. Rev. 30 (1989) 667-684 Brown, B.M., Hall, P. and G.A. Young (1997): On the effect of inliers on the spatial median, J. Multivar. Anal. 63 (1997) 88-104 Brown, H. and R. Prescott (1999): Applied mixed models in medicine, J. Wiley, Chichester 1999 Brown, K.G. (1976): Asymptotic behavior of Minque-type estimators of variance components, The Annals of Statistics 4 (1976) 746-754 Brualdi, R.A. and H. Schneider (1983): Determinantal identities: Gauss, Schur, Cauchy, Sylvester, Kronecker, Jacobi, Binet, Laplace, Muir and Cayley, Linear Algebra Appl. 52/53 (1983) 765-791 Brunk, H.D. (1958): On the estimation of parameters restricted by inequalities, Ann. Math. Statist. 29 (1958) 437-454
668
References
Brunner, F.K., Hartinger, H. and L. Troyer (1999): GPS signal diffraction modelling: the stochastic sigma- ' model, J. Geodesy 73 (1999) 259-267 Bruno, A.D. (2000): Power geometry in algebraic and differential equations, Elsevier, Amsterdam-Lausanne-New York-Oxford-Shannon-Singapore-Tokyo 2000 Brzézniak, Z. and T. Zastawniak (1959): Basic stochastic processes, Springer Verlag, Berlin 1959 Buja, A. (1996): What Criterion for a Power Algorithm?, Rieder, H. (editor): Robust statistics, data analysis and computer intensive methods, In honour of Peter Huber’s 60th Birthday, Springer 1996 Buhmann, M.D. (2001): Approximation and interpolation with radial functions, In: Multivariate Approximation and Applications, pp. 25-43, Cambridge University Press, Cambridge 2001 Bunday, B.D., Bokhari S.M.H. and K.H. Khan (1997): A new algorithm for the normal distribution function, Sociedad de Estadistica e Investigacion Operativa 6 (1997) 369377 Bunke, H. und O. Bunke (1974): Identifiability and estimability, Math. Operationsforschg. Statist. 5 (1974) 223-233 Bunke, H. and O. Bunke (1986): Statistical inference in linear models, J. Wiley, New York 1986 Bunke, O. (1977): Mixed models, empirical Bayes and Stein estimators, Math. Operationsforschg. Ser. Statistics 8 (1977) 55-68 Buonaccorsi, J., Demidenko, E. and T. Tosteson (2000): Estimation in longitudinal random effects models with measurement error, Statistica Sinica 10 (2000) 885-903 Burgio, G. and Y. Nitkitin (1998): Goodness-of-fit tests for normal distribution of order p and their asymptotic effiency, Statistica 58 (1998) 213-230 Burns, F., Carlson, D., Haynsworth, E., and T. Markham (1974): Generalized inverse formulas using the Schur complement, SIAM J. Appl. Math. 26 (1974) 254-259 Businger, P. and G.H. Golub (1965): Linear least squares solutions by Householder transformations, Numer. Math., 7 (1965) 269-276 Butler, N.A. (1999): The efficiency of ordinary least squares in designed experiments subject to spatial or temporal variation, Statistics & Probability Letters 41 (1999) 7381 Caboara, M. and E. Riccomagno (1998): An algebraic computational approach to the identifiability of Fourier models, J. Symbolic Computation 26 (1998) 245-260 Cadet, A. (1996): Polar coordinates in Rnp, application to the computation of the Wishart and Beta laws, Sankhya: The Indian J.Statistics 58 (1996) 101-114 Cai, J., Grafarend, E.W. and B. Schaffrin (2004): The A-optimal regularization parameter in uniform Tykhonov-Phillips regularization - D -weighted BLE -, V Hotine-Marussi Symposium on Mathematical Geodesy, Matera / Italy 2003, in: International Association of Geodesy Symposia 127, pp. 309-324, Springer Verlag Berlin – Heidelberg 2004 Cambanis, S. and I. Fakhre-Zakeri (1996): Forward and reversed time prediction of autoregressive sequences, J. Appl. Prob. 33 (1996) 1053-1060 Campbell, H.G. (1977): An introduction to matrices, vectors and linear programming, 2nd ed., Printice Hall, Englewood Cliffs 1977 Candy, J.V. (1988): Signal processing, McGrow Hill, New York 1988 Cantoni, E. (2003): Robust inference based on quasi-likelihoods for generalized linear models and longitudinal data, Developments in Robust Statistics, pp. 114-124, Physica Verlag, Heidelberg 2003 Carlin, B.P. and T.A. Louis (1996): Bayes and empirical Bayes methods, Chapman and Hall, Boca Raton 1996
References
669
Carlitz, L. (1963): The inverse of the error function, Pacific J. Math. 13 (1963) 459-470 Carlson, D., Haynsworth, E. and T. Markham (1974): A generalization of the Schur complement by means of the Moore-Penrose inverse, SIAM J. Appl. Math. 26 (1974) 169179 Carlson, D. (1986): What are Schur complements, anyway?, Linear Algebra and its Applications 74 (1986) 257-275 Carroll, J.D., Green, P.E. and A. Chaturvedi (1999): Mathematical tools for applied multivariate analysis, Academic Press, San Diego 1999 Carroll, J.D. and P.E. Green (1997): Mathematical tools for applied multivariate analysis, Academic Press, San Diego 1997 Carroll, R.J. and D. Ruppert (1982): A comparison between maximum likelihood and generalized least squares in a heteroscedastic linear model, M. American Statist. Assoc. 77 (1982) 878-882 Carroll, R., Ruppert, D. and L. Stefanski (1995): Measurement error in nonlinear models, Chapman and Hall, Boca Raton 1995 Carruthers, P. (1971): Broken scale invariance in particle physics, Phys. Lett. Rep. 1 (1971) 1-30 Caspary, W. (2000): Zur Analyse geodätischer Zeitreihen, Schriftenreihe, Heft 60-1, Neubinerg 2001 Caspary, W. and K. Wichmann (1994): Lineare Modelle. Algebraische Grundlagen und statistische Anwendungen, Oldenbourg Verlag, München / Wien 1994 Castillo, J. (1994): The singly truncated normal distribution, a non-steep exponential family, Ann. Inst. Statist. Math 46 (1994) 57-66 Castillo, J. and M. Perez-Casany (1998): Weights poison distributions for overdispersion and underdispersion situations, Ann. Ins. Statist. Math. 50 (1998) 567-585 Castillo, J. and P. Puig (1997): Testing departures from gamma, Rayleigh and truncated normal distributions, Ann. Inst. Statist.Math. 49 (1997) 255-269 Cayley, A. (1855): Sept différents mémoires d'analyse, No. 3, Remarque sur la notation des fonctions algebriques, Journal für die reine und angewandte Mathematik 50 (1855) 282-285 Cayley, A. (1858): A memoir on the theory of matrices, Phil. Transactions, Royal Society of London 148 (1858) 17-37 Cenkov, N.N. (1972): Statistical decision rule and optimal inference, Nauka 1972 Chan, K.-S. and H. Tong (2001): Chaos, a statistical perspective, Springer-Verlag, New York 2001 Chan, K.-S. and H. Tong (2002): A note on the equivalence of two approaches for specifying a Markov process, Bernoulli 8 (2002) 117-122 Chan, L.-Y. (2000): Optimal designs for experiments with mixtures: a survey, Commun. Statist.-Theory Meth. 29 (2000) 2281-2312 Chan, T.F. and P.C. Hansen (1991): Some applications of the rank revealing QR factorizations, Numer. Linear Algebra Appl., 1 (1991) 33-44 Chan, T.F. and P.C. Hansen (1992): Some applications of the rank revealing QR factorization, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 727-741 Chandrasekaran, S. (2000): An efficient and stable algorithm for the symmetric-definite generalized eigenvalue problem, SIAM J. Matrix Anal. Appl. 21 (2000) 1202-1228 Chandrasekaran, S. and I.C.F. Ipsen (1995): Analysis of a QR algorithm for computing singular values, SIAM J. Matrix Anal. Appl. 16 (1995) 520-535 Chandrasekaran, S., Gu, M. and A.H. Sayed (1998): A stable and efficient algorithm for the indefinite linear least-squares problem, SIAM J. Matrix Anal. Appl. 20 (1998) 354-362
670
References
Chandrasekaran, S., Golub, G.H., Gu, M. and A.H. Sayed (1998): Parameter estimation in the presence of bounded data uncertainties, SIAM J. Matrix Anal. Appl. 19 (1998) 235-252 Chang, F.-C. (1999): Exact D-optimal designs for polynomial regression without intercept, Statistics & Probability Letters 44 (1999) 131-136 Chang, F.-C. and Y.-R. Yeh (1998): Exact A-optimal designs for quadratic regression, Statistica Sinica 8 (1998) 527-533 Chang, T. (1986): Spherical regression, Annals of Statistics 14 (1986) 907-924 Chang, T. (1988): Estimating the relative rotations of two tectonic plates from boundary crossings, American Statis. Assoc. 83 (1988) 1178-1183 Chang, T. (1993): Spherical regression and the statistics of tectonic plate reconstructions, International Statis. Rev. 51 (1993) 299-316 Chapman, D.G. and H. Robbins (1951): Minimum variance estimation without regularity assumptions, Ann. Math. Statist. 22 (1951) 581-586 Chartres, B.A. (1963): A geometrical proof of a theorem due to Slepian, SIAM Review 5 (1963) 335-341 Chatfield, C. and A.J. Collins (1981): Introduction to multivariate analysis, Chapman and Hall, Boca Raton 1981 Chatterjee, S. and A.S. Hadi (1988): Sensitivity analysis in linear regression, J. Wiley, New York 1988 Chatterjee, S. and M. Mächler (1997): Robust regression: a weighted least-squares approach, Commun. Statist. Theor. Meth. 26 (1997) 1381-1394 Chaturvedi, A. and A.T.K. Wan (1998): Stein-rule estimation in a dynamic linear model, J. Appl. Stat. Science 7 (1998) 17-25 Chaturvedi, A. and A.T.K. Wan (2001): Stein-rule restricted regression estimator in a linear regression model with nonspherical disturbances, Commun. Statist.-Theory Meth. 30 (2001) 55-68 Chaturvedi, A. and A.T.K. Wan (1999): Estimation of regression coefficients subject to interval constraints in models with non-spherical errors, Indian J.Statistics 61 (1999) 433-442 Chauby, Y.P. (1980): Minimum norm quadratic estimators of variance components, Metrika 27 (1980) 255-262 Chen, C. (2003): Robust tools in SAS, Developments in Robust Statistics, pp. 125133, Physica Verlag, Heidelberg 2003 Chen, H.-C. (1998): Generalized reflexive matrices: Special properties and applications, Society for Industrial and Applied Mathematics, 9 (1998) 141-153 Chen, R.-B. and M.-N.L., Huang (2000): Exact D-optimal designs for weighted polynomial regression model, Computational Statistics & Data Analysis 33 (2000) 137-149 Chen, X. (2001): On maxima of dual function of the CDT subproblem, J. Comput. Mathematics 19 (2001) 113-124 Chen, Z. and J. Mi (1996): Confidence interval for the mean of the exponential distribution, based on grouped data, IEEE Transactions on Reliability 45 (1996) 671-677 Cheng, C.L. (1998): Polynomial regression with errors in variables, J. Royal Statistical Soc. B60 (1998) 189-199 Cheng, C.L. and J.W. van Ness (1999): Statistical regression with measurement error, Arnold Publ., London 1999 Cheng, C.-S. (1996): Optimal design: exact theory, Handbook of Statistik 13 (1996) 9771006
References
671
Cherrie, J.B., Beatson, R.K. and G.N. Newsam (2002): Fast evaluation of radial basis functions: methods for generalized multiquadrics in RN*, SIAM J. Sci. Comput. 23 (2002) 1459-1571 Chiang, C.-Y. (1998): Invariant parameters of measurement scales, British J.Mathematical and Statistical Psychology 51 (1998) 89-99 Chiang, C.L. (2003): Statistical methods of analysis, University of California, Berkeley, USA 2003 Chikuse, Y. (1999): Procrustes analysis on some special manifolds, Commun. Statist. Theory Meth. 28 (1999) 885-903 Chilès, J.P. and P. Delfiner (1999): Geostatistics - modelling spatial uncertainty, J. Wiley, New York 1999 Chiodi, M. (1986): Procedures for generating pseudo-random numbers from a normal distribution of order, Riv. Stat. Applic. 19 (1986) 7-26 Chmielewski, M.A. (1981): Elliptically symmetric distributions: a review and bibliography, International Statistical Review 49 (1981) 67-74 Chow, T.L. (2000): Mathematical methods for physicists, Cambridge University Press, Cambridge 2000 Chow, Y.S. and H. Teicher (1978): Probability theory, Springer Verlag, New York 1978 Christensen, R. (1996): Analysis of variance, design and regression, Chapman and Hall, Boca Raton 1996 Chu, M.T. and N.T. Trendafilov (1998): Orthomax rotation problem. A differential equation approach, Behaviormetrika 25 (1998) 13-23 Chui, C.K. and G. Chen (1989): Linear Systems and optimal control, Springer Verlag, New York 1989 Chui, C.K. and G. Chen (1991): Kalman filtering with real time applications, Springer Verlag, New York 1991 Clark, G.P.Y.(1980): Moments of the least squares estimators in a nonlinear regression model, JR. Statist. Soc. B42 (1980) 227-237 Clerc-Bérod, A. and S. Morgenthaler (1997): A close look at the hat matrix, Student 2 (1997) 1-12 Cobb, G.W. (1997): Introduction to design and analysis of experiments, Springer Verlag, New York 1997 Cobb, L., Koppstein, P. and N.H. Chen (1983): Estimation and moment recursions relations for multimodal distributions of the exponential family, J. American Statist. Assoc. 78 (1983) 124-130 Cochran, W. (1972a): Some effects of errors of measurement on linear regression, in: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, pages 527-539, UCP, Berkeley 1972 Cochran, W. (1972b): Stichprobenverfahren, de Gruyter, Berlin 1972 Cohen, A. (1966): All admissible linear estimates of the mean vector, Ann. Math. Statist. 37 (1966) 458-463 Cohen, C. and A. Ben-Israel (1969): On the computation of canonical correlations, Cahiers Centre Études Recherche Opér 11 (1969) 121-132 Collett, D. (1992): Modelling binary data, Chapman and Hall, Boca Raton 1992 Collet, D. and T. Lewis (1981): Discriminating between the von Mises and wrapped normal distributions, Austr. J. Statist. 23 (1981) 73-79 Colton, D., Coyle, J. and P. Monk (2000): Recent developments in inverse acoustic scattering theory, SIAM Review 42 (2000) 369-414 Cook, R.D., Tsai, C.L. and B.C. Wei (1986): Bias in nonlinear regression, Biometrika 73 (1986) 615-623
672
References
Cook, R.D. and S. Weisberg (1982): Residuals and influence in regression, Chapman and Hall, London 1982 Cottle, R.W. (1974): Manifestations of the Schur complement, Linear Algebra Appl. 8 (1974) 189-211 Cox, A.J. and N.J. Higham (1999): Row-wise backward stable elimination methods for the equality constrained least squares problem, SIAM J. Matrix Anal. Appl. 21 (1999) 313-326 Cox, D., Little, J. and D. O’Shea (1992): Ideals varieties and algorithms, Springer Verlag, New York 1992 Cox, D.R. and D.V. Hinkley (1979): Theoretical statistics, Chapman and Hall, Boca Raton 1979 Cox, D.R. and V. Isham (1980): Point processes, Chapman and Hall, Boca Raton 1980 Cox, D.R. and E.J. Snell (1989): Analysis of binary data, Chapman and Hall, Boca Raton 1989 Cox, D.R. and N. Reid (2000): The theory of the design of experiments, Chapman & Hall, Boca Raton 2000 Cox, D.R. and N. Wermuth (1996): Multivariate dependencies, Chapman and Hall, Boca Raton 1996 Cox, T.F. and M.A.A. Cox (2001): Multidimensional scaling, Chapman and Hall, Boca Raton, Florida 2001 Cox, D.R. and P.J. Salomon (2003): Components of variance, Chapman & Hall/CRC, Boca Raton – London – New York – Washington D.C. 2003 Craig, A.T. (1943): Note on the independence of certain quadratic forms, The Annals of Mathematical Statistics 14 (1943) 195-197 Cross, P.A. (1985): Numerical methods in network design, in: Grafarend, E.W. and F. Sanso (eds.), Optimization and design of geodetic networks, pp. 429-435, Springer, Berlin - Heidelberg - New York 1985, Crowder, M.J. (1987): On linear and quadratic estimating function, Biometrika 74 (1987) 591-597 Crowder, M.J. and D.J. Hand (1990): Analysis of repeated measures, Chapman and Hall, Boca Raton 1990 Crowder, M.J., Sweeting, T. and R. Smith (1994): Statistical analysis of reliability data, Chapman and Hall, Boca Raton 1994 Crowder, M.J., Kimber, A., Sweeting, T., and R. Smith (1993): Statistical analysis of reliability data, Chapman and Hall, Boca Raton 1993 Csörgö, M. and L. Horvath (1993): Weighted approximations in probability and statistics, J. Wiley, Chichester 1993 Csörgö, S. and L. Viharos (1997): Asymptotic normality of least-squares estimators of tail indices, Bernoulli 3 (1997) 351-370 Csörgö, S. and J. Mielniczuk (1999): Random-design regression under long-range dependent errors, Bernoulli 5 (1999) 209-224 Cummins, D. and A.C. Webster (1995): Iteratively reweighted partial least-squares: a performance analysis by Monte Carlo simulation, J. Chemometrics 9 (1995) 489-507 Cunningham, E. (1910): The principle of relativity in electrodynamics and an extension thereof, Proc. London Math. Soc. 8 (1910) 77-98 Czuber, E. (1891): Theorie der Beobachtungsfehler, Leipzig 1891 D‘Agostino, R. and M.A. Stephens (1986): Goddness-of-fit techniques, Marcel Dekker, New York 1986 Daniel, J.W. (1967): The conjugate gradient method for linear and nonlinear operator equations, SIAM J. Numer. Anal. 4 (1967) 10-26
References
673
Dantzig, G.B. (1940): On the nonexistence of tests of „Student’s“ hypothesis having power functions independent of V2, Ann. Math. Statist. 11 (1949) 186-192 Das, I. (1996): Normal-boundary intersection: A new method for generating the Pareto surface in nonlinear multicriteria optimization problems, SIAM J. Optim. 3 (1998) 631ff. Das, R. and B.K. Sinha (1987): Robust optimum invariant unbiased tests for variance components. In Proc. of the Second International Tampere Conference in Statistics. T. Pukkila and S. Puntanen (eds.), University of Tampere - Finland (1987) 317-342 Das Gupta, S. (1980): Distribution of the correlation coefficient, in: Fienberg, S., Gani, J., Kiefer, J. and K. Krickeberg (eds): Lecture notes in statistics, pp. 9-16, Springer Verlag 1980 Das Gupta, S., Mitra, S.K., Rao, P.S., Ghosh, J.K., Mukhopadhyay, A.C. and Y.R. Sarma (1994a): Selected papers of C.R.Rao, vol. 1, John Wiley, New York 1994 Das Gupta, S., Mitra, S.K., Rao, P.S., Ghosh, J.K., Mukhopadhyay, A.C. and Y.R. Sarma (1994b): Selected papers of C.R.Rao, vol. 2, John Wiley, New York 1994 David, F.N. and N.L. Johnson (1948): The probability integral transformation when parameters are estimated from the sample, Biometrika 35 (1948) 182-190 David, F.N. (1954): Tables of the ordinates and probability integral of the distribution of the correlation coefficient in small samples, Cambridge University Press, London 1954 David, H.A. (1957): Some notes on the statistical papers of Friedrich Robert Helmert (1943-1917), Bull. Stat. Soc. NSW 19 (1957) 25-28 David, H.A. (1970): Order Statistics, J. Wiley, New York 1970 David, H.A., Hartley, H.O. and E.S. Pearson (1954): The distribution of the ratio in a single normal sample, of range to standard deviation, Biometrika 41 (1954) 482-293 Davidian, M. and A.R. Gallant (1993): The nonlinear mixed effects model with a smooth random effects density, Biometrika 80 (1993) 475-488 Davidian, M., and D.M. Giltinan (1995): Nonlinear models for repeated measurement data, Chapman and Hall, Boca Raton 1995 Davis, R.A.(1997) : M-estimation for linear regression with infinite variance, Probability and Mathematical Statistics 17 (1997) 1-20 Davis, R. A. and W.T.M. Dunsmuir (1997): Least absolute deviation estimation for regression with ARMA errors, J. theor. Prob. 10 (1997) 481-497 Davis, J.H. (2002): Foundations of deterministic and stochastic control, Birkhäuser, Boston-Basel-Berlin 2002 Davison, A.C. and D.V. Hinkley (1997): Bootstrap methods and their application, Cambridge University Press, Cambridge 1997 Dax, A. (1997): An elementary proof of Farkas’ lemma, SIAM Rev. 39 (1997) 503-507 Decreusefond, L. and A.S. Üstünel (1999): Stochastic analysis of the fractional Brownian motion, Potential Anal. 10 (1999) 177-214 Dedekind, R. (1901): Gauß in seiner Vorlesung über die Methode der kleinsten Quadrate, Berlin 1901 Defant, A. and K. Floret (1993): Tensor norms and operator ideals, North Holland, Amsterdam 1993 Deitmar, A. (2002): A frist course in harmonic analysis, Springer Verlag, New York 2002 Demidenko, E. (2000): Is this the least squares estimate?, Biometrika 87 (2000) 437-452 Denham, M.C. (1997): Prediction intervals in partial least-squares, J. Chemometrics 11 (1997) 39-52 Denham, W. and S. Pines (1966): Sequential estimation when measurement function nonlinearity is comparable to measurement error, AIAA J4 (1966) 1071-1076
674
References
Denis, J.-B. and A. Pazman (1999): Bias of LS estimators in nonlinear regression models with constraints. Part II: Biadditive models, Applications of Mathematics 44 (1999) 375-403 Denison, D.G.T., Walden, A.T., Balogh, A. and R.J. Forsyth (1999): Multilayer testing of spectral lines and the detection of the solar rotation frequency and its harmonics, Appl. Statist. 48 (1999) 427-439 Dermanis, A. (1977): Geodetic linear estimation techniques and the norm choice problem, Manuscripta Geodetica 2 (1977) 15-97 Dermanis, A. (1978): Adjustment of geodetic observations in the presence of signals, International School of Advanced Geodesy, Erice, Sicily, May-June 1978, Bollettino di Geodesia e Scienze Affini 38 (1979) 513-539 Dermanis, A. (1998): Generalized inverses of nonlinear mappings and the nonlinear geodetic datum problem, J. Geodesy 72 (1998) 71-100 Dermanis, A. and E.W. Grafarend (1981): Estimability analysis of geodetic, astrometric and geodynamical quantities in Very Long Baseline Interferometry, Geoph. J. R. Astronom. Soc. 64 (1981) 31-56 Dermanis, A. and F. Sanso (1995): Nonlinear estimation problems for nonlinear models, Manuscripta Geodaetica 20 (1995) 110-122 Dermanis, A. and R. Rummel (2000a): Parameter estimation as an inverse problem, Lecture Notes in Earth Sciences 95 (2000) 24-47 Dermanis, A. and R. Rummel (2000b): The statistical approach to parameter determination: Estimation and prediction, Lecture Notes in Earth Sciences 95 (2000) 53-73 Dermanis, A. and R. Rummel (2000c): From finite to infinite-dimensional models (or from discrete to continuous models), Lecture Notes in Earth Sciences 95 (2000) 53-73 Dermanis, A. and R. Rummel (2000d): Data analysis methods in geodesy, Lecture Notes in Earth Sciences 95, Springer 2000 De Santis, A. (1991): Translated origin spherical cap harmonic analysis, Geoph. J. Int 106 (1991) 253-263 Dette, H. (1993): A note on E-optimal designs for weighted polynomial regression, Ann. Stat. 21 (1993) 767-771 Dette, H. (1997a): Designing experiments with respect to 'standardized' optimality criteria, J.R. Statist. Soc. B 59 (1997) 97-110 Dette, H. (1997b): E-optimal designs for regression models with quantitative factors – a reasonable choice?, The Canadian J.Statistics 25 (1997) 531-543 Dette, H. and W. J. Studden (1993): Geometry of E-optimality, Ann. Stat. 21 (1993) 416433 Dette, H. and W. J. Studden (1997): The theory of canonical moments with applications in statistics, probability, and analysis, J. Wiley, New York 1997 Dette, H. and T.E. O'Brien (1999): Optimality criteria for regression models based on predicted variance, Biometrika 86 (1999) 93-106 Deutsch, F. (2001): Best approximation in inner product spaces, Springer Verlag, New York 2001 Devidas, M. and E.O. George (1999): Monotonic algorithms for maximum likelihood estimation in generalized linear models, The Indian J.Statistical 61 (1999) 382-396 Dewess, G. (1973): Zur Anwendung der Schätzmethode MINQUE auf Probleme der Prozeßbilanzierung, Math. Operationsforschg. Statistik 4 (1973) 299-313 DiCiccio, T.J. and B. Efron (1996): Bootstrap confidence intervals, Statistical Science 11 (1996) 189-228 Diebolt, J. and J. Zuber (1999): Goodness-of-fit tests for nonlinear heteroscedastic regression models, Statistics & Probability Letters 42 (1999) 53-60
References
675
Dieck, T. (1987): Transformation groups, W de Gruyter, Berlin - New York 1987 Diggle, P.J., Liang, K.Y. and S.L. Zeger (1994): Analysis of longitudinal data, Clarendon Press, Oxford 1994 Ding, C.G. (1999): An efficient algorithm for computing quantiles of the noncentral chisquared distribution, Computational Statistics & Data Analysis 29 (1999) 253-259 Dixon, W.J. (1951): Ratio involving extreme values, Ann. Math. Statistics 22 (1951) 6878 Dobson, A.J. (1990): An introduction to generalized linear models, Chapman and Hall, Boca Raton 1990 Dobson, A.J. (2002): An introduction to generalized linear models, 2nd ed., Chapman Hall - CRC, Boca Raton 2002 Dodge, Y. (1987): Statistical data analysis based on the L1-norm and related methods, Elsevier, Amsterdam 1987 Dodge, Y. (1997): LAD Regression for Detecting Outliers in Response and Explanatory Variables, J. Multivariate Analysis 61 (1997) 144-158 Dodge, Y. and A.S. Hadi (1999): Simple graphs and bounds for the elements of the hat matrix, J. Applied Statistics 26 (1999) 817-823 Dodge, Y. and D. Majumdar (1979): An algorithm for finding least square generalized inverses for classification models with arbitrary patterns, J. Statist. Comput. Simul. 9 (1979) 1-17 Dodge, Y. and J. Jurecková (1997): Adaptive choice of trimming proportion in trimmed least-squares estimation, Statistics & Probability Letters 33 (1997) 167-176 Donoho, D.L. and P.J. Huber (1983): The notion of breakdown point, Festschrift für Erich L. Lehmann, eds. P.J. Bickel, K.A. Doksum and J.L. Hodges, Wadsworth, Belmont, Calif. 157-184, 1983 Dorea, C.C.Y. (1997): L1-convergence of a class of algorithms for global optimization, Student 2 (1997) Downs, T.D. and A.L. Gould (1967): Some relationships between the normal and von Mises distributions, Biometrika 54 (1967) 684-687 Dragan, V. and A. Halanay (1999): Stabilization of linear systems, Birkhäuser Boston – Basel – Berlin 1999 Draper, N.R. and R. C. van Nostrand (1979): Ridge regression and James-Stein estimation: review and comments, Technometrics 21 (1979) 451-466 Draper, N.R. and J.A. John (1981): Influential observations and outliers in regression, Technometrics 23 (1981) 21-26 Draper, N.R. and F. Pukelsheim (1996): An overview of design of experiments, Statistical Papers 37 (1996) 1-32 Draper, N.R. and F. Pukelsheim (2000): Ridge analysis of mixture response surfaces, Statistics & Probability Letters 48 (2000) 131-140 Draper, N.R. and C.R. Van Nostrand (1979): Ridge regression and James Stein estimation: review and comments, Technometrics 21 (1979) 451-466 Driscoll, M.F. (1999): An improved result relating quadratic forms and Chi-Square Distributions, The American Statistician 53 (1999) 273-275 Driscoll, M.F. and B. Krasnicka (1995): An accessible proof of Craig’s theorem in the general case, The American Statistician 49 (1995) 59-62 Droge, B. (1998): Minimax regret analysis of orthogonal series regression estimation: selection versus shrinkage, Biometrika 85 (1998) 631-643 Drygas, H. (1975): Estimation and prediction for linear models in general spaces, Math. Operationsforsch. Statistik 6 (1975) 301-324
676
References
Drygas, H. (1983): Sufficiency and completeness in the general Gauss-Markov model, Sankhya Ser. A 45 (1983) 88-98 Du, Z. and D.P. Wiens (2000): Jackknifing, weighting, diagnostics and variance estimation in generalized M-estimation, Statistics & Probability Letters 46 (2000) 287-299 Duan, J.C. (1997): Augmented GARCH (p,q) process and its diffusion limit, J. Econometrics 79 (1997) 97-127 Duncan, W.J. (1944): Some devices for the solution of large sets of simultaneous linear equations, London, Edinburgh and Dublin Philosophical Magazine and J.Science (7th series) 35 (1944) 660-670 Dunfour, J.M. (1986): Bias of s2 in linear regression with dependent errors, The American Statistician 40 (1986) 284-285 Dunnett, C.W. and M. Sobel (1954): A bivariate generalization of Student’s t-distribution, with tables for certain special cases, Biometrika 41 (1954) 153-69 Dupuis, D.J. and C.A. Field (1998): A comparison of confidence intervals for generalized extreme-value distributions, J. Statist. Comput. Simul. 61 (1998) 341-360 Durand, D. and J.A. Greenwood (1957): Random unit vectors II: usefulness of GramCharlier and related series in approximating distributions, Ann. Math. Statist. 28 (1957) 978-986 Durbin, J. and G.S. Watson (1950): Testing for serial correlation in least squares regression, Biometrika 37 (1950) 409-428 Durbin, J. and G.S. Watson (1951): Testing for serial correlation in least squares regression II, Biometrika 38 (1951) 159-177 D’Urso, P. and T. Gastaldi (2000): A least-squares approach to fuzzy linear regression analysis, Computational Statistics & Data Analysis 34 (2000) 427-440 Dyn, N., Leviatan, D., Levin, D. and A. Pinkus (2001): Multivariate approximation and applications, Cambridge University Press, Cambridge 2001 Ecker, E. (1977): Ausgleichung nach der Methode der kleinsten Quadrate, Öst. Z. Vermessungswesen 64 (1977) 41-53 Eckert, M (1935): Eine neue flächentreue (azimutale) Erdkarte, Petermann’s Mitteilungen 81 (1935) 190-192 Eckhart, C. and G. Young (1939): A principal axis transformation for non-Hermitean matrices, Bull. Amer. Math. Soc. 45 (1939) 188-121 Eckl, M.C., Snay, R.A., Solder, T., Cline, M.W. and G.L. Mader (2001): Accuracy of GPS-derived positions as a function of interstation distance and observing-session duration, J. Geodesy 75 (2001) 633-640 Edelman, A. (1989): Eigenvalues and condition numbers of random matrices, PhD dissertation, Massachussetts Institute of Technology 1989 Edelman, A. (1998): The geometry of algorithms with orthogonality constraints, SIAM J. Matrix Anal Appl. 20 (1998) 303-353 Edelman, A., Arias, T.A. and Smith, S.T. (1998): The geometry of algorithms with orthogonality constraints, SIAM J. Matrix Anal. Appl. 20 (1998) 303-353 Edelman, A., Elmroth, E. and B. Kagström (1997): A geometric approach to perturbation theory of matrices and matrix pencils. Part I: Versal deformations, SIAM J. Matrix Anal. Appl. 18 (1997) 653-692 Edgar, G.A. (1998): Integral, probability, and fractal measures, Springer Verlag, New York 1998 Edgeworth, F.Y. (1883): The law of error, Philosophical Magazine 16 (1883) 300-309 Edlund, O., Ekblom, H. and K. Madsen (1997): Algorithms for non-linear M-estimation, Computational Statistics 12 (1997) 373-383
References
677
Eeg, J. and T. Krarup (1973): Integrated geodesy, Danish Geodetic Institute, Report No. 7, Copenhagen 1973 Effros, E.G. (1997): Dimensions and C* algebras, Regional Conference Series in Mathematics 46, Rhode Island 1997 Efromovich, S. (2000): Can adaptive estimators for Fourier series be of interest to wavelets?, Bernoulli 6 (2000) 699-708 Efron, B. and R.J. Tibshirani (1994): An introduction to the bootstrap, Chapman and Hall, Boca Raton 1994 Eibassiouni, M.Y. and J. Seely (1980): Optimal tests for certain functions of the parameters in a covariance matrix with linear structure, Sankya: The Indian J.Statistics 42 (1980) 64-77 Ekblom, S. and S. Henriksson (1969): Lp-criteria for the estimation of location parameters, SIAM J. Appl. Math. 17 (1969) 1130-1141 Elden, L. (1977): Algorithms for the regularization of ill-conditioned least squares problems, BIT 17 (1977) 134-145 Elhay, S., Golub, G.H. and J. Kautsky (1991): Updating and downdating of orthogonal polynomials with data fitting applications, SIAM J. Matrix Anal. Appl. 12 (1991) 327-353 Elian, S.N. (2000): Simple forms of the best linear unbiased predictor in the general linear regression model, American Statistician 54 (2000) 25-28 Ellis, R.L. and I. Gohberg (2003): Orthogonal systems and convolution operators, Birkhäuser Verlag, Basel-Boston-Berlin 2003 Ellenberg, J.H. (1973): The joint distribution of the standardized least squares residuals from a general linear regression, J.the American Statistical Association 68 (1973) 941-943 Elpelt, B. (1989): On linear statistical models of commutative quadratic type, Commun. Statist.-Theory Method 18 (1989) 3407-3450 El-Basssiouni, M.Y. and Seely, J. (1980): Optimal tests for certain functions of the parameters in a covariance matrix with linear structure, Sankya A42 (1980) 64-77 Elfving, G. (1952): Optimum allocation in linear regression theory, Ann. Math. Stat. 23 (1952) 255-263 El-Sayed, S.M. (1996): The sampling distribution of ridge parameter estimator, Egyptian Statistical Journal, ISSR – Cairo University 40 (1996) 211-219 Engel, J. and A. Kneip (1995): Model estimation in nonlinear regression, Lecture Notes in Statistics 104 (1995) 99-107 Engl, H.W., Hanke, M. and A. Neubauer (1996): Regularization of inverse problems, Kluwer Academic Publishers, Dordrecht 1996 Engl, H.W., Louis, A.K. and W. Rundell (1997): Inverse problems in geophysical applications, SIAM, Philadelphia 1997 Engler, K., Grafarend, E.W., Teunissen, P. and J. Zaiser (1982): Test computations of three-dimensional geodetic networks with observables in geometry and gravity space, Proceedings of the International Symposium on Geodetic Networks and Computations. Vol. VII, 119-141. Report B 258/VII. Deutsche Geodätische Kommission, Bayerische Akademie der Wissenschaften, München 1982. Ernst, M.D. (1998): A multivariate generalized Laplace distribution, Computational Statistics 13 (1998) 227-232 Eubank, R.L. and P. Speckman (1991): Convergence rates for trigonometric and polynomial-trigonometric regression estimators, Statistical & Probability Letters 11 (1991) 119-124 Euler, N. and W.H. Steeb (1992): Continuous symmetry, Lie algebras and differential equations, B.I. Wissenschaftsverlag, Mannheim 1992
678
References
Even-Tzur, G. (1998): Application of the set covering problem to GPS measurements, surveying and land Information Systems 58 (1998) 25-29 Even-Tzur, G. (1999): Reliability designs and control of geodetic networks, Z. Vermessungswesen 4 (1999) 128-134 Even-Tzur, G. (2001): Graph theory application to GPS networks, GPS Solution 5 (2001) 31-38 Everitt, B.S. (1987): Introduction to optimization methods and their application in statistics, Chapman and Hall, London 1987 Fagnani, F. and L. Pandolci (2002): A singular perturbation approach to a recursive deconvolution problem, SIAM J. Control Optim 40 (2002) 1384-1405 Fahrmeir, L. and G. Tutz (2001): Multivariate statistical modelling based on generalized linear models, Springer Verlag, New York 2001 Fakeev, A.G. (1981): A class of iterative processes for solving degenerate systems of linear algebraic equations, USSR. Comp. Maths. Math. Phys. 21 (1981) 15-22 Falk, M., Hüsler, J. and R.D. Reiss (1994): Law of small numbers, extremes and rare events, Birkhäuser Verlag, Basel 1994 Fan, J. and I. Gijbels (1996): Local polynomial modelling and its applications, Chapman and Hall, Boca Raton 1996 Fang, K.-T. and Y. Wang (1993): Number-theoretic methods in statistics, Chapman and Hall, Boca Raton 1993 Fang, K.-T. and Y.-T. Zhang (1990): Generalized multivariate analysis, Science Press Beijing - Springer Verlag, Bejing - Berlin 1990 Fang, K.-T., Kotz, S. and K.W. Ng (1990): Symmetric multivariate and related distributions, Chapman and Hall, London 1990 Fang, Z. and D.P. Wiens (2000): Integer-valued, minimax robust designs for estimation and extrapolation in heteroscedastic, approximately linear models, J. the American Statistical Association 95 (2000) 807-818 Farahmand, K. (1996): Random polynomials with complex coefficients, Statistics & Probability Letters 27 (1996) 347-355 Farahmand, K. (1999): On random algebraic polynomials, Proceedings of the American Math. Soc. 127 (1999) 3339-3344 Farebrother, R.W. (1987): The historical development of the L1 and Lf estimation procedures, Statistical Data Analysis Based on the L1-Norm and Related Methods, Y. Dodge (ed.) 1987 Farebrother, R.W. (1988): Linear least squares computations, Dekker, New York 1988 Farebrother, R.W. (1999): Fitting linear relationships, Springer Verlag, New York 1999 Farrel, R.H. (1964): Estimators of a location parameter in the absolutely continuous case, Ann. Math. Statist. 35 (1964) 949-998 Fassò, A. (1997): On a rank test for autoregressive conditional heteroscedasticity, Student 2 (1997) 85-94 Faulkenberry , G.D. (1973): A method of obtaining prediction intervals, J. Amer. Statist. Ass. 68 (1973) 433-435 Fausett, D.W. and C.T. Fulton (1994): Large least squares problems involving Kronecker products, SIAM J. Matrix Anal. Appl. 15 (1994) 219-227 Fedi, M. and G. Florio (2002): A stable downward continuation by using the ISVD method, Geophys. J. Int. 151 (2001) 146-156 Fedorov, V.V. and P. Hackl (1997): Model-oriented design of experiments, Springer Verlag, New York 1997 Fedorov, V.V., Montepiedra, G. and C.J. Nachtsheim (1999): Design of experiments for locally weighted regression, J.Statistical Planning and Inference 81 (1999) 363-382
References
679
Feinstein, A.R. (1996): Multivariate analysis, Yale University Press, New Haven 1996 Fengler, M., Freeden, W. and V. Michel (2003): The Kaiserslautern multiscale geopotential model SWITCH-03 from orbit pertubations of the satellite CHAMP and its comparison to the models EGM96, UCPH2002_02_0.5, EIGEN-1s, and EIGEN-2, Geophysical Journal International (submitted) 2003 Feuerverger, A. and P. Hall (1998): On statistical inference based on record values, Extremes 1:2 (1998) 169-190 Fiebig, D.G., Bartels, R. and W. Krämer (1996): The Frisch-Waugh theorem and generalized least squares, Econometric Reviews 15 (1996) 431-443 Fierro, R.D. (1996): Pertubation analysis for twp-sided (or complete) orthogonal decompositions, SIAM J. Matrix Anal. Appl. 17 (1996) 383-400 Fierro, R.D. and J.R. Bunch (1995): Bounding the subspaces from rank revealing twosided orthogonal decompositions, SIAM J. Matrix Anal. Appl. 16 (1995) 743-759 Fierro, R.D. and P.C. Hansen (1995): Accuracy of TSVD solutions computed from rankrevealing decompositions, Numer. Math. 70 (1995) 453-471 Fierro, R.D. and P.C. Hansen (1997): Low-rank revealing UTV decompositions, Numer. Algorithms 15 (1997) 37-55 Fill, J.A. and D.E. Fishkind (1999): The Moore-Penrose generalized inverse for sums of matrices, SIAM J. Matrix Anal. Appl. 21 (1999) 629-635 Fisher, N.I. (1993): Statistical analysis of circular data, Cambridge University Press, Cambridge 1993 Fisher, N.J. (1985): Spherical medians, J. Royal Statistical Society, Series B: 47 (1985) 342-348 Fisher, N.J. and A.J. Lee (1983): A correlation coefficient for circular data, Biometrika 70 (1983) 327-332 Fisher, N.J. and A.J. Lee (1986): Correlation coefficients for random variables on a sphere or hypersphere, Biometrika 73 (1986) 159-164 Fisher, N.I. and P. Hall (1989): Bootstrap confidence regions for directional data, J. American Statist. Assoc. 84 (1989) 996-1002 Fisher, R.A. (1915): Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population, Biometrika 10 (1915) 507-521 Fisher, R.A. (1935): The fiducial argument in statistical inference, Annals of Eugenics 6 (1935) 391-398 Fisher, R.A. (1939): The sampling distribution of some statistics obtained from nonlinear equations, Ann. Eugen. 9 (1939) 238-249 Fisher, R.A. (1953): Dispersion on a sphere, Pro. Roy. Soc. Lond. A 217 (1953) 295-305 Fisher, R.A. and F. Yates (1942): Statistical tables for biological, agricultural and medical research, 2nd edition, Oliver and Boyd, Edinburgh 1942 Fisz, M. (1970): Wahrscheinlichkeitsrechnung und mathematische Statistik, WEB deutscher Verlag der Wissenschaften, Berlin 1970 Fitzgerald, W.J., Smith, R.L., Walden, A.T. and P.C. Young (2001): Non-linear and nonstationary signal processing, Cambridge University Press, Cambridge 2001 Fletcher, R. and C.M. Reeves (1964): Function minimization by conjugate gradients, Comput. J. 7 (1964) 149-154 Flury, B. (1997): A first course in multivariate statistics, Springer Verlag, New York 1997 Focke, J. and G. Dewess (1972): Über die Schätzmethode MINQUE von C.R. Rao und ihre Verallgemeinerung, Math. Operationsforschg. Statistik 3 (1972) 129-143 Foerstner, W. (1979a): Konvergenzbeschleunigung bei der a posteriori Varianzschätzung, Z. Vermessungswesen 104 (1979) 149-156
680
References
Foerstner, W. (1979b): Ein Verfahren zur Schätzung von Varianz- und Kovarianz- Komponenten, Allg. Vermessungsnachrichten 86 (1979) 446-453 Foerstner, W. (1983): Reliability and discernability of extended Gauss-Markov models, in: Seminar – Mathematical models of geodetic/Photogrammetric point determination with regard to outliers and systematic errors, Ackermann, F.E. (ed), München 1983 Foerstner, W. and B. Moonen (2003): A metric for covariance matrices, in: E.W. Grafarend, F. Krumm and V. Schwarze: Geodesy – the Challenge of the 3rd Millenium, pp. 299-309, Springer Verlag, Berlin 2003 Forsgren, A. and W. Murray (1997): Newton methods for large-scale linear inequalityconstrained minimization, Siam J. Optim. 7 (1997) 162-176 Forsgren, A. and G. Sporre (2001): On weighted linear least-squares problems related to interior methods for convex quadratic programming, SIAM J. Matrix Anal. Appl. 23 (2001) 42-56 Forsythe, A.B. (1972): Robust estimation of straight line regression coefficients by minimizing p-th power deviations, Technometrics 14 (1972) 159-166 Foster, L.V. (2003): Solving rank-deficient and ill-posed problems using UTV and QR factorizations, SIAM J. Matrix Anal. Appl. 25 (2003) 582-600 Fotiou, A. and D. Rossikopoulos (1993): Adjustment, variance component estimation and testing with the affine and similarity transformations, Z. Vermessungswesen 118 (1993) 494-503 Foucart, T. (1999): Stability of the inverse correlation matrix. Partial ridge regression, J.Statistical Planning and Inference 77 (1999) 141-154 Fox, M. and H. Rubin (1964): Admissibility of quantile estimates of a single location parameter, Ann. Math. Statist. 35 (1964) 1019-1031 Franses, P.H. (1998): Time series models for business and esonomic forecasting, Cambridge University Press, Cambridge 1998 Fraser, D.A.S. (1963): On sufficiency and the exponential family, J. Roy. Statist. Soc. 25 (1963) 115-123 Fraser, D.A.S. (1968): The structure of inference, J. Wiley, New York 1968 Fraser, D.A.S. and I. Guttman (1963): Tolerance regions, Ann. Math. Statist. 27 (1957) 162-179 Freeman, R.A. and P.V. Kokotovic (1996): Robust nonlinear control design, Birkhäuser Verlag, Boston 1996 Freiberg, B. (1985): Exact design for regression models with correlated errors, Statistics 16 (1985) 479-484 Freund, P.G.O. (1974): Local scale invariance and gravitation, Annals of Physics 84 (1974) 440-454 Frey, M. and J.C. Kern (1997): The Pitman Closeness of a Class of Scaled Estimators, The American Statistician, May 1997, Vol. 51 (1997) 151-154 Fristedt, B. and L. Gray (1995): A modern approach to probability theory, Birkhäuser, Basel 1997 Frobenius, F.G. (1893): Gedächtnisrede auf Leopold Kronecker (1893), Ferdinand Georg Frobenius, Gesammelte Abhandlungen, ed. J.S.Serre, Band III, pages 705-724, Springer Verlag, Berlin 1968 Fujikoshi, Y. (1980): Asymptotic expansions for the distributions of sample roots under non-normality, Biometrika 67 (1980) 45-51 Fulton, T., Rohrlich, F. and L. Witten (1962): Conformal invariance in physics, Reviews of Modern Physics 34 (1962) 442-457 Furno, M. (1997): A robust heteroskedasticity consistent covariance matrix estimator, Statistics 30 (1997) 201-219
References
681
Gabor, D. (1946): Theory of communication, J. the Electrical Engineers 93 (1946) 429441 Gaffke, N. and B. Heiligers (1996): Approximate designs for polynomial regression: invariance, admissibility and optimality, Handbook of Statistik 13 (1996) 1149-1199 Galil, Z. (1985): Computing d-optimum weighing designs: Where statistics, combinatorics, and computation meet, in: Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, Vol. II, eds. L.M. LeCam and R.A. Olshen, Wadsworth 1985 Gallant, A.R. (1987): Nonlinear statistical models, John Wiley, New York 1987 Gallavotti, G. (1999): Statistical mechanics: A short treatise, Springer-Verlag, New York 1999 Gander, W. (1981): Least squares with a quadratic constraint, Numer. Math. 36 (1981) 291-307 Gao, S. and T.M.F. Smith (1995): On the nonexistence of a global nonengative minimum bias invariant quadratic estimator of variance components, Statistics and Probability letters 25 (1995) 117-120 Gao, S. and T.M.F. Smith (1998): A constrained Minqu estimator of correlated response variance from unbalanced dara in complex surveys, Statistica Sinica 8 (1998) 11751188 Gao, Y., Lahaye, F., Heroux, P., Liao, X., Beck, N. and M. Olynik (2001): Modelling and estimation of C1-P1 bias in GPS receivers, J. Geodesy 74 (2001) 621-626 Garcia, A.G. (2000): Orthogonal sampling formulas: a unified approach, SIAM Review 42 (2000) 499-512 García-Escudero, L.A., Gordaliza, A. and C. Matrán (1997): k-Medians and trimmed kmedians, Student 2 (1997) 139-148 Garcia-Ligero, M.J., Hermoso, A. and J. Linares (1998): Least squared estimation for distributed parameter systems with uncertain observations: Part 1: Linear prediction and filtering, Applied Stochastic Models and Data Analysis 14 (1998) 11-18 Gassmann, H. (1989): Einführung in die Regelungstechnik, Verlag Harri Deutsch, Frankfurt am Main 1989 Gather, U. and C. Becker (1997): Outlier identification and robust methods, Handbook of Statistics 15 (1997) 123-143 Gauss, C.F. (1809): Theoria Motus, Corporum Coelesium, Lib. 2, Sec. III, Perthes u. Besser Publ., 205-224, Hamburg 1809 Gauss, C.F. (1816): Bestimmung der Genauigkeit der Beobachtungen, Z. Astronomi 1 (1816) 185-197 Gautschi, W. (1982): On generating orthogonal polynomials, SIAM Journal on Scientific and Statistical Computing 3 (1982) 289-317 Gautschi, W. (1985): Orthogonal polynomials - constructive theory and applications, J. Comput. Appl. Math. 12/13 (1985) 61-76 Gautschi, W. (1997): Numerical analysis - an introduction, Birkhäuser Verlag, BostonBasel-Berlin 1997 Gelfand, A.E. and D.K. Dey (1988): Improved estimation of the disturbance variance in a linear regression model, J. Econometrics 39 (1988) 387-395 Gelman, A., Carlin, J.B., Stern, H.S. and D.B. Rubin (1995): Bayesian data analysis, Chapman and Hall, London 1995 Genton, M.G. (1998): Asymptotic variance of M-estimators for dependent Gaussian random variables, Statistics and Probability Lett. 38 (1998) 255-261 Genton, M.G. and Y. Ma (1999): Robustness properties of dispersion estimators, Statistics & Probability Letters 44 (1999) 343-350
682
References
Ghosh, M. and G. Meeden (1978): Admissibility of the MLE of the normal integer mean, The Indian J.Statistics 40 (1978) 1-10 Ghosh, M., Mukhopadhyay, N. and P.K. Sen (1997): Sequential estimation, Wiley, New York 1997 Ghosh, S. (1996): Wishart distribution via induction, The American Statistician 50 (1996) 243-246 Ghosh, S. (1999a): Multivariate analysis, design of experiments, and survey sampling, Marcel Dekker, Basel 1999 Ghosh, S. (ed.)(1999b): Multivariate analysis, design of experiments, and survey sampling, Marcel Dekker, New York 1999 Ghosh, S., Beran, J. and J. Innes (1997): Nonparametric conditional quantile estimation in the presence of long memory, Student 2 (1997) 109-117 Giacolone, M. (1997): Lp-norm estimation for nonlinear regression models, Student 2 (1997) 119-130 Gil, A. and J. Segura (1998): A code to evaluate prolate and oblate spheroidal harmonics, Computer Physics Communications 108 (1998) 267-278 Gil, J.A. and R. Romera (1998): On robust partial least squares (PLS) methods, J. Chemometrics 12 (1998) 365-378 Gilbert, E.G. and C.P. Foo (1990): Computing the distance between general convex objects in three-dimensional space, JEEE Transactions on Robotics and Automation 6 (1990) 53-61 Gilberg, F., Urfer, F. and L. Edler (1999): Heteroscedastic nonlinear regression models with random effects and their application to enzyme kinetic data, Biometrical Journal 41 (1999) 543-557 Gilchrist, R. and G. Portides (1995): M-estimation: some remedies, Lectures Notes in Statistics 104 (1995) 117-124 Gill, P.E., Murray, W. and M.A. Saunders (2002): Snopt: An SQP algorithm for large scale constrained optimization, Siam J. Optim. 12 (2002) 979-1006 Gille, J.C., Pelegrin, M. and P. Decaulne (1964): Lehrgang der Regelungstechnik, Verlag Technik, Berlin 1964 Giri, N. (1977): Multivariate statistical inference, Academic Press, New York 1977 Giri, N. (1993): Introduction to probability and statistics, 2nd edition, Marcel Dekker, New York 1993 Giri, N. (1996a): Multivariate statistical analysis, Marcel Dekker, New York 1996 Giri, N. (1996b): Group invariance in statistical inference, World Scientific, Singapore 1996 Girko, V.L. (1988): Spectral theory of random matrices, Nauka, Moscow 1988 Girko, V.L. (1990): Theory of random determinants, Kluwer Academic Publishers, Dordrecht 1990 Girko, V.L. and A.K.Gupta (1996): Multivariate elliptically contoured linear models and some aspects of the theory of random matrices, in: Multidimensional statistical analysis and theory of random matrices, Proceedings of the Sixth Lukacs Symposium, eds. Gupta, A.K. and V.L. Girko, pages 327-386, VSP, Utrecht 1996 Glatzer, E. (1999): Über Versuchsplanungsalgorithmen bei korrelierten Beobachtungen, Master's Thesis, Wirtschaftsuniversität Wien Gleason, J.R. (2000): A note on a proposed student t approximation, Computational Statistics & Data Analysis 34 (2000) 63-66 Gleick, J. (1987): Chaos, Viking, New York 1987 Glesser, L.J. and I. Olkin (1972): Estimation for a regression model with an unknown covariance matrix, in: Proceedings of the Sixth Berkeley Symposium on Mathemati-
References
683
cal Statistics and Probability, pp. 541-569, Cam, L.M.L, Neyman, J. and Scott, E.L., University of California Press, Berkley and Los Angeles 1972 Glimm, J. (1960): On a certain class of operator algebras, Trans. American Mathematical Society 95 (1960) 318-340 Gnedenko, B.V. and A.N. Kolmogorov (1968): Limit distributions for sums of independent random variables, Addison-Wesley Publ., Reading, Mass. 1968 Gnedin, A.V. (1993): On multivariate extremal processes, J. Multivariate Analysis 46 (1993) 207-213 Gnedin, A.V. (1994): On a best choice problem with dependent criteria, J. Applied Probability 31 (1994) 221-234 Gneiting, T. (1999): Correlation functions for atmospheric data analysis, Q. J. R. Meteorol. Soc. 125 (1999) 2449-2464 Gnot, S. and A. Michalski (1994): Tests based on admissible estimators in two variance components models, Statistics 25 (1994) 213-223 Gnot, S. and G. Trenkler (1996): Nonnegative quadratic estimation of the mean squared errors of minimax estimators in the linear regression model, Acta Applicandae Mathematicae 43 (1996) 71-80 Goad, C.C. (1996): Single-site GPS models, in: GPS for Geodesy, pp. 219-237, Teunissen, P.J.G. and A. Kleusberg (eds), Berlin 1996 Godambe, V.P. (1991): Estimating Functions, Oxford University Press 1991 Godambe, V.P. (1995): A unified theory of sampling from finite populations, J. Roy. Statist. Soc B17 (1955) 268-278 Göbel, M. (1998): A constructive description of SAGBI bases for polynomial invariants of permutation groups, J. Symbolic Computation 26 (1998) 261-272 Goldberger, A.S. (1962): Best linear unbiased prediction in the generalized linear regression model, J. Amer. Statist. Ass. 57 (1962) 369-375 Goldie, C.M. and S. Resnick (1989): Records in a partially ordered set, Annals Probability 17 (1989) 678-689 Goldie, C.M. and S. Resnick (1995): Many multivariate records, Stochastic Processes Appl. 59 (1995) 185-216 Goldie, C.M. and S. Resnick (1996): Ordered independent scattering, Commun. Statist. Stochastic Models 12 (1996) 523-528 Goldstine, H. (1977): A history of numerical analysis from the 16th through the 19th century, Springer Verlag, New York 1977 Golshstein, E.G. and N.V. Tretyakov (1996): Modified Lagrangian and monotone maps in optimization, J. Wiley, New York 1996 Golub, G.H. (1965): Numerical methods for solving linear least squares solution, Numer. Math. 7 (1965) 206-216 Golub G.H. (1968): Least squares, singular values and matrix approximations, Aplikace Matematiky 13 (1968) 44-51 Golub, G.H. (1973): Some modified matrix eigenvalue problems, SIAM Review 15 (1973) 318-334 Golub, G.H. and C.F. van Loan (1996): Matrix computations, 3rd edition, John Hopkins University Press, Baltimore 1996 Golub, G.H. and W. Kahan (1965): Calculating the singular values and pseudo-inverse of a matrix, SIAM J Numer. Anal. 2 (1965) 205-224 Golub, G.H. and C. Reinsch (1970): Singular value decomposition and least squares solutions, Numer. Math. 14 (1970) 403-420 Golub, G.H. and U. von Matt (1991): Quadratically constrained least squares and quadratic problems, Numer. Math. 59 (1991) 561-580
684
References
Golub, G.H., Hansen, P.C. and D.P. O'Leary (1999): Tikhonov regularization and total least squares, SIAM J. Matrix Anal. Appl. 21 (1999) 185-194 Gómez, E., Gómez-Villegas, M.A. and J.M. Marín (1998): A multivariate generalization of the power exponential family of distributions, Commun. Statist. - Theory Meth. 27 (1998) 589-600 Gonin, R. and A.H. Money (1987a): Outliers in physical processes: L1- or adaptive Lpnorm estimation?, in: Statistical Data Analysis Based on the L1 Norm and Related Methods, Dodge Y. (ed), North-Holland 1987 Gonin, R. and A.H. Money (1987b): A review of computational methods for solving the nonlinear L1 norm estimation problem, in: Statistical data analysis based on the L1 norm and related methods, Ed. Y. Dodge, North Holland 1987 Gonin, R. and A.H. Money (1989): Nonlinear lp-norm estimation, Marcel Dekker, New York 1989 Goodall, C. (1991): Procrustes methods in the statistical analysis of shape, J. Royal Statistical Society B 53 (1991) 285-339 Goodal, C.R. (1993): Computation using the QR decomposition, C.R. Rao, ed., Handbook of Statistic 9 (1993) 467-508 Goodman, J.W. (1985): Statistical optics, Wiley, New York 1985 Gordon, A.D. (1997): L1-norm and L2-norm methodology in cluster analysis, Student 2 (1997) 181-193 Gordon, A.D. (1999): Classification, 2nd edition, Chapman and Hall, Yew York 1999 Gordon, L. and M. Hudson (1977): A characterization of the Von Mises Distribution, Ann. Statist. 5 (1977) 813-814 Gordonova, V.I. (1973): The validation of algorithms for choosing the regularization parameter, Zh. vychisl. mat. mat. fiz. 13 (1973) 1328-1332 Gorman, T.W.: (2001): Adaptive estimation using weighted least squares, Aust. N. Z. J. Stat. 43 (2001) 287-297 Gotthardt, E. (1978): Einführung in die Ausgleichungsrechnung, 2. Auflage, Karlsruhe 1978 Gould, A.L. (1969): A regression technique for angular varietes, Biometrica 25 (1969) 683-700 Gower, J.C. and G.B. Dijksterhuis (2004): Procrustes Problems, Oxford Statistical Science Series 30, Oxford 2004 Grafarend, E.W. (1967a): Bergbaubedingte Deformation und ihr Deformationstensor Bergbauwissenschaften 14 (1967) 125-132 Grafarend, E.W. (1967b): Allgemeiner Fehlertensor bei a priori und a posteriori Korrelationen, Z. Vermessungswesen 92 (1967) 157-165 Grafarend, E.W. (1969): Helmertsche Fußpunktkurve oder Mohrscher Kreis?, Allg. Vermessungsnachrichten 76 (1969) 239-240 Grafarend, E.W. (1970a): Verallgemeinerte Methode der kleinsten Quadrate für zyklische Variable, Z. Vermessungswesen 4 (1970) 117-121 Grafarend, E.W. (1970b): Die Genauigkeit eines Punktes im mehrdimensionalen Euklidischen Raum, Deutsche Geodätische Kommission bei der Bayerischen Akademie der Wissenschaften C 153, München 1970 Grafarend, E.W. (1970c): Fehlertheoretische Unschärferelation, Festschrift Professor Dr.Ing. Helmut Wolf, 60. Geburtstag, Bonn 1970 Grafarend, E.W. (1971a): Mittlere Punktfehler und Vorwärtseinschneiden, Z. Vermessungswesen 96 (1971) 41-54 Grafarend, E.W. (1971b): Isotropietests von Lotabweichungen Westdeutschlands, Z. Geophysik 37 (1971) 719-733
References
685
Grafarend, E.W. (1972a): Nichtlineare Prädiktion, Z. Vermessungswesen 97 (1972) 245255 Grafarend, E.W. (1972b): Isotropietests von Lotabweichungsverteilungen Westdeutschlands II, Z. Geophysik 38 (1972) 243-255 Grafarend, E.W. (1972c): Genauigkeitsmaße geodätischer Netze, Deutsche Geodätische Kommission bei der Bayerischen Akademie der Wissenschaften A 73, München 1972 Grafarend, E.W. (1973a): Nichtlokale Gezeitenanalyse, Mitt. Institut für Theoretische Geodäsie No. 13, Bonn 1973 Grafarend, E.W. (1973b): Optimales Design geodätische Netze 1 (zus. P. Harland), Deutsche Geodätische Kommission bei der Bayerischen Akademie der Wissenschaften A 74, München 1973 Grafarend, E.W. (1974): Optimization of geodetic networks, Bollettino di Geodesia e Scienze Affini 33 (1974) 351-406 Grafarend, E.W. (1975): Second order design of geodetic nets, Z. Vermessungswesen 100 (1975) 158-168 Grafarend, E.W. (1976): Geodetic applications of stochastic processes, Physics of the Earth and Planetory Interiors 12 (1976) 151-179 Grafarend, E.W. (1978): Operational geodesy, in: Approximation Methods in Geodesy, eds. H. Moritz and H. Sünkel, pp. 235-284, H. Wichmann Verlag, Karlsruhe 1978 Grafarend, E.W. (1979): Kriterion-Matrizen I - zweidimensional homogene und isotope geodätische Netze - Z. Vermessungswesen 104 (1979) 133-149 Grafarend, E.W. (1983): Stochastic models for point manifolds, in: Mathematical models of geodetic/ photogrammetric point determination with regard to outliers and systematic errors, ed. F.E. Ackermann, Report A 98, 29-52, Deutsche Geodätische Kommission, Bayerische Akademie der Wissenschaften, München 1983 Grafarend, E.W. (1984): Variance-covariance component estimation of Helmert type in the Gauss-Helmert model, Z. Vermessungswesen 109 (1984) 34-44 Grafarend, E.W. (1985a): Variance-covariance component estimation, theoretical results and geodetic applications, Statistics and Decision, Supplement Issue No. 2 (1985) 407-447 Grafarend, E.W. (1985b): Criterion matrices of heterogeneously observed threedimensional networks, Manuscripta Geodaetica 10 (1985) 3-22 Grafarend, E.W. (1985c): Criterion matrices for deforming networks, in: Optimization and Design of Geodetic Networks, E.W. Grafarend and F. Sanso (eds.) pages 363428, Springer-Verlag, Berlin-Heidelberg-New York-Tokyo 1985 Grafarend, E.W. (1986): Generating classes of equivalent linear models by nuisance parameter elimination - applications to GPS observations, Manuscripta Geodaetica 11 (1986) 262-271 Grafarend, E.W. (1989a): Four lectures on special and general relativity, Lecture Notes in Earth Sciences, F. Sanso and R. Rummel (eds.), Theory of Satellite Geodesy and Gravity Field Determination, Nr. 25, pages 115-151, Springer Verlag Berlin - Heidelberg - New York - London - Paris - Tokyo - Hongkong 1989 Grafarend, E.W. (1989b): Photogrammetrische Positionierung, Festschrift Prof. Dr.-Ing. Dr. h.c. Friedrich Ackermann zum 60. Geburtstag, Institut für Photogrammetrie, Universität Stuttgart, Report 14, pages 45-55, Suttgart 1989. Grafarend, E.W. (1991a): Relativistic effects in geodesy, Report Special Study Group 4.119, International Association of Geodesy, Contribution to "Geodetic Theory and Methodology" ed. F. Sanso, 163-175, Politecnico di Milano, Milano/Italy 1991 Grafarend, E.W. (1991b): The Frontiers of Statistical Scientific Theory and Industrial Applications (Volume II of the Proceedings of ICOSCO-I), American Sciences Press, pages 405-427, New York 1991
686
References
Grafarend, E.W. (1998): Helmut Wolf – das wissenschaftliche Werk - the scientific work, Heft A 115, Deutsche Geodätische Kommission, Bayerische Akademie der Wissenschaften, C.H. Beck’sche Verlagsbuchhandlung, 97 Seiten, München 1998 Grafarend, E.W. (2000): Mixed integer-real valued adjustment (IRA) problems, GPS Solutions 4 (2000) 31-45 Grafarend, E.W. and J. Awange (2002a): Nonlinear adjustment of GPS observations of type pseudo-ranges, GPS Solutions 5 (2002) 80-93 Grafarend, E.W. and J. Awange (2002b): Algebraic solution of GPS pseudo-ranging equations, GPS Solutions 5 (2002) 20-32 Grafarend, E.W. and J. Shan (1997): Estimable quantities in projective networks, Z. Vermessungswesen, Part I, 122 (1997) 218-226, Part II, 122 (1997) 323-333 Grafarend, E.W. and J. Shan (2002): GPS Solutions: closed forms, critical and special configurations of P4P, GPS Solutions 5 (2002) 29-42 Grafarend, E.W. and A. d'Hone (1978): Gewichtsschätzung in geodätischen Netzen, Deutsche Geodätische Kommission bei der Bayerischen Akademie der Wissenschaften A 88, München 1978 Grafarend, E.W. and B. Richter (1978): Threedimensional geodesy II-the datum problem, Z. Vermessungswesen 103 (1978) 44-59 Grafarend, E.W. and G. Kampmann (1996): C10(3): The ten parameter conformal group as a datum transformation in threedimensional Euclidean space, Z. Vermessungswesen 121 (1996) 68-77 Grafarend, E.W. W. and A. Kleusberg (1980): Expectation and variance component estimation of multivariate gyrotheodolite observations, I. Allg. Vermessungsnachrichten 87 (1980) 129-137 Grafarernd, E.W. and F. Krumm (1985): Continuous networks I, in: Optimization and Design of Geodetic Net-works. E.W. Grafarend and F. Sanso (Ed.) pp. 301-341, Springer-Verlag, Berlin-Heidelberg-New York-Tokyo 1985 Grafarend, E.W. and A. Mader (1989): A graph-theoretical algorithm for detecting configuration defects in triangular geodetic networks, Bulletin Géodésique 63 (1989) 387-394 Grafarend, E.W. and V. Mueller (1985): The critical configuration of satellite networks, especially of Laser and Doppler type, for planar configurations of terrestrial points, Manuscripta Geodaetica 10 (1985) 131-152 Grafarend, E.W. and F. Sanso (1985): Optimization and design of geodetic networks, Springer Verlag, Berlin-Heidelberg-New York-Tokyo 1985 Grafarend, E.W. and B. Schaffrin (1974): Unbiased free net adjustment, Surv. Rev. XXII, 171 (1974) 200-218 Grafarend, E.W. and B. Schaffrin (1976): Equivalence of estimable quantities and invariants in geodetic networks, Z. Vermessungswesen 191 (1976) 485-491 Grafarend, E.W. and B. Schaffrin (1979): Kriterion-Matrizen I – zweidimensional homogene und isotope geodätische Netze – Z. Vermessungswesen 104 (1979), 133-149 Grafarend, E.W. and B. Schaffrin (1982): Kriterion Matrizen II: Zweidimensionale homogene und isotrope geodätische Netze, Teil II a: Relative cartesische Koordinaten, Z. Vermessungswesen 107 (1982), 183-194, Teil IIb: Absolute cartesische Koordinaten, Z. Vermessungswesen 107 (1982) 485-493 Grafarend, E.W. and B. Schaffrin (1988): Von der statistischen zur dynamischen Auffasung geodätischer Netze, Z. Vermessungswesen 113 (1988) 79-103 Grafarend, E.W. and B. Schaffrin (1989): The geometry of nonlinear adjustment - the planar trisection problem, Festschrift to Torben Krarup eds. E. Kejlso, K. Poder and C.C. Tscherning, Geodaetisk Institut, Meddelelse No. 58, pages 149-172, Kobenhavn 1989
References
687
Grafarend, E.W. and B. Schaffrin (1991): The planar trisection problem and the impact of curvature on non-linear least-squares estimation, Comput. Stat. Data Anal. 12 (1991) 187-199 Grafarend, E.W. and B. Schaffrin (1993): Ausgleichungsrechnung in linearen Modellen, Brockhaus, Mannheim 1993 Grafarend, E.W. and G. Offermanns (1975): Eine Lotabweichungskarte Westdeutschlands nach einem geodätisch konsistenten Kolmo-gorov-Wiener Modell, Deutsche Geodätische Kommission bei der Bayerischen Akademie der Wissenschaften A 82, München 1975 Grafarend, E.W. and P. Xu (1994): Observability analysis of integrated INS/GPS system, Bollettino di Geodesia e Scienze Affini 103 (1994) 266-284 Grafarend, E.W. and P. Xu (1995): A multi-objective second-order optimal design for deforming networks, Geoph. Journal Int. 120 (1995) 577-589 Grafarend, E.W., Kleusberg, A. and B. Schaffrin (1980): An introduction to the variancecovariance- component estimation of Helmert type, Z. Vermessungswesen 105 (1980) 161-180 Grafarend, E.W., Krarup, T. and R. Syffus (1996): An algorithm for the inverse of a multivariate homogeneous polynomial of degree n, J. Geodesy 70 (1996) 276-286 Grafarend, E.W., Krumm, F. and F. Okeke (1995): Curvilinear geodetic datum transformations, Z. Vermessungswesen 120 (1995) 334-350 Grafarend, E.W., Knickemeyer, E.H. and B. Schaffrin (1982): Geodätische Datumstransformationen, Z. Vermessungswesen 107 (1982) 15-25 Grafarend, E.W., Krumm, F. and B. Schaffrin (1985): Criterion matrices of heterogeneously observed threedimensional networks, Manuscripta Geodaetica 10 (1985) 3-22 Grafarend, E.W., Krumm, F. and B. Schaffrin (1986): Kriterion-Matrizen III: Zweidimensional homogene und isotrope geodätische Netze, Z. Vermessungswesen 111 (1986) 197-207 Grafarend, E.W., Schmitt, G. and B. Schaffrin (1976): Über die Optimierung lokaler geodätischer Netze (Optimal design of local geodetic networks), 7th course, High precision Surveying Engineering (7.Int. Kurs für Ingenieurvermessung hoher Präzision) 29 Sept - 8 Oct 1976, Darmstadt 1976 Grafarend, E.W., Mueller, J.J., Papo, H.B. and B. Richter (1979): Concepts for reference frames in geodesy and geodynamics: the reference directions, Bulletin Géodésique 53 (1979) 195-213 Graham, A. (1981): Kronecker products and matrix calculus, J. Wiley, New York 1981 Gram, J.P. (1883): Über die Entwicklung reeller Funktionen in Reihen mittelst der Methode der kleinsten Quadrate, J. Reine Angew. Math. 94 (1883) 41-73 Granger, C.W.J. and P. Newbold (1986): Forecasting economic time series, 2nd ed., Academic Press, New York 1986 Granger, C.W.J. and T. Teräsvirta (1993): Modelling nonlinear economic relations, Oxford University Press, New York 1993 Graybill, F.A. (1954): On quadratic estimates of variance components, The Annals of Mathematical Statistics 25 (1954) 267-372 Graybill, F.A. (1983): Matrices with applications in statistics, 2nd ed., Wadsworth, Beltmont 1983 Graybill, F.A. and R.A. Hultquist (1961): Theorems concerning Eisenhart’s model II, The Annals of Mathematical Statistics 32 (1961) 261-269 Green, B. (1952): The orthogonal approximation of an oblique structure in factor analysis, Psychometrika 17 (1952) 429-440 Green, P.J. and B.W. Silverman (1993): Nonparametric regression and generalized linear models, Chapman and Hall, Boca Raton 1993
688
References
Greenbaum, A. (1997): Iterative methods for solving linear systems, SIAM, Philadelphia 1997 Greenwood, J.A. and D. Durand (1955): The distribution of length and components of the sum of n random unit vectors, Ann. Math. Statist. 26 (1955) 233-246 Greenwood, P.E. and G. Hooghiemstra (1991): On the domain of an operator between supremum and sum, Probability Theory Related Fields 89 (1991) 201-210 Grenander, U. (1981): Abstract inference, Wiley, New York 1981 Griffith, D.F. and D.J. Higham (1997): Learning LaTeX, SIAM, Philadelphia 1997 Grimstad, A-A. and T. Mannseth (2000): Nonlinearity, scale and sensitivity for parameter estimation problems, SIAM J. Sci. Comput. 21 (2000) 2096-2113 Grodecki, J. (1999): Generalized maximum-likelihood estimation of variance components with inverted gamma prior, J. Geodesy 73 (1999) 367-374 Groechenig, K. (2001): Foundations of time-frequency analysis, Birkäuser Verlag, Boston-Basel-Berlin 2001 Gross, J. (1996a): On a class of estimators in the general Gauss-Markov model, Commun. Statist. – Theory Meth. 25 (1996) 381-388 Gross, J. (1996b): Estimation using the linear regression model with incomplete ellipsoidal restrictions, Acta Applicandae Mathematicae 43 (1996) 81-85 Gross, J. (1998): Statistical estimation by a linear combination of two given statistics, Statistics and Probability Lett. 39 (1998) 379-384 Gross, J. and G. Trenkler (1997): When do linear transforms of ordinary least squares and Gauss-Markov estimator coincide?, Sankhya 59 (1997) 175-178 Gross, J., Trenkler, G. and E.P. Liski (1998): Necessary and sufficient conditions for superiority of misspecified restricted least squares regression estimator, J. Statist. Planning and Inference 71 (1998) 109-116 Gross, J., Trenkler, G. and H.J. Werner (2001): The equality of linear transforms of the ordinary least squares estimator and the best linear unbiased estimator, The Indian J. Statistics 63 (2001) 118-127 Grossmann, W. (1973): Grundzüge der Ausgleichungsrechnung, Springer-Verlag, Berlin 1973 Grubbs, F.E. (1973): Errors of measurement, precision, accuracy and the statistical comparison of measuring instruments, Technometrics 15 (1973) 53-66 Grubbs, F.E. and G. Beck (1972): Extension of sample sizes and percentage points for significance tests of outlying observations, Technometrics 14 (1972) 847-854 Guenther, W.C. (1964): Another derivation of the non-central Chi-Square distribution, J. the American Statistical Association 59 (1964) 957-960 Guérin, C.-A. (2000): Wavelet analysis and covariance structure of some classes of nonstationary processes, The J.Fourier Analysis and Applications 4 (2000) 403-425 Gui, Q. and J. Zhang (1998): Robust biased estimation and its applications in geodetic adjustments, J. Geodesy 72 (1998) 430-435 Gui, Q.M. and J.S. Liu (2000): Biased estimation in the Gauss-Markov model, Allg. Vermessungsnachrichten 107 (2000) 104-108 Gulliksson, M. and P.A. Wedin (2000): The use and properties of Tikhonov filter matrices, SIAM J. Matrix Anal. Appl. 22 (2000) 276-281 Gulliksson, M., Soederkvist, I. and P.A. Wedin (1997): Algorithms for constrained and weighted nonlinear least-squares, Siam J. Optim. 7 (1997) 208-224 Gumbel, E.J., Greenwood, J.A. and D. Durand (1953): The circular normal distribution: theory and tables, J. Amer. Statist. Assoc. 48 (1953) 131-152 Guolin, L. (2000): Nonlinear curvature measures of strength and nonlinear diagnosis, Allg. Vermessungsnachrichten 107 (2000) 109-111
References
689
Guolin, L., Jinyun, G. and T. Huaxue (2000): Two kinds of explicit methods to nonlinear adjustments of free-networks with rank deficiency, Geomatics Research Australasia 73 (2000) 25-32 Guolin, L., Lianpeng, Z. and J. Tao (2001): Linear Space [L,M] N and the law of generalized variance-covariance propagation, Allg. Vermessungsnachrichten 10 (2001) 352356 Gupta, A.K. and D.G. Kabe (1997): Linear restrictions and two step multivariate least squares with aplications, Statistics & Probability Letters 32 (1997) 413-416 Gupta, A.K. and D.G. Kabe (1999a): On multivariate Liouville distribution, Metron 57 (1999) 173-179 Gupta, A.K. and D.G. Kabe (1999b): Distributions of hotelling’s T2 and multiple and partial correlation coefficients for the mixture of two multivariate Gaussian populations, Statistics 32 (1999) 331-339 Gupta, A.K. and D.K. Nagar (1998): Quadratic forms in disguised matrix T-variate, Statistics 30 (1998) 357-374 Gupta, S.S. (1963): Probability integrals of multivariate normal and multivariate t 1, Annals of Mathematical Statistics 34 (1963) 792-828 Gut, A. (2002): On the moment problem, Bernoulli 8 (2002) 407-421 Guttman, I. (1982): Linear models: An Introduction, J. Willey & Sons 1982 Guttman, L. (1946): Enlargement methods for computing the inverse matrix, Ann. Math. Statist. 17 (1946) 336-343 Guu, S.M., Lur, Y.Y. and C.T. Pang (2001): On infinite products of fuzzy matrices, SIAM J. Matrix Anal. Appl. 22 (2001) 1190-1203 Haantjes, J. (1937): Conformal representations of an n-dimensional Euclidean space with a non-definite fundamental form on itself, in: Nederl. Akademie van Wetenschappen, Proc. Section of Sciences, vol. 40, pages 700-705, Noord-Hollandsche Uitgeversmaatschappij, Amsterdam 1937 Haantjes, J. (1940): Die Gleichberechtigung gleichförmig beschleunigter Beobachter für die elektromagnetischen Erscheinungen, in: Nederl. Akademie van Wetenschappen, Proc. Section of Sciences, vol. 43, pages 1288-1299, Noord-Hollandsche Uitgeversmaatschappij, Amsterdam 1940 Habermann, S.J. (1996): Advanced statistics, volmue I: description of populations, Springer Verlag, New York 1996 Hadamard, J. (1899): Theorem sur les series entieres, Acta Math. 22 (1899) 1-28 Haerdle, W., Liang, H. and J. Gao (2000): Partially linear models, Physica-Verlag, Heidelberg 2000 Hager, W.W. (1989): Updating the inverse of a matrix, SIAM Rev. 31 (1989) 221-239 Hager, W.W. (2000): Iterative methods for nearly singular linear systems, SIAM J. Sci. Comput. 22 (2000) 747-766 Hager, W.W. (2002): Minimizing the profile of a symmetric matrix, SIAM J. Sci. Comput. 23 (2002) 1799-1816 Hahn, M. and R. Bill (1984): Ein Vergleich der L1- und L2 - Norm am Beispiel Helmerttransformation, Allg. Vermessungsnachrichten 91 (1984) 441-450 Hahn, W. and P. Weibel (1996): Evolutionäre Symmetrietheorie, Wiss. Verlagsgesellschaft, Stuttgart 1996 Haimo, D. (eds) (1967): Orthogonal expansions and their continuous analogues, Southern Illinois University Press, Carbondale 1967 Haines, G.V. (1985): Spherical cap harmonic analysis, J. Geophysical Research 90 (1985) 2583-2591
690
References
Hald, A. (1998): A history of mathematical statistics from 1750 to 1930, J. Wiley, New York 1998 Hald, A. (2000): The early history of the cumulants and the Gram-Charlier series, International Statistical Review 68 (2000) 137-153 Halmos, P.R. (1946): The theory of unbiased estimation, Ann. Math. Statist. 17 (1946) 34-43 Hammersley, J.M. (1950): On estimating restricted parameters, J.R. Statist. Soc. (B) 12 (1950) 192Hampel, F.R. (1973): Robust estimation: a condensed partial survey, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 27 (1973) 87-104 Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and W.A. Stahel (1986): Robust statistics, J. Wiley, New York 1986 Hanagal, D.D. (1996): UMPU tests for testing symmetry and stress-passing in some bivariate exponential models, Statistics 28 (1996) 227-239 Hand, D.J. and M.J. Crowder (1996): Practical longitudinal data analysis, Chapman and Hall, Boca Raton 1996 Hand, D.J., Daly, F., McConway, K., Lunn, D. and E. Ostrowski (1993): Handbook of small data sets, Chapman and Hall, Boca Raton 1993 Hand, D.J. and C.C. Taylor (1987): Multivariate analysis of variance and repeated measures, Chapman and Hall, Boca Raton 1987 Handl, A.: Multivariate Analysemethoden. Theorie und Praxis multivariater Verfahren unter besonderer Berücksichtigung von S-PLUS, Springer-Verlag Hanke, M. (1991): Accelerated Landweber iterations for the solution of ill-posed equations, Numer. Math. 60 (1991) 341-375 Hanke, M. and P.C. Hansen (1993): Regularization methods for large-scale problems, Surveys Math. Indust. 3 (1993) 253-315 Hansen, P.C. (1987): The truncated SVD as a method for regularization, BIT 27 (1987) 534-553 Hansen, P.C. (1990): The discrete Picard condition for discrete ill-posed problems, BIT 30 (1990) 658-672 Hansen, P.C. (1990): Truncated singular value decomposition solutions to discrete illposed problems with ill-determined numerical rank, SIAM J. Sci. Statist. Comput. 11 (1990) 503-518 Hansen, P.C. (1994): Regularization tools: a matlab package for analysis and solution of discrete ill-posed problems, Numer. Algorithms 6 (1994) 1-35 Hansen, P.C. (1995): Test matrices for regularization methods, SIAM J. Sci. Comput. 16 (1995) 506-512 Hansen, P.C. (1998): Rank-deficient and discrete ILL-posed problems, SIAM, Philadelphia 1998 Hardtwig, E. (1968): Fehler- und Ausgleichsrechung, Bibliographisches Institut, Mannheim 1968 Harley, B.I. (1956): Some properties of an angular transformation for the correlation coefficient, Biometrika 43 (1956) 219-223 Harter, H.L. (1964): Criteria for best substitute interval estimators with an application to the normal distribution, J. Amer. Statist. Assoc 59 (1964) 1133-1140 Harter, H.L. (1974/75): The method of least squares and some alternatives (five parts) International Statistics Review 42 (1974) 147-174, 235-264, 43 (1975) 1-44, 125-190, 269-278 Harter, H.L. (1977): The non-uniqueness of absolute values regression, Commun. Statist. Simul. Comput. 6 (1977) 829-838
References
691
Hartley, H.O. and J.N.K. Rao (1967): Maximum likelihood estimation for the mixed analysis of variance model, Biometrika 54 (1967) 93-108 Hartman, P. and G.S. Watson (1974): „Normal“ distribution functions on spheres and the modified Bessel function, Ann. Prob. 2 (1974) 593-607 Hartmann, C., Van Keer Berghen P., Smeyersverbeke, J. and D.L. Massart (1997): Robust orthogonal regression for the outlier detection when comparing two series of measurement results, Analytica Chimica Acta 344 (1997) 17-28 Hartung, J. (1981): Non-negative minimum biased invariant estimation in variance componet models, Annals of Statistics 9 (1981) 278-292 Hartung, J. (1999): Ordnungserhaltende positive Varianzschätzer bei gepaarten Messungen ohne Wiederholungen, Allg. Statistisches Archiv 83 (1999) 230-247 Hartung, J. and B. Elpelt (1989): Multivariate Statistik, Oldenbourg Verlag, München 1989 Hartung, J. and K.H. Jöckel (1982): Zuverlässigkeits- und Wirtschaftlichkeitsüberlegungen bei Straßenverkehrssignalanlagen, Qualität und Zuverlässigkeit 27 (1982) 65-68 Hartung, J. and D. Kalin (1980): Zur Zuverlässigkeit von Straßenverkehrssignalanlagen, Qualität und Zuverlässigkeit 25 (1980) 305-308 Hartung, J. and B. Voet (1986): Best invariant unbiased estimators for the mean squared error of variance component estimators, J. American Statist. Assoc. 81 (1986) 689691 Hartung, J. and H.J. Werner (1980): Zur Verwendung der restringierten Moore-PenroseInversen beim Testen von linearen Hypothesen, Z. Angew. Math. Mechanik 60 (1980) T344-T346 Hartung, J., Elpelt, B. and K.H. Klösener (1995): Statistik, Oldenbourg Verlag, München 1995 Hartung, J. et al (1982): Statistik, R. Oldenbourg Verlag, München 1982 Harvey, A.C. (1993): Time series models, 2nd ed., Harvester Wheatsheaf, New York 1993 Harville, D.A. (1976): Extension of the Gauss-Markov theorem to include the estimation of random effects, Annals of Statistics 4 (1976) 384-395 Harville, D.A. (1977): Maximum likelihood approaches to variance component estimation and to related problems, J.the American Statistical Association 72 (1977) 320-339 Harville, D.A. (1997): Matrix algebra from a statistician’s perspective, Springer Verlag, New York 1997 Harville, D.A. (2001): Matrix algebra: exercises and solutions, Springer Verlag, New York 2001 Hasssanein, K.M. and E.F. Brown (1996): Moments of order statistics from the rayleigh distribution, J. Statistical Research 30 (1996) 133-152 Hassibi, A. and S. Boyd (1998): Integer parameter estimation in linear models with applications to GPS, JEEE Trans. on Signal Processing 46 (1998) 2938-2952 Haslett, J. and K. Hayes (1998): Residuals for the linear model with general covariance structure, J. Royal Statistical Soc. B60 (1998) 201-215 Hastie, T.J. and R.J. Tibshirani (1990): Generalized additive models, Chapman and Hall, Boca Raton 1990 Hauser, M.A., Pötscher, B.M. and E. Reschenhofer (1999): Measuring persistence in aggregate output: ARMA models, fractionally integrated ARMA models and nonparametric procedures, Empirical Economics 24 (1999) 243-269 Haussdorff, F. (1901): Beiträge zur Wahrscheinlichkeitsrechnung, Königlich Sächsische Gesellschaft der Wissenschaften zu Leipzig, berichte Math. Phys. Chasse 53 (1901) 152-178
692
References
Hawkins, D.M. (1993): The accuracy of elemental set approximation for regression, J. Amer. Statist. Assoc. 88 (1993) 580-589 Hayes, K. and J. Haslett (1999): Simplifying general least squares, American Statistician 53 (1999) 376-381 He, K. (1995): The robustness of bootstrap estimator of variance, J. Ital. Statist. Soc. 2 (1995) 183-193 He, X. (1991): A local breakdown property of robust tests in linear regression, J. Multivar. Analysis 38, 294-305, 1991 He, X., Simpson, D.G. and Portnoy, S.L. (1990): Breakdown robustness of tests, J. Am. Statis. Assn 85, 446-452, 1990 Healy, D.M. (1998): Spherical Deconvolution, J. Multivariate Analysis 67 (1998) 1-22 Heck, B. (1981): Der Einfluss einzelner Beobachtungen auf das Ergebnis einer Ausgleichung und die Suche nach Ausreißern in den Beobachtungen, Allg. Vermessungsnachrichten 88 (1981) 17-34 Heideman, M.T., Johnson, D.H. and C.S. Burrus (1984): Gauss and the history of the fast Fourier transform, JEEE ASSP Magazine 1 (1984) 14-21 Heiligers, B. (1994): E-optimal designs in weighted polynomial regression, Ann. Stat. 22 (1994) 917-929 Heine, V. (1955): Models for two-dimensional stationary stochastic processes, Biometrika 42 (1955) 170-178 Heinrich, L. (1985): Nonuniform estimates, moderate and large derivations in the central limit theorem for m-dependent random variable, Math. Nachr. 121 (1985) 107-121 Hekimoglu, S. (1998): Application of equiredundancy design to M-estimation, J.Surveying Engineering 124 (1998) 103-124 Hekimoglu, S. (2005): Do robust methods identify outliers more reliably than conventional tests for outliers, Z. Vermessungswesen 3 (2005) 174-180 Hekimoglu, S. and M. Berber (2003): Effectiveness of robust methods in heterogeneous linear models, J. Geodesy 76 (2003) 706-713 Hekimoglu, S and K.-R. Koch (2000): How can reliability of the test for outliers be measured?, Allg. Vermessungsnachrichten 7 (2000) 247-253 Helmert, F.R. (1875): Über die Berechnung des wahrscheinlichen Fehlers aus einer endlichen Anzahl wahrer Beobachtungsfehler, Z. Math. U. Physik 20 (1875) 300-303 Helmert, F.R. (1876): Diskussion der Beobachtungsfehler in Koppes Vermessung für die Gotthardtunnelachse, Z. Vermessungswesen 5 (1876) 129-155 Helmert, F.R. (1876a): Die Genauigkeit der Formel von Peters zur Berechnung des wahrscheinlichen Fehlers direkter Beobachtungen gleicher Genauigkeit, Astron. Nachrichten 88 (1976) 113-132 Helmert, F.R. (1876b): Über die Wahrscheinlichkeit der Potenzsummen der Beobachtungsfehler, Z. Math. U. Phys. 21 (1876) 192-218 Helmert, F.R. (1907): Die Ausgleichungsrechnung nach der Methode der kleinsten Quadrate, mit Anwendungen auf die Geodäsie, die Physik und die Theorie der Messinstrumente, B.G. Teubner, Leipzig – Berlin 1907 Henderson, H.V. (1981): The vec-permutation matrix, the vec operator and Kronecker products: a review, Linear and Multilinear Algebra 9 (1981) 271-288 Henderson, H.V. and S.R. Searle (1981a): Vec and vech operators for matrices, with some uses in Jacobians and multivariate statistics Henderson, H.V. and S.R. Searle (1981b): On deriving the inverse of a sum of matrices, SIAM Review 23 (1981) 53-60 Henderson, H.V., Pukelsheim, F. and S.R. Searle (1983): On the history of the Kronecker product, Linear and Multilinear Algebra 14 (1983) 113-120
References
693
Hendriks, H. and Z. Landsman (1998): Mean location and sample mean location on manifolds: Asymptotics, tests, confidence regions, J. Multivar. Analysis 67 (1998) 227-243 Hengst, M. (1967): Einführung in die Mathematische Statistik und ihre Anwendung, Bibliographisches Institut, Mannheim 1967 Henrici, P. (1962): Bounds for iterates, inverses, spectral variation and fields of values of non-normal matrices, Numer. Math. 4 (1962) 24-40 Herzberg, A.M. and A.V. Tsukanov (1999): A note on the choice of the best selection criterion for the optimal regression model, Utilitas Mathematica 55 (1999) 243-254 Hesse, K. (2003): Domain decomposition methods in multiscale geopotential determination from SST and SGG, Berichte aus der Mathematik, Shaker Verlag, Aachen 2003 Hetherington, T.J. (1981): Analysis of directional data by exponential models, PhD. Thesis, University of California, Berkeley 1981 Hext, G.R. (1963): The estimation of second-order tensors, with related tests and designs, Biometrika 50 (1963) 353-373 Heyde, C.C. (1997): Quasi-likelihood and its application. A general approach to optimal parameter estimation, Springer Verlag, New York 1997 Hickernell, F.J. (1999): Goodness-of-fit statistics, discrepancies and robust designs, Statistics and Probability Letters 44 (1999) 73-78 Hida, T. and S. Si (2004): An innovation approach to random field, Application to white noise theory, Probability and Statistics 2004 Higham, N.J. and F. Tisseur (2000): A block algorithm for matrix 1-norm estimation, with an application to 1-norm pseudospectra, SIAM J. Matrix Anal. Appl. 21 (2000) 11851201 Hinde, J. (1998): Overdispersion: models and estimation, Comput. Stat. & Data Anal. 27 (1998) 151-170 Hinkelmann, K. (ed) (1984): Experimental design, statistical models, and genetic statistics, Marcel Dekker, Inc. 1984 Hinkley, D. (1979): Predictive likelihood, Ann. Statist. 7 (1979) 718-728 Hinkley, D., Reid, N. and E.J. Snell (1990): Statistical theory and modelling, Chapman and Hall, Boca Raton 1990 Hjorth, J.S.U. (1993): Computer intensive statistical methods, Chapman and Hall, Boca Raton 1993 Ho, L.L. (1997): Regression models for bivariate counts, Brazilian J. Probability and Statistics 11 (1997) 175-197 Hoaglin, D.C. and R.E. Welsh (1978): The Hat Matrix in regression and ANOVA, The American Statistician 32 (1978) 17-22 Hocking, R.R. (1996): Methods and applications of linear models – regression and the analysis of variance, John Wiley & Sons. Inc 1996 Hodge, W. and D. Pedoe (1968): Methods of algebraic geometry, I, Cambridge University Press, Cambridge 1968 Hoel, P.G. (1965): Minimax distance designs in two dimensional regression, Ann. Math. Statist. 36 (1965) 1097-1106 Hoel, P.G., S.C. Port and C.J. Stone (1972): Introduction to stochastic processes, Houghton Mifflin Publ., Boston 1972 Hoerl, A.E. and R.W. Kennard (2000): Ridge regression: biased estimation for nonorthogonal problems, Technometrics 42 (2000) 80-86 Hoepke, W. (1980): Fehlerlehre und Ausgleichungsrechnung, De Gruyter, Berlin 1980 Hoffmann, K. (1992): Improved estimation of distribution parameters: Stein-type estimators, Teubner-Texte zur Mathematik, Stuttgart/Leipzig 1992
694
References
Hofmann, B. (1986): Regularization for applied inverse and ill-posed problems, Teubner Texte zur Mathematik 85, Leipzig 1986 Hogg, R.V. (1972): Adaptive robust procedures: a partial review and some suggestions for future applications and theory, J. American Statistical Association 43 (1972) 10411067 Hogg, R.V. (1974): Adaptive robust procedures: a partial review and some suggestions for future applications and theory, J. American Statistical Association 69 (1974) 909923 Hogg, R.V. and R.H. Randles (1975): Adaptive distribution free regression methods and their applications, Technometrics 17 (1975) 399-407 Holota, P. (2001): Variational methods in the representation of the gravitational potential, Cahiers du Centre Européen de Géodynamique et de Sésmologie 2001 Holota, P. (2002): Green’s function and external masses in the solution of geodetic boundary-value problems, Presented at the 3rd Meeting of the IAG Intl. Gravity and Geoid Comission, Thessaloniki, Greece, August 26-30, 2002 Holschneider, M. (2000): Introduction to continuous wavelet analysis, in: Klees, R. and R. Haagmans (eds): Wavelets in the geosciences, Springer 2000 Hong, C.S. and H.J. Choi (1997): On L1 regression coefficients, Commun. Statist. Simul. Comp. 26 (1997) 531-537 Hora, R.B. and R.J. Buehler (1965): Fiducial theory and invariant estimation, Ann. Math. Statist. 37 (1965) 643-656 Horn, R.A. (1989): The Hadamard product, in Matrix Theory and Applications, C.R. Johnson, ed., Proc. Sympos. Appl. Math. 40 (1989) 87-169 Horn, R.A. and C.R. Johnson (1990): Matrix analysis, Cambridge University Press, Cambridge 1990 Horn, R.A. and C.R. Johnson (1991): Topics on Matrix analysis, Cambridge University Press, Cambridge 1991 Hornoch, A.T. (1950): Über die Zurückführung der Methode der kleinsten Quadrate auf das Prinzip des arithmetischen Mittels, Österr. Z. Vermessungswesen 38 (1950) 13-18 Hosking, J.R.M. and J.R. Wallis (1997): Regional frequency analysis. An approach based on L-moments, Cambridge University Press 1997 Hosoda, Y. (1999): Truncated least-squares least norm solutions by applying the QR decomposition twice, trans. Inform. Process. Soc. Japan 40 (1999) 1051-1055 Hotelling, H. (1953): New light on the correlation coefficient and its transform, J. Royal Stat. Society, Series B, 15 (1953) 225-232 Hoyle, M.H. (1973): Transformations- an introduction and a bibliography, Int. Statist. Review 41 (1973) 203-223 Hsu, J.C. (1996): Multiple comparisons, Chapman and Hall, Boca Raton 1996 Hsu, P.L. (1940): An algebraic derivation of the distribution of rectangular coordinates, Proc. Edinburgh Math. Soc. 6 (1940) 185-189 Hsu, R. (1999): An alternative expression for the variance factors in using Iterated Almost Unbiased Estimation, J. Geodesy 73 (1999) 173-179 Hsu, Y.S., Metry, M.H. and Y.L. Tong (1999): Hypotheses testing for means of dependent and heterogeneous normal random variables, J. Statist. Planning and Inference 78 (1999) 89-99 Huang, J.S. (1999): Third-order expansion of mean squared error of medians, Statistics & Probability Letters 42 (1999) 185-192 Huber, P.J. (1964): Robust estimation of a location parameter, Annals Mathematical Statistics 35 (1964) 73-101
References
695
Huber, P.J. (1972): Robust statistics: a review, Annals Mathematical Statistics 43 (1972) 1041-1067 Huber, P.J. (1981): Robust Statistics, J. Wiley, New York 1981 Huda, S. and A.A. Al-Shiha (1999): On D-optimal designs for estimating slope, The Indian J. Statistics 61 (1999) 485-495 Huet, S., A. Bouvier, M.A. Gruet and E. Jolivet (1996): Statistical tools for nonlinear regression, Springer Verlag, New York 1996 Hunter, D.B. (1995): The evaluation of Legendre functions of the second kind, Numerical Algorithms 10 (1995) 41-49 Huwang, L. and Y.H.S. Huang (2000): On errors-in-variables in polynomical regressionBerkson case, Statistica Sinica 10 (2000) 923-936 Hwang, C. (1993): Fast algorithm for the formation of normal equations in a least-squares spherical harmonic analysis by FFT, Manuscripta Geodaetica 18 (1993) 46-52 Ibragimov, F.A. and R.Z. Kasminskii (1981): Statistical estimation, asymptotic theory, Springer Verlag, New York 1981 Ihorst, G. and G. Trenkler (1996): A general investigation of mean square error matrix superiority in linear regression, Statistica 56 (1996) 15-23 Imhof, L. (2000): Exact designs minimising the integrated variance in quadratic regression, Statistics 34 (2000) 103-115 Imhof, J.P. (1961): Computing the distribution of quadratic forms in normal variables, Biometrika 48 (1961) 419-426 Inda, M.A. de et al (1999): Parallel fast Legendre transform, proceedings of the ECMWF Workshop “Towards TeraComputing – the Use of Parallel Processors in Meteorology”, Worls Scientific Publishing Co 1999 Irle, A. (1990): Sequentialanalyse: Optimale sequentielle Tests, Teubner Skripten zur Mathematischen Stochastik. Stuttgart 1990 Irle, A. (2001): Wahrscheinlichkeitstheorie und Statistik, Teubner 2001 Irwin, J.O. (1927): On the frequency distribution of the means of samples from a population having any law of frequency with finite moments with special reference to Pearson’s Type II, Biometrika 19 (1927) 225-239 Isham, V. (1993): Statistical aspects of chaos, in: Networks and Chaos, Statistical and Probabilistic Aspects (ed. D.E. Barndorff-Nielsen et al) 124-200, Chapman and Hall, London 1993 Ishibuchi, H., Nozaki, K. and H. Tanaka (1992): Distributed representation of fuzzy rules and its application to pattern classification, Fuzzy Sets and Systems 52 (1992) 21-32 Ishibuchi, H., Nozaki, K., Yamamoto, N. and H. Tanaka (1995): Selecting fuzzy if-then rules for classification problems using genetic algorithms, IEEE Transactions on Fuzzy Systems 3 (1995) 260-270 Ishibuchi, H. and T. Murata (1997): Minimizing the fuzzy rule base and maximizing its performance by a multi-objective genetic algorithm, in: Sixth FUZZ-IEEE Conference, Barcelona 1997, pp. 259-264 Izenman, A.J. (1975): Reduced-rank regression for the multivariate linear model, J. Multivariate Analysis 5 (1975) 248-264 Jacob, N. (1996): Pseudo-differential operators and Markov processes, Akademie Verlag, Berlin 1996 Jacobi, C.G.J. (1841): Deformatione et proprietatibus determinatum, Crelle's J. reine angewandte Mathematik, Bd.22 Jacod, J. and P. Protter (2000): Probability essentials, Springer Verlag, Berlin 2000 Jaeckel, L.A. (1972): Estimating regression coefficients by minimizing the dispersion of the residuals, Annals Mathematical Statistics 43 (1972) 1449-1458
696
References
Jajuga, K. (1995): On the principal components of time series, Statistics in Transition 2 (1995) 201-205 James, A.T. (1954): Normal multivariate analysis and the orthogonal group, Ann. Math. Statist. 25 (1954) 40-75 Jammalamadaka, S.R. and A.SenGupta (2001): Topics in circular statistics, World Scientific, Singapore 2001 Janacek, G. (2001): Practical time series, Arnold, London 2001 Jennison, C. and B.W. Turnbull (1997): Distribution theory of group sequential t, x2 and F-Tests for general linear models, Sequential Analysis 16 (1997) 295-317 Jennrich, R.I. (1969): Asymptotic properties of nonlinear least squares estimation, Ann. Math. Statist. 40 (1969) 633-643 Jensen, J.L. (1981): On the hyperboloid distribution, Scand. J. Statist. 8 (1981) 193-206 Jiancheng, L., Dingbo, C. and N. Jinsheng (1995): Spherical cap harmonic expansion for local gravity field representation, Manuscripta Geodaetica 20 (1995) 265-277 Jiang, J. (1997): A derivation of BLUP - Best linear unbiased predictor, Statistics & Probability Letters 32 (1997) 321-324 Jiang, J. (1999): On unbiasedness of the empirical BLUE and BLUP, Statistics & Probability 41 (1999) 19-24 Jiang, J., Jia, H. and H. Chen (2001): Maximum posterior estimation of random effects in generalized linear mixed models, Statistica Sinica 11 (2001) 97-120 Joe, H. (1997): Multivariate models and dependence concepts, Chapman and Hall, Boca Raton 1997 John, P.W.M. (1998): Statistical design and analysis of experiments, SIAM 1998 Johnson, N.L. and S. Kotz (1970) : Continuous univariate distributions-1, distributions in statistics, Houghton Mifflin Company Boston 1970 Johnson, N.L., Kotz, S. and A.W. Kemp (1992): Univariate discrete distributions, J. Willey & Sons 1992 Joergensen, B. (1984): The delta algorithm and GLIM, Int. Statist. Review 52 (1984) 283300 Joergensen, B. (1997): The theory of dispersion models, Chapman and Hall, Boca Raton 1997 Joergensen, B., Lundbye-Christensen, S., Song, P.X.-K. and L. Sun (1996b): State space models for multivariate longitudinal data of mixed types, Canad. J. Statist. 24 (1996b) 385-402 Jorgensen, P.C., Kubik, K., Frederiksen, P. and W. Weng (1985): Ah, robust estimation!, Australian J. Geodesy, Photogrammetry and Surveying 42 (1985) 19-32 John, S. (1962): A tolerance region for multivariate normal distributions, Sankya A24 (1962) 363-368 Johnson, N.L. and S. Kotz (1970a): Continuous univariate distributions – 2, Houghton Mifflin Company, Boston 1970 Johnson, N.L. and S. Kotz (1970b): Discrete distribution, Houghton Mifflin Company, Boston 1970 Johnson, N.L. and S. Kotz (1972): Distributions in statistics: continuous multivariate distributions, J. Wiley, New York 1972 Johnson, N.L., Kotz, S. and X. Wu (1991): Inspection errors for attributes in quality control, Chapman and Hall, Boca Raton 1991 Joshi, V.M. (1966): Admissibility of confidence intervals, Ann. Math. Statist. 37 (1966) 629-638 Judge, G.G. and M.E. Bock (1978): The statistical implications of pre-test and Stein-rule estimators in econometrics, Amsterdam 1978
References
697
Judge, G.G. and T.A. Yancey (1981): Sampling properties of an inequality restricted estimator, Economics Lett. 7 (1981) 327-333 Judge, G.G. and T.A. Yancey (1986): Improved methods of inference in econometrics, Amsterdam 1986 Jukiü, D. and R. Scitovski (1997): Existence of optimal solution for exponential model by least squares, J. Comput. Appl. Math. 78 (1997) 317-328 Jupp, P.E. and K.V. Mardia (1980): A general correlation coefficient for directional data and related regression problems, Biometrika 67 (1980) 163-173 Jupp, P.E. and K.V. Mardia (1989): A unified view of the theory of directional statistics, 1975-1988, International Statist. Rev. 57 (1989) 261-294 Jureckova, J. (1995): Affine- and scale-equivariant M-estimators in linear model, Probability and Mathematical Statistics 15 (1995) 397-407 Jurisch, R. and G. Kampmann (1997): Eine Verallgemeinerung des arithmetischen Mittels für einen Freiheitsgrad bei der Ausgleichung nach vermittelnden Beobachtungen, Z. Vermessungswesen 11 (1997) 509-520 Jurisch, R. and G. Kampmann (1998): Vermittelnde Ausgleichungsrechnung mit balancierten Beobachtungen – erste Schritte zu einem neuen Ansatz, Z. Vermessungswesen 123 (1998) 87-92 Jurisch, R. and G. Kampmann (2001): Plücker-Koordinaten – ein neues Hilfsmittel zur Geometrie- Analyse und Ausreissersuche, Vermessung, Photogrammetrie und Kulturtechnik 3 (2001) 146-150 Jurisch, R. and G. Kampmann (2002): Teilredundanzen und ihre natürlichen Verallgemeinerungen, Z. Vermessungswesen 127 (2002) 117-123 Jurisch, R., Kampmann, G. and B. Krause (1997): Über eine Eigenschaft der Methode der kleinsten Quadrate unter Verwendung von balancierten Beobachtungen, Z. Vermessungswesen 122 (1997) 159-166 Jurisch, R., Kampmann, G. and J. Linke (1999a): Über die Analyse von Beobachtungen in der Ausgleichungsrechnung - Teil I, Z. Vermessungswesen 124 (1999) 350-357 Jurisch, R., Kampmann, G. and J. Linke (1999b): Über die Analyse von Beobachtungen in der Ausgleichungsrechnung - Teil II, Z. Vermessungswesen 124 (1999) 350-357 Kagan, A.M., Linnik, J.V. and C.R. Rao (1965): Characterization problems of the normal law based on a property of the sample average, Sankya Ser. A 27 (1965) 405-406 Kagan, A. and L.A. Shepp (1998): Why the variance?, Statist. Prob. Letters 38 (1998) 329-333 Kagan, A. and Z. Landsman (1999): Relation between the covariance and Fisher information matrices, Statistics & Probability Letters 42 (1999) 7-13 Kahn, M., Mackisack, M.S., Osborne, M.R. and G.K. Smyth (1992): On the consistency of Prony's method and related algorithms, J. Comp. and Graph. Statist. 1 (1992) 329349 Kahng, M.W. (1995): Testing outliers in nonlinear regression, J. the Korean Stat. Soc. 24 (1995) 419-437 Kakihara, Y. (2001): The Kolmogorov isomorphism theorem and extensions to some nonstationary processes, D. N. Shanbhag and C.R. Rao, eds., Handbook of Statistic 19 (2001) 443-470 Kallianpur, G. (1963): Von Mises functionals and maximum likelihood estimation, Sankya A25 (1963) 149-158 Kallianpur, G. and Y.-T. Kim (1996): A curious example from statistical differential geometry, Theory Probab. Appl. 43 (1996) 42-62 Kallenberg, O. (1997): Foundations of modern probability, Springer Verlag, New York 1997
698
References
Kaminsky, K.S. and P.I. Nelson (1975): Best linear unbiased prediction of order statistics in location and scale families, J. Amer. Statist. Ass. 70 (1975) 145-150 Kaminsky, K.S. and L.S. Rhodin (1985): Maximum likelihood prediction, Ann. Inst. Statist. Math. 37 A (1985), 507-517 Kampmann, G. (1988): Zur kombinativen Norm-Schätzung mit Hilfe der L1-, der L2- und der Boskovic-Laplace-Methode mit den Mittlen der linearen Programmierung, PhD. Thesis, Bonn University, Bonn 1988 Kampmann, G. (1992): Zur numerischen Überführung verschiedener linearer Modelle der Ausgleichungsrechnung, Z. Vermessungswesen 117 (1992), 278-287 Kampmann, G. (1994): Robuste Deformationsanalyse mittels balancierter Ausgleichung, Allg. Vermessungsnachrichten 1 (1994) 8-17 Kampmann, G (1997): Eine Beschreibung der Geometrie von Beobachtungen in der Ausgleichungsrechnung, Z. Vermessungswesen 122 (1997) 369-377 Kampmann, G. and B. Krause (1996): Balanced observations with a straight line fit, Bolletino di Geodesia e Scienze Affini 2 (1996) 134-141 Kampmann, G. and B. Krause (1997a): Minimierung von Residuenfunktionen unter Ganzzahligkeitrestriktionen, Allg. Vermessungsnachrichtungen 8-9 (1997) 325-331 Kampmann, G. and B. Krause (1997b): A breakdown point analysis for the straight line fit based on balanced observations, Bolletino di Geodesia e Scienze Affini 3 (1997) 294-303 Kampmann, G. and B. Krause (2004): Zur statistischen Begründung des Regressionsmodells der balanzierten Ausgleichungsrechnung, Z. Vermessungswesen 129 (2004) 176-183 Kampmann, G. and B. Renner (1999): Über Modellüberführungen bei der linearen Ausgleichungsrechnung, Allg. Vermessungsnachrichten 2 (1999) 42-52 Kampmann, G. and B. Renner (2000): Numerische Beispiele zur Bearbeitung latenter Bedingungen und zur Interpretation von Mehrfachbeobachtungen in der Ausgleichungsrechnung, Z. Vermessungswesen 125 (2000) 190-197 Kanani, E. (2000): Robust estimators for geodetic transformations and GIS, Institut für Geodäsie und Photogrammetrie an der Eidgenössischen Technischen Hochschule Zürich, Mitteilungen Nr. 70, Zürich 2000 Kannan, N. and D. Kundu (1994): On modified EVLP and ML methods for estimating superimposed exponential signals, Signal Processing 39 (1994) 223-233 Kantz, H. and Scheiber, T. (1997): Nonlinear rime series analysis, Cambridge University Press, Cambridge 1997 Karatzas, I. and S.E. Shreve (1991): Brownian motion and stochastic calculus, SpringerVerlag, New York 1991 Karian, Z.A. and E.J. Dudewicz (2000): Fitting statistical distributions, CRC Press 2000 Kariya, T. (1989): Equivariant estimation in a model with an ancillary statistic, Ann. Statist 17 (1989) 920-928 Karlin, S. and W.J. Studden (1966a): Tchebychev systems, Interscience, New York (1966) Karlin, S. and W.J. Studden (1966b): Optimal experimental designs, Ann. Math. Statist. 57 (1966) 783-815 Karr, A.F. (1993): Probability, Springer Verlag, New York 1993 Kasala, S. and T. Mathew (1997): Exact confidence regions and tests in some linear functional relationships, Statistics & Probability Letters 32 (1997) 325-328 Kasietczuk, B. (2000): Geodetic network adjustment by the maximum likelihood method with application of local variance, asymmetry and excess coefficients, Anno LIX Bollettino di Geodesia e Scienze Affini 3 (2000) 221-235
References
699
Kass, R.E. and P.W. Vos (1997): Geometrical foundations of asymptotic inference, Wiley, New York 1997 Kastrup, H.A. (1962): Zur physikalischen Deutung und darstellungstheoretischen Analyse der konformen Transformationen von Raum und Zeit, Annalen der Physik 9 (1962) 388-428 Kastrup, H.A. (1966): Gauge properties of the Minkowski space, Physical Review 150 (1966) 1183-1193 Kay, S.M. (1988): Sinusoidal parameter estimation, Prentice Hall, Englewood Cliffs, N.J. 1988 Keller, J.B. (1975): Closest unitary, orthogonal and Hermitian operators to a given operator, Math. Mag. 46 (1975) 192-197 Kelly, R.J. and T. Mathew (1993): Improved estimators of variance components having smaller probability of negativity, J. Royal Stat. Soc. B 55 (1993) 897-911 Kemperman, J.H.B. (1956): Generalized tolerance limits, Ann. Math. Statist. 27 (1956) 180-186 Kendall, D.G. (1974): Pole seeking Brownian motion and bird navigation, Joy. Roy. Stat. Soc. B. 36 (1974) 365-417 Kendall, D.G. (1984): Shape manifolds, Procrustean metrics, and complex projective space, Bulletin of the London Mathematical Society 16 (1984) 81-121 Kendall, M.G. (1960): The evergreen correlation coefficient, pages 274-277, in: Essays on Honor of Harold Hotelling, ed. I. Olkin, Stanford University Press, Stanford 1960 Kenney, C.S., A.J. Laub and M.S. Reese (1998): Statistical condition estimation for linear systems, SIAM J. Scientific Computing 19 (1998) 566-584 Kent, J.T. (1976): Distributions, processes and statistics on “spheres”, PhD. Thesis, University of Cambridge Kent, J.T. (1983): Information gain and a general measure of correlation, Biometrika 70 (1983) 163-173 Kent, J.T. (1997): Consistency of Procrustes estimators, J. R. Statist. Soc. B 59 (1997) 281-290 Kent, J.T. and K.V. Mardia (1997): Consistency of Procrustes estimators, J. R. Statist. Soc. 59 (1997) 281-290 Kent, J.T. and M. Mohammadzadeh (2000): Global optimization of the generalized crossvalidation criterion, Statistics and Computing 10 (2000) 231-236 Khan, R.A. (1973): On some properties of Hammersley’s estimator of an integer mean, The Annals of Statistics 1 (1973) 756-762 Khan, R.A. (1978): A note on the admissibility of Hammersley’s estimator of an integer mean, The Canadian J.Statistics 6 (1978) 113-119 Khan, R.A. (1998): Fixed-width confidence sequences for the normal mean and the binomial probability, Sequential Analysis 17 (1998) 205-217 Khan, R.A. (2000): A note on Hammersley's estimator of an integer mean, J. Statist. Planning and Inference 88 (2000) 37-45 Khatri, C.G. and C.R. Rao (1968): Solutions to some fundamental equations and their applications to characterization of probability distributions, Sankya, Series A, 30 (1968) 167-180 Khatri, C.G. and S.K. Mitra (1976): Hermitian and nonnegative definite solutions of linear matrix equations, SIAM J. Appl. Math. 31 (1976) 597-585 Khuri, A.I. (1999): A necessary condition for a quadratic form to have a chi-squared distribution: an accessible proof, Int. J. Math. Educ. Sci. Technol. 30 (1999) 335-339 Khuri, A.I., Mathew, T. and B.K. Sinha (1998): Statistical tests for mixed linear models, Wiley, New York 1998
700
References
Kidd, M. and N.F. Laubscher (1995): Robust confidence intervals for scale and its application to the Rayleigh distribution, South African Statist. J. 29 (1995) 199-217 Kiefer, J. (1974): General equivalence theory for optimal designs (approximate theory), Ann. Stat. 2 (1974) 849-879 Kiefer, J.C. and J. Wolfowitz (1959): Optimum design in regression problem, Ann. Math. Statist. 30 (1959) 271-294 Kilmer, M.E. and D.P. O’Leary (2001): Choosing regularization parameters in iterative methods for ILL-Posed Problems, SIAM J. Matrix Anal. Appl. 22 (2001) 1204-1221 Kim, C. and B.E. Storer (1996): Reference values for Cook’s distance, Commun. Statist. – Simula. 25 (1996) 691-708 King, J.T. and D. Chillingworth (1979): Approximation of generalized inverses by iterated regularization, Numer. Funct. Anal. Optim. 1 (1979) 499-513 King, M.L. (1980): Robust tests for spherical symmetry and their application to least squares regression, Ann. Statist. 8 (1980) 1265-1271 Kirkwood, B.H., Royer, J.Y., Chang,T.C. and R.G.Gordon (1999): Statistical tools for estimating and combining finite rotations and their uncertainties, Geoph. J. Int. 137(1999) 408-428 Kirsch, A. (1996): An introduction to the mathematical theory of inverse problems, Springer Verlag, New York 1996 Kitagawa, G. and W. Gersch (1996): Smoothness priors analysis of time series, Springer Verlag, New York 1996 Klebanov, L.B. (1976): A general definition of unbiasedness, Theory of Probability and Appl. 21 (1976) 571-585 Klees, R., Ditmar, P. and P. Broersen (2003): How to handle colored observation noise in large least-squares problems, J. Geodesy 76 (2003) 629-640 Kleffe, J. (1976): A note on MINQUE for normal models, Math. Operationsforschg. Statist. 7 (1976) 707-714 Kleffe, J. and R. Pincus (1974): Bayes and the best quadratic unbiased estimators for variance components and heteroscedastic variances in linear models, Math. Operationsforschg. Statistik 5 (1974) 147-159 Kleffe, J. and J.N.K. Rao (1986): The existence of asymptotically unbiased nonnegative quadratic estimates of variance components in ANOVA models, J. American Statistical Assoc. 81(1986) 692-698 Kleusberg, A. and E.W. Grafarend (1981): Expectation and variance component estimation of multivariate gyrotheodolite observation II, Allg. Vermessungsnachrichten 88 (1981) 104-108 Klonecki, W. and S. Zontek (1996): Improved estimators for simultaneous estimation of variance components, Statistics & Probability Letters 29 (1996) 33-43 Knautz, H. (1996): Linear plus quadratic (LPQ) quasiminimax estimation in the linear regression model, Acta Applicandae Mathematicae 43 (1996) 97-111 Knautz, H. (1999): Nonlinear unbiased estimation in the linear regression model with nonnormal disturbances, J. Statistical Planning and Inference 81 (1999) 293-309 Knickmeyer, E.H. (1984): Eine approximative Lösung der allgemeinen linearen Geodätischen Randwertaufgabe durch Reihenentwicklungen nach Kugelfunktionen, Deutsche Geodätische Kommission bei der Bayerischen Akademie der Wissenschaften, München 1984 Knobloch, E. (1992): Historical aspects of the foundations of error theory, in: Echeveria, J., Ibarra, J. and T. Mormann (eds): The space of mathematics – philosophical, epistemological and historical explorations, Walter de Gruyter 1992 Kobilinsky, A. (1990): Complex linear models and cyclic designs, Linear Algebra and its Application 127 (1990) 227-282
References
701
Koch, G.G. (1968): Some further remarks on “A general approach to the estimation of variance components“, Technometrics 10 (1968) 551-558 Koch, K.R. (1979): Parameter estimation in the Gauß-Helmert model, Boll. Geod. Sci. Affini 38 (1979) 553-563 Koch, K.R. (1982): S-transformations and projections for obtaining estimable parameters, in: Blotwijk, M.J. et al. (eds.): 40 Years of Thought, Anniversary volume for Prof. Baarda’s 65th Birthday Vol. 1. pp. 136-144, Technische Hogeschool Delft, Delft 1982 Koch, K.R. (1987): Parameterschaetzung und Hypothesentests in linearen Modellen, 2nd ed., Duemmler, Bonn 1987 Koch, K.R. (1988): Parameter estimation and hypothesis testing in linear models, Springer-Verlag, Berlin – Heidelberg – New York, 1988 Koch, K.R. (1999): Parameter estimation and hypothesis testing in linear models, 2nd ed., Springer Verlag, Berlin 1999 Koch, K.R. and J. Kusche (2002): Regularization of geopotential determination from satellite data by variance components, J. Geodesy 76 (2002) 259-268 Koch, K.R. and Y. Yang (1998): Konfidenzbereiche und Hypothesentests für robuste Parameterschätzungen, Z. Vermessungswesen 123 (1998) 20-26 König, D. and V. Schmidt (1992): Zufällige Punktprozesse, Teubner Skripten zur Mathematischen Stochastik, Stuttgart 1992 Koenker, R. and G. Basset (1978): Regression quantiles, Econometrica 46 (1978) 33-50 Kokoszka, P. and T. Mikosch (2000): The periodogram at the Fourier frequencies, Stochastic Processes and their Applications 86 (2000) 49-79 Kollo, T. and H. Neudecker (1993): Asymptotics of eigenvalues and unit-length eigenvectors of sample variance and correlation matrices, J. Multivariate Anal. 47 (91993) 283-300 Kollo, T. and D. von Rosen (1996): Formal density expansions via multivariate mixtures, in: Multidimensional statistical analysis and theory of random matrices, pp. 129-138, Proceedings of the Sixth Lukacs Symposium, eds. Gupta, A.K. and V.L.Girko, VSP, Utrecht 1996 Koopmans, T.C. and O. Reiersol (1950): The identification of structural characteristics, Ann. Math. Statistics 21 (1950) 165-181 Kosko, B. (1992): Networks and fuzzy systems, Prentice-Hall, Englewood Cliffs 1992 Kotecky, R. and J. Niederle (1975): Conformally covariant field equations: First order equations with non-vanishing mass, Czech. J. Phys. B25 (1975) 123-149 Kotlarski, I. (1967): On characterizing the gamma and the normal distribution, Pacific J. Mathematics 20 (1967) 69-76 Kotsakis, C. and M.G. Sideris (2001): A modified Wiener-type filter for geodetic estimation problems with non-stationary noise, J. Geodesy 75 (2001) 647-660 Kott, P.S. (1998): A model-based evaluation of several well-known variance estimators for the combined ratio estimator, Statistica Sinica 8 (1998) 1165-1173 Kotz, S., Kozubowski, T.J. and K. Podgórki (2001): The Laplace distribution and generalizations, Birkhäuser 1999 Koukouvinos, C. and J. Seberry (1996): New weighing matrices, Sankhya: The Indian J. Statistics B58 (1996) 221-230 Kowalewski, G. (1995): Robust estimators in regression, Statistics in Transition 2 (1995) 123-135 Krämer, W., Bartels, R. and D.G. Fiebig (1996): Another twist on the equality of OLS and GLS, Statistical Papers 37 (1996) 277-281 Krantz, S.G. and H.R. Parks (2002): The implicit function theorem – history, theory and applications, Birkhäuser, Boston 2002
702
References
Krarup, T., Juhl, J. and K. Kubik (1980): Götterdämmerung over least squares adjustment, in: Proc. 14th Congress of the International Society of Photogrammetry, vol. B3, Hamburg 1980, 369-378 Krengel, U. (1985): Ergodic theorems, de Gruyter, Berlin-New York 1985 Kres, H. (1983): Statistical tables for multivariate analysis, Springer, Berlin-HeidelbergNew York 1985 Krishnakumar, J. (1996): Towards a general robust estimation approach for generalised regression models, Physics Abstract, Science Abstract Series A, INSPEC 1996 Kronecker, L. (1903): Vorlesungen über die Theorie der Determinanten, Erster Band, Bearbeitet und fortgeführt von K.Hensch, B.G. Teubner, Leipzig 1903 Krumbein, W.C. (1939): Preferred orientation of pebbles in sedimentary deposits, J. Geol. 47 (1939) 673-706 Krumm, F. (1987): Geodätische Netze im Kontinuum: Inversionsfreie Ausgleichung und Konstruktion von Kriterionmatrizen, Deutsche Geodätische Kommission, Bayerische Akademie der Wissenschaften, PhD. Thesis, Report C334, München 1987 Krumm, F. and F. Okeke (1998): Graph, graph spectra, and partitioning algorithms in a geodetic network structural analysis and adjustment, Bolletino di Geodesia e Science Affini 57 (1998) 1-24 Krumm, F., Grafarend, E.W. and B. Schaffrin (1986): Continuous networks, Fourier analysis and criterion matrices, Manuscripta Geodaetica 11 (1986) 57-78 Kruskal, W. (1946): Helmert’s distribution, American Math. Monthly 53 (1946) 435-438 Kruskal, W. (1968): When are Gauß-Markov and least squares estimators identical? A coordinate-free approach, Ann. Statistics 39 (1968) 70-75 Kryanev, A.V. (1974): An iterative method for solving incorrectly posed problem, USSR. Comp. Math. Math. Phys. 14 (1974) 24-33 Krzanowski, W.J. (1995): Recent advances in descriptive multivariate analysis, Clarendon Press, Oxford 1995 Krzanowski, W.J. and F.H.C. Marriott (1994): Multivariate analysis: part I - distribution, ordination and inference, Arnold Publ., London 1994 Krzanowski, W.J. and F.H.C. Marriott (1995): Multivariate analysis: part II - classification, covariance structures and repeated measurements, Arnold Publ., London 1995 Kshirsagar, A.M. (1983): A course in linear models, Marcel Dekker Inc, New York – Basel 1983 Kuang, S.L. (1991): Optimization and design of deformations monitoring schemes, PhD dissertation, Department of Surveying Engineering, University of New Brunswick, Tech. Rep. 91, Fredericton 1991 Kuang, S. (1996): Geodetic network analysis and optimal design, Ann Arbor Press, Chelsea, Michigan 1996 Kubácek, L. (1996a): Linear model with inaccurate variance components, Applications of Mathematics 41 (1996) 433-445 Kubácek, L. (1996b): Nonlinear error propagation law, Applications of Mathematics 41 (1996) 329-345 Kubik, K. (1982): Kleinste Quadrate und andere Ausgleichsverfahren, Vermessung Photogrammetrie Kulturtechnik 80 (1982) 369-371 Kubik, K., and Y. Wang (1991): Comparison of different principles for outlier detection, Australian J. Geodesy, Photogrammetry and Surveying 54 (1991) 67-80 Kuechler, U. and M. Soerensen (1997): Exponential families of stochastic processes, Springer Verlag, Berlin 1997 Kullback, S. (1934): An application of characteristic functions to the distribution problem of statistics, Annals Math. Statistics 4 (1934) 263-305
References
703
Kumaresan, R. (1982): Estimating the parameters of exponentially damped or undamped sinusoidal signals in noise, PhD thesis, The University of Rhode Island, Rhode Island 1982 Kundu, D. (1993a): Estimating the parameters of undamped exponential signals, Technometrics 35 (1993) 215-218 Kundu, D. (1993b): Asymptotic theory of least squares estimator of a particular nonlinear regression model, Statistics and Probability Letters 18 (1993) 13-17 Kundu, D. (1994a): Estimating the parameters of complex valued exponential signals, Computational Statistics and Data Analysis 18 (1994) 525-534 Kundu, D. (1994b): A modified Prony algorithm for sum of damped or undamped exponential signals, Sankhya A 56 (1994) 525-544 Kundu, D. (1997): Asymptotic theory of the least squares estimators of sinusoidal signal, Statistics 30 (1997) 221-238 Kundu, D. and R.D. Gupta (1998): Asymptotic properties of the least squares estimators of a two dimensional model, Metrika 48 (1998) 83-97 Kundu, D. and A. Mitra (1998a): Fitting a sum of exponentials to equispaced data, The Indian J.Statistics 60 (1998) 448-463 Kundu, D. and A. Mitra (1998b): Different methods of estimating sinusoidal frequencies: a numerical comparison, J. Statist. Comput. Simul. 62 (1998) 9-27 Kunst, R.M. (1997): Fourth order moments of augmented ARCH processes, Commun. Statist. Theor. Meth. 26 (1997) 1425-1441 Kunz, E. (1985): Introduction to commutative algebra and algebraic geometry, Birkhäuser Boston – Basel – Berlin 1985 Kuo, W., Prasad, V.R., Tillman, F.A. and C.-L. Hwang (2001): Optimal reliability design, Cambridge University Press, Cambridge 2001 Kuo, Y. (1976): An extended Kronecker product of matrices, J. Math. Anal. Appl. 56 (1976) 346-350 Kupper, L.L (1972): Fourier series and spherical harmonic regression, J. Royal Statist. Soc. C21 (1972) 121-130 Kupper, L.L (1973): Minimax designs of Fourier series and spherical harmonic regressions: a characterization of rota table arrangements, J. Royal Statist. Soc. B35 (1973) 493-500 Kurata, H. (1998): A generalization of Rao's covariance structure with applications to several linear models, J. Multivariate Analysis 67 (1998) 297-305 Kurz, L. and M.H. Benteftifa (1997): Analysis of variance in statistical image processing, Cambridge University Press, Cambridge 1997 Kusche, J. (2001): Implementation of multigrid solvers for satellite gravity anomaly recovery, J. Geodesy 74 (2001) 773-782 Kusche, J. (2002): Inverse Probleme bei der Gravitationsfeldbestimmung mittels SSTund SGG- Satellitenmissionen , Deutsche Geodätische Kommission, Report C548, München 2002 Kusche, J. (2003): A Monte-Carlo technique for weight estimation in satellite geodesy, J. Geodesy 76 (2003) 641-652 Kusche, J. and R. Klees (2002): Regularization of gravity field estimation from satellite gravity gradients, J. Geodesy 76 (2002) 359-368 Kushner, H. (1967a): Dynamical equations for optimal nonlinear filtering, J. Diff. Eq. 3 (1967) 179-190 Kushner, H. (1967b): Approximations to optimal nonlinear filters, IEEE Trans. Auto. Contr. AC-12 (1967) 546-556
704
References
Kutoyants, Y.A. (1984): Parameter estimation for stochastic processes, Heldermann, Berlin 1984 Kutterer, H. (1994): Intervallmathematische Behandlung endlicher Unschärfen linearer Ausgleichsmodelle, PhD Thesis, Deutsche Geodätische Kommission DGK C423, München 1994 Kutterer, H. and S. Schoen (1999): Statistische Analyse quadratischer Formen - der Determinantenansatz, Allg. Vermessungsnachrichten 10 (1999) 322-330 Kutterer, H. (1999): On the sensitivity of the results of least-squares adjustments concerning the stochastic model, J. Geodesy 73 (1999) 350-361 Kutterer, H. (2002): Zum Umgang mit Ungewissheit in der Geodäsie – Bausteine für eine neue Fehlertheorie - , Deutsche Geodätische Kommission, Report C553, München 2002 Laeuter, H. (1970): Optimale Vorhersage und Schätzung in regulären und singulären Regressionsmodellen, Math. Operationsforschg. Statistik 1 (1970) 229-243 Laeuter, H. (1971): Vorhersage bei stochastischen Prozessen mit linearem Regressionsanteil, Math. Operationsforschg. Statistik 2 (1971) 69-85, 147-166 Laha, R.G. (1956): On the stochastic independence of two second-degree polynomial statistics in normally distributed variates, The Annals of Mathematical Statistics 27 (1956) 790-796 Laha, R.G. and E. Lukacs (1960): On a problem connected with quadratic regression, Biometrika 47 (1960) 335-343 Lai, T.L. and C.Z. Wie (1982): Least squares estimates in stochastic regression model with applications to stochastic regression in linear dynamic systems, Anals of Statistics 10 (1982) 154-166 Laird, N.M. and J.H. Ware (1982): Random-effects models for longitudinal data, Biometrics 38 (1982) 963-974 LaMotte, L.R. (1973): Quadratic estimation of variance components, Biometrics 29 (1973) 311-330 LaMotte, L.R. (1973): On non-negative quadratic unbiased estimation of variance components, J. American Statist. Assoc. 68 (1973) 728-730 LaMotte, L.R. (1976): Invariant quadratic estimators in the random, one-way ANOVA model, Biometrics 32 (1976) 793-804 Lamotte, L.R., McWhorter, A. and R.A. Prasad (1988): Confidence intervals and tests on the variance ratio in random models with two variance components, Com. Stat. – Theory Meth. 17 (1988) 1135-1164 Lamperti, J. (1966): Probability, Benjamin Publ. 1966 Lancaster, H.O. (1965): The helmert matrices, American Math. Monthly 72 (1965) 4-11 Lancaster, H.O. (1966): Forerunners of the Pearson Chi2 , Australian J. Statistics 8 (1966) 117-126 Langevin, P. (1905): Magnetisme et theorie des electrons, Ann. de Chim. et de Phys. 5 (1905) 70-127 Lanzinger, H. and U. Stadtmüller (2000): Weighted sums for i.i.d. random variables with relatively thin tails, Bernoulli 6 (2000) 45-61 Lardy, L.J. (1975): A series representation for the generalized inverse of a closed linear operator, Atti Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. 58 (1975) 152-157 Lauritzen, S. (1973): The probabilistic background of some statistical methods in Physical Geodesy, Danish Geodetic Institute, Maddelelse 48, Copenhagen 1973 Lauritzen, S. (1974): Sufficiency, prediction and extreme models, Scand. J. Statist. 1 (1974) 128-134
References
705
Lawson, C.L. and R.J. Hanson (1995): Solving least squares problems, SIAM, Philadelphia 1995 (reprinting with corrections and a new appendix of a 1974 Prentice Hall text) Lawson, C.L. and R.J. Hanson (1974): Solving least squares problems, Prentice-Hal, Englewod Cliffs 1974 Laycock, P.J. (1975): Optimal design: regression models for directions, Biometrika 62 (1975) 305-311 LeCam, L. (1960): Locally asymptotically normal families of distributions, University of California Publication 3, Los Angeles 1960 LeCam, L. (1970): On the assumptions used to prove asymptotic normality of maximum likelihood estimators, Ann. Math. Statistics 41 (1970) 802-828 LeCam, L. (1986): Proceedings of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, Chapman and Hall, Boca Raton 1986 Lee, J.C. and S. Geisser (1996): On the Prediction of Growth Curves, in: Lee, C., Johnson, W.O. and A. Zellner (eds.): Modelling and Prediction Honoring Seymour Geisser, pp. 77-103, Springer Verlag, New York 1996 Lee, J.C., Johnson, W.O. and A. Zellner (1996): Modeling and prediction, Springer Verlag, New York 1996 Lee, M. (1996): Methods of moments and semiparametric econometrics for limited dependent variable models, Springer Verlag, New York 1996 Lee, P. (1992): Bayesian statistics, Wiley, New York 1992 Lee, S.L. (1995): A practical upper bound for departure from normality, Siam J. Matrix Anal. Appl.16 (1995) 462-468 Lee, S.L. (1996): Best available bounds for departure from normality, Siam J. Matrix Anal. Appl.17 (1996) 984-991 Lee, S.Y. and J.-Q. Shi (1998): Analysis of covariance structures with independent and non-identically distributed observations, Statistica Sinica 8 (1998) 543-557 Lee, Y. and J.A. Nelder (1996): Hierarchical generalized linear models, J. Roy. Statist. Soc., Series B, 58 (1996) 619-678 Lehmann, E.L. (1959): Testing statistical hypotheses, J. Wiley, New York 1959 Lehmann, E.L. and H. Scheffé (1950): Completeness, similar regions and unbiased estimation, Part I, Sankya 10 (1950) 305-340 Lehmann, E.L. and G. Casella (1998): Theory of point estimation, Springer Verlag, New York 1998 Lenk, U. (2001a): Schnellere Multiplikation großer Matrizen durch Verringerung der Speicherzugriffe und ihr Einsatz in der Ausgleichungsrechnung, Z. Vermessungswesen 4 (2001) 201-207 Lenk, U. (2001b): 2.5D-GIS und Geobasisdaten – Integration von Höheninformation und Digitalen Situationsmodellen, Wiss. Arbeiten der Fachrichtung Vermessungswesen der Uni. Hannover, Hannnover 2001 Lenstra, A.K., Lenstra, H.W. and L. Lovacz (1982): Factoring polynomials with rational coefficients, Math. Ann. 261 (1982) 515-534 Lenstra, H.W. (1983): Integer programming with a fixed number of variables, Math. Operations Res. 8 (1983) 538-548 Lenth, R.V. (1981): Robust measures of location for directional data, Technometrics 23 (1981) 77-81 Lenth, R.V. (1981): Robust measures of location for directional data, Technometrics 23 (1981) 77-81
706
References
Lentner, M.N. (1969): Generalized least-squares estimation of a subvector of parameters in randomized fractional factorial experiments, Ann. Math. Statist. 40 (1969) 13441352 Lenzmann, L. (2003): Strenge Auswertung des nichtlnearen Gauß-Helmert-Modells, Allg. Vermessungsnachrichten 2 (2004) 68-73 Lesaffre, E. and G. Verbeke (1998): Local influence in linear mixed models, Biometrics 54 (1998) 570 – 582 Letac, G. and M. Mora (1990): Natural real exponential families with cubic variance functions, Ann. Statist. 18 (1990) 1-37 Lether, F.G. and P.R. Wentson (1995): Minimax approximations to the zeros of P n (x) and Gauss-Legendre quadrature, J. Comput. Appl. Math. 59 (1995) 245-252 Levenberg, K. (1944): A method for the solution of certain non-linear problems in leastsquares, Quart. Appl. Math. Vol. 2 (1944) 164-168 Levin, J. (1976): A parametric algorithm for drawing pictures of solid objects composed of quadratic surfaces, Communications of the ACM 19 (1976) 555-563 Lewis, R.M. and V. Torczon (2000): Pattern search methods for linearly constrained minimization, SIAM J. Optim. 10 (2000) 917-941 Lewis, T.O. and T.G. Newman (1968): Pseudoinverses of positive semidefinite matrices, SIAM J. Appl. Math. 16 (1968) 701-703 Li, B.L. and C. Loehle (1995): Wavelet analysis of multiscale permeabilities in the subsurface, Geoph. Res. Lett. 22 (1995) 3123-3126 Li, C.K. and R. Mathias (2000): Extremal characterizations of the Schur complement and resulting inequalities, SIAM Review 42 (2000) 233-246 Li, T. (2000): Estimation of nonlinear errors-in variables models: a simulated minimum distance estimator, Statistics & Probability Letters 47 (2000) 243-248 Liang, K. and K. Ryu (1996): Selecting the form of combining regressions based on recursive prediction criteria, in: Lee, C., Johnson, W.O. and A. Zellner (eds.): Modelling and prediction honoring Seymour Geisser, Springer Verlag, New York 1996, 122-135 Liang, K.Y. and S.L. Zeger (1986): Longitudinal data analysis using generalized linear models, Biometrika 73 (1986) 13-22 Liang, K.Y. and S.L. Zeger (1995): Inference based on estimating functions in the presence of nuisance parameters, Statist. Sci. 10 (1995) 158-199 Liesen, J., Rozlozník, M. and Z. Strakos (2002): Least squares residuals and minimal residual methods, SIAM J. Sci. Comput. 23 (2002) 1503-1525 Lindley, D.V. and A.F.M. Smith (1972): Bayes estimates for the linear model, J. Roy. Stat. Soc. 34 (1972) 1-41 Lilliefors, H.W. (1967): On the Kolmogorov-Smirnov test for normality with mean and variance unknown, J. American Statistical Assoc. 62 (1967) 399-402 Lin, A. and S.-P. Han (2004): A class of methods for projection on the intersection of several ellipsoids, SIAM J. Optim 15 (2004) 129-138 Lin, X. and N.E. Breslow (1996): Bias correction in generalized linear mixed models with multiple components of dispersion, J. American Statistical Assoc. 91 (1996) 10071016 Lin, X.H. (1997): Variance component testing in generalised linear models with random effects, Biometrika 84 (1997) 309-326 Lindsey, J.K. (1997): Applying generalized linear models, Springer Verlag, New York 1997 Lindsey, J.K. (1999): Models for repeated measurements, 2nd edition, Oxford University Press, Oxford 1999
References
707
Lingjaerde, O. and N. Christophersen (2000): Shrinkage structure of partial least squares, Board of the Foundation of the Scandivnavian J.Statistics 27 (2000) 459-473 Linke, J., Jurisch, R., Kampmann, G. und H. Runne (2000): Numerisches Beispiel zur Strukturanalyse von geodätischen und mechanischen Netzen mit latenten Restriktionen, Allg. Vermessungsnachrichten 107 (2000) 364-368 Linnik, J.V. and I.V. Ostrovskii (1977): Decomposition of random variables and vectors, Transl. Math. Monographs Vol. 48, American Mathematical Society, Providence 1977 Liptser, R.S. and A.N. Shiryayev (1977): Statistics of random processes, Vol. 1, Springer Verlag, New York 1977 Liski, E.P. and T. Nummi (1996): Prediction in repeated-measures models with engineering applications, Technometrics 38 (1996) 25-36 Liski, E.P. and S. Puntanen (1989): A further note on a theorem on the difference of the generalized inverses of two nonnegative definite matrices, Commun. Statist.-Theory Meth. 18 (1989) 1747-1751 Liski, E.P., Luoma, A. and A. Zaigraev (1999): Distance optimality design criterion in linear models, Metrika 49 (1999) 193-211 Liski, E.P., Luoma, A., Mandal, N.K. and B.K. Sinha (1998): Pitman nearness, distance criterion and optimal regression designs, Calcutta Statistical Ass. Bull. 48 (1998) 191192 Liski, E.P., Mandal, N.K., Shah, K.R. and B.K. Sinha (2002): Topics in optimal design, Springer Verlag, New York 2002 Liu, J. (2000): MSEM dominance of estimators in two seemingly unrelated regressions, J. Statistical Planning and Inference 88 (2000) 255-266 Liu, S. (2000): Efficiency comparisons between the OLSE and the BLUE in a singular linear model, J. Statistical Planning and Inference 84 (2000) 191-200 Liu, X.-W. and Y.-X. Yuan (2000): A robust algorithm for optimization with general equality and inequality constraints, SIAM J. Sci. Comput. 22 (2000) 517-534 Liu, W., Yao, Y. and C. Shi (2001): Theoretic research on robustified least squares estimator based on equivalent variance-covariance, Geo-spatial Information Science 4 (2001) 1-8 Liu, W., Xia, Z. and M. Deng (2001): Modelling fuzzy geographic objects within fuzzy fields, Geo-spatial Information Science 4 (2001) 37-42 Ljung, L. (1979): Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems, IEEE Trans. Auto. Contr. AC-24 (1979) 36-50 Lloyd, E.H. (1952): Least squares estimation of location and scale parameters using order statistics, Biometrika 39 (1952) 88-95 Lohse, P. (1994): Ausgleichungsrechnung in nichtlinearen Modellen, Deutsche Geod. Kommission C 429, München 1994 Long, J.S. and L.H. Ervin (2000): Using heteroscedasticy consistent standard errors in the linear regression model, The American Statistician 54 (2000) 217-224 Longford, N.T. (1993): Random coefficient models, Clarendon Press, Oxford 1993 Longford, N. (1995): Random coefficient models, Oxford University Press, 1995 Longley, J.W. and R.D. Longley (1997): Accuracy of Gram-Schmidt orthogonalization and householder transformation for the solution of linear least squares problems, Numerical Linear Algebra with Applications 4 (1997) 295-303 López-Blázquez, F. (2000): Unbiased estimation in the non-central Chi-Square distribution, J. Multivariate Analysis 75 (2000) 1-12 Lord, R.D. (1948): A problem with random vectors, Phil. Mag. 39 (1948) 66-71
708
References
Lord, R.D. (1995): The use of the Hankel transform in statistics, I. General theory and examples, Biometrika 41 (1954) 44-55 Lorentz, G.G. (1966): Metric entropy and approximation, Bull. American Math. Soc. 72 (1966) 903-937 Ludlow, J. and W. Enders (2000): Estimating non-linear ARMA models using Fourier coefficients, International J.Forecasting 16 (2000) 333-347 Lütkepol, H. (1996): Handbook of matrices, J. Wiley, Chichester U.K. 1996 Lund, U. (1999): Least circular distance regression for directional data, J. Applied Statistics 26 (1999) 723-733 Macinnes, C.S. (1999): The solution to a structured matrix approximation problem using Grassmann coordinates, SIAM J. Matrix Analysis Appl. 21 (1999) 446-453 Madansky, A. (1959): The fitting of straight lines when both variables are subject to error, J. American Statistical Ass. 54 (1959) 173-205 Madansky, A. (1962): More on lenght of confidence intervals, J. Amer. Statist. Assoc. 57 (1962) 586-589 Maejima, M. (1978): Some Lp versions for the central limit theorem, Ann. Probability 6 (1978) 341-344 Maekkinen, J. (2002): A bound for the Euclidean norm of the difference between the best linear unbiased estimator and a linear unbiased estimator, J. Geodesy 76 (2002) 317322 Maess, G. (1988): Vorlesungen über numerische Mathematik II. Analysis, Birkhäuser Verlag, Basel Boston 1988 Magder, L.S. and S.L. Zeger (1996): A smooth nonparametric estimate of a mixing distribution using mixtures of Gaussians, J. Amer. Statist. Assoc. 91 (1996) 1141-1151 Magee, L. (1998): Improving survey-weighted least squares regression, J. Roy Statist. Soc. B 60 (1998) 115-126 Magness, T.A. and J.B. McGuire (1962): Comparison of least-squares and minimum variance estimates of regression parameters, Ann. Math. Statist. 33 (1962) 462-470 Magnus, J.R. and H. Neudecker (1988): Matrix differential calculus with applications in statistics and econometrics, Wiley, Chichester 1988 Mahalanabis, A. and M. Farooq (1971): A second-order method for state estimation of nonlinear dynamical systems, Int. J. Contr. 14 (1971) 631-639 Mahanobis, P.C., Bose, R.C. and S.N. Roy (1937): Normalisation of statistical variates and the use of rectangular coordinates in the theory of sampling distributions, Sankhya 3 (1937) 1-40 Mallet, A. (1986): A maximum likelihood estimation method for random coefficient regression models, Biometrika 73 (1986) 645-656 Malliavin, P. (1997): Stochastic analysis, Springer Verlag, New York 1997 Mallows, C.L. (1961): Latent vectors of random symmetric matrices, Biometrika 48 (1961) 133-149 Malyutov, M.B. and R.S. Protassov (1999): Functional approach to the asymptotic normality of the non-linear least squares estimator, Statistics & Probability Letters 44 (1999) 409-416 Mamontov, Y. and M. Willander (2001): High-dimensional nonlinear diffusion stochastic processes, World Scientific, Singapore 2001 Mandel, J. (1994): The analysis of two-way layouts, Chapman and Hall, Boca Raton 1994 Mangasarian, O.L. and D.R. Musicant (2000): Robust linear and support vector regression, IEEE Transactions on Pattern analysis and Maschen Intelligence 22 (2000) 950955
References
709
Mangoubi, R.S. (1998): Robust estimation and failure detection, Springer Verlag, BerlinHeidelberg New York 1998 Manly, B.F.J. (1976): Exponential data transformations, The Statistician 25 (1976) 37-42 Mardia, K.V. (1962): Multivariate Pareto distributions, Ann. Math. Statistics 33 (1962) 1008-1015 Mardia, K.V. (1970): Measures of multivariate skewness and kurtosis with applications, Biometrika 57 (1970) 519-530 Mardia, K.V. (1972): Statistics of directional data, Academic Press, New York 1972 Mardia, K.V. (1975a): Characterization of directional distributions, Statistical Distributions, Scientific Work 3 (1975), G.P. Patil et al (Eds.), 365-385 Mardia, K.V. (1975b): Statistics of directional data, J. Royal Statistical society, Series B, 37 (1975) 349-393 Mardia, K.V. (1976): Linear-circular correlation coefficients and rhythmometry, Biometrika 63 (1976) 403-405 Mardia, K.V. (1988): Directional data analysis: an overview, J. Applied Statistics 2 (1988) 115-122 Mardia, K.V. and M.L. Puri (1978): A robust spherical correlation coefficient against scale, Biometrika 65 (1978) 391-396 Mardia, K.V., Kent J. and J.M. Bibby: Multivariate analysis, Academic Press, London 1979 Mardia, K.V., Southworth, H.R. and C.C. Taylor (1999): On bias in maximum likelihood estimators, J. Statist. Planning and Inference 76 (1999) 31-39 Mardia, K.V. and P.E. Jupp (1999): Directional statistics, J. Wiley, England 1999 Marinkovic, P, Grafarend, E.W. and T. Reubelt (2003): Space gravity spectroscopy: the benefits of Taylor-Karman structured criterion matrices, Advances in Geosciences 1 (2003) 113-120 Mariwalla, K.H. (1971): Coordinate transformations that form groups in the large, in: De Sitter and Conformal Groups and their Applications, A.O. Barut and W.E. Brittin (Hrsg.), vol. 13, pages 177-191, Colorado Associated University Press, Boulder 1971 Markatou, M. and X. He (1994): Bounded influence and high breakdown point testing procedures in linear models, J. Am. Statist. Assn. 89, 543-49, 1994 Markiewicz, A. (1996): Characterization of general ridge estimators, Statistics & Probability Letters 27 (1996) 145-148 Markov, A.A. (1912): Wahrscheinlichkeitsrechnung, 2nd edition, Teubner, Leipzig 1912 Maroãeviü, T. and D. Jukiü (1997): Least orthogonal absolute deviations problem for exponential function, Student 2 (1997) 131-138 Marquardt, D.W. (1963): An algorithm for least-squares estimation of nonlinear parameters, J. Soc. Indust. Appl. Math. 11 (1963) 431-441 Marquardt, D.W. (1970): Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation, Technometrics 12 (1970) 591-612 Marriott, J. and P. Newbold (1998): Bayesian comparison of ARIMA and stationary ARMA models, J. Statistical Review 66 (1998) 323-336 Marsaglia, G. and G.P.H. Styan (1972): When does rank (A+B) = rank A + rank B?, Canad. Math. Bull. 15 (1972) 451-452 Marshall, J. (2002): L1-norm pre-analysis measures for geodetic networks, J. Geodesy 76 (2002) 334-344 Martinek, Z. (2002a): Spherical harmonic analysis of regularly distributed data on a sphere with a uniform or a non-uniform distribution of data uncertainties, or, shortly, what I know about: Scala surface spherical harmonics, Geodätisches Oberseminar Stuttgart 2002
710
References
Martinek, Z. (2002b): Lecture Notes 2002. Scalar surface spherical harmonics, Geo Forschungs Zentrum Potsdam 2002 Martinez, W.L. and E.J. Wegman (2000): An alternative criterion useful for finding exact E-optimal designs, Statistic & Probability Letters 47 (2000) 325-328 Maruyama, Y. (1998): Minimax estimators of a normal variance, Metrika 48 (1998) 209214 Masry, E. (1997): Additive nonlinear arx time series and projection estimates, Econometric Theory 13 (1997) 214-252 Mastronardi, N., Lemmerling, P. and S. van Huffel (2000): Fast structured total least squares algorithm for solving the basic deconvolution problem, Siam J. Matrix Anal. Appl. 22 (2000) 533-553 Matérn, B. (1989): Precision of area estimation: a numerical study, J. Microscopy 153 (1989) 269-284 Mathai, A.M. (1997): Jacobians of matrix transformations and functions of matrix arguments, World Scientific, Singapore 1997 Mathar, R. (1997): Multidimensionale Skalierung: mathematische Grundlagen und algorithmische Aspekte, Teubner Stuttgart 1997 Mathew, T. (1989): Optimum invariant tests in mixed linear models with two variance components, Statistical Data Analysis and Inference, Y. Dodge (ed.), North-Holland (1989) 381-388 Mathew, T. (1997): Wishart and Chi-Square Distributions Associated with Matrix Quadratic Forms, J. Multivariate Analysis 61 (1997) 129-143 Mathew, T. and B.K. Sinha (1988): Optimum tests in unbalanced two-way models without interaction, Ann. Statist. 16 (1988) 1727-1740 Mathew, T. and S. Kasala (1994): An exact confidence region in multivariate calibration, The Annals of Statistics 22 (1994) 94-105 Mathew, T. and W. Zha (1996): Conservative confidence regions in multivariate calibration, The Annals of Statistics 24 (1996) 707-725 Mathew, T. and K. Nordstroem (1997): An inequality for a measure of deviation in linear models, The American Statistician 51 (1997) 344-349 Mathew, T. and W. Zha (1997): Multiple use confidence regions in multivariate calibration, J. American Statist. Assoc. 92 (1997) 1141-1150 Mathew, T. and W. Zha (1998): Some single use confidence regions in a multivariate calibration problem, Applied Statist. Science III (1998) 351-363 Mathew, T., Sharma, M.K. and K. Nordström (1998): Tolerance regions and multiple-use confidence regions in multivariate calibration, The Annals of Statistics 26 (1998) 1989-2013 Mathias, R. and G.W. Stewart (1993): A block QR algorithm and the singular value decomposition, Linear Algebra Appl. 182 (1993) 91-100 Mauly, B.F.J. (1976): Exponential data transformations, Statistician 27 (1976) 37-42 Mautz, R. (2002): Solving nonlinear adjustment problems by global optimization, Boll. di Geodesia e Scienze Affini 61 (2002) 123-134 Maxwell, S.E. (2003): Designig experiments and analyzing data. A model comparison perspective, Lawrence Erlbaum Associates, Publishers, London - New Jersey 2003 Maybeck, P. (1979): Stochastic models, estimation and control, vol. 1, Academic Press, New York 1979 Mayer, D.H. (1975): Vector and tensor fields on conformal space, J. Math. Physics 16 (1975) 884-893 Mazya, V. and T. Shaposhnikova (1998): Jacques Hadamand, a universal mathematician American Mathematical Society, Providence 1998
References
711
McCullagh, P. (1983): Quasi-likelihood functions, The Annals of Statistics 11 (1983) 5967 McCullagh, P. (1987): Tensor methods in statistics, Chapman and Hall, London 1987 McCullagh, P. and J.A. Nelder (1989): Generalized linear models, Chapman and Hall, London 1989 McCulloch, C.E. (1997): Maximum likelihood algorithms for generalized linear mixed models, J. American Statist. Assoc. 92 (1997) 162-170 McCulloch, C.E. and S.R. Searle (2002): Generalized, lineas and mixed models, Wiley Series in Probability and Statistic 2002 McElroy, F.W. (1967): A necessary and sufficient condition that ordinary least-squares estimators be best linear unbiased, J.the American Statistical Association 62 (1967) 1302-1304 McGilchrist, C.A. (1994): Estimation in generalized mixed models, J. Roy. Statist. Soc., Series B, 56 (1994) 61-69 McGilchrist, C.A. and C.W. Aisbett (1991): Restricted BLUP for mixed linear models, Biometrical Journal 33 (1991) 131-141 McGilchrist, C.A. and K.K.W. Yau (1995): The derivation of BLUP, ML, REML estimation methods for generalised linear mixed models, Commun. Statist.-Theor. Meth. 24 (1995) 2963-2980 McMorris, F.R. (1997): The median function on structured metric spaces, Student 2 (1997) 195-201 Mehta, M.L. (1991): Random matrices, Academic Press, New York 1991 Meissl, P. (1965): Über die Innere Genauigkeit dreidimensionaler Punkthaufen, Z. Vermessungswesen 90 (1965) 198-118 Meissl, P. (1969): Zusammenfassung und Ausbau der inneren Fehlertheorie eines Punkthaufens, Deutsche Geod. Kommission A 61, München 1994, 8-21 Meissl, P. (1976): Hilbert spaces and their application to geodetic least-squares problems, Bolletino di Geodesia e Scienze Affini 35 (1976) 49-80 Melbourne, W. (1985): The case of ranging in GPS-based geodetic systems, Proc. 1st Int. Symp. on Precise Positioning with GPS, Rockville, Maryland (1985) 373-386 Menz, J. (2000): Forschungsergebnisse zur Geomodellierung und deren Bedeutung, Manuskript 2000 Menz, J. and N. Kolesnikov (2000): Bestimmung der Parameter der Kovarianzfunktionen aus den Differenzen zweier Vorhersageverfahren, Manuskript, 2000 Merriman, M. (1877): On the history of the method of least squares, The Analyst 4 (1877) Merriman, M. (1884): A textbook on the method of least squares, Wiley, New York 1884 Meyer, C.D. (1973): Generalized inverses and ranks of block matrices, SIAM J. Appl. Math. 25 (1973) 597-602 Mhaskar, H.N., Narcowich, F.J and J.D. Ward (2001): Representing and analyzing scattered data on spheres, In: Multivariate approximations and applications, Cambridge University Press, Cambridge 2001, 44-72 Midi, H. (1999): Preliminary estimators for robust non-linear regression estimation, J. Applied Statistics 26 (1999) 591-600 Migon, H.S. and D. Gammermann (1999): Statistical inference, Arnold London 1999 Miller, R.G. (1981): Simultaneous statistical inference, Springer Verlag 1981 Minami, M. and K. Shimizu (1998): Estimation for a common correlation coefficient in bivariate normal distributions with missing observations, Biometrics 54 (1998) 11361146 Minzhong, J. and C. Xiru (1999): Strong consistency of least squares estimate in multiple regression when the error variance is infinite, Statistica Sinica 9 (1999) 289-296
712
References
Minkler, G. and J. Minkler (1993): Theory and application of Kalman filtering, Magellan Book Company 1993 Misra, R.K. (1996): A multivariate procedure for comparing mean vectors for populations with unequal regression coefficient and residual covariance matrices, Biom. J. 38 (1996) 415-424 Mitra, S.K. (1982): Simultaneous diagonalization of rectangular matrices, Linear Algebra Appl. 47 (1982) 139-150 Mjulekar, M.S. and S.N. Mishra (2000): Confidence interval estimation of overlap: Equal means case, Computational Statistics & Data Analysis 34 (2000) 121-137 Mohan, S.R. and S.K. Neogy (1996): Algorithms for the generalized linear complementarity problem with a vertical block Z-matrix, Siam J. Optimization 6 (1996) 9941006 Moire, C. and J.A. Dawson (1992): Distribution, Chapman and Hall, Boca Raton 1996 Money, A.H. et al. (1982): The linear regression model: Lp-norm estimation and the choice of p, Commun. Statist. Simul. Comput. 11 (1982) 89-109 Monin, A.S. and A.M. Yaglom (1981): Statistical fluid mechanics: mechanics of turbulence, vol. 2, The Mit Press, Cambridge 1981 Montromery, D.C. (1996): Introduction to statistical quality control, 3rd edition, J. Wiley, New York 1996 Mood, A.M., F.A. Graybill and D.C. Boes (1974): Introduction to the theory of statistics, 3rd ed., McGraw-Hill, New York 1974 Moon, M.S. and R.F. Gunst (1994): Estimation of the polynomial errors-in-variables model with decreasing error variances, J. Korean Statist. Soc. 23 (1994) 115-134 Moon, M.S. and R.F. Gunst (1995): Polynomial measurement error modelling, Comput. Statist. Data Anal. 19 (1995) 1-21 Moore, E.H. (1900): A fundamental remark concerning determinantal notations with the evaluation of an important determinant of special form, Ann.Math. 2 (1900) 177-188 Moore, E.H. (1920): On the reciprocal of the general algebraic matrix, Bull. Amer. Math. Soc 26 (1920) 394-395 Morgan, B.J.T. (1992): Analysis of quantal response data, Chapman and Hall, Boca Raton 1992 Morgenthaler, S. (1992): Least-absolute-deviations fits for generalized linear models, Biometrika 79 (1992) 747-754 Morgera, S. (1992): The role of abstract algebra in structured estimation theory, IEEE Trans. Inform. Theory 38 (1992) 1053-1065 Moritz, H. (1976): Covariance functions in least-squares collocation, Rep. Ohio State Uni. 240, 1976 Morris, C.N. (1982): Natural exponential families with quadratic variance functions, Ann. Statist. 10 (1982) 65-80 Morrison, D.F. (1967): Multivariate statistical methods, Mc Graw-Hill Book Comp., New York 1967 Morrison, T.P. (1997): The art of computerized measurement, Oxford University Press, Oxford 1997 Morsing, T. and C. Ekman (1998): Comments on construction of confidence intervals in connection with partial least-squares, J. Chemometrics 12 (1998) 295-299 Moser, B.K. and M.H. McCann (1996): Maximum likelohood and restricted maximum likelihood estimators as functions of ordinary least squares and analysis of variance estimators, Commun. Statist.-Theory Meth. 25 (1996) 631-646 Moser, B.K. and J.K. Sawyer (1998): Algorithms for sums of squares and covariance matrices using Kronecker Products, The American Statistician 52 (1998) 54-57
References
713
Moutard, T. (1894): Notes sur la propagation des ondes et les équations de l'hydroudynamique, Paris 1893, reprint Chelsea Publ., New York 1949 Mudholkar, G.S, (1997): On the efficiencies of some common quick estimators, Commun. Statist.-Theory Meth. 26 (1997) 1623-1647 Mueller, C.H. (1997): Robust planning and analysis of experiments, Springer Verlag, New York 1997 Mueller, C.H. (1998): Breakdown points of estimators for aspects of linear models, in: MODA 5 – Advances in model-oriented data analysis and experimental design, pp. 137-144, Atkinson, A.K., Pronzato, L. and H.P. Wynn (eds), Physica Verlag 1998 Mueller, C.H. (2003): Robust estimators for estimating discontinuous functions, Developments in Robust Statistics, pp. 266-277, Physica Verlag, Heidelberg 2003 Mueller-Gronbach, T. (1996): Optimal designs for approximating the path of a stochastic process, J. Statistical Planning and Inference 49 (1996) 371-385 Mueller, H. (1983): Strenge Ausgleichung von Polygonnetzen unter rechentechnischen Aspekten, Deutsche Geodätische Komission, Bayerische Akademie der Wissenschaften C279, München 1983 Mueller, H. (1985): Second-order design of combined linear-angular geodetic networks, Bulletin Geodésique, 59 (1985), 316-331 Mueller, J. (1987): Sufficiency and completeness in the linear model, J. Multivariate Anal. 21 (1987) 312-323 Mueller, J., Rao, C.R. and B.K. Sinha (1984): Inference on parameters in a linear model: a review of recent results, in: Experimental design, statistical models and genetic statistics, K. Hinkelmann (ed.), Chap. 16, Marcel Dekker, New York 1984 Mueller, W. (1995): Ein Beispiel zur Versuchungsplanung bei korrelierten Beobachtungen, Osterreichische Zeitschrift für Statistik 24 (1995) 9-15 Mueller, W. (1998a): Spatial data collection, contributions to statistics, Physica Verlag, Heidelberg 1998 Mueller, W. (1998b): Collecting spatial data - optimum design of experiments for random fields, Physica-Verlag, Heidelberg 1998 Mueller, W. (2001): Collecting spatial data - optimum design of experiments for random fields, 2nd ed., Physica-Verlag, Heidelberg 2001 Mueller, W. and A. Pázman (1998): Design measures and extended information matrices for experiments without replications, J. Statist. Planning and Inference 71 (1998) 349362 Mueller, W. and A. Pázman (1999): An algorithm for the computation of optimum designs under a given covariance structure, Computational Statistics 14 (1999) 197-211 Mueler, W.G. (1995): Ein Beispiel zur Versuchungsplanung bei korrelierten Beobachtungen, Österreichische Zeitschrift für Statistik N.F. 24 (1995) 9-15 Mukhopadhyay, P. and R. Schwabe (1998): On the performance of the ordinary least squares method under an error component model, Metrika 47 (1998) 215-226 Muir, T. (1911): The theory of determinants in the historical order of development, volumes 1-4, Dover, New York 1911, reprinted 1960 Muirhead, R.J. (1982): Aspects of multivariate statistical theory, J. Wiley, New York 1982 Mukherjee, K. (1996): Robust estimation in nonlinear regression via minimum distance method, Mathematical Methods of Statistics 5 (1996) 99-112 Mukhopadhyay, N. (2000): Probability and statistical inference, Dekker, New York 2000 Muller, C. (1966): Spherical harmonics – Lecture notes in mathematics 17 (1966), Springer-Verlag, New York, 45pp.
714
References
Muller, D. and W.W.S. Wei (1997): Iterative least squares estimation and identification of the tranfer function model, J. Time Series Analysis 18 (1997) 579-592 Muller, H. and M. Illner (1984): Gewichtsoptimierung geodätischer Netze. Zur Anpassung von Kriteriumsmatrizen bei der Gewichtsoptimierung, Allg. Vermessungsnachrichten (1984), 253-269 Muller, H. and G. Schmitt (1985): SODES2 – Ein Programm-System zur Gewichtsoptimierung zweidimensionaler geodätischer Netze. Deutsche Geodätische Kommission, München, Reihe B, 276 (1985) Munoz-Pichardo, J.M., Munoz-García, J., Fernández-Ponce, J.M. and F. López-Bláquez (1998): Local influence on the general linear model, Sankhya: The Indian J. Statistics 60 (1998) 269-292 Murray, J.K. and J.W. Rice (1993): Differential geometry and statistics, Chapman and Hall, Boca Raton 1993 Myers, J.L. (1979): Fundamentals of experimental designs, Allyn and Bacon, Boston 1979 Nadaraya, E. (1993): Limit distribution of the integrated squared error of trigonometric series regression estimator, Proceedings of the Georgian Academy of Sciences. Mathematics 1 (1993) 221-237 Naes, T. and H. Martens (1985): Comparison of prediction methods for multicollinear data, Commun. Statist. Simulat. Computa. 14 (1985) 545-576 Naether, W. (1985): Exact designs for regression models with correlated errors, Statistics 16 (1985) 479-484 Nagaev, S.V. (1979): Large deviations of sums of independent random variables, Ann. Probability 7 (1979) 745-789 Nagar, D.K. and A.K. Gupta (1996): On a test statistic useful in Manova with structured covariance matrices, J. Appl. Stat. Science 4 (1996) 185-202 Nagaraja, H.N. (1982): Record values and extreme value distributions, J. Appl. Prob. 19 (1982) 233-239 Nagarsenker, B.N. (1977): On the exact non-null distributions of the LR criterion in a general MANOVA model, Sankya A39 (1977) 251-263 Nahin, P.J. (2004): When least is best, how mathematicians discovered many clever ways to make things as small (or a s large) as possible, Princeton University Press, Princeton and Oxford 2004 Nakamura, N. and S. Konishi (1998): Estimation of a number of components for multivariate normal mixture models based on information criteria, Japanese J. Appl. Statist 27 (1998) 165-180 Nakamura, T. (1990): Corrected score function for errors-invariables models: methodology and application to generalized linear models, Biometrika 77 (1990) 127-137 Nandi, S. and D. Kundu (1999): Least-squares estimators in a stationary random field, J. Indian Inst. Sci. 79 (1999) 75-88 Nashed, M.Z. (1976): Generalized inverses and applications, Academic Press, New York 1976 Nelson, R. (1995): Probability, stochastic processes, and queueing theory, Springer Verlag, New York 1995 Nesterov, Y.E. and A.S. Nemirovskii (1992): Interior point polynomial methods in convex programming, Springer Verlag, New York 1992 Neudecker, H. (1968): The Kronecker matrix product and some of its applications in econometrics, Statistica Neerlandica 22 (1968) 69-82 Neudecker, H. (1969): Some theorems on matrix differentiation with special reference to Kronecker matrix products, J. American Statist. Assoc. 64 (1969) 953-963
References
715
Neudecker, H. (1978): Bounds for the bias of the least squares estimator of V2 in the case of a first-order autoregressive process (positive autocorrelation), Econometrica 45 (1977) 1257-1262 Neudecker, H. (1978): Bounds for the bias of the LS estimator of V2 in the case of a firstorder (positive) autoregressive process when the regression contains a constant term, Econometrica 46 (1978) 1223-1226 Neumaier, A. (1998): Solving ILL-conditioned and singular systems: a tutorial on regularization, SIAM Rev. 40 (1998) 636-666 Neuts, M.F. (1995): Algorithmic probability, Chapman and Hall, Boca Raton 195 Neutsch, W. (1995): Koordinaten: Theorie und Anwendungen, Spektrum Akademischer Verlag, Heidelberg 1995 Neway, W.K. and J.K. Powell (1987): Asymmetric least squares estimation and testing, Econometrica 55 (1987) 819-847 Newman, D. (1939): The distribution of range in samples from a normal population, expressed in terms of an independent estimate of standard deviation, Biometrika 31 (1939) 20-30 Neykov, N.M. and C.H. Mueller (2003): Breakdown point and computation of trimmed likelihood estimators in generalized linear models, in: R. Dutter, P. Filzmoser, U. Gather, P.J. Rousseeuw (Eds.), Developments in Robust Statistics, pp. 277-286, Physica Verlag, Heidelberg 2003 Neyman, J. (1934): On the two different aspects of the representative method, J. Royal Statist. Soc. 97 (1934) 558-606 Neyman, J. (1937): Outline of the theory of statistical estimation based on the classical theory of probability, Phil. Trans. Roy. Soc. London 236 (1937) 333-380 Ng, M.K. (2000): Preconditioned Lanczos methods for the minimum eigenvalue of a symmetric positive definitive toeplitz matrix, SIAM J. Svi. Comput 21 (2000) 19731986 Nicholson, W.K. (1999): Introduction to abstract algebra, 2nd ed., J. Wiley, New York 1999 Niemeier, W. (1985): Deformationsanalyse, in: Geodätische Netze in Landes- und Ingenieurvermessung II, Kontaktstudium, ed. H. Pelzer, Wittwer, Stuttgart 1985 Nkuite, G. (1998): Ausgleichung mit singulärer Varianzkovarianzmatrix am Beispiel der geometrischen Deformationsanalyse, Dissertation, TU München, München 1998 Nobre, S. and M. Teixeiras (2000): Der Geodät Wilhelm Jordan und C.F. Gauss, GaussGesellschaft e.V. Goettingen, Mitt. 38, pp. 49-54 Goettingen 2000 Nyquist, H. (1988): Least orthogonal absolute deviations, Comput. Statist. Data Anal. 6 (1988) 361-367 O'Neill, M., Sinclair, L.G. and F.J. Smith (1969): Polynomial curve fitting when abscissa and ordinate are both subject ot error, Comput. J. 12 (1969) 52-56 O'Neill, M.E. and K. Mathews (2000): A weighted least squares approach to Levene's test of homogeneity of variance, Austral. & New Zealand J. Statist. 42 (2000) 81-100 Offlinger, R. (1998): Least-squares and minimum distance estimation in the threeparameter Weibull and Fréchet models with applications to river drain data, in: Kahle, et al (eds.) Advances in stochastic models for reliability, quality and safety, pages 8197, Birkhäuser Verlag, Boston 1998 Ogawa, J. (1950): On the independence of quadratic forms in a non-central normal system, Osaka Mathematical Journal 2 (1950) 151-159 Ohtani, K. (1988a): Optimal levels of significance of a pre-test in estimating the disturbance variance after the pre-test for a linear hypothesis on coefficients in a linear regression, Econom. Lett. 28 (1988) 151-156
716
References
Ohtani, K. (1998b): On the sampling performance of an improved Stein inequality restricted estimator, Austral. and New Zealand J. Statis. 40 (1998) 181-187 Ohtani, K. (1998c): The exact risk of a weighted average estimator of the OLS and Steinrule estimators in regression under balanced loss, Statistics & Decisions 16 (1998) 3545 Ohtani, K. (1996): Further improving the Stein-rule estimator using the Stein variance estimator in a misspecified linear regression model, Statist. Probab. Lett. 29 (1996) 191-199 Okamoto, M. (1973): Distinctness of the eigenvalues of a quadratic form in a multivariate sample, Ann. Stat. 1 (1973) 763-765 Okeke, F. and F. Krumm (1998): Graph, graph spectra and partitioning algorithms in a geodetic network structural analysis and adjustment, Bolletino di Geodesia e Scienze Affini 57 (1998) 1-24 Olkin, I. (1998): The density of the inverse and pseudo-inverse of a random matrix, Statistics and Probability Letters 38 (1998) 131-135 Olkin, J. (2000): The 70th anniversary of the distribution of random matrices: a survey, Technical Report No. 2000-06, Department of Statistics, Stanford University, Stanford 2000 Olkin, I. and S.N. Roy (1954): On multivariate distribution theory, Ann. Math. Statist. 25 (1954) 329-339 Olkin, I. and A.R. Sampson (1972): Jacobians of matrix transformations and induced functional equations, Linear Algebra Appl. 5 (1972) 257-276 Olkin, I. and J.W. Pratt (1958): Unbiased estimation of certain correlation coefficient, Annals Mathematical Statistics 29 (1958) 201-211 Olsen, A., Seely, J. and D. Birkes (1976): Ivariant quadratic unbiased estimators for two variance components, Annals of Statistics 4 (1976) 878-890 Ord, J.K. and S. Arnold (1997): Kendall’s advanced theory of statistics, volume IIA, classical inference, Arnold Publ., 6th edition, London 1997 Ortega, J.M. and W.C. Rheinboldt (2000): Iterative solution of nonlinear equations in several variables, SIAM 2000 Osborne, M.R. (1972): Some aspects of nonlinear least squares calculations, Numerical Methods for Nonlinear Optimization, ed. F.A. Lootsma, Academic Press, New York 1972 Osborne, M.R. (1976): Nonlinear least squares the Levenberg algorithm revisited, J. Aust. Math. Soc. B 19 (1976) 343-357 Osborne, M.R. and G.K. Smyth (1986): An algorithm for exponential fitting revisited, J. App. Prob. (1986) 418-430 Osborne, M.R. and G.K. Smyth (1995): A modified Prony algorithm for fitting sums of exponential functions, SIAM J. Sc. and Stat. Comp. 16 (1995) 119-138 Osiewalski, J. and M.F.J. Steel (1993): Robust Bayesian inference in elliptical regression models, J. Economatrics 57 (1993) 345-363 Ouellette, D.V. (1981): Schur complements and statistics, Linear Algebra Appl. 36 (1981) 187-295 Owens, W.H. (1973): Strain modification of angular density distributions, Techtonophysics 16 (1973) 249-261 Oyet, A.J. and D.P. Wiens (2000): Robust designs for wavelet approximations of regression models, Nonparametric Statistics 12 (2000) 837-859 Ovtchinnikov, E.E. and L.S. Xanthis (2001): Successive eigenvalue relaxation : a new method for the generalized eigenvalue problem and convergence estimates, Proc. R. Soc. Lond. A 457 (2001) 441-451
References
717
Padmawar, V.R. (1998): On estimating nonnegative definite quadratic forms, Metrika 48 (1998) 231-244 Pagano, M. (1978): On periodic and multiple autoregressions, Annals of Statistics 6 (1978) 1310-1317 Paige, C.C. and M.A.Saunders (1975): Solution of sparse indefinite systems of linear equations, SIAM J. Numer. Anal. 12 (1975) 617-629 Paige, C. and C. van Loan (1981): A Schur decomposition for Hamiltonian matrices, Linear Algebra and its Applications 41 (1981) 11-32 Pakes, A.G. (1999): On the convergence of moments of geometric and harmonic means, Statistica Neerlandica 53 (1999) 96-110 Pal, N. and W.K. Lim (2000): Estimation of a correlation coefficient: some second order decision – theoretic results, Statistics and Decisions 18 (2000) 185-203 Pal, S.K. and P.P. Wang (1996): Genetic algorithms for pattern recognition, CRC Press, Boca Raton 1996 Papoulis, A. (1991): Probability, random variables and stochastic processes, McGraw Hill, New York 1991 Park, H. (1991): A parallel algorithm for the unbalanced orthogonal Procrustes problem, Parallel Computing 17 (1991) 913-923 Parker, W.V. (1945): The characteristic roots of matrices, Duke Math. J. 12 (1945) 519526 Parthasaratky, K.R. (1967): Probability measures on metric spaces, Academic Press, New York 1967 Parzen, E. (1962): On estimation of a probability density function and mode, Ann. Math. Statistics 33 (1962) 1065-1073 Patel, J.K. and C.B. Read (1982): Handbook of the normal distribution, Marcel Dekker, New York and Basel 1982 Patil, V.H. (1965): Approximation to the Behrens-Fisher distribution, Biometrika 52 (1965) 267-271 Pázman, A. (1986): Foundations of optimum experimental design, Mathematics and its applications, D. Reidel, Dordrecht 1986 Pázman, A. and J.-B. Denis (1999): Bias of LS estimators in nonlinear regression models with constraints. Part I: General case, Applications of Mathematics 44 (1999) 359-374 Pázman, A. and W.G. Mueler (1998): A new interpretation of design measures, in: MODA 5 – Advances in model-oriented data analysis and experimental design, pp. 239-246, Atkinson, A.K., Pronzato, L. and H.P. Wynn (eds), Physica Verlag 1998 Pearson, E.S. (1970): William Sealy Gosset, 1876-1937: "Student" as a statistician, Studies in the History of Statistics and Probalbility (E.S. Pearson and M.G. Kendall, eds.) Hafner Publ., 360-403, New York 1970 Pearson, E.S. and H.O. Hartley (1958): Biometrika Tables for Statisticians Vol. 1, Cambridge University Press, Cambridge 1958 Pearson, K. (1905): The problem of the random walk, Nature 72 (1905) 294 Pearson, K. (1906): A mathematical theory of random migration, Mathematical Contributions to the Theory of Evolution, XV Draper’s Company Research Memoirs, Biometrik Series III, London 1906 Pearson, K. (1931): Historical note on the distributional of the Standard Deviations of Samples of any size from any indefinitely large Normal Parent Population, Biometrika 23 (1931) 416-418 Peddada, S.D. and T. Smith (1997): Consistency of a class of variance estimators in linear models under heteroscedasticity, Sankhya 59 (1997) 1-10
718
References
Pelzer, H. (1971): Zur Analyse geodätischer Deformationsmessungen, Deutsche Geodätische Kommission, Akademie der Wissenschaften, Reihe C (164), München 1971 Pelzer, H. (1974): Zur Behandlung singulärer Ausgleichungsaufgaben, Z. Vermessungswesen 99 (1974) 181-194, 479-488 Pena, D., Tiao, G.C., and R.S. Tsay (2001): A course in time series analysis, Wiley, New York 2001 Penrose, R. (1955): A generalised inverse for matrices, Proc. Cambridge Phil. Soc. 51 (1955) 406-413 Penny, K.I. (1996): Appropriate critical values when testing for a single multivariate outlier by using the Mahalanobis distance, in: Applied Statistics, ed. S.M. Lewis and D.A. Preece, J. Royal Stat. Soc. 45 (1996) 73-81 Percival, D.B. and A.T. Walden (1993): Spectral analysis for physical applications, Cambridge, Cambridge University Press 1997 Pereyra, V. and G. Scherer (1973): Efficient computer manipulation of tensor products with application to multidimensional approximation, Math. Computation 27 (1973) 595-605 Perron, F. and N. Giri (1992): Best equivariant estimation in curved covariance models, J. Multivariate Analysis 40 (1992) 46-55 Perron, F. and N. Giri (1990): On the best equivariant estimator of mean of a multivariate normal population, J. Multivariate Analysis 32 (1990) 1-16 Percival, D.B. and A.T. Walden (1999): Wavelet methods for time series analysis, Cambridge University Press, Cambridge 1999 Petrov, V.V. (1975): Sums of independent random variables, Berlin 1975 Pfeufer, A. (1990): Beitrag zur Identifikation und Modellierung dynamischer Deformationsprozesse, Vermessungstechnik 38 (1990) 19-22 Pfeufer, A. (1993): Analyse und Interpretation von Überwachungsmessungen - Terminologie und Klassifikation, Z. Vermessungswesen 118 (1993) 19-22 Phillips, G.M. (2000): Two millennia of mathematics – From Archimedes to Gauss, Springer 2000 Piepho, H.-P. (1998): An algorithm for fitting the shifted multiplicative model, J. Statist. Comput. Simul. 62 (1998) 29-43 Pilz, J. (1983): Bayesian estimation and experimental design in linear regression models, Teubner-Texte zur Mathematik 55, Teubner, Leipzig 1983 Pincus, R. (1974): Estimability of parameters of the covariance matrix and variance components, Math. Operationsforschg. Statistik 5 (1974) 245-248 Pinheiro, J.C. and D.M. Bates (2000): Mixed-effects models in S and S-Plus, Statistics and Computing, Springer-Verlag, New York 2000 Pison, G., Van Aelst, S. and G. Willems (2003): Small sample corrections for LTS and MCD, Developments in Robust Statistics, pp. 330-343, Physica Verlag, Heidelberg 2003 Pistone, G. and M.P. Rogantin (1999): The exponential statistical manifold: mean parameters. Orthogonality and space transformations, Bernoulli 5 (1999) 721-760 Pitman, E.J.G. (1979): Some basic theory for statistical inference, Chapman and Hall, Boca Raton 1979 Pitman, J. and M. Yor (1981): Bessel processes and infinitely divisible laws, unpublished report, University of California, Berkeley Plachky, D. (1993): An estimation-theoretical characterization of the Poisson distribution, Statistics and Decisions, Supplement Issue 3 (1993) 175-178 Plackett, R.L. (1949): A historical note on the method of least-squares, Biometrika 36 (1949) 458-460
References
719
Plackett, R.L. (1972): The discovery of the method of least squares, Biometrika 59 (1972) 239-251 Plato, R. (1990): Optimal algorithms for linear ill-posed problems yield regularization methods, Numer. Funct. Anal. Optim. 11 (1990) 111-118 Plemmons, R.J. (1990): Recursive least squares computation, Proceedings of the International Symposium MTNS 3 (1990) 495-502 Pohst, M. (1987): A modification of the LLL reduction algorithm, J. Symbolic Computation 4 (1987) 123-127 Poirier, D.J. (1995): Intermediate statistics and econometrics, The MIT Press, Cambridge 1995 Poisson, S.D. (1827): Connaissance des temps de l’annee 1827 Polasek, W. and S. Liu (1997): On generalized inverses and Bayesian analysis in simple ANOVA models, Student 2 (1997) 159-168 Pollock, D.S.G. (1999): A handbook of time-series analysis, signal processing and dynamics, Academic Press, Cambridge 1999 Polya, G. (1919): Zur Statistik der sphaerischen Verteilung der Fixsterne, Astr. Nachr. 208 (1919) 175-180 Polya, G. (1930): Sur quelques points de la théorie des probabilités, Ann. Inst. H. Poincare 1 (1930) 117-161 Pope, A.J. (1976): The statistics of residuals and the detection of outliers, NOAA Technical Report, NOW 65 NGS 1, U.S. Dept. of Commerce, Rockville, Md., 1976 Popinski, W. (1999): Least-squares trigonometric regression estimation, Applicationes Mathematicae 26 (1999) 121-131 Portnoy, S. and R. Koenker (1997): The Gaussian hare and the Laplacian tortoise: computability of squared error versus absolute-error estimators, Statistical Science 12 (1997) 279-300 Potts, D., Steidl, G. and M. Tasche (1996): Kernels of spherical harmonics and spherical frames, in: Advanced Topics in Multivariate Approximation pp. 287-301, Fontanella, F., Jetter, K. and P.J. Laurent (eds), World Scientific Publishing 1996 Powers, D.L. (1999): Boundary value problems, Harcourt Academic Press 1999 Pratt, J.W. (1961): Length of confidence intervals, J. American Statistical Assoc. 56 (1961) 549-567 Pratt, J.W. (1963): Shorter confidence intervals for the mean of a normal distribution with known variance, Ann. Math. Statist. 34 (1963) 574-586 Prescott, P. (1975): An approximate tests for outliers in linear models, Technometric 17 (1975) 129-132 Presnell, B., Morrison, S.P. and R.C. Littell (1998): Projected multivariate linear models for directional data, J. American Statist. Assoc. 93 (1998) 1068-1077 Press, S.J. (1989): Bayesian statistics: Principles, models and applications, Wiley, New York 1989 Press, W.H., Teukolsky, S.A., Vetterling, W.T. and B.P. Flannery (1992): Numerical Recipes in FORTRAN (2nd edition), Cambridge University Press, Cambridge 1992 Priestley, M.B. (1981): Spectral analysis and time series, Vol. 1 and 2, Academic Press, London 1981 Priestley, M.B. (1988): Nonlinear and nonstationary time series analysis, Academic Press, London 1988 Prony, R. (1795): Essai experimentale et analytique, J.Ecole Polytechnique (Paris) 1 (1795) 24-76 Prószynski, W. (1997): Measuring the robustness potential of the least-squares estimation: geodetic illustration, J. Geodesy 71 (1997) 652-659
720
References
Pruscha, H. (1996): Angewandte Methoden der Mathematischen Statistik, Teubner Skripten zur Mathematischen Stochastik, Stuttgart 1996 Pugachev, V.S. and I.N. Sinitsyn (2002): Stochastic systems, Theory and applications, Russian Academy of Sciences 2002 Puntanen, S., Styan, G.P.H. and H.J. Werner (2000): Two matrix-based proofs that the linear estimator Gy is the best linear unbiased estimator, J. Statist. Planning and Inference 88 (2000) 173-179 Pukelsheim, F. (1981a): Linear models and convex geometry: Aspects of non-negative variance estimation, Math. Operationsforsch. u. Stat. 12 (1981) 271-286 Pukelsheim, F. (1981b): On the existence of unbiased nonnegative estiamtes of variance covariance components, Ann. Statist. 9 (1981) 293-299 Pukelsheim, F. (1993): Optimal design of experiments, Wiley, New York 1993 Pukelsheim, F. (1994): The three sigma rule, American Statistician 48 (1994) 88-91 Pukelsheim, F. and B. Torsney (1991): Optimal weights for experimental designs on linearly independent support points, The Annals of Statistics 19 (1991) 1614-1625 Pukelsheim, F. and W.J. Studden (1993): E-optimal designs for polynomial regression, Ann. Stat. 21 (1993) 402-415 Qingming, G. and L. Jinshan (2000): Biased estimation in Gauss-Markov model, Allg. Vermessungsnachrichten 107 (2000) 104-108 Qingming, G., Yuanxi, Y. and G. Jianfeng (2001): Biased estimation in the GaussMarkov model with constraints, Allg. Vermessungsnachrichten 108 (2001) 28-30 Qingming, G., Lifen, S., Yuanxi, Y. and G. Jianfeng (2001): Biased estimation in the Gauss-Markov model not full of rank, Allg. Vermessungsnachrichten 108 (2001) 390393 Quintana, E.S., Quintana, G., Sun, X. and R. van de Geijn (2001): A note on parallel matrix inversion, SIAM J. Sci. Comput 22 (2001) 1762-1771 Quintana-Orti, G., Sun, X. and C.H. Bischof (1998): A blas-3-version of the QR factorization with column pivoting, SIAM J. Sci. Comput. 19 (1998) 1486-1494 Rader, C. and A.O. Steinhardt (1988): Hyperbolic householder transforms, SIAM. J. Matrix Anal. Appl. 9 (1988) 269-290 Rafajlowicz, E. (1988): Nonparametric least squares estimation of a regression function, Statistic 19 (1988) 349-358 Raj, D. (1968): Sampling theory, Mc Graw-Hill Book Comp., Bombay 1968 Ramsey, J.O. and B.W. Silverman (1997): Functional data analysis, Springer Verlag, New York 1997 Rao, B.L.S.P. (1997a): Variance components, Chapman and Hall, Boca Raton 1997 Rao, B.L.S.P. (1997b): Weighted least squares and nonlinear regression, J. Ind. Soc. Ag. Statistics 50 (1997) 182-191 Rao, B.L.S.P. and B.R. Bhat (1996): Stochastic processes and statistical inference, New Age International, New Delhi 1996 Rao, C.R. (1945): Generalisation of Markoff’s Theorem and tests of linear hypotheses, Sankya 7 (1945) 9-16 Rao, C.R. (1952a): Some theorems on Minimum Variance Estimation, Sankhya 12, 27-42 Rao, C.R. (1952b): Advanced statistical methods in biometric research, Wiley, New York 1952 Rao. C.R. (1965a): Linear Statistical Interference and ist Applications, Wiley, New York 1965 Rao, C.R. (1965b): The theory of least squares when the parameters are stochastic and its application to the analysis of growth curves, Biometrika 52 (1965) 447-458
References
721
Rao, C.R. (1970): Estimation of heteroscedastic variances in linear models, J. Am. Stat. Assoc. 65 (1970) 161-172 Rao, C.R. (1971a): Estimation of variance and covariance components - MINQUE theory, J. Multivar. Anal. 1 (1971) 257-275 Rao, C.R. (1971b): Unified theory of linear estimation, Sankya Ser. A33 (1971) 371-394 Rao, C.R. (1971c): Minimum variance quadratic unbiased estimation of variance components, J. Multivar. Anal. 1 (1971) 445-456 Rao, C.R. (1972a): Unified theory of least squares, Communications in Statistics 1 (1972) 1-8 Rao, C.R. (1972b): Estimation of variance and covariance components in linear models. J. Am. Stat. Ass. 67 (1972) 112-115 Rao, C.R. (1973a): Linear statistical inference and its applications, 2nd ed., Wiley, New York 1973 Rao, C.R. (1973b): Representation of best linear unbiased estimators in the GaussMarkoff model with a singular dispersion matrix, J. Multivariate Analysis 3 (1973) 276-292 Rao, C.R. (1976): Estimation of parameters in a linear model, Ann. Statist. 4 (1976) 10231037 Rao, C.R. (1985): The inefficiency of least squares: extensions of the Kantorovich inequality, Linear algebra and its applications 70 (1985) 249-255 Rao, C.R. and S.K. Mitra (1971): Generalized inverse of matrices and its applications, Wiley, New York 1971 Rao, C.R. and J. Kleffe (1988): Estimation of variance components and applications, North Holland, Amsterdam 1988 Rao, C.R. and R. Mukerjee (1997): Comparison of LR, score and wald tests in a non-IID setting, J. Multivariate Analysis 60 (1997) 99-110 Rao, C.R. and H. Toutenburg (1995a): Linear models, least-squares and alternatives, Springer-Verlag, New York 1995 Rao, C.R. and H. Toutenburg (1995b): Linear models, Springer Verlag, New York 1995 Rao, C.R. and H. Toutenburg (1999): Linear models, Least squares and alternatives, 2nd ed., Springer Verlag, New York 1999 Rao, C.R. and G. J. Szekely (2000): Statistics for the 21st century - Methodologies for applications of the future, Marcel Dekker, Basel 2000 Rao, P.S.R.S. and Y.P. Chaubey (1978): Three modifications of the principle of the MINQUE, Commun. Statist. Theor. Methods A7 (1978) 767-778 Ravishanker, N. and D.K. Dey (2002): A first course in linear model theory – Multivariate normal and related distributions, Chapman & Hall/CRC 2002 Ravi, V. and H.-J. Zimmermann (2000): Fuzzy rule based classification with FeatureSelector and modified threshold accepting, European J.Operational Research 123 (2000) 16-28 Ravi, V., Reddy, P.J. and H.-J. Zimmermann (2000): Pattern classification with principal component analysis and fuzzy rule bases, European J.Operational Research 126 (2000) 526-533 Ravishanker, N. and D.K. Dey (2002): A first course in linear model theory, CRC Press, Boca Raton 2002 Rayleigh, L. (1880): On the resultant of a large number of vibrations of the same pitch and of arbitrary phase, Phil. Mag. 5 (1880) 73-78 Rayleigh, L. (1905): The problem of random walk, Nature 72 (1905) 318 Rayleigh, L. (1919): On the problem of random vibrations, and of random flights in one, two or three dimensions, Phil Mag. 37 (1919) 321-347
722
References
Reeves, J. (1998): A bivariate regression model with serial correlation, The Statistician 47 (1998) 607-615 Reich, K. (2000): Gauss' Schüler. Studierten bei Gauss und machten Karriere. Gauss' Erfolg als Hochschullehrer (Gauss's students: studied with him and were successful. Gauss's success as a university professor), Gauss Gesellschaft E.V.Göttingen, Mitteilungen Nr. 37, pages 33-62, Göttingen 2000 Relles, D.A. (1968): Robust regression by modified least squares, PhD. Thesis, Yale University, Yale 1968 Remondi, B.W. (1984): Using the Global Positioning System (GPS) phase observable for relative geodesy: modelling, processing and results. PhD.Thesis, Center for Space Research, The University of Texas, Austin 1984 Ren, H. (1996): On the eroor analysis and implementation of some eigenvalue decomposition and singular value decomposition algorithms, UT-CS-96-336, LAPACK working note 115 (1996) Rencher A.C. (2000): Linear models in statistics, J. Wiley, New York 2000 Renfer, J.D. (1997): Contour lines of L1 -norm regression, Student 2 (1997) 27-36 Resnikoff, G.J. and G.J. Lieberman (1957): Tables of the noncentral t-distribution, Stanford University Press, Stanford 1957 Riccomagno, E., Schwabe, R. and H.P. Wynn (1997): Lattice-based optimum design for Fourier regression, Ann. Statist. 25 (1997) 2313-2327 Rice, J.R. (1969): The approximation of functions, vol. II - Nonlinear and multivariate theory, Addison-Wesley, Reading 1969 Richards, F.S.G. (1961): A method of maximum likelihood estimation, J. Royal Stat. Soc. B 23 (1961) 469-475 Richter, H. and V. Mammitzsch (1973): Methode der kleinsten Quadrate, Stuttgart 1973 Riedel, K.S. (1992): A Sherman-Morrison-Woodbury identity for rank augmenting matrices with application to centering, SIAM J. Matrix Anal. Appl. 13 (1992) 659-662 Riedwyl, H. (1997): Lineare Regression, Birkhäuser Verlag, Basel 1997 Richter, W.D. (1985): Laplace-Gauß integrals, Gaussian measure asymptotic behaviour and probabilities of moderate deviations, Z. Analysis und ihre Anwendungen 4 (1985) 257-267 Rilstone, P., Srivastava, V.K. and A. Ullah (1996): The second order bias and mean squared error of nonlinear estimators, J. Econometrics 75 (1996) 369-395 Rivest, L.P. (1982): Some statistical methods for bivariate circular data, J. Royal Statistical Society, Series B: 44 (1982) 81-90 Rivest, L.P. (1988): A distribution for dependent unit vectors, Comm. Statistics A: Theory Methods 17 (1988) 461-483 Rivest, L.P. (1989): Spherical regression for concentrated Fisher-von Mises distributions, Annals of Statistics 17 (1989) 307-317 Roberts, P.H. and H.D. Ursell (1960): Random walk on the sphere and on a Riemannan manifold, Phil. Trans. Roy. Soc. A252 (1960) 317-356 Robinson, G.K. (1982): Behrens-Fisher problem, Encyclpedia of the Statistical Sciences, Vol. 1, Wiley, New York 1982 Robinson, P.M. and C. Velasco (1997): Autocorrelation-Robust Interference, Handbook of Statistics 15 (1997) 267-298 Rodgers, J.L. and W.A. Nicewander (1988): Thirteen ways to look at the correlation coefficient, the Maerican Statistician 42 (1988) 59-66 Rohatgi, V.K. (1987): Statistical inference, J. Willey & Sons 1987 Rohde, C.A. (1966): Some results on generalized inverses, SIAM Rev. 8 (1966) 201-205
References
723
Romano, J.P. and A.F. Siegel (1986): Counterexamples in probability and statistics, Chapman and Hall, Boca Raton 1986 Romanowski, M. (1979): Random errors in observations and the influence of modulation on their distribution, K. Wittwer Verlag, Stuttgart 1979 Rosen, J.B., Park, H. and J. Glick (1996): Total least norm formulation and solution for structured problems, SIAM J. Matrix Anal. Appl. 17 (1996) 110-126 Rosén, K.D.P. (1948): Gauss’s mean error and other formulae for the precision of direct observations of equal weight, Tätigkeitsbereiche Balt. Geod. Komm. 1944-1947, pp. 38-62, Helsinki 1948 Rosenblatt, M. (1971): Curve estimates, Ann. Math. Statistics 42 (1971) 1815-1842 Rosenblatt, M. (1997): Some simple remarks on an autoregressive scheme and an implied problem, J. theor. Prob. 10 (1997) 295-305 Ross, G.J.S. (1982): Non-linear models, Math. Operationsforschung Statistik 13 (1982) 445-453 Ross, S.M. (1983): Stochastic processes, Wiley, New York 1983 Rousseeuw, P.J. and A.M. Leroy (1987): Robust regression and outlier detection, J. Wiley, New York 1987 Roy, T. (1995): Robust non-linear regression analysis, J. Chemometrics 9 (1995) 451-457 Rozanski, I.P. and R. Velez (1998): On the estimation of the mean and covariance parameter for Gaussian random fields, Statistics 31 (1998) 1-20 Rueda, C., Salvador, B. and M.A. Fernández (1997): Simultaneous estimation in a restricted linear model, J. Multivariate Analysis 61 (1997) 61-66 Rueschendorf, L. (1988): Asymptotische Statistik, Teubner, Stuttgart 1988 Rummel, R. (1975): Zur Behandlung von Zufallsfunktionen und –folgen in der physikalischen Geodäsie, Deutsche Geodätische Kommission bei der Bayerischen Akademie der Wissenschaften, Report No. C 208, München 1975 Rummel, R. and K. P. Schwarz (1977): On the nonhomogenity of the global covariance function, Bull. Géodésique 51 (1977) 93-103 Ruppert, D. and R.J. Carroll (1980): Trimmed least squares estimation in the linear model, J.the American Statistical Association 75 (1980) 828-838 Rutherford, A. (2001): Introducing Anova and Ancova – a GLM approach, Sage, London 2001 Rutherford, D.E. (1933): On the condition that two Zehfuss matrices be equal, Bull. American Math. Soc. 39 (1933) 801-808 Saalfeld, A. (1999): Generating basis sets of double differences, J. Geodesy 73 (1999) 291-297 Sacks, J. and D. Ylvisaker (1966): Design for regression problems with correlated errors, Annals of Mathematical Statistics 37 (1966) 66-89 Sahai, H.(2000): The analysis of variance: fixed, random and mixed models, 778 pages, Birkhaeuser Verlag, Basel 2000 Saichev, A.I. and W.A. Woyczynski (1996): Distributions in the physical and engineering sciences, Vol. 1, Birkäuser Verlag, Basel 1996 Sakallioglu, S., Kaciranlar, S. and F. Akdeniz (2001): Mean squared error comparisons of some biased regression estimators, Commun. Statist.- Theory Math. 30 (2001) 347361 Samorodnitsky, G. and M.S. Taqqu (1994): Stable non-Gaussian random processes, Chapman and Hall, Boca Raton 1994 Sampson, P.D. and P. Guttorp (1992): Nonparametric estimation of nonstationary spatial covariance structure, J.the American Statistical Association 87 (1992) 108-119 Sander, B. (1930): Gefugekunde und Gesteine, J. Springer, Vienna 1930
724
References
Sansò, F. (1990): On the aliasing problem in the spherical harmonic analysis, Bull. Géod. 64 (1990) 313-330 Sansò, F. and G. Sona (1995): The theory of optimal linear estimation for continuous fields of measurements, Manuscripta Geodetica 20 (1995) 204-230 Sastry, S. (1999): Nonlinear systems: Analysis, stability and control, Springer-Verlag, New York 1999 Sathe, S.T. and H.D. Vinod (1974): Bound on the variance of regression coefficients due to heteroscedastic or autoregressive errors, Econometrica 42 (1974) 333-340 Saw, J.G. (1978): A family of distributions on the m-sphere and some hypothesis tests, Biometrika 65 (1978) 69-73 Saw, J.G. (1981): On solving the likelihood equations which derive from the Von Mises distribution, Technical Report, University of Florida, 1981 Saxe, K. (2002): Beginning functional analyses, Springer 2002 Sayed, A.H., Hassibi, B. and T. Kailath (1996): Fundamental inertia conditions for the minimization of quadratic forms in indefinite metric spaces, Oper. Theory: Adv. Appl., Birkhäuser Verlag, Cambridge/ Mass 1996 Schach, S. and T. Schäfer (1978): Regressions- und Varianzanalyse, Springer, Berlin 1978 Schafer, J.L. (1997): Analysis of incomplete multivariate data, Chapman and Hall, London 1997 Schaffrin, B. (1979): Einige ergänzende Bemerkungen zum empirischen mittleren Fehler bei kleinen Freiheitsgraden, Z. Vermessungsesen 104 (1979) 236-247 Schaffrin, B. (1983a): Varianz-Kovarianz Komponentenschätzung bei der Ausgleichung heterogener Wiederholungsmessungen, Deutsche Geodätische Kommission, Report C 282, München, 1983 Schaffrin, B. (1983b): Model choice and adjustment techniques in the presence of prior information, Ohio State University Department of Geodetic Science and Surveying, Report 351, Columbus 1983 Schaffrin, B. (1985): The geodetic datum with stochastic prior information, Publ. C313, German Geodetic Commission, München 1985 Schaffrin, B. (1991): Generalized robustified Kalman filters for the integration of GPS and INS, Tech. Rep. 15, Geodetic Institute, Stuttgart Unversity 1991 Schaffrin, B. (1995): A generalized Lagrange function approach to include fiducial constraints, Z. Vermessungswesen 7 (1995) 325-350 Schaffrin, B. (1997): Reliability measures for correlated observations, J. Surveying Engineering 123 (1997) 126-137 Schaffrin, B. (1999): Softly unbiased estimation part1: The Gauss-Markov model, Linear Algebra and its Applications 289 (1999) 285-296 Schaffrin, B. (2001a): Equivalent systems for various forms of kriging, including leastsquares collocation, Z. Vermessungswesen 126 (2001) 87-94 Schaffrin, B. (2001b): Minimum mean square error adjustment, part. I: the empirical BLE and the repro-BLE for direct observation, J.the Geodetic Society of Japan 46 (2000) 21-30 Schaffrin, B. and E.W. Grafarend (1982a): Kriterion-Matrizen II: Zweidimensionale homogene und isotrope geodätische Netze, Teil II a: Relative cartesische Koordinaten, Z. Vermessungswesen 107 (1982) 183-194 Schaffrin, B. and E.W. Grafarend (1982b): Kriterion-Matrizen II: Zweidimensionale homogene und isotrope geodätische Netze. Teil II b: Absolute cartesische Koordinaten, Z. Vermessungswesen 107 (1982) 485-493 Schaffrin, B., Grafarend, E.W. and G. Schmitt (1977): Kanonisches Design Geodätischer Netze I, Manuscripta Geodaetica 2 (1977) 263-306
References
725
Schaffrin, B., Grafarend, E.W. and G. Schmitt (1978): Kanonisches Design Geodätischer Netze II, Manuscripta Geodaetica 2 (1978) 1-22 Schaffrin, B. and E.W. Grafarend (1986): Generating classes of equivalent linear models by nuisance parameter elimination, Manuscripta Geodaetica 11 (1986) 262-271 Schaffrin, B. and E.W. Grafarend (1991): A unified computational scheme for traditional and robust prediction of random effects with some applications in geodesy, The Frontiers of Statistical Scientific Theory & Industrial Applications 2 (1991) 405-427 Schaffrin, B. and J.H. Kwon (2002): A bayes filter in friendland form for INS/GPS vector gravimetry, Geoph. J. Int. 149 (2002) 64-75 Shanbhag D.N. and C.R. Rao (eds.) (2001): Stochastic Processes: Theory and methods, Elsevier 2001 Scheidegger, A.E. (1965): On the statistics of the orientation of bedding planes, grain axes and similar sedimentological data, U.S. Geol. Survey Prof. Paper 525-C (1965) 164167 Schetzen, M. (1980): The Volterra and Wiener theories of nonlinear systems, Wiley, New York 1980 Schick, A. (1999): Improving weighted least-squares estimates in heteroscedastic linear regression when the variance is a function of the mean response, J.Statistical Planning and Inference 76 (1999) 127-144 Schiebler, R. (1988): Giorgio de Chirico and the theory of relativity, Lecture given at Stanford University, Wuppertal 1988 Schmetterer, L. (1956): Einführung in die mathematische Statistik, Wien 1956 Schmidt, E. (1907): Entwicklung willkürlicher Funktionen, Math. Annalen 63 (1907) 433476 Schmidt, K. (1996): A comparison of minimax and least squares estimators in linear regression with polyhedral prior information, Acta Applicandae Mathematicae 43 (1996) 127-138 Schmidt, K.D. (1996): Lectures on risk theory, Teubner Skripten zur Mathematischen Stochastik, Stuttgart 1996 Schmidt, P. (1976): Econometrics, Marcel Dekker, New York 1976 Schmidt-Koenig, K. (1972): New experiments on the effect of clock shifts on homing pigeons in animal orientation and navigation, Eds.: S.R. Galler, K. Schmidt-Koenig, G.J. Jacobs and R.E. Belleville, NASA SP-262, Washington D:C. 1972 Schmitt, G. (1975): Optimaler Schnittwinkel der Bestimmungsstrecken beim einfachen Bogenschnitt, Allg. Vermessungsnachrichten 6 (1975) 226-230 Schmitt, G. (1977a): Experiences with the second-order design problem in theoretical and practical geodetic networks, Proceedings International Symposium on Optimization of Design and Computation of Control Networks, Sporon 1977 Schmitt, G. (1977b): Experiences with the second-order design problem in theoretical and practical geodetic networks, Optimization of design and computation of control networks. F. Halmos and J. Somogyi eds, Akadémiai Kiadó, Budapest (1979), 179-206 Schmitt, G. (1978): Gewichtsoptimierung bei Mehrpunkteinschaltung mit Streckenmessung, Allg. Vermessungsnachrichten 85 (1978) 1-15 Schmitt, G. (1979): Zur Numerik der Gewichtsoptimierung in geodätischen Netzen, Deutsche Geodätische Kommission, Bayerische Akademie der Wissenschaften, Report 256, München 1979 Schmitt, G. (1980): Second order design of a free distance network considering different types of criterion matrices, Bulletin Geodetique 54 (1980) 531-543 Schmitt, G. (1985): Second Order Design, Third Order Design, Optimization and design of geodetic networks, Springer Verlag, Berlin (1985), 74-121
726
References
Schmitt, G., Grafarend, E.W., and B. Schaffrin.(1977): Kanonisches Design geodätischer Netze I, Manuscripta Geodaetica 2 (1977) 263-306 Schmitt, G., Grafarend, E.W. and B. Schaffrin (1978): Kanonisches Design geodätischer Netze II, Manuscripta Geodaetica 3 (1978) 1-22 Schneeweiß, H. and H.J. Mittag (1986): Lineare Modelle mit fehlerbehafteten Daten, Physica-Verlag, Heidelberg 1986 Schock, E. (1987): Implicite iterative methods for the approximate solution of ill-posed problems, Bolletino U.M.I., Series 1-B, 7 (1987) 1171-1184 Schoenberg, I.J. (1938): Metric spaces and completely monotone functions, Ann. Math. 39 (1938) 811-841 Schott, J.R. (1997): Matrix analysis for statistics, J. Wiley, New York 1997 Schott, J.R. (1998): Estimating correlation matrices that have common eigenvectors, Comput. Stat. & Data Anal. 27 (1998) 445-459 Schouten, J.A. and J.Haantjes (1936): Über die konforminvariante Gestalt der relativistischen Bewegungsgleichungen, in: Koningl. Ned. Akademie van Wetenschappen, Proc. Section of Sciences, vol. 39, Noord-Hollandsche Uitgeversmaatschappij, Amsterdam 1936 Schultz, C. and G. Malay (1998): Orthogonal projections and the geometry of estimating functions, J. Statistical Planning and Inference 67 (1998) 227-245 Schultze, J. and J. Steinebach (1996): On least squares estimates of an exponential tail coefficient, Statistics & Decisions 14 (1996) 353-372 Schur, J. (1911): Bemerkungen zur Theorie der verschränkten Bilinearformen mit unendlich vielen Veränderlichen, J. Reine und angew. Math. 140 (1911) 1-28 Schur, J. (1917): Über Potenzreihen, die im Innern des Einheitskreises beschränkt sind, J. Reine angew. Math. 147 (1917) 205-232 Schwarz, G. (1978): Estimating the dimension of a model, The Annals of Statistics 6 (1978) 461-464 Schwarz, H. (1960): Stichprobentheorie, Oldenbourg, München 1960 Schwarz, K. P. (1976): Least-squares collocation for large systems, Boll. Geodesia e Scienze Affini 35 (1976) 309-324 Scitovski, R. and D. Jukiü (1996): Total least squares problem for exponential function, Inverse Problems 12 (1996) 341-349 Seal, H.L. (1967): The historical development of the Gauss linear model, Biometrika 54 (1967) 1-24 Searle, S.R. and C.R. Henderson (1961): Computing procedures for estimating components of variance in the two-way classification, mixed model, Biometrics 17 (1961) 607-616 Searle, S.R. (1971a). Linear Models, Wiley, New York 1971 Searle, S.R. (1971b): Topics in variance components estimation, Biometrics 27 (1971) 176 Searle, S.R. (1974): Prediction, mixed models, and variance components, Reliability and Biometry (1974) 229-266 Searle, S.R., Casella, G. and C.E. McCulloch (1992): Variance components, Wiley, New York 1992 Seber, G.A.F. (1977): Linear regression analysis, Wiley, New York 1977 Seely, J. (1970): Linear spaces and unbiased estimation, Ann. Math. Statist. 41 (1970) 1725-1734 Seely, J. (1970): Linear spaces and unbiased estimation – Application to the mixed linear model, Ann. Math. Statist. 41 (1970) 1725-1734
References
727
Seely, J. (1971): Quadratic subspaces and completeness, Ann. Math. Statist. 42 (1971) 710-721 Seely, J. (1975): An example of an inquadmissible analysis of variance estimator for a variance component, Biometrika 62 (1975) 689 Seely, J. (1977): Minimal sufficient statistics and completeness, Sankhya, Series A, Part 2, 39 (1977) 170-185 Seely, J. (1980): Some remarks on exact confidence intervals for positive linear combinations of variance components, J.the American Statistical Association 75 (1980) 372374 Seely, J. and R.V. Hogg (1982): Unbiased estimation in linear models, Communication in Statistics 11 (1982) 721-729 Seely, J. and Y. El-Bassiouni (1983): Applying Wald’s variance components test, Ann. Statist. 11 (1983) 197-201 Seely, J. and E.-H. Rady (1988): When can random effects be treated as fixed effects for computing a test statistics for a linear hypothesis?, Communications in Statistics 17 (1988) 1089-1109 Seely, J. and Y. Lee (1994): A note on the Satterthwaite confidence interval for a variance, Communications in Statistics 23 (1994) 859-869 Seely, J., Birkes, D. and Y. Lee (1997): Characterizing sums of squares by their distribution, American Statistician 51 (1997) 55-58 Seemkooei, A.A. (2001): Comparison of reliability and geometrical strength criteria in geodetic networks, J. Geodesy 75 (2001) 227-233 Segura, J. and A. Gil (1999): Evaluation of associated Legendre functions off the cut and parabolic cylinder functions, Electronic Transactions on Numerical Analysis 9 (1999) 137-146 Selby, B. (1964): Girdle distributions on the sphere, Biometrika 51 (1964) 381-392 Sen Gupta, A. and R. Maitra (1998): On best equivarience and admissibility of simultaneous MLE for mean direction vectors of several Langevin distributions, Ann. Inst. Statist. Math. 50 (1998) 715-727 Sengupta, D. and S.R. Jammalamadaka (2003): Linear models, an integrated approach, Iseries of Multivariate analysis 6 (2003) Serfling, R.J. (1980): Approximation theorems of mathematical statistics, J. Wiley, New York 1980 Shaban, A.M.M. (1994): Anova, minque, PSD-minqmbe, canova and cminque in estimating variance components, Statistica 54 (1994) 481-489 Shah, B.V. (1959): On a generalisation of the Kronecker product designs, Ann. Math. Statistics 30 (1959) 48-54 Shalabh (1998): Improved estimation in measurement error models through Stein rule procedure, J. Multivar. Analysis 67 (1998) 35-48 Shao, Q.-M. (1996): p-variation of Gaussian processes with stationary increments, Studia Scientiarum Mathematicarum Hungarica 31 (1996) 237-247 Shapiro, S.S. and M.B. Wilk (1965): An analysis of variance for normality (complete samples), Biometrika 52 (1965), 591-611 Shapiro, S.S., Wilk, M.B. and M.J. Chen (1968): A comparative study of various tests for normality, J. American Statistical Ass. 63 (1968) 1343-1372 Sheppard, W.F. (1912): Reduction of errors by means of negligible differences, Proc. 5th Int. Congress Mathematicians (Cambridge) 2 (1912) 348-384 Shevlyakov, G.L. and T.Y. Khcatova (1998): On robust estimation of a correlation coefficient and correlation matrix, Contributions to statistics, pp. 153-162, 1998 Sheynin, O.B. (1966): Origin of the theory of errors, Nature 211 (1966) 1003-1004
728
References
Sheynin, O.B. (1979): Gauß and the theory of errors, Archive for History of Exact Sciences 20 (1979) Sheynin, O. (1995): Helmert’s work in the theory of errors, Arch. Hist. Exact Sci. 49 (1995) 73-104 Shin, D.W. and S.H. Song (2000): Asymptotic efficiency of the OLSE for polynomial regression models with spatially correlated errors, Statistics Probability Letters 47 (2000) 1-10 Shiryayev, A.N. (1973): Statistical sequential analysis, Transl. Mathematical Monographs 8, American Mathematical Society, Providence/R.I. 1973 Shkarofsky, I.P. (1968): Generalized turbulence space-correlation and wave-number spectrum-function pairs, Canadian J.Physics 46 (1968) 2133-2153 Shorack, G.R. (1969): Testing and estimating ratios of scale parameters, J. Am. Statist. Assn 64, 999-1013, 1969 Shumway, R.H. and D.S. Stoffer (2000): Time series analysis and its applications, Springer Verlag, New Ýork 2000 Shwartz, A. and A. Weiss (1995): Large deviations for performance analysis, Chapman and Hall, Boca Raton 1995 Sibuya, M. (1960): Bivariate extreme statistics, Ann. Inst. Statist. Math. 11 (1960) 195210 Sibuya, M. (1962): A method of generating uniformly distributed points on n-dimensional spheres, Ann. Inst. Statist. Math. 14 (1962) 81-85 Sillard, P., Altamimi, Z. and C. Boucher (1998): The ITRF96 realization and its associated velocity field, Geophysical Research Letters 25 (1998) 3223-3226 Silvey, S.D. (1975): Statistical inference, Chapman and Hall, Boca Raton 1975 Silvey, S.D. (1980): Optimal design, Chapman and Hall, 1980 Sima, V. (1996): Algorithms for linear-quadratic optimization, Dekker, New York 1996 Simmonet, M. (1996): Measures and probabilities, Springer Verlag, New York 1996 Simon, H.D. and H. Zha (2000): Low-rank matrix approximation using the Lanczos bidiagonalization process with applications, SIAM J. Sci. Comput. 21 (2000) 2257-2274 Simoncini, V. and F. Perotti (2002): On the numerical solution of ( O 2 A + O B + C ) x = b and application to structural dynamics, SIAM J. Sci. Comput. 23 (2002) 1875-1897 Simonoff, J.S. (1996): Smoothing methods in statistics, Springer Verlag, New York 1996 Singh, R. (1963): Existence of bounded length confidence intervals, Ann. Math. Statist. 34 (1963) 1474-1485 Singh, S. and D.S. Tracy (1999): Ridge regression using scrambled responses, Metron 57 (1999) 147-157 Singh, S., Horn, S., Chowdhury, S. and F. Yu (1999): Calibration of the estimators of variance, Austral. & New Zeeland J. Statist. 41 (1999) 199-212 Sjoeberg, L.E. (2003): The BLUE of the GPS double difference satellite-to-receiver Range for precise positioning, Z. Vermessungswesen 1 (2003) 26-30 Slakter, M.J. (1965): A comparison of the Pearson chi-square and Kolmogorov goodnessof-fit-tests with respect to validity, J. American Statist. Assoc. 60 (1965) 854-858 Small, C.G. (1996): The statistical theory of shape, Springer Verlag, New York 1996 Smith, A.F.M. (1973): A general Bayesian linear model, J. Royal Statistical Society B 35 (1973) 67-75 Smith, P.J. (1995): A recursive formulation of the old problem of obtaining moments from cumulants and vice versa, American Statistical Association 49 (1995) 217-218 Smith, T. and S.D. Peddada (1998): Analysis of fixed effects linear models under heteroscedastic errors, Statistics & Probability Letters 37 (1998) 399-408
References
729
Smyth, G.K. (1989): Generalized linear models with varying dispersion, J. Royal Statistical Society B51 (1989) 47-60 Sneeuw, N. and R. Bun (1996): Global spherical harmonic computation by twodimensional Fourier methods, J. Geodesy 70 (1996) 224-232 Solari, H.G., Natiello, M.A. and G.B. Mindlin (1996): Nonlinear dynamics, IOP, Bristol 1996 Solomon, P.J. (1985): Transformations for components of variance and covariance, Biometrica 72 (1985) 233-239 Somogyi, J. (1998): The robust estimation of the 2D-projective transformation, Acta Geod. Geoph. Hung. 33 (1998) 279-288 Song, S.H. (1996): Consistency and asymptotic unbiasedness of S2 in the serially correlated error components regression model for panel data, Statistical Papers 37 (1996) 267-275 Song, S.H. (1999): A note on S2 in a linear regression model based on two-stage sampling data, Statistics & Probability Letters 43 (1999) 131-135 Soper, H.E. (1916): On the distributions of the correlation coefficient in small samples, Biometrika 11 (1916) 328-413 Soroush, M. and K.R. Muske (2000): Analytical model predictive control, Progress in Systems and Control Theory 26 (2000) 166-179 Spanos, A. (1999): Probability theory and statistical inference, Cambridge University Press, Cambridge 1999 Spaeth, H. (1997): Zum Ausgleich von sphärischen Messdaten mit Kleinkreisen, Allg. Vermessungsnachrichtungen 11 (1997) 408-410 Spaeth, H. and G.A. Watson (1987): On orthogonal linear l1 approximation, Numerische Mathematik 51 (1987) 531-543 Sposito, V.A. (1982): On unbiased Lp regression. J. Amer. Statist. Assoc. 77 (1982) 652653 Sprent, P. and N.C. Smeeton (1989): Applied nonparametric statistical methods, Chapman and Hall, Boca Raton, Florida 1989 Sprott, D.A. (1978): Gauss’s contributions to statistics, Historia Mathematica 5 (1978) 183-203 Srecok, A.J. (1968): On the calculation of the inverse of the error function, Math. Computation 22 (1968) 144-158 Srivastava, A.K., Dube, M. and V. Singh (1996): Ordianry least squares and Stein-rule predictions in regression models under inclusion of some superfluous variables, Statistical Papers 37 (1996) 253-265 Srivastava, A.K. and Shalabh, S. (1996): Efficiency properties of least squares and SteinRule predictions in linear regression models, J. Appl. Stat. Science 4 (1996) 141-145 Srivastava, A.K. and Shalabh, S. (1997): A new property of Stein procedure in measurement error model, Statistics & Probability Letters 32 (1997) 231-234 Srivastava, M.S. and D. von Rosen (1998): Outliers in multivariate regression models, J. Multivariate Analysis 65 (1998) 195-208 Stahlecker, P. and K. Schmidt (1996): Biased estimation and hypothesis testing in linear regression, Acta Applicandae Mathematicae 43 (1996) 145-151 Stahlecker, P., Knautz, H. and G. Trenkler (1996): Minimax adjustment technique in a parameter restricted linear model, Acta Applicandae Mathematicae 43 (1996) 139-144 Stam, A.J. (1982): Limit theorems for uniform distributions on spheres in high dimensional euclidean spaces, J. Appl. Prob. 19 (1982) 221-229 Steele, B.M. (1996): A modified EM algorithm for estimation in generalized mixed models, Biometrics 52 (1996) 1295-310
730
References
Stefanski, L.A. (1989): Unbiased estimation of a nonlinear function of a normal mean with application to measurement error models, Communications Statist. Theory Method. 18 (1989) 4335-4358 Stefansky, W. (1971): Rejecting outliers by maximum normal residual, Ann. Math. Statistics 42 (1971) 35-45 Stein, C. (1945): A two-sample test for a linear hypothesis whose power is independent of the variance, Ann. Math. Statistics 16 (1945) 243-258 Stein, C. (1950): Unbiased estimates with minimum variance, Ann. Math. Statist. 21 (1950) 406-415 Stein, C. (1959): An example of wide discrepancy between fiducial and confidence intervals, Ann. Math. Statist. 30 (1959) 877-880 Stein, C. (1964): Inadmissibility of the usual estimator for the variance of a normal distribution with unknown mean, Ann. Inst. Statist. Math. 16 (1964) 155-160 Stein, C. and A. Wald (1947): Sequential confidence intervals for the mean of a normal distribution with known variance, Ann. Math. Statist. 18 (1947) 427-433 Steiner, F. and B. Hajagos (1999a): A more sophisticated definition of the sample median, Acta Geod. Geoph. Hung. 34 (1999) 59-64 Steiner, F. and B. Hajagos (1999b): Insufficiency of asymptotic results demonstrated on statistical efficiencies of the L1 Norm calculated for some types of the supermodel fp(x), Acta Geod. Geoph. Hung. 34 (1999) 65-69 Steiner, F. and B. Hajagos (1999c): Error characteristics of MAD-S (of sample medians) in case of small samples for some parent distribution types chosen from the supermodels fp(x) and fa(x), Acta Geod. Geoph. Hung. 34 (1999) 87-100 Steinmetz, V. (1973): Regressionsmodelle mit stochastischen Koeffizienten, Proc. Oper. Res. 2, DGOR Ann. Meet., Hamburg 1973, pp. 95-104 Stenger, H. (1971): Stichprobentheorie, Physica-Verlag, Würzburg 1971 Stephens, M.A. (1963): Random walk on a circle, Biometrika 50 (1963) 385-390 Stephens, M.A. (1964): The testing of unit vectors for randomness, J. Amer. Statist. Soc. 59 (1964) 160-167 Stephens, M.A. (1969a): Tests for randomness of directions against two circular alternatives, J. Amer. Statist. Ass. 64 (1969) 280-289 Stephens, M.A. (1969b): Test for the von Mises distribution, Biometrika 56 (1969) 149160 Stephens, M.A. (1979): Vector correlations, Biometrika 66 (1979) 41-88 Stepniak, C. (1985): Ordering of nonnegative definite matrices with application to comparison of linear models, Linear Algebra and Its Applications 70 (1985) 67-71 Stewart, C.W. (1997): Bias in robust estimation caused by discontinuities and multiple structures, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997) 818-833 Stewart, G.W. (1977): On the perturbation of pseudo-inverses, projections and linear least squares, SIAM Review 19 (1977) 634-663 Stewart, G.W. (1995a): Gauss, statistics and Gaussian elimination, J. Computational and Graphical Statistics 4 (1995) 1-11 Stewart, G.W. (1995b): Afterword, in Translation: Theoria Combinationis Observationum Erroribus Minimis Obnoxiae, pars prior-pars posterior-supplementum by Carl Friedrich Gauss (Theory of the Combination of Observations Least Subject to Errors, Classics in Applied Mathematics, SIAM edition , pages 205-236, Philadelphia 1995 Stewart, G.W. (1992): An updating algorithm for subspace tracking, IEEE Trans. Signal Proc. 40 (1992) 1535-1541
References
731
Stewart, G.W. (1998): Matrix algorithms, vol. 1: Basic decompositions, SIAM, Philadelphia 1998 Stewart, G.W. (1999): The QLP approximation to the singular value decomposition, SIAM J. Sci. Comput. 20 (1999) 1136-1348 Stewart, G.W. (2001): Matrix algorithms, vol. 2: Eigensystems, SIAM, Philadelphia 2001 Stewart, G.W. and Sun Ji-Guang (1990): Matrix perturbation theory, Academic Press, New York 1990 Stewart, K.G. (1997): Exact testing in multivariate regression, Econometric reviews 16 (1997) 321-352 Stigler, S.M. (1973a): Laplace, Fisher and the discovery of the concept of suffiency, Biometrika 60 (1973) 439-445 Stigler, S.M. (1973b): Simon Newcomb, Percy Daniell, and the history of robust estimation 1885-1920, J. American Statistical Association 68 (1973) 872-879 Stigler, S.M. (1977): An attack on Gauss, published by Legendre in 1820, Historia Mathematica 4 (1977) 31-35 Stigler, S.M. (1986): The history of statistics, the measurement of uncertainty before 1900, Belknap Press-Harvard University Press, Cambridge/Mass. 1986 Stigler, S.M. (1999): Statistics on the table, the history of statistical concepts and methods, Harvard University Press, Cambridge-London 1999 Stigler, S.M. (2000): International statistics at the millennium: progressing or regressing, International Statistical Review 68 (2000) 2,111-121 Stoica, P. and T. Soederstroem (1998): Partial least squares: A first-order analysis, Board of the Foundation of the Scandinavian J.Statistics 25 (1998) 17-24 Stopar, B. (1999): Design of horizontal GPS net regarding non-uniform precision of GPS baseline vector components, Bollettino di Geodesia e Scienze Affini 58 (1999) 255272 Stopar, B. (2001): Second order design of horizontal GPS net, Survey Review 36 (2001) 44-53 Storm, R. (1967): Wahrscheinlichkeitsrechnung, mathematische Statistik und statistische Qualitätskontrolle, Leipzig 1967 Stoyanov, J. (1997): Regularly perturbed stochastic differential systems with an internal random noise, Nonlinear Analysis, Theory, Methods & Applications 30 (1997) 41054111 Stoyanov, J. (1998): Global dependency measure for sets of random elements: "The Italian problem" and some consequences, in: Ioannis Karatzas et al. (Eds.), Stochastic process and related topics in memory of Stamatis Cambanis 1943-1995, Birkhäuser, Boston/Basel/Berlin 1998 Stoyanov, J. (1999): Inverse Gaussian distribution and the moment problem, J. Appl. Statist. Science 9 (1999) 61-71 Stoyanov, J. (2000): Krein condition inprobabilistic moment problems, Bernoulli 6 (2000) 939-949 Strang, G. and K. Borre (1997): Linear algebra, geodesy and GPS, Wellesley, Cambridge Press 1997 Stroebel, D. (1997): Die Anwendung der Ausgleichungsrechnung auf elastsmechanische Systeme, Deutsche Geodätische Kommission, Bayerische Akademie der Wissenschaften, Report C478, München 1997 Strohmer, T. (2000): A Levinson-Galerkin algorithm for regularized trigonometric approximation, SIAM J. Sci. Comput 22 (2000) 1160-1183 Stroud, A.H. (1966): Gaussian quadrature formulas, Prentice Hall, Englewood Cliffs, N.J. 1966
732
References
Stroud, A.H. and D. Secrest (1963): Approximate integrations formulas for certain spherically symmetric regions, Mathematics of Computation 17 (1963) 105-135 Stuart, A. and J.K. Ord (1994): Kendall’s advanced theory of statistics: volume I, distribution theory, Arnold Publ., 6th edition, London 1997 Student (Gosset, W.S.): The probable error of a mean, Biometrika 6 (1908) 1-25 Stulajter, F. (1978): Nonlinear estimators of polynomials in mean values of a Gaussian stochastic process, Kybernetika 14 (1978) 206-220 Sturmfels, B. (1996): Gröbner bases and convex polytopes, American Mathematical Society, Providence 1996 Subrahamanyan, M. (1972): A property of simple least squares estimates, Sankya B34 (1972) 355-356 Suenkel, H. (1985): K. Fourier analysis of geodetic networks, in: Optimization and design of geodetic networks, pp. 257-302, Grafarend E.W. and F. Sansò (eds), Springer Verlag 1985 Sugaria, N. and Y. Fujikoshi (1969): Asymptotic expansions of the non-null distributions of the likelihood ratio criteria for multivariate linear hypothesis and independence, Ann. Math. Stat. 40 (1969) 942-952 Sun, J.-G. (2000): Condition number and backward error for the generalized singular value decomposition, Siam J. Matrix Anal. Appl. 22 (2000) 323-341 Sundaram, R.K. (1996): A first course in optimisation theory, Cambridge University Press 1996 Swallow, W.H. and F. Kianifard (1996): Using robust scale estimates in detecting multiple outliers in linear regression, Biometrics 52 (1996) 545-556 Swallow, W.H. and S.R. Searle (1978): Minimum variance quadratic unbiased estimation (MIVQUE) of variance components, Technometrics 20 (1978) 265-272 Swamy, P.A.V.B. (1971): Statistical inference in random coefficient regression models, Springer-Verlag, Berlin 1971 Sylvester, J.J. (1850): Additions to the articles, "On a new class of theorems", and "On Pascal's theorem", Phil. Mag. 37 (1850) 363-370 Sylvester, J.J. (1851): On the relation between the minor determinants of linearly equivalent quadratic functions, Phil. Mag. 14 (1851) 295-305 Szabados, T. (1996): An elementary introduction to the Wiener process and stochastic integrals, Studia Scientiarum Mathematicarum Hungarica 31 (1996) 249-297 Szasz, D. (1996): Boltzmann’s ergodic hypothesis, a conjecture for centuries?, Studia Scientiarum Mathematicarum Hungarica 31 (1996) 299-322 Takos, I. (1999): Adjustment of observation equations without full rank, Bolletino di Geodesia e Scienze Affini 58 (1999) 195-208 Tanana, V.P. (1997): Methods for solving operator equations, VSP, Utrecht, Netherlands 1997 Tanizaki, H. (1993): Nonlinear filters – estimation and applications, Springer, Berlin Heidelberg New York 1993 Tanizaki, H. (2000): Bias correction of OLSE in the regression model with lagged dependent variables, Computational Statistics & Data Analysis 34 (2000) 495-511 Tanner, A. (1996): Tools for statistical inference, 3rd ed., Springer Verlag, New York 1996 Tarpey, T. (2000): A note on the prediction sum of squares statistic for restricted least squares, The American Statistician 54 (2000) 116-118 Tarpey, T. and B. Flury (1996): Self-consistency, a fundamental concept in statistics, Statistical Science 11 (1996) 229-243
References
733
Tasche, D. (2003): Unbiasedness in least quantile regression, in: R. Dutter, P. Filzmoser, U. Gather, P.J. Rousseeuw (Eds.), Developments in Robust Statistics, pp. 377-386, Physica Verlag, Heidelberg 2003 Tashiro, Y. (1977): On methods for generating uniform random points on the surface of a sphere, Ann. Inst. Statist. Math. 29 (1977) 295-300 Tate, R.F. (1959): Unbiased estimation: functions of location and scale parameters, Ann. Math. Statist. 30 (1959) 341-366 Tate, R.F. and G.W. Klett (1959): Optimal confidence intervals for the variance of a normal distribution, J. American Statistical Assoc. 16 (1959) 243-258 Teicher, H. (1961): Maximum likelihood characterization of distribution, Ann. Math. Statist. 32 (1961) 1214-1222 Taylor, J.R. (1982): An introduction to error analysis, University Science Books, Sausalito 1982 Tenorio, L. (2001): Statistical Regularization of inverse problems, SIAM Review 43 (2001) 347-366 Teunissen, P.J.G. (1985a: The geometry of geodetic inverse linear mapping and nonlinear adjustment, Netherlands Geodetic Commission, Publications on Geodesy, New Series, Vol. 8/1, Delft 1985 Teunissen, P.J.G. (1985b: Zero order design: generalized inverses, adjustment, the datum problem and S-transformations, In: Optimization and design of geodetic networks, Grafarend, E.W. and F. Sanso eds., Springer Verlag, Berlin-Heidelberg-New YorkTokyo 1985 Teunissen, P.J.G. (1989a): Nonlinear inversion of geodetic and geophysical data: diagnosing nonlinearity, In: Brunner, F.K. and C. Rizos (eds): Developments in fourdimensinal geodesy, Lecture Notes in Earth Sciences 29 (1989), 241-264 Teunissen, P.J.G. (1989b): First and second moments of non-linear least-squares estimators, Bull. Geod. 63 (1989) 253-262 Teunissen, P.J.G. (1990): Non-linear least-squares estimators, Manuscripta Geodaetica 15 (1990) 137-150 Teunissen, P.J.G. (1993): Least-squares estimation of the integer GPS ambiguities, LGR series, No. 6, Delft Geodetic Computing Centre, Delft 1993 Teunissen, P.J.G. (1995a: The invertible GPS ambiguity transformation, Manuscripta Geodaetica 20 (1995) 489-497 Teunissen, P.J.G. (1995b: The least-squares ambiguity decorrelation adjustment: a method for fast GPS integer ambiguity estimation, J. Geodesy 70 (1995) 65-82 Teunissen, P.J.G. (1997a): A canonical theory for short GPS baselines. Part I: The baseline precision, J. Geodesy 71 (1997) 320-336 Teunissen, J.P.G. (1997b): On the sensitivity of the location, size and shape of the GPS ambiguity search space to certain changes in the stochastic model, J. Geodesy 71 (1997) 541-551 Teunissen, J.P.G. (1997c): On the GPS widelane and its decorrelating property, J. Geodesy 71 (1997) 577-587 Teunissen, J.P.G. (1997d): The least-squares ambiguity decorrelation adjustment: its performance on short GPS baselines and short observation spans, J. Geodesy 71 (1997) 589-602 Teunissen, P.J.G. and A. Kleusberg (1998): GPS observation and positioning concepts, in: GPS for Geodesy, pp. 187-229, Teunissen, P.J.G. and A. Kleusberg (eds), Berlin 1998 Teunissen, P.J.G., de Jonge, P.J. and C.C.J.M. Tiberius (1997): The least-squares ambiguity decorrelation adjustment: its performance on short GPS baselines and short observation spans, J. Geodesy 71 (1997) 589-602
734
References
Théberge, A. (2000): Calibration and restricted weights, Survey Methodology 26 (2000) 99-107 Theil, H. (1965): The analysis of disturbanaces in regression analysis, J.the American Statistical Association 60 (1965) 1067-1079 Thompson, R. (1969). Iterative estimation of variance components for non-orthogonal data, Biometrics 25 (1969) 767-773 Thompson, W.A. (1955): The ratio of variances in variance components model, Ann. Math. Statist. 26 (1955) 325-329 Thomson, D.J. (1982): Spectrum estimation and harmonic analysis, Proceedings Of The IEEE 70 (1982) 1055-1096 Tian, G.-L. (1998): The comparison between polynomial regression and orthogonal polynomial regression, Statistics & Probability Letters 38 (1998) 289-294 Tiberius, C.C.J.M. and F. Kenselaar (2000): Estimation of the stochastic model for GPS code and phase observables, Survey Review 35 (2000) 441-455 Tikhonov, A.N. and V.Y. Arsenin (1977): Solutions of ill-posed problems, J. Wiley, New York 1977 Tikhonov, A.N., A.S. Leonov and A.G. Yagola (1998a): Nonlinear ill-posed problems, vol.1, Appl. Math. and Math. Comput. 14, Chapman & Hall, London 1998 Tikhonov, A.N., A.S. Leonov and A.G. Yagola (1998b): Nonlinear ill-posed problems, vol.2, Appl. Math. and Math. Comput. 14, Chapman & Hall, London 1998 Tjoestheim, D. (1990): Non-linear time series and Markov chains, Adv. Appl. Prob. 22 (1990) 587-611 Tjur, T (1998): Nonlinear regression, quasi likelihood, and overdispersion in generalized linear models, American Statistician 52 (1998) 222-227 Tobias, P.A. and D.C. Trinidade (1995): Applied reliability, Chapman and Hall, Boca Raton 1995 Tolimieri, R., An, A. and C. Lu (1989): Algorithms for discrete Fourier transform and convolution, Springer Verlag 1989 Tominaga, Y. and I. Fujiwara (1997): Prediction-weighted partial least-squares regression (PWPLS), Chemometrics and Intelligent Lab Systems 38 (1997) 139-144 Tong, H. (1990): Non-linear time series, Oxford University Press, Oxford 1990 Toranzos F.I. (1952): An asymmetric bell-shaped frequency curve, Ann. Math. Statist. 23 (1952) 467-469 Tornatore, V. and F. Migliaccio (2001): Stochastic modelling of non-stationary smooth phenomena, International Association of Geodesy Symposia 122 “IV Hotine – Marussi Symposium on Mathematical Geodesy”, Springer Verlag, Berlin – Heidelberg 2001 Toutenburg, H. (1970): Vorhersage im allgemeinen linearen Regressionsmodell mit stochastischen Regressoren, Math. Operationsforschg. Statistik 2 (1970) 105-116 Toutenburg, H. (1975): Vorhersage in linearen Modellen, Akademie Verlag, Berlin 1975 Toutenburg, H. (1996): Estimation of regression coefficients subject to interval constraints, Sankhya: The Indian J. Statistics A, 58 (1996) 273-282 Toutenburg, H. (2000): Improved predictions in linear regression models with stochastic linear constraints, Biometrical Journal 42 (2000) 71-86 Townsend, E.C. and S.R. Searle (1971): Best quadratic unbiased estimation of variance components from unbalanced data in the l-way classification, Biometrics 27 (1971) 643-657 Trefethen, L.N. and D. Bau (1997): Numerical linear algebra, Society for Industrial and Applied Mathematics (SIAM), Philadelphia 1997
References
735
Troskie, C.G. and D.O. Chalton (1996): Detection of outliers in the presence of multicollinearity, in: Multidimensional statistical analysis and theory of random matrices, Proceedings of the Sixth Lukacs Symposium, eds. Gupta, A.K. and V.L.Girko, pages 273-292, VSP, Utrecht 1996 Troskie, C.G., Chalton, D.O. and M. Jacobs (1999): Testing for outliers and influential observations in multiple regression using restricted least squares, South African Statist. J. 33 (1999) 1-40 Tsai, H. and K.S. Chan (2000): A note on the covariance structure of a continuous-time arma process, Statistica Sinica 10 (2000) 989-998 Tsimikas, J.V. and J. Ledolter (1997): Mixed model representation of state space models: new smoothing results and their application to REML estimation, Statistica Sinica 7 (1997) 973-991 Tufts, D.W. and R. Kumaresan (1982): Estimation of frequencies of multiple sinusoids: making linear prediction perform like maximum likelihood, Proc. of IEEE Special issue on Spectral estimation 70 (1982) 975-989 Turkington, D. (2000): Generalised vec operators and the seemingly unrelated regression equations model with vector correlated disturbances, J. Econometrics 99 (2000) 225253 Ulrych, T.J. and R.W. Clayton (1976): Time series modelling and maximum entropy, Phys. Earth and Planetry Interiors 12 (1976) 188-200 Vainikko, G.M. (1982): The discrepancy principle for a class of regularization methods, USSR. Comp. Math. Math. Phys. 22 (1982) 1-19 Vainikko, G.M. (1983): The critical level of discrepancy in regularization methods, USSR. Comp. Math. Math. Phys. 23 (1983) 1-9 Van der Veen, A.-J. (1996): A schur method for low-rank matrix approximation, SIAM J. Matrix Anal. Appl. 17 (1996) 139-160 Van Garderen, K.J. (1999): Exact geometry of autoregressive models, J. Time Series Analysis 20 (1999) 1-21 Van Gelderen, M. and R. Rummel (2001): The solution of the general geodetic boundary value problem by least squares, J. Geodesy 75 (2001) 1-11 Van Huffel, S. (1990): Solution and properties of the restricdes total least squares problem, Processing of the International Mathematical Theory of Networks and Systems Symposium (MTNS ´89) 521-528 Van Huffel, S. (1997): Recent advances in total least squares techniques and errors-invariables modelling, SIAM, Philadelphia 1997 Van Huffel, S. and H. Zha (1991a): The restricted total least squares problem: Formulation, algorithm, and properties, SIAM J. Matrix Anal. Appl. 12 (1991) 292-309 Van Huffel, S. and H. Zha (1991b): The total least squares problem, SIAM J. Matrix Anal. Appl. 12 (1991) 377-407 Van Mierlo, J. (1980): Free network adjustment and S-transformations, Deutsche Geod. Kommission B 252, München 1980, 41-54 Van Montfort, K. (1988): Estimating in structural models with non-normal distributed variables: some alternative approaches, Leiden 1988 Van Montfort, K. (1989): Estimating in structural models with non-normal distributed variables: some alternative approaches, 'Reprodienst, Subfaculteit Psychologie', Leiden 1989 Van Rosen, D. (1988): Moments for matrix normal variables, Statistics 19 (1988) 575-583 Vanicek, P. and E.W. Grafarend (1980): On the weight estimation in leveling, National Oceanic and Atmospheric Administration, Report NOS 86, NGS 17, Rockville 1980 Varadhan, S.R.S. (2001): Diffusion Processes, D. N. Shanbhag and C.R. Rao, eds., Handbook of Statistic 19 (2001) 853-871
736
References
Vasconcellos, K.L.P. and M.C. Gauss (1997): Approximate bias for multivariate nonlinear heteroscedastic regressions, Brazilian J. Probability and Statistics 11 (1997) 141159 Vedel-Jensen, E.B. and L. Stougaard-Nielsen (2000): Inhomogeneous Markov point processes by transformation, Bernoulli 6 (2000) 761-782 Ventsel, A.D. and M.I. Freidlin (1969): On small random perturbations of dynamical systems, Report delivered at the meeting of the Moscow Mathematical Society on March 25, 1969, Moscow 1969 Ventzell, A.D. and M.I. Freidlin (1970): On small random perturbations of dynamical systems, Russian Math. Surveys 25 (1970) 1-55 Verbeke, G. and G. Molenberghs (2000): Linear mixed models for longitudinal data, Springer-Verlag, New York 2000 Vernizzi, A., Goller, R. and P. Sais (1995): On the use of shrinkage estimators in filtering extraneous information, Giorn. Econ. Annal. Economia 54 (1995) 453-480 Verbeke, G. and G. Molenberghs (1997): Linear Mixed Models in Practice, Springer, New York 1997 Veroren, L.R. (1980): On estimation of variance components, Statistica Neerlandica 34 (1980) 83-106 Vichi, M. (1997): Fitting L2 norm classification models to complex data sets, Student 2 (1997) 203-213 Vigneau, E. M.F. Devaux, E.M. Qannari and P. Robert (1997): Principal component regression, ridge regression and ridge principal component regression in spectroscopy calibration, J. Chemometrics 11 (1997) 239-249 Vinod, H.D. and L.R. Shenton (1996): Exact moments for autoregressive and random walk models for a zero or stationary initial value, Econometric Theory 12 (1996) 481499 Vinograde, B. (1950): Canonical positive definite matrices underinternal linear transformations, Proc. Amer. Math. Soc.1 (1950) 159-161 Voinov, V.G. and M.S. Nikulin (1993a): Unbiased estimators and their applications, volume 1: univariate case, Kluwer-Academic Publishers, Dordrecht 1993 Voinov, V.G. and M.S. Nikulin (1993b): Unbiased estimators and their applications, volume 2: multivariate case, Kluwer-Academic Publishers, Dordrecht 1993 Volterra, V. (1930): Theory of functionals, Blackie, London 1930 Vonesh, E.F. and V.M. Chinchilli (1997): Linear and nonlinear models for the analysis of repeated measurements, Marcel Dekker Inc, New York – Bael – Hong Kong 1997 Von Mises, R. (1981): Über die „Ganzzahligkeit“ der Atomgewichte und verwandte Fragen, Phys. Z. 19 (1918) 490-500 Wagner, H. (1959): Linear programming techniques for regression analysis, J. American Statistical Association 56 (1959) 206-212 Wahba, G. (1975): Smoothing noisy data with spline functions, Numer. Math. 24 (1975) 282-292 Wald, A. (1939): Contributions to the theory of statistical estimation and testing hypotheses, Ann. Math. Statistics 10 (1939) 299-326 Wald, A. (1945): Sequential tests for statistical hypothesis, Ann. Math. Statistics 16 (1945) 117-186 Walker, J.S. (1996): Fast Fourier transforms, 2nd edition, CRC Press, Boca Raton 1996 Walker, P.L. (1996): Elliptic functions, J. Wiley, Chichester U.K. 1996 Walker, S. (1996): An EM algorithm for nonlinear random effect models, Biometrics 52 (1996) 934-944
References
737
Wallace, D.L. (1980): The Behrens-Fisher and Fieller-Creasy problems, in: R.A. Fisher: an appreciation, Fienberg and Hinkley, eds, Springer, pp 117-147, New York 1980 Wan, A.T.K. (1994a): The sampling performance of inequality restricted and pre-test estimators in a misspecified linear model, Austral. J. Statist. 36 (1994) 313-325 Wan, A.T.K. (1994b): Risk comparison of the inequality constrained least squares and other related estimators under balanced loss, Econom. Lett. 46 (1994) 203-210 Wan, A.T.K. (1994c): The non-optimality of interval restricted and pre-test estimators under squared error loss, Comm. Statist. A – Theory Methods 23 (1994) 2231-2252 Wan, A.T.K. (1999): A note on almost unbiased generalized ridge regression estimator under asymmetric loss, J. Statist. Comput. Simul. 62 (1999) 411-421 Wan, A.T.K. and K. Ohtani (2000): Minimum mean-squared error estimation in linear regression with an inequality constraint, J.Statistical Planning and Inference 86 (2000) 157-173 Wang, J. (1996): Asymptotics of least-squares estimators for constrained nonlinear regression, Annals of Statistics 24 (1996) 1316-1326 Wang, J. (2000): An approach to GLONASS ambiguity resolution, J. Geodesy 74 (2000) 421-430 Wang, M. C. and G.E. Uhlenbeck (1945): On the theory of the Brownian motion II, Review of Modern Physics 17 (1945) 323-342 Wang, N., Lin, X. and R.G. Guttierrez (1999): A bias correction regression calibration approach in generalized linear mixed measurement error models, Commun. Statist. Theory Meth. 28 (1999) 217-232 Wang, Q-H. and B-Y. Jing (1999): Empirical likelihood for partial linear models with fixed designs, Statistics & Probability Letters 41 (1999) 425-433 Wang, S. and G.I. Uhlenbeck (1945): On the theory of the Brownian motion II, Rev. Modern Phys. 17 (1945) 323-342 Wang, T. (1996): Cochran Theorems for multivariate components of variance models, Sankhya: The Indian J. Statistics A, 58 (1996) 238-342 Wassel, S.R. (2002): Rediscovering a family of means, Mathematical Intelligencer 24 (2002) 58-65 Waterhouse, W.C. (1990): Gauss’s first argument for least squares, Archive for the History of Exact Sciences 41 (1990) 41-52 Watson, G.S. (1983): Statistics on spheres, Wiley, New York 1983 Watson, G.S. (1956a): Analysis of dispersion on a sphere, Monthly Notices of the Royal Astronomical society, Geophysical Supplement 7 (1956) 153-159 Watson, G.S. (1956b): A test for randomness of directions, Monthly Notices Roy. Astro. Soc. Geoph. Suppl. 7 (1956) 160-161 Watson, G.S. (1960): More significance tests on the sphere, Biometrika 47 (1960) 87-91 Watson, G.S. (1961): Goodness-of-fit tests on a circle, Biometrika 48 (1961) 109-114 Watson, G.S. (1962): Goodness-of-fit tests on a circle-II, Biometrika 49 (1962) 57-63 Watson, G.S. (1964): Smooth regression analysis, Sankhya: The Indian J. Statistics: Series A (1964), 359-372 Watson, G.S. (1965): Equatorial distributions on a sphere, Biometrika 52 (1965) 193-201 Watson, G.S. (1966): Statistics of orientation data, J. Geology 74 (1966) 786-797 Watson, G.S. (1967a): Another test for the uniformity of a circular distribution, Biometrika 54 (1967) 675-677 Watson, G.S. (1967b): Some problems in the statistics of directions, Bull. of I.S.I. 42 (1967) 374-385
738
References
Watson, G.S. (1968): Orientation statistic in the earth sciences, Bull of the Geological Institutions of the Univ. of Uppsala 2 (1968) 73-89 Watson, G.S. (1969): Density estimation by orthogonal series, Ann. Math. Stat. 40 (1969): Density estimation by orthogonal series, Ann. Math. Stat. 40 (1969) 14691498 Watson, G.S. (1970): The statistical treatment of orientation data, Geostatistics – a colloquium (Ed. D.F. Merriam), Plenum Press, New York 1970, 1-10 Watson, G.S. (1974): Optimal invariant tests for uniformity, Studies in Probability and Statistics, Jerusalem Academic Press (1974) 121-128 Watson, G.S. (1981): Large sample theory of the Langevin distributions, J. Stat. Planning Inference 8 (1983) 245-256 Watson, G.S: (1982): The estimation of palaeomagnetic ploe positions, Statistics in Probability: Essay in hohor of C.R. Rao, North-Holland, Amsterdam and New York 1982 Watson, G.S (1982): Distributions on the circle and sphere, Essays in Statsistical Science, J. App. Prob. Special Volume 19A (1982) 265-280 Watson, G.S. (1984): The theory of concentrated Langevin distributions, J. Mult. Anal. 14 (1984) 74-82 Watson, G.S. (1986): Some estimation theory on the sphere, Ann. Inst. Statist. Math. 38 (1986) 263-275 Watson, G.S. (1987): The total approximation problem, in: Approximation theory IV, eds. Chui, C.K. et al, pages 723-728, Academic Press 1987 Watson, G.S. (1988): The Langevin distribution on high dimensional spheres, J. Applied Statistics 15 (1988) 123-130 Watson, G.S. (1998): On the role of statistics in polomagnetic proof of continental drift, Canadian J. Statistics 26 (1998) 383-392 Watson, G.S. and E.J. Williams (1956): On the construction of significance tests on the circle and the sphere, Biometrika 43 (1956) 344-352 Watson, G.S. and E. Irving (1957): Statistical methods in rock magnetism, Monthly Notices Roy. Astro. Soc. 7 (1957) 290-300 Watson, G.S. and M.R. Leadbetter (1963): On the estimation of the probability density-I, Ann. Math. Stat 34 (1963) 480-491 Watson, G.S. and S. Wheeler (1964): A distribution-free two-sample test on a circle, Biometrika 51 (1964) 256 Watson, G.S. and R.J. Beran (1967): Testing a sequence of unit vectors for serial correlation, J. Geophysical Research 72 (1967) 5655-5659 Watson, G.S., Epp, R. and J. W. Tukey (1971): Tsting unit vectors for correlation, J. Geophysical Research 76 (1971) 8480-8483 Wedderburn, R. (1974): Quasi-likelihood functions, generalized linear models and the Gauß-Newton method, Biometrica 61 (1974) 439-447 Wei, B.-C. (1998): Exponential family: nonlinear models, Springer Verlag, Singapore 1998 Wei, B.-C. (1998): Testing for varying dispersion in exponential family nonlinear models, Ann. Inst. Statist. Math. 50 (1998) 277-294 Wei, B.-C. and Y.-Q. Hu (1998): Generalized Leverage and its applications, Board of the Foundations of the Scandinavian J.Statistics 25 (1998)25-37 Wei, M. (1997): Equivalent formulae for the supremum and stability of weighted pseudoinverses, Mathematics of Computation 66 (1997) 1487-1508 Wei, M. (2001): Supremum and stability of weighted pseudoinverses and weighted least squares problems analysis and computations, Nova Science Publishers, New York 2001
References
739
Wei, M. and A.R. de Pierro (2000): Upper perturbation bounds of weighted projections, weighted and constrained least squares problems, SIAM J. Matrix Anal. Appl. 21 (2000) 931-951 Wei, M. and A.R. de Pierro (2000): Some new properties of the equality constrained and weighted least squares problem, Linear Algebra and its Applications 320 (2000) 145165 Weiss, A. (2002): Determination of thermal stratification and turbulence of the atmospheric surface layer over various types of terrain by optical scintillometry, Dissertation Swiss Federal Institute of Technology Zurich 2002 Weiss, G. and R. Rebarber (2000): Optimizability and estimatability for infinitedimensional linear systems, Siam J. Control Optim. 39 (2000) 1204-1232 Weisstein, E.W. (1999): Legendre Polynomial, CRC Press LLC, Wolfram Research Inc. 1999 Wellisch, S. (1910): Theorie und Praxis der Ausgleichungsrechnung Band 2: Probleme der Ausgleichungsrechnung, pp. 46-49, Kaiserliche und königliche HofBuchdruckerei und Hof-Verlags-Buchhandlung, Carl Fromme, Wien und Leipzig 1910 Wellner, J (1979): Permutation tests for directional data, Ann. Statist. 7 (1979) 924-943 Wells, D.E., Lindlohr, W., Schaffrin, B. and E.W. Grafarend (1987): GPS design: Undifferenced carrier beat phase observations and the fundamental difference theorem, University of New Brunswick, Surveying Engineering, Technical Report Nr. pages 116, 141 pages, Fredericton/Canada 1987 Welsh, A.H. (1996): Aspects of statistical inference, J. Wiley, New York 1996 Wenzel, H.G. (1977): Zur Optimierung von Schwerenetzen, Z. Vermessungswesen 102 (1977) 452-457 Werkmeister, P. (1916): Graphische Ausgleichung bei trigonometrischer Punktbestimmung durch Einschneiden, Z. Vermessungswesen 45 (1916) 113-126 Wernstedt, J. (1989): Experimentelle Prozeßanalyse, Oldenbourg Verlag, München 1989 Wertz, J.R. (1978): Spacekraft attitude determination and control, Kluwer Academic Publishers, Dordrecht – Boston – London 1978 Wess, J. (1960): The conformal invariance in quantum field theory, in: Il Nuovo Cimento, Nicola Zanichelli (Hrsg.), vol 18, Bologna 1960 Wetzel, W., Jöhnk, M.D. and P. Naeve (1967): Statistische Tabellen, de Gruyter, Berlin 1977 Whittaker, E.T. and G. Robinson (1924): The calculus of observations, Blackie, London 1924 Whittle, P. (1963a): Prediction and regulation, D. van Nostrand Co., Inc., Princeton 1963 Whittle, P. (1963b): Stochastic processes in several dimensions, Bull. Inst. Int. Statist. 40 (1963) 974-994 Whittle, P. (1973): Some general points in the theory of optimal experimental design, J. Royal Statist. B35 (1973) 123-130 Wickerhauser, M.V. (1996): Adaptive Wavelet-Analysis, Theorie und Software, Vieweg & Sohn Verlag, Braunschweig/Wiesbaden 1996 Wieser, A. (2000): Equivalent weight matrix, Graz University of Technilogy, Graz 2000 Wigner, E.P. (1958): On the distribution of the roots of certain symmetric matrices, Ann. Math. 67 (1958) Wilcox, R.R. (1997): Introduction to robust estimation and hypothesis testing, Academic Press, San Diego 1997 Wilcox, R.R. (2001): Fundamentals of modern statistical methods, Springer Verlag, New York 2001
740
References
Wilders, P. and E. Brakkee (1999): Schwarz and Schur: an algebraical note on equivalence properties, SIAM J. Sci. Comput. 20 (1999) 2297-2303 Wilkinson, J. (1965): The algebraic eigenvalue problem, Clarendon Press, Oxford 1965 Wilks, S.S. (1962): Mathematical statistics, J. Wiley, New York 1962 Wilks, S.S. (1963): Multivariate statistical outliers, Sankya A25 (1963) 407-426 Williams, E.J. (1963): A comparison of the direct and fiducial arguments in the estimation of a parameter, J. Royal Statistical Society, Series B, 25 (1963) 95-99 Wimmer, G. (1995): Properly recorded estimate and confidence regions obtained by an approximate covariance operator in a special nonlinear model, Applications of Mathematics 40 (1995) 411-431 Wimmer, H. (1981a): Ein Beitrag zur Gewichtsoptimierung geodätischer Netze, Deutsche Geodätische Kommission, München, Reihe C (1981), 269 Wimmer, H. (1981b): Second-order design of geodetic networks by an iterative approximation of a given criterion matrix, in: Proc. of the IAG Symposium on geodetic networks and computations, R. Sigle ed., Deutsche Geodätische Kommission, München, Reihe B, Nr. 258 (1981), Heft Nr. III, 112-127 Wishart, J. (1928): The generalized product moment distribution in samples from a normal multivariate population, Biometrika 20 (1928) 32-52 Wishner, R., Tabaczynski, J. and M. Athans (1969): A comparison of three non-linear filters, Automatica 5 (1969) 487-496 Witkovsky, Viktor (1998): Modified minimax quadratic estimation of variance components, Kybernetika 34 (1998) 535-543 Witting, H. and G. Nölle (1970): Angewandte Mathematische Statistik, Teubner Verlag, Stuttgart 1970 Wolf, H. (1968): Ausgleichungsrechnung nach der Methode der kleinsten Quadrate, Ferdinand Dümmlers Verlag, Bonn 1968 Wolf, H. (1973): Die Helmert-Inverse bei freien geodätischen Netzen, Z. Vermessungswesen 98 (1973) 396-398 Wolf, H. (1975): Ausgleichungsrechnung I, Formeln zur praktischen Anwendung, Duemmler, Bonn 1975 Wolf, H. (1976): The Helmert block method, its origin and development, Proc. 2 nd Int. Symp. on problems related to the Redefinition of North American Geodetic Networks, pp. 319-326, Arlington 1976 Wolf, H. (1980a): Ausgleichungsrechnung II, Aufgaben und Beispiel zur praktischen Anwendung, Duemmler, Bonn 1980 Wolf, H. (1980b): Hypothesentests im Gauß-Helmert-Modell, Allg. Vermessungsnachrichten 87 (1980) 277-284 Wolf, H. (1997): Ausgleichungsrechnung I und II, 3. Auflage, Ferdinand Dümmler Verlag, Bonn 1997 Wolfowitz, J. and J. Kiefer (1959): Optimum design in regression problems, Ann. Math. Statist. 30 (1959) 271-294 Wolkowicz, H. and G.P.H. Styan (1980): More bounds for eigenvalues using traces, Linear Algebra Appl. 31 (1980) 1-17 Wolter, K.H. and Fuller, W.A. (1982): Estimation of the quadratic errors-in-variables model, Biometrika 69 (1982) 175-182 Wong, C.S. (1993): Linear models in a general parametric form, Sankya 55 (1993) 130149 Wong, W.K. (1992): A unified approach to the construction of minimax designs, Biometrika 79 (1992) 611-620
References
741
Wood, A. (1982): A bimodal distribution for the sphere, Applied Statistics 31 (1982) 5258 Woolson, R.F. and W.R. Clarke (1984): Analysis of categorical incomplete data, J. Royal Statist. Soc. Series A 147 (1984) 87-99 Worbs, E. (1955): Carl Friedrich Gauß, ein Lebensbild, Leipzig 1955 Wu, C.F.J. (1981): Asymptotic theory of nonlinear least squares estimation, Ann. Stat. 9 (1981) 501-513 Wu, Q. and Z. Jiang (1997): The existence of the uniformly minimum risk equivariant estimator in Sure model, Commun. Statist. - Theory Meth. 26 (1997) 113-128 Wunsch, G. (1986): Handbuch der Systemtheorie, Oldenbourg Verlag, München 1986 Xi, Z. (1993): Iterated Tikhonov regularization for linear ill-posed problems, PhD. Thesis, Universität Kaiserslautern, Kaiserslautern 1993 Xu, P. (1989): On robust estimation with correlated observations, Bulletin Géodesique 63 (1989) 237-252 Xu, P. (1991): Least squares collocation with incorrect prior information, Z. Verm. 116 (1991) 266-273 Xu, P. (1992): The value of minimum norm estimation of geopotential fields, Geoph. J. Int. 111 (1992) 170-178 Xu, P. (1995): Testing the hypotheses of non-estimable functions in free net adjustment models, Manuscripta Geodaetica 20 (1995) 73-81 Xu, P. (1999a): Biases and accuracy of, and an alternative to, discrete nonlinear filters, J. Geodesy 73 (1999) 35-46 Xu, P. (1999b): Spectral theory of constrained second-rank symmetric random tensors, Geoph. J. Int. 138 (1999) 1-24 Xu, P. (2001): Random simulation and GPS decorrelation, J. Geodesy 75 (2001) 408-423 Xu, P. (2002): Isotropic probabilistic models for directions, planes and referential systems, Proc. Royal Soc. London A 458 (2002) 2017-2038 Xu, P. and E.W. Grafarend (1996): Statistics and geometry of the eigenspectra of threedimensional second-rank symmetric random tensors, Geophysical Journal International 127 (1996) 744-756 Xu, P. and E.W. Grafarend (1996): Probability distribution of eigenspectra and eigendirections of a twodimensional, symmetric rank two random tensor, J. Geodesy 70 (1996) 419-430 Yaglom, A.M. (1961): Second-order homogeneous random fields in: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, pages 593622, University of California Press, Berkeley 1961 Yang, H. (1996): Efficiency matrix and the partial ordering of estimate, Commun. Statist. - Theory Meth. 25(2) (1996) 457-468 Yang, Y. (1994): Robust estimation for dependent observations, Manuscripta Geodaetica 19 (1994) 10-17 Yang, Y. (1999a): Robust estimation of geodetic datum transformation, J. Geodesy 73 (1999) 268-274 Yang, Y. (1999b): Robust estimation of systematic errors of satellite laser range, J. Geodesy 73 (1999) 345-349 Yazji, S. (1998): The effect of the characteristic distance of the correlation function on the optimal design of geodetic networks, Acta Geod. Geoph. Hung., Vol. 33 (2-4) (1998) 215-234 Ye, Y. (1997): Interior point algorithms: Theory and analysis, Wiley, New York 1997 Yeh, A.B. (1998): A bootstrap procedure in linear regression with nonstationary errors, The Canadian J. Stat. 26 (1998) 149-160
742
References
Yeung, M.-C. and T.F. Chan (1997): Probabilistic analysis of Gaussian elimination without pivoting, SIAM J. Matrix Anal. Appl. 18 (1997) 499-517 Ylvisaker, D. (1977): Test resistance, J. Am Statis. Assn 72, 551-557, 1977 Yohai, V.J. and R.H. Zamar (1997): Optimal locally robust M-estimates of regression, J. Statistical Planning and Inference 64 (1997) 309-323 Yor, M. (1992): Some aspects of Brownian motion, Part I: Some special functionals, Birkhäuser Verlag, Basel 1992 Yor, M. (1992): Some aspects of Brownian motion, Part II: Some recent martingale problems, Birkhäuser Verlag, Basel 1997 Youcai, H. and S.P. Mertikas (1995): On the designs of robust regression estimators, Manuscripta Geodaetica 20 (1995) 145-160 Youssef, A.H.A. (1998): Coefficient of determination for random regression model, Egypt. Statist. J. 42 (1998) 188-196 Yu, Z.C. (1996): A universal formula of maximum likelihood estimation of variancecovariance components, J. Geodesy 70 (1996) 233-240 Yuan, K.H. and P.M. Bentler (1997): Mean and covariance structure analysis- theoretical and practical improvements, J. American Statist. Assoc. 92 (1997) 767-774 Yuan, K.H. and P.M. Bentler (1998a): Robust mean and covariance structure analysis, British J. Mathematical and Statistical Psychology (1998) 63-88 Yuan, K.H. and P.M. Bentler (1998b): Robust mean and covariance structure analysis through iteratively reweighed least squares, Psychometrika 65 (2000) 43-58 Yuan, Y. (2000): On the truncated conjugate gradient method, Math. Prog. 87 (2000) 561571 Yuan, Y. (1999): On the truncated conjugate gradient method, Springer- Verlag Yusuf, S., Peto, R., Lewis, J., Collins, R. and P. Sleight (1985): Beta blockade during and after myocardial infarction: An overview of the randomized trials, Progress in Cardiovascular Diseases 27 (1985) 335-371 Zabell, S. (1992): R.A. Fisher and the fiducial argument, Statistical Science 7 (1992) 369387 Zackin, R., de Gruttola, V. and N. Laird (1996): Nonparametric mixed-effects for repeated binary data arising in serial dilution assays: Application to estimating viral burden in AIDS, J. American Statist. Assoc. 91 (1996) 52-61 Zacks, S.(1971): The theory of statistical inference, J. Wiley, New York 1971 Zacks, S. (1996): Adaptive designs for parametric models, 151-180 Zadeh, L. (1965): Fuzzy sets, Information and Control 8 (1965) 338-353 Závoti, J. (1999): Modified versions of estimates based on least squares and minimum norm, Acta Geod. Geoph. Hung. 34 (1999) 79-86 Závoti, J. (2001): Filtering of earth’s polar motion using trigonometric interpolation, Acta Geod. Geoph. Hung. 36 (2001) 345-352 Zehfuss, G. (1858): Über eine gewisse Determinante, Zeitschrift für Mathematik und Physik 3 (1858) 298-301 Zellner, A. (1971): An introduction to Bayesian inference in econometrics, J. Wiley, New York 1971 Zha, H. (1995): Comments on large least squares problems involving Kronecker products, SIAM J. Matrix Anal. Appl. 16 (1995) 1172 Zhan, X. (2000): Singular values of differences of positive semidefinite matrices, Siam J. Matrix Anal. Appl. 22 (2000) 819-823 Zhang, Y. (1985): The exact distribution of the Moore-Penrose inverse of X with a density, in: Multivariate Analysis VI, Krishnaiah, P.R. (ed), pages 633-635, Elsevier, New York 1985
References
743
Zhang, J.Z., Chen, L.H. and N.Y. Deng (2000): A family of scaled factorized Broydenlike methods for nonlinear least squares problems, SIAM J. Optim. 10 (2000) 11631179 Zhang, Z. and Y. Huang (2003): A projection method for least squares problems with a quadratic equality constraint, SIAM J. Matrix Anal. Appl. 25 (2003) 188-212 Zhao, L. (2000): Some contributions to M-estimation in linear models, J. Statistical Planning an Inference 88 (2000) 189-203 Zhao, Y. and S. Konishi (1997): Limit distributions of multivariate kurtosis and moments under Watson rotational symmetric distributions, Statistics & Probability Letters 32 (1997) 291-299 Zhdanov, M.S. (2002): Geophysical inverse theory and regularization problems, Methods in Geochemistry and Geophysics 36, Elsevier, Amsterdam-Boston-London 2002 Zhenhua, X. (1993): Iterated Tikhonov regularization for linear ill-posed problem, PhD. Thesis University of Kaiserslautern, Kaiserslautern 1993 Zhen-Su She, Jackson, E. and Orszag, S. (1990) : Intermittency of turbulence, in: The Legacy of John von Neumann (J. Glimm, J. Impagliazzo, I. Singer eds.) Proc. Symp. Pure Mathematics, vol. 50, pages 197-211, American Mathematical Society, Providence, Rhode Island 1990 Zhou, J. (2001): Two robust design approaches for linear models with correlated errors, Statistica Sinica 11 (2001) 261-272 Zhou, K.Q. and S.L. Portnoy (1998): Statistical inference on hetero-skedastic models based on regression quantiles, J. Nonparametric Statistics 9 (1998) 239-260 Zhong, D. (1997): Robust estimation and optimal selection of polynomial parameters for the interpolation of GPS geoid heights, J. Geodesy 71 (1997) 552-561 Zhu, Jianjun (1996): Robustness and the robust estimate, J. Geodesy 70 (1996) 586-590 Ziegler, A., Kastner, C. and M. Blettner (1998): The generalised estimating equations: an annotated bibliography, Biom. J. 40 (1998) 115-139 Zimmermann, H.-J. (1991): Fuzzy set theory and its applications, 2nd ed., Kluwer Academic Publishers, Dordrecht 1991 Zioutas, G., Camarinopoulos, L. and E.B. Senta (1997): Theory and Methodology: Robust autoregressive estimates using quadratic programming, European J. Operational Research 101 (1997) 486-498 Zippel, R. (1993): Effective polynomial computations, Kluwer Academic Publishers, Boston 1993 Zolotarev, V.M. (1997): Modern theory of summation of random variables, VSP, Utrecht 1997 Zucker, D.M., Lieberman, O. and O. Manor (2000): Improved small sample inference in the mixed linear model: Bartlett correction and adjusted likelihood, J. R. Statist. Soc. B. 62 (2000) 827-838 Zurmuehl, R. and S. Falk (1984): Matrizen und ihre Anwendungen, Teil 1: Grundlagen, 5.ed., Springer-Verlag, Berlin 1984 Zurmuehl, R. and S. Falk (1986): Matrizen und ihre Anwendungen, Teil 2: Numerische Methoden, 5.ed., Springer-Verlag, Berlin 1986 Zwet, W.R. van and J. Osterhoff (1967): On the combiantion of independent test statistics, Ann. Math. Statist. 38 (1967) 659-680 Zyskind, G. (1969): On best linear estimation and a general Gauss-Markov theorem in linear models with arbitrary nonnegative covariance structure, SIAM J. Appl. Math. 17 (1969) 1190-1202
Index
1-way classification, 460, 461, 463, 464 2-way classification , 464, 467, 469, 470, 473, 475 3-way classification, n-way classification, 476 3d datum transformation, 431, 433, 441 algebraic regression, 40 A-optimal design, 323, 359 ARIMA, 455, 477, 478 arithmetic mean, 191, 195, 184 ARMA, 455, 478 ARMA process, 477 associativity, 486, 497 augmented Helmert matrix, 583 autoregressive integrated movingaverage process, see ARIMA best homogeneous linear prediction, 400 best homogeneous linear unbiased prediction, 400 best inhomogeneous linear prediction, 400 Best Invariant Quadratic Uniformly Unbiased Estimation, see BIQUUE Best Linear Uniformly Unbiased Estimation, see BLUUE Best Linear V-Norm Uniformly Minimum Bias S-Norm Estimation, 462 bias matrix, 87, 88, 91, 93, 94, 288, 312, 313, 349 bias vector, 87, 91, 93, 90, 288, 293, 292, 300, 304, 309, 312, 313, 314, 320, 321, 322, 349, 350, 355, 357, 358 bias weight matrix, 359 BIQE, 285, 294, 299, 300, 301, 303, 304, 305, 311 BIQUUE, 187, 189-196, 198-201, 217, 236-241, 285, 294, 298-301, 303, 304, 305, 311, 380, 379, 385, 386, 387, 569, 571, 572, 574, 579, 581, 582, 588, 596, 597, 599, 603, 606, 613, 621,
622, 623, 631, 632, 633, 634, 640, 642 bivariate Gauss-Laplace pdf, 627 Bjerhammar formula, 485, 516 BLE, 285, 287 BLIMBE, 311, 642 BLIP, 347 BLUMBE, 285, 291, 293, 287, 298, 299, 300 BLUUE, 188, 187, 194, 195, 201, 208, 210, 379, 380, 387, 430, 467, 567, 569, 571, 572, 574, 579, 581, 582, 585, 586, 588, 592, 596, 597, 599, 603, 606, 621, 622, 629, 630, 632, 633, 639, 640, 643 BLUUP, 380, 387 bordering, 302, 485 break points, 176, 177, 181, 182, 143 Brown process, 476 canonical LESS, 135, 137, 139, 131 canonical MINOLESS, 281 canonical MINOS, 1, 13, 26, 36, 37, 41, 212, 485 Cayley inverse, 322, 323, 324, 359, 360, 497, 498, 499, 501, 502, 513, 517, 576 Cayley multiplication, 513 Cayley-product, 486, 497 characteristic equation, 509 characteristic polynomial, 509 Choleski decomposition, 503 collocation, 386, 399 column space, 6, 496, 497 commutativity, 486 condition equations with unknowns, 429 conditional equations, 413 confidence coefficient, 554 confidence interval, 543, 553, 557, 564, 567, 592, 596, 605, 606, 611, 612, 613, 614, 617, 619, 620 confidence region, 543, 621 consistent linear equation, 2, 6 cumulative pdf, 569, 571, 581
746 cumulative probability, 554, 615, 627, 628, 630 curtosis, 614, 645, 649, 647, 651 curved manifolds, 327 datum defect, 74 datum transformation, 2, 78, 83 determinant, 485, 504, 512 determinantal Form, 526 diagonal, 488 distributed observation, 91 distribution - circular normal, 328, 329, 336, 339, 340 - circular normal Fisher, 335 - Fisher- von Mises or Langevin, 329 - Langevin, 328 - oblique normal, 339, 340, 335 - von Mises, 335, 336, - Gauss normal, 328 distributivity, 486 Duncan-Guttman matrix identity, 323, 324, 359, 360, 502 dynamical Systems, 476 eigenspace, 31, 509 eigenspace analysis, 1, 27, 30, 31, 33, 34, 40, 118, 151, 131, 133, 135, 243, 281, 621, 624, 625 eigenspace synthesis, 1, 28, 30, 33, 34, 36, 118, 131, 133, 135, 138, 144, 243, 281, 621, 625, 628, 635, 638 eigenvalue analysis, 632 eigenvalue decomposition, 26 eigenvalue-eigenvector analysis, 510 eigenvalue-eigenvector decomposition, 485 eigenvalue-eigenvector synthesis, 510 eigenvalues, 509, 512 eigenvectors, 509, 512 Equivalence Theorem, 216 error propagation, 365, 366, 648 error-in-variables, 347, 348, 401-405 estimability, 653, 654 extended Helmert matrix, 583 fibering, 3, 7, 12, 96, 100, 103, 244, 248, 256 Fisher - von Mises, 328 Fisher distribution, 646 Fisher pdf, 330
Index Fisher sphere, 327 fixed effects, 304, 306, 309, 312, 313, 314, 347, 348, 397 Fourier analysis, 45, 46, 47, 48, 49, 50, 51 Fourier coefficient, 45 Fourier series, 40, 41, 42, 44, 45 Fourier synthesis, 45 Fourier-Legendre analysis, 59, 63, 64, 65, 57 Fourier-Legendre series, 52, 67 Fourier-Legendre synthesis, 57, 68 Frobenius error matrix, 433 Frobenius matrix norm, 209, 290, 291, 314, 313, 350 Frobenius matrix W - seminorm, 438, 439 Frobenius norm, 87, 88, 91, 92, 94, 93, 122 gamma function, 550, 553, 604, 605 Gauss-Laplace normal pdf, 573 Gauss elimination, 23 Gauss matrix, 552 Gauss process, 476 Gauss-Helmert model, 411, 412, 413, 419, 428, 429 Gauss-Jacobi Combinatorial Algotithm, 176, 177, 179, 180, 182 Gauss-Laplace inequality, 644, 647 Gauss-Laplace normal distribution, 567, 580, 644, 646, 647, 652 Gauss-Laplace normally distributed observations, 553, 557, 564 Gauss-Laplace pdf, 554, 555, 581, 585 Gauss-Laplace probability distribution, 556, 559, 561, 562 Gauss-Markov model - consistent linear, 86, 87 - general with mixed effects, 379, 385 - general linear with fixed effects, 380 - inhomogeneous general linear with fixed effects, 379 - linear, 621 - mixed, 382, 383 - mixed with fixed effects, 389, 390, 391, 392 - mixed with random effects, 389, 390, 391, 392 - multivariate with constraints, 455, 459, 460 - multivariate model, 455, 456, 457 - multivariate with linear homogeneous constraints, 459
Index - special consistent linear, 88 - special consistent linear model of fixed effects, 89, 90 - special error propagation law, 648 - special model, 85, 192, 208, 209, 210, 211, 213, 223, 224, 226, 228, 236, 237, 240, 241, 285, 297, 302, 306 - special with fixed effects, 313, 315, 316, 319, 320 - special with random effects, 347, 349, 348, 355 - special without datum defect, 187 - special linear, 216, 287, 288, 623, 631 - special linear of fixed effects, 631 - special linear with datum defect, 642 - special linear with datum defect of fixed effects, 642 - special linear with fixed effects, 321 - special linear with random effects, 351, 352, 353, 355, 356, 357 general Gauß-Helmert model, 445, 453 general linear model of type conditional equations with unknowns, 453 generalized Helmert transformation, 635 geometric mean, 184 g-inverse, 16, 254, 298, 302, 376, 377, 415, 425, 447, 448, 485, 513, 514, 516, 517, 519, 521 - reflexive and symmetric, 520 - reflexive, 16, 110, 485, 513, 515, 517, 520 - reflexive symmetric, 485, 513 - symmetric reflexive, 518, 519 goodness of fit, 375 Grand Linear Model, 445, 446, 449, 450 Grassmann coordinates, 143, 152, 156, 157, 158, 161, 162, 165, 169175 Grassmann product, 545 Gx-LESS, 180, 181 (Gx, Gy)-MINOLESS, 261, 264, 263, 265, 271, 273, 274, 276, 277, 282 Gx-MINOS, 1, 18-23, 25, 26, 36, 37, 68, 86, 90, 91, 273
747 Gy-LESS, 95, 112-116, 118, 129, 130, 131, 138, 139, 188, 202, 216, 273, 373, 374, 376, 377 Gy-MINOS, 264 Gy-norm, 375 Gy-seminorm, 112, 374, 375, 376, 377 Hadamard-product, 487 Hankel matrix, 492, 493, 485, 494 HAPS, 243, 244, 282, 283, 284, 287 Hellenic mean, 185 Helmert’s Chi Square F 2 pdf, 585, 591, 614, 621, 631, 633, Helmert equation, 234, 235, 236 Helmert matrix, 233, 234, 235, 485, 491, 492, 578, 579 Helmert random variable, 598, 601, 602, 603, 617, 629, 636 Helmert transformation, 555, 572, 575, 576, 577, 578, 579, 588, 597, 600, 623, 634 Helmert’s inverse random variable, 615 Helmert’s pdf, 614, 615, 616 Helmert’s polar coordinates, 637 Helmert’s random variable, 613, 614, 615, 621, 627, 631 Helmert’s random variable x, 637 Helmert's ansatz, 230-234 Hesse matrix, 366, 367, 368, 370, 442, 531, 542, 644, 650 Higher classifications with interaction, 474 Hilbert space, 41, 44, 54, 45, 56, 114, HIQUUE, 221, 232, 236, 230, 235, 217, 234 Hodge star operator, 152, 153, 164, 545, hom BLE, 312, 313, 315, 316, 317, 319, 320, 321, 323, 324, 325, 326 hom BLIP, 347, 349, 351, 353, 355, 359, 360, 361, 362 hom BLUUP, 349 hom LUMBE, 90, 91, hom S-BLE, 312, 313, 315, 316, 317, 318, 319, 320, 321, 323, 324, 325, 326 hom S-BLIP, 347, 349, 351, 353, 354, 355, 356, 357, 359, 360, 361, 362 hom S-LUMBE, 88, 90 hom Į-BLE, 312, 313, 315, 316, 317, 318, 319, 321, 322, 324, 325, 326 hom Į-VIP, 347, 349, 351, 352, 353, 355, 357, 359, 360, 361, 362
748 homogeneous, 124, 125 homogeneous conditional equations with unknowns, 421 homogeneous conditions with unknowns , 415, 417, 419 homogeneous inconsistent condition equations with unknowns, 422 homogeneous linear prediction, 349, 397, 398 homogeneous Linear Uniformly Minimum Bias Estimation, 88 homogeneous Į-weighted hybrid minimum variance- minimum bias estimation, 316 Hybrid APproximate Solution, 419 I, I-BLUMBE, 467, 472 I, I-HAPS, 421, 423, 412 I, I-LESS, 422 I, I-MINOLESS, 412, 421, 422, 423 I,I-BLE, 311 I,I-BLUMBE, 311 I3 ,I3-BLE, 306 I3, I3-BLUMBE, 301 I3, I3-MINOLESS, 301 I-BIQUUE, 203, 202 I-BLUUE, 203 idempotence, 216, 488, 494 idempotent, 24, 25, 231, 514 identifiability, 644, 652, 653, 654 identity, 497 I-LESS, 113, 144, 127, 148, 149, 159, 202, 412, 421, 422, 439, 441 I-MINOS, 68, 72, 75 implicit function theorem, 2, 3, 96, 74, 533, 537, 538 inconsis-tent homogeneous condition equations, 373, 375, 376 inconsistent inhomogeneous condition equations, 374, 377 inconsistent linear equation, 466 inconsistent linear system of equations, 433 inconsistent, inhomogeneous condition equations with unknowns, 426, 427 inconsistent, inhomogeneous system of linear equations, 445 inconsistent, inhomogeneous system of linear equations, namely, 446 independent identically, 91 independent identically distributed, 93
Index infinite Fourier series, 48 inhomogeneous conditions with unknowns, 425, 428, invariant quadratic estimation, 191, 223 invariant quadratic estimation: IQE, 188 invariant quadratic unformly unbiased estimation: IQUUE, 188 Invariant Quadratic Uniformly Unbiased Estimation, 193, 226, 230 inverse, 497 - Cayley, see Cayley inverse - generalized, 2, 16, 18, 23, 25, 26, 109, 110, 111, 129, 130, 197, 254, 259, 265, 273, 276, 291, 293, 294, 297, 413, 416, 421, 427, 511, 513 - left, 114, 127, 130, 197, 226, 270, 271, 312 - left and right, 485 - left generalized, 430 - Moore-Penrose, see Moore-Penrose inverse - pseudo, 254, 265, 268, 293, 485, 511, 516, 517, 518, 520, 521 - reflexive, 36, 37, 138, 254, 259, 265, 268, 273, 293 - reflexive, symmetric generalized , 455 - right, 2, 20, 68, 270, 271, 572 - symmetric reflexive generalized, 462, 467, 472 - weighted right, 21 inverse addition, 486 inverse Helmert transformation, 583 inverse partitioned matrices, 19, 22, 115, 211, 212, 375 IQE, 198, 199, 223, 224, 225, 228, 231, 235, 239 IQUUE, 193, 197, 198, 217, 226, 228, 230, 231, 232, 237, 238 IQUUE of Helmert type: HIQUUE, 189 isotropic, 124, 125 Jacobi matrix, 366, 367, 368, 370 Jacobian determinant, 552 Kalman - Bucy Filter, 455 Kalman filter, 478, 479, 480 Khatri-Rao-product, 487, 488 Kolmogorov-Wiener prediction, 379, 398, 399 Kronecker - Zehfuss, 528, 531 Kronecker products, 651
Index Kronecker-Rao product, 122 Kronecker-Zehfuss Product, 89, 122, 211, 318, 354, 456, 486, 521 Lagrange multipliers, 19, 374, 377, 376, 382, 415, 417, 419, 425, 426, 428, 447, 449, 450, 533, 536, 538, 541 Langevin sphere, 327 Laplace transform, 484 Laplace-Beltrami operator, 41 latent conditions, 143 latent restrictions, 152, 158, 160, 163, 168, 169, 172 least squares, 100, 101, 103, 109, 107, 111-115, 185, 187 least squares solution, 96, 248, 252, 375 left eigenspace, 437 left eigenvectors, 509 LESS, 104, 107-110, 117, 119, 121, 122, 126, 135, 189, 190, 480 LESS model, 458 leverage point, 165, 166, 143, 144, 148, 149, 151, 176 likehood function, 652 linear equations - underdetermined, 40 linear independence, 495 linear uniformly minimum bias estimator, 85, 86 Linear Uniformly Unbiased Estimation, 193, 209 linear Voltera integral equation of the first kind, 594 linear Volterra integral equation of the first kind, 556, 557, 619 linear Volterra integral equation the first kind, 555 linear Volterra integral equations of the first kind, 615, 616 logarithmic mean, 184 longitudinal and lateral correlation functions, 366 Löwner partial ordering, 508 LUMBE, 85, 189 LUUE, 193 MALE, 205, 206, 208, 217 matrix - adjoint, 503 - antisymmetric, 485, 488, 523 - canonically simple dispersion, 93 - commutation, 494, 495, 485, 527
749 - criterion, 124, 365 - diagonal, 485 - dispersion, 90, 93, - extended Helmert, see extended Helmert matrix - hat, 128, 143, 144, 146, 148, 151 - Helmert, see Helmert matrix - Hesse, see Hesse matrix - idempotent, 485, 509 - inverse partitional, 485, 497 - inverse partitioned, 499, 501 - Jacobi, see Jacobi matrix - lower triangular, 488 - normal, 485 - null, 485 - orthogonal, 485, 489, 490, 492 - orthonormal, 485, 509, 510 - permutation, 201, 494, 485 - projection, 16, 24, 25, 216 - quadratic, 497, 501, 509, 517 - quadratic Helmert, see quadratic Helmert matrix - rectangular, 493, 496, 503, 506, 516 - rectangular Helmert, see rectangular Helmert matrix - reflexive, 25 - rotation, 509 - simple dispersion, 91 - simple variance-covariance matrix, 91, 92, 93 - singular, 496 - substitute, 90, 301, 306, 308 - symmetric, 485, 497, 516, 523, 488 - Taylor Karman criterion, see Taylor Karman criterion matrix - Taylor-Karman, Taylor-Karman matrix - unity, 485, 488 - upper triangular, 488 - Vandermonde, see Vandermonde matrix - zero, 488 matrix multiplication - Cayley, 485 - Khati-Rao, 485 - Kronecker-Zehfuss, 485 matrix of courtosis, 644 matrix of obliquity, 644 Maximum Likelihood Estimation, 191, 205, 206 maximum likelihood estimator MLE, 336 Mean Square Error, see MSE
750 Mean Square Estimation Error, see MSE Mean Square Estimation Error, see MSE Mean Square Prediction Error, see MSPE median, 184, 191, 201 minimum norm, 7, 252, 248 minimum norm solution, 2, 3, 4, 7, 17, 72, 79, 254, 250 minimum variance, 190 MINQUUE, 217 MINOLESS, 235, 243, 244, 245, 256, 258, 263, 269, 270, 272, 281, 286, 464, 469 MINOS, 12, 14-18, 24, 32, 33, 34, 37, 38, 52, 64, 65, 75, 77, 85, 189, 243, 255 mixed model, 348, 402 modified Mean Square Estimation Error, 314 modified Mean Square Estimation Error MSE, 319 modified Mean Square Prediction Error, 350, 355, 357 modified method of least squares, 407, 408 moment estimator, 401 Moore-Penrose inverse, 499, 516, 520 MSE, 288, 300, 304, 307, 309, 313, 315, , 314, 319, 320, 321, 322, 323, 325, 326 MSPE, 349, 350, 351, 355, 356, 358, 352, 359, 362 multidimensional Gauss-Laplace normal distribution, 621 multivariate BLUUE, 458 multivariate Gauss-Laplace normal distribution, 640 multivariate Gauss-Laplace probability distribution, 642 multivariate Gauss-Markov, 458 multivariate least squares solution, 458 multivariate LESS, 457 necessary conditions, 19, 89, 101, 112, 115, 194, 250, 264, 284, 291, 207, 210, 332, 334, 353, 354, 375, 448, 451, 317, 318, 404, 415, 420 nonlinear error propagation, 644, 648, 649, 650
Index norm, 485 normal, 488 normal equation, 18, 20, 22, 23, 89, 114, 116, 117, 114, 194, 212, 240, 262, 293, 405 null space, 4, 5, 6, 9, 10, 11, 12, 14, 24, 97, 98, 99, 104, 245, 247, 509, 517, 518, 520, n-way classification model, 455, 460 obliquity, 645, 647, 649, 651 observability, 483 observation space, 6 orthogonal, 488 orthogonal complement, 9 orthogonal functions, 40 outlier, 176, 191, 202, 312 p.d.f., 543, 544, 547 parameter space, 11, 12, partial redundancies, 143, 148 partitioning - algebraic, 3, 7, 12, 15, 96, 100, 103, 244, 248, 256 - geometric, 3, 7, 12, 15, 96, 100, 103, 244, 248, 256 - rank, 4, 7, 10, 12, 96, 100 - set-theoretical, 3, 7, 12, 96, 100, 103, 244, 248, 256 - rank, 100, 101, 103, 104, 106, 127, 128, 133, 137, 139, 160, 166, 244, 245, 248, 249, 252, 256, 261 Penrose equations, 516 Plücker coordinates, 143, 152, 156, 157, 158, 161, 162, 165, 169-175 PLUUP, 479 polar decomposition, 28, 31 positive-definite, 494 positive-semidefinite, 494 Principal Component Analysis, 632 prior information, 2, 23, 119 probability density function, 543 Procrustes algorithm, 2, 75, 78, 79, 81, 431, 433, 437, 440, 441 Procrustes transformation, 442, 443 projections, 485 pseudo-observations, 179, 181 quadratic Helmert matrix, 583, 588, 589, 605 quadratic Helmert transformation, 582, 583
Index quasi - Gauss-Laplace normal distribution, 647 quasi - normal distribution, 644 R, W-HAPS, 412, 413, 414, 420, 421, 424, 427, 428, 445, 446, 451, 453 R, W-MINOLEES, 411, 412, 413, 414, 416, 417, 419, 421, 423, 424, 426, 427, 428, 445, 446, 449, 450, 453 random effect, 301, 305, 307, 309, 347, 348, 351, 379, 380, 397, 398, 402, 460, 385 random effect model, 401, 404 random variable, 544, 543, 545, 547 range space, 6, 9 rank, 485, 512 rank - column, 495 - row, 495 rank factorization, 235, 270, 485, 496, 503, 517, 510 rank partitioning - additive, 271, 272, 263 - multiplicative, 263, 270 - vertical, 107 Rao’s Pandora Box, 375 rectangular Helmert matrix, 576, 582 rectangular Helmert transformation, 582 reflexiv, 516 regression, 40, 85 ridge estimator, 283, 322, 359 right eigenspace, 437 right eigenvector, 509 6, In-BLUUE, 458, 460 Sampling distributions, 543 S-BLE, 316 6-BLUUE, 404 Schur complements, 382, 383, 384, 385, 388, 391, 448, 498, 500, 501, 502 second derivatives, 415 second order design, 120-123 Sherman-Morrison-Woodbury matrix identity, 502 S-hom BLIP, 352 similarity transformation, 83, 481, 482 simple bias matrix, 92 singular value decomposition, 41, 437, 438, 439, 485, 511, 632
751 singular value representation, 485 singular values, 512 skew product, 545 skewness, 614 slicing, 3, 7, 12, 96, 100, 103, 244, 248, 256 S-LUMBE, 86, 89, 90 special linear error propagation, 644 state differential equation, 484 state equation, 479, 482 state vector, 483 state-space form, 480 statistical homogeneity and isotropy, 365 S-transformation, 515, 518, 519, 653 Student random variable, 598, 600-609 Student’s pdf, 606, 609, 611 Student’s random variable, Student’s t distribution, 596, 603, 605606 Sturm-Liouville boundary condition, 41, 45, 57 Sturm-Liouville equations, 46 sufficiency condition, 19, 89, 102, 113, 115, 195, 207, 211, 251, 264, 284, 291, 317, 318, 375, 448, 451, 332, 334, 353, 354, 355 6y-BLUUE, 187, 188, 210, 211, 216, 379, 380, 381, 382, 383 system of conditional equations, 373 system of conditional equations with unknowns, 411 system of consistent linear equations, 17 system of directional observational equations - inconsistent, 327 system of homogeneous condition equations with unknowns, 414 system of inconsistent linear equations, 99, 111 system of inhomogeneous condition equations with unknowns, 424 system of homogenous equations - consistent, 521 system of inhomogeneous linear equations - consistent, 13 - inconsistent, 101 - overdetermined, 100 system of linear equations - consistent, 1, 5, 15, 16, 18, 20, 25, 520
752 - inconsistent, 100, 104, 105, 109, 111, 112, 113, 130, 208, 243, 245, 246, 256, 258, 280 - overdetermined, 95, 96, 104, 103, 106, 134, 135, 137, 189 - underdetermined, 1, 3, 10, 32, 33, 34, 36, 49, 52, 87, 90 system of linear observational equations, 84 system of linear observational equations - inconsistent, 95-99 system of nonlinear equations - overdetermined, 327 system of observational equation consistent, 74 Taylor Karman criterion matrix, 124 Taylor-Karman matrix, 367 Taylor-Karman structure, 125, 365, 366 system of conditional equations with unknowns, 411 Three Sigma Rule, 553, 562, total least squares, 114, 348, 402, 403, 404 total least squares estimator, 401 trace, 485, 512, 523, 525 trace of a matrix, 503, 506 trance, 507 two-sided confidence interval, 595, 609, 613, 616 Tykhonov-Phillips regularization, 282, 283, 287, 322, 359 unbiased estimability, 644 unbiased estimation, 191 unbiasedness, 652, 653 Uncertainly Principle, 612, 613, 618, 620 uncertainty number, 619 uncertainty number D, 620 underdetermined, 5, 6, 7, 15, underdetermined regression problem, 41 uniform unbiasedness, 190 V, S-BLE, 308, 309, 311 V, S-BLUMBE, 301, 302, 303, 304, 305, 311, 462 Vandermonde determinant, 494 Vandermonde matrix, 485, 493 variance component, 218, 221, 222, 224, 226, 227, 228, 230, 231, 232, 236, 237, 238
Index variance component estimation, 634 variance-covariance component estimation, 217, 218, 460 variance-covariance components, 218, 219, 223, 225, 229, 232, 236 Variational Problem, 541 vec, 506, 507 vech, 506 veck, 506 vector valued matrix Forms, 506 Venn diagrams, 4 VIP, 347 Volterra integral equation of the first kind, 565, 607 von Mises, 328 von Mises circle, 327 von Mises pdf, 330 von Mises distribution, 646 Vysochainskii - Potunin inequality, 644, 647, 556 Wassel's family of means, 185 wedge product, 545 W-LESS, 411, 412, 414, 415, 416, 424, 425, 431, 432, 433, 434, 435, 437, 438, 441, 443, 445, 446, 447 Zlobec formula, 485, 516