Measurement Data Modeling and Parameter Estimation
© 2012 by Taylor & Francis Group, LLC
Systems Evaluation, Predicti...
95 downloads
1682 Views
5MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Measurement Data Modeling and Parameter Estimation
© 2012 by Taylor & Francis Group, LLC
Systems Evaluation, Prediction, and Decision-Making Series Series Editor
Yi Lin, PhD Professor of Systems Science and Economics School of Economics and Management Nanjing University of Aeronautics and Astronautics Editorial Board Dr. Antonio Caselles, University of Valencia, Spain Dr. Lei Guo, Chinese Academy of Sciences, China Dr. Tadeusz Kaczorek, Bialystok Technical University, Poland Dr. Salvatore Santoli, International Nanobiological Testbed Ltd., Italy Dr. Vladimir Tsurkov, Russian Academy of Sciences, Russia Dr. Robert Vallee, Organisation of Cybernetics and Systems, France Dr. Dongyun Yi, National University of Defense Technology, China Dr. Elias Zafiris, University of Athens, Greece
Efficiency of Scientific and Technological Activities and Empirical Tests Hecheng Wu, Nanjing University of Aeronautics and Astronautics ISBN: 978-1-4200-8846-5 Grey Game Theory and Its Applications in Economic Decision-Making Zhigeng Fang, Nanjing University of Aeronautics and Astronautics ISBN: 978-1-4200-8739-0 Hybrid Rough Sets and Applications in Uncertain Decision-Making Lirong Jian, Sifeng Liu, and Yi Lin, Nanjing University of Aeronautics and Astronautics ISBN: 978-1-4200-8748-2 Irregularities and Prediction of Major Disasters Yi Lin, Nanjing University of Aeronautics and Astronautics ISBN: 978-1-4200-8745-1 Measurement Data Modeling and Parameter Estimation Zhengming Wang, Dongyun Yi, Xiaojun Duan, Jing Yao, Defeng Gu National University of Defense Technology, Changsha, PR of China ISBN: 978-1-4398-5378-8 Optimization of Regional Industrial Structures and Applications Yaoguo Dang, Sifeng Liu, and Yuhong Wang, Nanjing University of Aeronautics and Astronautics ISBN: 978-1-4200-8747-5 Systemic Yoyos: Some Impacts of the Second Dimension Yi Lin, Nanjing University of Aeronautics and Astronautics ISBN: 978-1-4200-8820-5 Theory and Approaches of Unascertained Group Decision-Making Jianjun Zhu, Nanjing University of Aeronautics and Astronautics ISBN: 978-1-4200-8750-5 Theory of Science and Technology Transfer and Applications Sifeng Liu, Zhigeng Fang, Hongxing Shi, and Benhai Guo, Nanjing University of Aeronautics and Astronautics ISBN: 978-1-4200-8741-3 © 2012 by Taylor & Francis Group, LLC
Measurement Data Modeling and Parameter Estimation
Zhengming Wang • Dongyun Yi Xiaojun Duan • Jing Yao • Defeng Gu
© 2012 by Taylor & Francis Group, LLC
CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2012 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20110829 International Standard Book Number-13: 978-1-4398-5379-5 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
© 2012 by Taylor & Francis Group, LLC
Contents P r e fa c e
xv
authors
xix
chaPter 1
e r r o r th e o r y
1.1
1.2
1.3
Measurement 1.1.1 Measurement Data 1.1.2 Classification of Measurement 1.1.2.1 Concept of Measurement 1.1.2.2 Methods of Measurement 1.1.2.3 Equal Precision and Unequal Precision Measurements 1.1.2.4 Measurements of Static and Dynamic Objects Measurement Error 1.2.1 Concept of Error 1.2.2 Source of Errors 1.2.3 Error Classification 1.2.4 Quality of Measurement Data 1.2.5 Summary Random Error in Independent Measurements with Equal Precision 1.3.1 Postulate of Random Error and Gaussian Law of Error 1.3.2 Numerical Characteristics of a Random Error 1.3.2.1 Mean 1.3.2.2 Standard Deviation 1.3.2.3 Estimation of Standard Deviation
1 1 1 2 2 3 4 5 6 6 6 7 8 8 8 9 10 10 11 11
v
© 2012 by Taylor & Francis Group, LLC
vi
C o n t en t s
1.3.2.4
Estimation of Mean and Standard Deviation 1.3.3 Distributions and Precision Indices of Random Errors 1.3.3.1 Distributions of Random Errors 1.3.3.2 Precision Index of Measurement 1.4 Systematic Errors 1.4.1 Causes of Systematic Errors 1.4.2 Variation Rules of Systematic Errors 1.4.3 Identification of Systematic Errors 1.4.4 Reduction and Elimination of Systematic Errors 1.5 Negligent Errors 1.5.1 Causes and Avoidance of Negligent Errors 1.5.1.1 Causes of Negligent Errors 1.5.1.2 Avoidance of Negligent Errors 1.5.2 Negligent Errors in Measurement Data of Static Objects 1.5.2.1 Romannovschi Criterion 1.5.2.2 Grubbs Criterion 1.5.2.3 Summary of Identification Criteria 1.6 Synthesis of Errors 1.6.1 Uncertainty of Measurement 1.6.1.1 Estimation of Measurement Uncertainty 1.6.1.2 Propagation of Uncertainties 1.6.2 Functional Errors 1.6.2.1 Functional Systematic Errors 1.6.2.2 Functional Random Errors 1.7 Steps of Data Processing: Static Measurement Data References chaPter 2
Pa r a m e t r i c r e P r e s e n tat i o n s B e e s t i m at e d
of
© 2012 by Taylor & Francis Group, LLC
13 13 15 17 17 18 19 21 22 23 23 23 23 23 27 28 28 28 29 29 31 31 31 33 36
fu n c t i o n s
to
2.1 2.2
12
Introduction Polynomial Representations of Functions to Be Estimated 2.2.1 Weierstrass Theorem 2.2.2 Best Approximation Polynomials 2.2.3 Best Approximation of Induced Functions 2.2.4 Degrees of Best Approximation Polynomials 2.2.5 Bases of Polynomial Representations of Functions to Be Estimated 2.2.5.1 Significance of Basis Selection 2.2.5.2 Chebyshev Polynomials
39 39 40 41 43 44 46 48 48 49
C o n t en t s
2.2.5.3
Bases of Interpolation Polynomials of Order n 2.2.5.4 Chebyshev Polynomial Bases 2.2.5.5 Bases and Coefficients 2.3 Spline Representations of Functions to Be Estimated 2.3.1 Basic Concept of Spline Functions 2.3.2 Properties of Cubic Spline Functions 2.3.3 Standard B Splines 2.3.4 Bases of Spline Representations of Functions to Be Estimated 2.4 Using General Solutions of Ordinary Differential Equations to Represent Functions to Be Estimated 2.4.1 Introduction 2.4.2 General Solutions of Linear Ordinary Differential Equations 2.4.3 General Solutions of Nonlinear Equation or Equations 2.5 Empirical Formulas 2.5.1 Empirical Formulas from Scientific Laws 2.5.2 Empirical Formulas from Experience 2.5.3 Empirical Formulas of Mechanical Type 2.5.4 Empirical Formulas of Progressive Type References chaPter 3
methods
3.1 3.2
3.3
3.4
3.5
© 2012 by Taylor & Francis Group, LLC
of
m o d e r n r e g r e s s i o n a n a ly s i s
Introduction Basic Methods of Linear Regression Analysis 3.2.1 Point Estimates of Parameters 3.2.2 Hypothesis Tests on Regression Coefficients 3.2.3 Interval Estimates of Parameters 3.2.4 Least Squares Estimates and Multicollinearity Optimization of Regression Models 3.3.1 Dynamic Measurement Data and Regression Models 3.3.2 Compound Models for Signals and Systematic Errors Variable Selection 3.4.1 Consequences of Variable Selection 3.4.2 Criteria of Variable Selection 3.4.3 Fast Algorithms to Select Optimal Reduced Regression Model 3.4.4 Summary Biased Estimation in Linear Regression Models 3.5.1 Introduction 3.5.2 Biased Estimates of Compression Type
vii 50 54 59 61 61 65 73 78 80 80 81 83 85 86 87 88 89 92 93 93 98 98 104 109 114 118 119 124 131 134 138 146 156 157 157 158
viii
C o n t en t s
3.5.3
A New Method to Determine Ridge Parameters 3.5.4 Scale Factors 3.5.5 Numerical Examples 3.6 The Method of Point-by-Point Elimination for Outliers 3.6.1 Introduction 3.6.2 Derivation of Criteria 3.6.3 Numerical Examples 3.7 Efficiency of Parameter Estimation in Linear Regression Models 3.7.1 Introduction 3.7.2 Efficiency of Parameter Estimation in Linear Regression Models with One Variable 3.7.3 Efficiency of Parameter Estimation in Multiple Linear Regression Models 3.8 Methods of Nonlinear Regression Analysis 3.8.1 Models of Nonlinear Regression Analysis 3.8.2 Methods of Parameter Estimation 3.9 Additional Information 3.9.1 Sources of Additional Information 3.9.2 Applications of Additional Information References chaPter 4
methods
4.1 4.2
4.3
© 2012 by Taylor & Francis Group, LLC
of
ti m e s e r i e s a n a ly s i s
Introduction to Time Series 4.1.1 Time Series and Random Process 4.1.2 Time Series Analysis Stationary Time Series Models 4.2.1 Stationary Random Processes 4.2.2 Autoregressive Models 4.2.3 Moving Average Model 4.2.4 ARMA(p,q) Model 4.2.5 Partial Correlation Function of a Stationary Model Parameter Estimation of Stationary Time Series Models 4.3.1 Estimation of Autocovariance Functions and Autocorrelation Functions 4.3.2 Parameter Estimation of AR(p) Models 4.3.2.1 Moment Estimation of Parameters in AR Models 4.3.2.2 Least Squares Estimation of Parameters in AR Models 4.3.3 Parameter Estimation of MA(q) Models 4.3.3.1 Linear Iteration Method
161 166 170 175 175 176 187 190 190 194 197 200 200 203 213 213 217 226 229 229 229 229 230 230 232 237 240 245 251 251 253 253 254 255 256
C o n t en t s
4.3.3.2 Newton–Raphson Algorithm Parameter Estimation of ARMA(p,q) Models 4.3.4.1 Moment Estimation 4.3.4.2 Nonlinear Least Squares Estimation 4.4 Tests of Observational Data from a Time Series 4.4.1 Normality Test 4.4.2 Independence Test 4.4.3 Stationarity Test: Reverse Method 4.4.3.1 Testing the Mean Stationarity 4.4.3.2 Testing the Variance Stationarity 4.5 Modeling Stationary Time Series 4.5.1 Model Selection: Box–Jenkins Method 4.5.2 AIC Criterion for Model Order Determination 4.5.2.1 AIC for AR Models 4.5.2.2 AIC for MA and ARMA Models 4.5.3 Model Testing 4.5.3.1 AR Models Testing 4.5.3.2 MA Models Testing 4.5.3.3 ARMA Models Testing 4.6 Nonstationary Time Series 4.6.1 Nonstationarity of Time Series 4.6.1.1 Processing Variance Nonstationarity 4.6.1.2 Processing Mean Nonstationarity 4.6.2 ARIMA Model 4.6.2.1 Definition of ARIMA Model 4.6.2.2 ARIMA Model Fitting for Time Series Data 4.6.3 RARMA Model 4.6.4 PAR Model 4.6.4.1 Model and Parameter Estimation 4.6.4.2 PAR Model Fitting 4.6.4.3 Further Discussions 4.6.5 Parameter Estimation of RAR Model 4.6.6 Parameter Estimation of RMA Model 4.6.7 Parameter Estimation of RARMA Model 4.7 Mathematical Modeling of CW Radar Measurement Noise References 4.3.4
chaPter 5
d i s c r e t e -ti m e K a l m a n f i lt e r
5.1 5.2
© 2012 by Taylor & Francis Group, LLC
Introduction Random Vector and Estimation 5.2.1 Random Vector and Its Process
ix 256 258 258 259 260 261 262 263 265 265 266 266 267 268 268 268 269 269 269 270 270 270 271 271 271 273 273 276 276 278 279 282 287 288 290 296 299 299 301 301
x
C o n t en t s
5.2.1.1 5.2.1.2
Mean Vector and Variance Matrix Conditional Mean Vector and Conditional Variance Matrix 5.2.1.3 Vector Random Process 5.2.2 Estimate of the State Vector 5.2.2.1 Minimum Mean Square Error Estimate 5.2.2.2 Linear Minimum Mean Square Error Estimate (LMMSEE) 5.2.2.3 The Relation between MMSEE and LMMSEE 5.3 Discrete-Time Kalman Filter 5.3.1 Orthogonal Projection 5.3.2 The Formula of Kalman Filter 5.3.3 Examples 5.4 Kalman Filter with Colored Noise 5.4.1 Kalman Filter with Colored State Noise 5.4.2 Kalman Filtering with Colored Measurement Noise 5.4.3 Kalman Filtering with Both Colored State Noise and Measurement Noise 5.5 Divergence of Kalman Filter 5.6 Kalman Filter with Noises of Unknown Statistical Characteristics 5.6.1 Selection of Correlation Matrix Qk of the Dynamic Noise 5.6.2 Extracting Statistical Features of Measurement Noises References chaPter 6
P r o c e s s i n g d ata me asurements
6.1
6.2
© 2012 by Taylor & Francis Group, LLC
from
301 303 304 305 305 308 310 310 310 314 317 321 321 322 324 325 332 333 333 345
r a da r
Introduction 6.1.1 Space Measurements 6.1.2 Tracking Measurements and Trajectory Determination Principle 6.1.2.1 Optical Measurements 6.1.2.2 Radar Measurements 6.1.3 Precision Appraisal and Calibration of Measurement Equipments 6.1.3.1 Precision Appraisal 6.1.3.2 Precision Calibration 6.1.4 Systematic Error Model of CW Radar 6.1.5 Mathematical Processing for Radar Measurement Data Parametric Representation of the Trajectory 6.2.1 Equation Representation of Trajectory
347 347 347 348 348 350 353 353 354 356 357 361 361
C o n t en t s
6.3
6.4
6.5
6.6
6.7
© 2012 by Taylor & Francis Group, LLC
6.2.2 Polynomial Representation of Trajectory 6.2.3 Matching Principle 6.2.4 Spline Representation of Trajectory Trajectory Calculation 6.3.1 Mathematical Method for MISTRAM System Trajectory Determination 6.3.1.1 Problem Introduction 6.3.1.2 Mathematical Model for the MISTRAM System Measurement Data 6.3.1.3 Mathematical Method for Trajectory Determination 6.3.1.4 Error Propagation Relationship 6.3.2 Nonlinear Regression Analysis Method for Trajectory Determination 6.3.2.1 Introduction 6.3.2.2 Mathematical Model Establishment 6.3.2.3 Algorithm and Error Analysis 6.3.3.4 Simulation Calculation Results Composite Model of Systematic Error and Trajectory Parameters 6.4.1 Measurement Data Models 6.4.2 Matched Systematic Error and Unmatched Systematic Error 6.4.3 Summary Time Alignment of CW Radar Multistation Tracking Data 6.5.1 Introduction 6.5.2 Velocity Measurement Mechanism of CW Radars 6.5.3 Mathematical Model of the Multistation Measurement Data 6.5.4 Solving Method and Error Analysis 6.5.5 Time Alignment between the Distance Sum and Its Change Rate Estimation for Constant Systematic Error of CW Radars 6.6.1 Mathematical Model of Measurement Data 6.6.2 EMBET Method Analysis 6.6.3 Nonlinear Modeling Method 6.6.4 Algorithm and Numerical Examples 6.6.5 Conclusions Systematic Error Estimation for the Free Flight Phase 6.7.1 Trajectory Equations in the Free Flight Phase 6.7.2 Nonlinear Model of the Measurement Data 6.7.3 Parameter Estimation Method
xi 363 365 366 370 371 371 372 372 375 376 376 377 379 382 383 383 384 387 387 387 388 391 393 398 401 401 403 405 412 415 416 417 420 423
x ii
C o n t en t s
6.7.4 Numerical Example and Analysis Estimation of Slow Drift Error in Range Rate Measurement 6.8.1 Mathematical Model of Measurement Data 6.8.2 Selection of the Spline Nodes 6.8.3 Estimation of the Slow Drift Errors 6.9 Summary of Radar Measurement Data Processing 6.9.1 Data Processing Procedures 6.9.1.1 Analysis of Abnormal Data 6.9.1.2 Analysis of the Measurement Principle and the Measurement Data 6.9.1.3 Measurement Data Modeling 6.9.1.4 Estimation of Statistical Features of Random Errors 6.9.1.5 Estimation of True Signal and Systematic Error 6.9.1.6 Engineering Analysis for Data Processing Results 6.9.2 Basic Conclusions References
443 444 447
Precise orBit determination of leo satellites Based on dual-frequency gPs
449
6.8
chaPter 7
7.1 7.2
7.3
© 2012 by Taylor & Francis Group, LLC
426 426 426 429 436 438 438 439 440 441 442 443
Introduction 449 Spaceborne Dual-Frequency GPS Data Preprocessing 451 7.2.1 Basic Observation Equations 452 7.2.2 Pseudocode Outliers Removal 452 7.2.2.1 Threshold Method of Signal-to-Noise 453 Ratio 7.2.2.2 Threshold Method of Ionospheric 453 Delay 7.2.2.3 Fitting Residual Method of 453 Ionospheric Delay 7.2.2.4 Method of Monitoring Receiver 458 Autonomous Integrity 7.2.3 Carrier Phase Outliers Removal and Cycle 461 Slip Detection 7.2.3.1 M–W Combination Epoch 462 Difference Method 7.2.3.2 Ionosphere-Free Ambiguity 463 Epoch Difference Method 7.2.3.3 Cumulative Sum Method 464 7.2.4 Data Preprocessing Flow 466 Orbit Determination by Zero-Difference Reduced 467 Dynamics 7.3.1 Observational Equations and Error Correction 469
C o n t en t s
7.3.1.1 7.3.1.2
7.3.2 7.3.3
7.3.4 7.3.5 7.3.6 References aPPendix 1
m at r i x f o r m u l a s
A1.1 A1.2 A1.3 A1.4 A1.5 aPPendix 2
© 2012 by Taylor & Francis Group, LLC
in
common use
Trace of a Matrix Inverse of a Block Matrix Positive Definite Character of a Matrix Idempotent Matrix Derivative of a Quadratic Form
distriButions
A2.1 A2.2 A2.3 A2.4 index
Relativity Adjustments Antenna Offset Corrections for GPS Satellites 7.3.1.3 Antenna Offsets for LEO Satellites Parameter Estimation of Orbit Models Dynamic Orbit Models and Parameter Selections 7.3.3.1 Earth Nonspherical Perturbation 7.3.3.2 Third Body Gravitational Perturbations 7.3.3.3 Tide Perturbations 7.3.3.4 Atmospheric Drag Forces 7.3.3.5 Solar Radiation Pressures 7.3.3.6 Relativity Perturbations 7.3.3.7 Empirical Forces 7.3.3.8 Dynamic Orbit Models and Parameter Selections Re-Editing Observational Data 7.3.4.1 Re-Editing Pseudocode Data 7.3.4.2 Re-Editing Phase Data The Flow of Zero-Difference Reduced Dynamic Orbit Determination Analysis of Results from Orbit Determination
in
common use
χ2-Distribution Noncentral χ2-Distribution t-Distribution F-Distribution
x iii 470 470 472 473 477 478 480 481 481 482 483 483 484 485 485 485 486 487 494 497 497 498 500 501 501 503 503 505 506 508 517
Preface Data are the basis of scientific research. People use various ways of measurement to obtain data. For different reasons, however, measurement data are not the same as true values of the corresponding physical quantities and contain measurement errors. To improve data quality and apply measurement data effectively, it is necessary to analyze rationales of measurement data, evaluate magnitudes and causes of measurement errors, and perform mathematical processing of available measurement data before applying the data. There are many books on how to process measurement data. The main feature of this book is to transform problems in processing measurement data into problems of model building, parameter estimation, outlier detection, and hypothesis tests. The theory and methods of parameter estimation are presented throughout the book. Mathematical modeling of measurement data is one of the focal contents of the book. As a book combining mathematics and its applications, the background of this book is on processing measurement data in aerospace. Such a topic usually presents a tough problem in the field of processing dynamic measurement data because there are many complex sources of error in aerospace measurements and the precision requirement is high. xv
© 2012 by Taylor & Francis Group, LLC
xvi
P ReFAC e
There are mainly two ways of improving the precision of aerospace measurements. One is from the perspective of measuring devices, and includes equipment rehabilitation and upgrading. The other relies on advanced techniques in processing measurement data. Equipments and techniques of processing measurement data are complementary. Mathematical modeling and processing of measurement data in aerospace play important roles in the following aspects: (1) analyzing causes of various measurement errors and verifying the rationale of measurement data; (2) identifying and adjusting outliers, estimating and calibrating systematic errors, and improving the precision of measurement; (3) analyzing statistical properties of random errors and providing the basis for the application of measurement data; and (4) evaluating the precision of measurement data and measuring equipments. This book, Measurement Data Modeling and Parameter Estimation, has been written in terms of aspects (1) through (4) above. It particularly emphasizes methods of mathematical processing in measurement data. We convert problems in processing measurement data to issues of parameter estimation and apply ideas from various mathematics branches such as numerical analysis and mathematical statistics to the mathematical modeling and processing of measurement data. The book is both innovative and rigorous in terms of mathematical theory. We focus on both theory and methods in mathematical modeling and parameter estimation of the measurement data. However, we do not rigidly adhere to the derivation procedures of some specific formulas such that readers can grasp the main points easily and acquire a great amount of information in a short period of time. The book is based on the authors’ many years of experience in teaching and scientific research. It has informative content. In particular, a large number of figures, tables, examples, and exercises are provided to link theory with practice, to understand the content step by step, and to maintain good readability. The book aims at building a bridge between mathematical theory and engineering practice in processing measurement data from aerospace so that researchers in mathematical theory and engineers are capable of communicating with each other within a short period of time. In particular, it aims to create opportunities for mathematicians who are interested in the aerospace industry to innovatively apply mathematical theory to engineering in aerospace measurement and control.
© 2012 by Taylor & Francis Group, LLC
P ReFAC e
x vii
The relevant theories and methods introduced in this book are also applicable to processing dynamic measurement data in other fields. The book is mainly intended for senior undergraduates majoring in applied mathematics, graduate students of relevant majors, and engineering technicians. It is recommended that readers have a basic knowledge of linear algebra, probability and statistics, and computing as prerequisites. The book is organized in seven chapters. Chapter 1, “Error Theory,” presents a brief introduction to the basic concepts of measurements and errors. Chapter 2, “Parametric Representations of Functions to Be Estimated,” discusses modeling methods for both the true signals to be estimated and systematic errors. Methods of parametric representation, such as polynomial, polynomial spline, initial values of equations, and empirical formula, are introduced. Chapter 3, “Methods of Modern Regression Analysis,” discusses methods of regression analysis that are closely related to the mathematical processing of dynamic measurement data. In addition to methods of parameter estimation and hypothesis testing, the focus is also on the conversion of problems of processing dynamic measurement data into problems in regression analysis, model building and optimal model selection, and methods of outlier detection. The feature of Chapter 4, “Methods of Time Series Analysis,” is to transform time series models into problems of parameter estimation. These contents have direct applications in the fields of space monitoring and control. Chapter 5, “Discrete-Time Kalman Filter,” focuses on Kalman filtering with colored noises and its applications. Chapter 6, “Processing Data from Radar Measurements,” includes analyses of measurement principles, systematic errors, and trajectory parameters, modeling of radar measurement data, estimation of trajectory parameters and systematic errors, and so on. Chapter 7, “Precise Orbit Determination of LEO Satellites Based on Dual-Frequency GPS,” focuses on the mathematical methods of data processing related to techniques of precise orbit determination for dual-frequency GPS-based LEO satellites. Chapters 1–3 are written by Zhengming Wang and Xiaojun Duan; Chapters 4 and 5 by Dongyun Yi and Jing Yao; Chapter 6 by Zhengming Wang and Jing Yao; and Chapter 7 by Defeng Gu and Jing Yao. Zhengming Wang finalizes the organization and unification of the book. Some materials in the book are organized with the
© 2012 by Taylor & Francis Group, LLC
x viii
P ReFAC e
required written permission by referring to a previous work, Measurement, Modeling and Estimation (in Chinese). This earlier work was authored by Zhengming Wang and Dongyun Yi, two authors of this book, and published in 1996 by National University of Defense Technology Press. This book has cited several results from various sources. Our research with the work in these references makes the book a complete system. We express our sincere appreciation to all the authors of the treatises. We also thank Hao Wang, Yi Wu, Haiyin Zhou, Jubo Zhu, Yu Sha, Lisheng Liu, and Jinyou Xiao of the National University of Defense Technology in People’s Republic of China for their careful examination of the manuscript and many valuable comments and suggestions. Our special gratitude goes to Xianggui Qu of Oakland University in the United States for his painstaking efforts in proofreading and polishing the entire book. The research contained in this book is funded by grants from the Natural Science Foundation of China (NSFC, Nos. 60902089, 60974124, 61002033).
© 2012 by Taylor & Francis Group, LLC
Authors Dr. Zhengming Wang received his BS and MS degrees in applied mathematics and a PhD degree in system engineering in 1982, 1986, and 1998, respectively. Currently, he is a professor in applied mathematics. He is also Standing Director of the Chinese Association for Quality Assurance Agencies in Higher Education, Director of Chinese Mathematical Society, chairman of the Hunan Institute of Computational Mathematics and Application Software, and Associate Provost of National University of Defense Technology. He has completed four projects funded by the National Science Foundation of China. He has won five state awards of Science and Technology Progress. He has co-published four monographs (all ranked first) and well over 80 papers, including 50 SCI or EI-indexed ones. His research interests cover areas, such as mathematical modeling in tracking data, image processing, experiment evaluation, and data fusion. Dr. Dongyun Yi received his BS and MS degrees in applied mathematics and a PhD degree in system engineering in 1985, 1992, and 2003, respectively. Currently, he is a professor in systems analysis and integration. He is now the director of the Department of Mathematics and Systems Science, College of Science, National University of Defense Technology. He has been engaged in data intelligent processing research for over 20 years. He is in-charge of the xix
© 2012 by Taylor & Francis Group, LLC
xx
Au t h o Rs
National Foundation Research Project “The Structural Properties of Resource Aggregation—Analysis and Applications” and also participates in the National Science Foundation of China “Pattern Recognition Research Based on High-Dimensional Data Structure” as a deputy chair. He has co-published two monographs and published more than sixty papers. His research interests include data fusion, parameter estimation of satellite positioning, mathematical modeling, and analysis of financial data. Dr. Xiaojun Duan received her BS and MS degrees in applied mathematics and a PhD degree in system engineering in 1997, 2000, and 2003, respectively. She also had one year of visiting scholar experience at Ohio State University during 2007–2008. Currently, she is an associate professor in systems analysis and integration. She teaches data analysis, systems science, linear algebra, probability and statistics, and mathematical modeling and trains undergraduates as a faculty advisor for participation in the Mathematical Contest in Modeling, which is held by the Society for Industry and Applied Mathematics in the United States. By teaching courses in data analysis, she gained valuable experience and also received suggestions from students on how to better organize materials so as to impart knowledge of data analysis. Her research is funded by the Natural Science Foundation of China, Spaceflight Science Foundation in China. She has published about 30 SCI or EI-indexed research papers. Her research interests cover areas, such as data analysis, mathematical positioning and geodesy, complex system test, and evaluation. Dr. Jing Yao received her BS and MS degrees in applied mathematics and a PhD degree in systems analysis and integration in 2001, 2003, and 2008, respectively. Currently, she is a lecturer at the Department of Mathematics and Systems Science, College of Science, National University of Defense Technology. She teaches probability and statistics for the undergraduate level and time series analysis with applications for the graduate level. Some of her research is funded by the National Science Foundation of China and Spaceflight Science Foundation in China. She has published more than 20 research papers. Her research interests include mathematical geodesy, data analysis, and processing in navigation systems.
© 2012 by Taylor & Francis Group, LLC
Au t h o Rs
xxi
Dr. Defeng Gu received his BS degree in applied mathematics and a PhD degree in systems analysis and integration in 2003 and 2009, respectively. Currently, he is a lecturer at the Department of Mathematics and Systems Science, College of Science, National University of Defense Technology. He has published more than 20 research papers. His research interests are in mathematical modeling, data analysis, and spaceborne global positioning system (GPS) data processing. The GPS processing software that is being maintained by Dr. Gu has achieved success in real satellite orbit determination.
© 2012 by Taylor & Francis Group, LLC
1 E rror ThEory
1.1 Measurement 1.1.1 Measurement Data [1–7]
Data are basic elements for studying the nature and movement regularities of objects, as well as linkages among them. They are the basis of scientific research. In order to grasp the rules of development in scientific experiments and production practice, people use various methods to measure the required physical quantities. The data recorded are called measurement data (observational data or measured values). Owing to the complexity of objects themselves, the inaccuracy of measurement devices, or the inappropriateness of measurement methods, measurement data may contain a large measurement error (the part that is not consistent with the true value). In many cases, measurement data cannot be used directly. In order to use measurement data effectively, we need to apply some mathematical processing and treatments to the data. This includes analyzing the accuracy of the data and explaining its rationales; sorting and summarizing the data, analyzing interrelationships, and finding regularities in the data; and analyzing, estimating, and correcting measurement errors in the data to improve the data quality. Records from measurement equipments that are used to measure an object are measurement data. To obtain the data, we have to go through the following three steps: selection and installation of devices, observation, and reading. In any case, it is impossible to obtain measurement data that are the same as true values. The accuracy limitations of measurement devices, the subjective factors of measurers, and the environmental conditions will inevitably bring various errors to measurement data. Measurement practice shows that multiple measurements of one physical quantity inevitably have numerical fluctuations. Such a 1
© 2012 by Taylor & Francis Group, LLC
2
M e A suReM en t DAtA M o D eLIn G
fluctuation is usually caused by the uncertainty of measurement data. The difference between the measurement datum and the true value due to the accuracy limitations of measurement devices, the uncertainty of device performance, the subjective factors of measurers, and the environmental conditions is called measurement error. Measurement error always exists. Therefore, measurement data cannot be applied directly in scientific research and engineering practice that require high accuracy. A series of analysis and processing of data is necessary. This includes engineering analysis (analysis of measurement devices, measurement principles, measurement process, and data rationales) and mathematical analysis (data modeling, error analysis, parameter estimation, hypothesis test and precision evaluation, etc.). In dealing with measurement data mathematically, we should 1. Take full advantage of the characteristics of the measured physical quantity to establish an accurate mathematical model for its true value. 2. Take full advantage of the characteristics of the measuring device to build mathematical models for various measurement errors. 3. Use a solid mathematical theory as a basis. 4. Make full use of accuracy, speed, and storage of a computer. 1.1.2 Classification of Measurement
1.1.2.1 Concept of Measurement
Definition 1.1 The quantity whose physical characteristics (states, movements, etc.) can be evaluated or expressed by numerical values is called a physical quantity. The process of comparing a physical quantity with a value of unit through experimenting is called a measurement. The process can be expressed as L = u⋅q
© 2012 by Taylor & Francis Group, LLC
eRR o R t heo Ry
3
where L is the value of measurement, u is the value of unit, and q is the ratio. Example: For instance, the length of an object L = 100 mm, when u = 1 mm and q = 100. Definition 1.2 The data that are recorded from an experiment of measuring a known physical quantity with a specific method are called measurement data. 1.1.2.2 Methods of Measurement According to the ways of obtaining data, there are three methods of measurement:
1. Direct measurement: y = x, where y is the value of measurement and x is the unknown quantity. For example, the height of a person is obtained by direct measurement. 2. Indirect measurement: y = f (x), where y is the value of measurement, x is the unknown quantity, and f is a known function. For example, the volume of a steel ball V = (4/3) πR 3 is obtained by indirect measurement. 3. Combined measurement: There are several unknown quantities x1, x 2, . . ., xt, and various ways of measuring are combined. A series of equations between measured data y1, y2, . . ., yn and unknown quantities x1, x2, . . ., xt is established as follows: y1 = f 1 ( x 1 , x 2 , … , x t ) y 2 = f 2 ( x1 , x 2 , … , x t ) yn = f n (x1 , x 2 , …, xt ) If n > t, least squares method (introduced in Chapter 3) is usually used to determine x1, x2, . . ., xt.
© 2012 by Taylor & Francis Group, LLC
4
M e A suReM en t DAtA M o D eLIn G
Example 1.1 Let (x(t), y(t), z(t)) be the coordinate of the position of a spacecraft at time t. The distance between the jth radar station at the position (xj, yj, zj) and the spacecraft is R j (t ) =
( x ( t ) − x j ) 2 + ( y ( t ) − y j ) 2 + ( z( t ) − z j ) 2
The measured distance is Y j (t ) = R j (t ) + ε j (t ),
j = 1, 2,…
When n > 3, the orbital position parameter (x(t), y(t), z(t)) can be obtained from Yj (t), ( j = 1, 2, . . ., n). This process is a combined measurement. Multistation and multiequipment measurements in determining the orbit of a spacecraft are combined measurements. The right combination of design and use of combined measurements takes full advantage of benefits and overcomes weakness of measurement devices, improves quality of measurement data, and reveals the rationality of physical laws. The accuracy of measuring (x(t), y(t), z(t)) in Example 1.1 can be improved by selecting positions of measuring stations (xj, yj, zj) for j = 1, 2, . . . , n and increasing the number of radars n.
According to the processes of measurement, there are two methods of measurement:
1.1.2.3 Equal Precision and Unequal Precision Measurements
Equal precision measurement: In the entire process of measurement, all factors that affect the measurement error (equipments, methods, environmental conditions, states of measurers, characteristics of objects measured, etc.) are invariant. Strictly speaking, it is difficult to fulfill all the above conditions. In general, if all factors or the main factors determining the error are invariant, the measurement could be regarded as an equal precision measurement. Unequal precision measurement: Some of the factors that determine the measurement error change in the process of measurement.
© 2012 by Taylor & Francis Group, LLC
eRR o R t heo Ry
5
There are two types of measurements according to the states of objects to be measured:
1.1.2.4 Measurements of Static and Dynamic Objects
Measurement of static objects: The targets measured are static in the process of measurement. The physical quantity to be measured is a constant and can be measured repeatedly for many times. For example, measurements of the length of a table are measurements of static objects. Measurement of dynamic objects: The physical quantity to be measured is a function of time, space, and other factors in the process of measurement. Tracking measurements of missiles or satellites and measurements of expansion coefficients of railways or bridges with temperature are dynamic measurements. Remark 1.1 Sometimes there is no essential difference between static and dynamic object measurement. If the change of the physical quantity is relatively small, a dynamic measurement can be considered as a static measurement. Dynamic object measurement has many applications in scientific research and production processes. The design of dynamic object measurements can sometimes reveal some useful information. Example 1.2 Determine the expansion coefficients of a steel ruler using a highaccuracy device. It is known that L = L 0(1 + αt + βt 2), where L 0 is the length of the ruler at 0°C, α and β are constants of expansion coefficients, and t is the temperature. Let Li = L0(1 + αti + βti2) + εi, i = 1, 2, ..., m be the length of the ruler at time ti, where m > 2, εi, (i = 1, 2, ..., m) are the measurement errors. Values of α and β can be estimated by the least squares method (Chapter 3). The expansion rule of this steel ruler is determined for given values of α and β. Here, data Li (i = 1, 2, ..., m) are obtained by dynamic object measurement and an empirical formula L = L 0(1 + αt + βt 2) is used.
© 2012 by Taylor & Francis Group, LLC
6
M e A suReM en t DAtA M o D eLIn G
1.2 Measurement Error 1.2.1 Concept of Error
Error postulate: Error exists in any manufacturing and measuring process. Inevitably, there exist errors in manufacturing parts of measurement devices, measurement devices themselves, etc. This book focuses mainly on the mathematical processing of measurement data. Thus, the analysis, estimation, and correction of measurement error are emphasized in the book. Measurement error is the difference between the observed value and the true value. Let u be the true value, x be the observed value, and δ be the measurement error. Then, δ = x − μ or μ = x − δ. The true value exists objectively but is unknown or cannot be determined in the process of measurement. Another terminology in measurement is discrepancy. Discrepancy is the difference between two measurements. Discrepancy is always known but it cannot show the accuracy of measurement. 1.2.2 Source of Errors
Generally, errors come from multiple sources such as measurement methods, measurement devices, environmental conditions, human beings, etc. 1. Methodological error: Measurement methods and principles cause measurement errors. For example, the volume of a steel ball is V = (1/6)πd 3 where d is the diameter of the ball. Since (1/6)π can only be approximated, the difference between the true and the approximate values of the volume is a methodological error. 2. Equipment error: This kind of error contains benchmark error and measuring equipment error. A benchmark error is an error in comparing standard physical quantities such as the error of balance weight. Measuring equipment error is caused by the equipment production, installation, accessories, and other factors. 3. Environmental error: This kind of error is caused by environmental factors such as temperature, humidity, air pressure,
© 2012 by Taylor & Francis Group, LLC
eRR o R t heo Ry
7
vibration, magnetic field, etc. For example, the refraction of electric wave due to the heterogeneity of air causes the measurement error in tracking the orbit of a spacecraft. 4. Human error: Errors due to surveyor’s circadian senses, work attitudes, work habits, proficiency, and so on. There are many sources of measurement error. In the process of analyzing measurement data, we should focus on the main sources of error and analyze their characteristics to ensure the relevance and accuracy of mathematical models. 1.2.3 Error Classification
According to the methods of mathematical processing, there are three measurement errors: 1. Systematic error: This kind of measurement error has an obvious tendency or regularity in the process of measurement. Balance weight errors are system errors. If a standard balance weight of 1 kg actually weighs 990 g, the measurement data using such a balance weight will be consistently 10 g higher than their true values. Systematic errors have dramatic impact on measurement outcome. Usually, systematic error can be adjusted, estimated, and amended by experiments. Some successful engineering practice on dealing with systematic error could be found in Refs. [8,9]. 2. Random error: The magnitude and sign of such an error is unknown beforehand. However, a random error follows a specific statistical distribution as the number of measurement increases. It is a random variable with a certain distribution. A random error cannot be amended but can be estimated. Thermal noises and reading errors are random errors. In the process of measuring static objects, measurement errors come from the same population or follow the same distribution if all measurement conditions are the same, while, in the process of measuring dynamic objects, random errors form a time series. In many cases, such a time series can be regarded as a stationary time series. For example, the random errors of measured range and its change rate in the process of measuring the orbit of a spacecraft using a radar
© 2012 by Taylor & Francis Group, LLC
8
M e A suReM en t DAtA M o D eLIn G
system can be well described by a second-order normal and stationary autoregressive time series (AR(2) in short) with mean zero. More details will be discussed in Chapter 4. 3. Negligent error: Such errors are caused by accidents or uncertain factors abruptly such as malfunctioned devices. Negligent errors severely twist measurements and can be identified and removed easily. Negligent errors are also known as gross errors. Data that contain negligent errors are called outliers or wild values. 1.2.4 Quality of Measurement Data
The magnitude of a measurement error reflects the accuracy of measurement data. If there are no negligent errors, accuracy mainly means the following: 1. Correctness: the magnitude of systematic errors 2. Precision: the magnitude of random errors 3. Accuracy: the degree of closeness of measurements of a quantity to its true value. That is, accuracy is the combination of correctness and precision 1.2.5 Summary
In the process of measurement, we should pay special attention to the analysis of causes of measurement errors to avoid or minimize measurement errors as much as possible. An important job of analyzing measurement data is to determine, estimate, and adjust systematic errors, reveal the statistical characterization of random errors, and identify and eliminate negligent errors. Figure 1.1 presents a flow chart that illustrates sources, classifications of measurement errors, and their corresponding methods of processing. 1.3 Random Error in Independent Measurements with Equal Precision
This section, using static objects as an example, considers the random error in independent measurements with equal precision. Strictly speaking, completely independent measurements with exactly equal precision do not exist. However, many dynamic object measurements could be treated as independent measurements with equal precision
© 2012 by Taylor & Francis Group, LLC
9
eRR o R t heo Ry
Measurement error
Equipment error
Methodological error
Human error
Negligent error
Random error
Normal distribution
Elimination
Environmental error
Systematic error
Others distributions Constant error
Other types
Statistical analysis
Estimation and amendment
Precision
Correctness
Accuracy
Figure 1.1
Flowchart of sources, classification, and processing of measurement errors.
under certain conditions. In most cases, the random error could be described by a random variable subjected to a normal distribution. 1.3.1 Postulate of Random Error and Gaussian Law of Error
1. Postulate of random error: There exist random errors in any measurement process. In any measurement process, the magnitude and sign of a random error is unknown. However, as the number of measuring experiments increases, values of a random error usually follow a certain probability distribution. Random errors are
© 2012 by Taylor & Francis Group, LLC
10
M e A suReM en t DAtA M o D eLIn G
caused by many uncontrolled but frequently varied factors. The magnitude of perturbation from each factor is tiny compared to the overall random error. A random error is usually the sum of many such tiny errors. Therefore, by the Linderberg–Feller center limit theorem, a random error in a measurement process follows a normal distribution with mean 0. 2. Gaussian law of random error i. Bounded property: The probability that the absolute value of the random error exceeds a large positive number called a bound is close to zero. ii. Unique peak property: The chance for the random error taking an interval of small absolute values is much larger than that of taking an interval of large absolute values. iii. Symmetry property: If the number of measurements is large enough, the probabilities of the random error within intervals of the same absolute values but different signs are the same. 1.3.2 Numerical Characteristics of a Random Error
The numerical characteristics of a random error are mean and standard deviation. Measurement data obtained from multiple measurements distribute around their arithmetic mean. Standard deviation expresses the decentralization of a random error. Generally, the value of arithmetic mean can be used as an outcome of static object. Root-mean-squared deviation can be used as a precision index of data. 1.3.2.1 Mean Let xi = μ + δi, i = 1, 2, . . ., n be n observations from independent and equal precision measurements where μ is the true value of a physical quantity and δi is the measurement error. Then
1 x = n −
n
∑ i =1
1 xi = µ + n
n
∑δ i =1
i
(1.1)
is called the arithmetic mean of the data. If δi ~ N(μ, σ2) for i = 1, 2, . . ., n are identically and independently distributed (i.i.d.), xi’s are independent with the same distribution
© 2012 by Taylor & Francis Group, LLC
11
eRR o R t heo Ry
N(μ, σ2) and lim x = µ almost surely by Kolmogorov law of large n →∞ numbers and Gaussian law of error. Also, it can be shown that x is the uniformly minimum variance unbiased estimation of μ. Therefore, mean can be a point estimation true value of a physical quantity. However, the precision of measurements cannot be described by mean. 1.3.2.2 Standard Deviation Example 1.3 The following are observational data from five measurements on a certain workpiece using two different measuring devices: Group I: 20.0005, 19.9996, 20.0003, 19.9994, 20.0002 Group II: 19.99, 20.01, 20.00, 19.98, 20.02 The data in the two groups have the same mean of 20 but their degrees of concentration are different. The first group has a higher degree of concentration. According to statistical methods, metrology uses the second moment to represent the degree of concentration, σ2 = Eδ2, or estimates σ by the following formula: = σ
n
∑ δ /n i =1
(1.2)
2 i
where δi ’s are measurement errors and n is the number of measurements. Equations 1.1 and 1.2 show that mean and standard deviation have the same unit.
Since δi is unknown, the standard deviation σ cannot be obtained directly by Equation 1.2 and it can only be estimated from measurement data. The commonly used estimates are the following: Bessel estimator:
1.3.2.3 Estimation of Standard Deviation
n
s = σ
© 2012 by Taylor & Francis Group, LLC
∑ (x i =1
i
− x )2
n−1
(1.3)
12
M e A suReM en t DAtA M o D eLIn G
Suppose that
δ i ∼ N(0, σ 2 ), i = 1, 2,..., n Eδ i δ j = 0, i ≠ j
(1.4)
2 2s = σ 2 , that is, σ is an unbiased then, it can be shown that Eσ estimation of σ2. In fact
E(δ 12 + δ 22 + + δ n2 ) = nσ 2
n n E ( xi − x ) 2 = E i = 1 i = 1
∑
∑
=
n
∑ i =1
1 δi − n
∑
1 Eδ + 2 E n 2 i
= nσ 2 + Therefore,
2
(∑ δ )
2
j
2 − n
∑ Eδ (∑ δ ) n
i =1
i
j
1 2 n ⋅ nσ 2 − ⋅ nσ 2 = (n − 1)σ 2 2 n n n
E σ 2s = E
δj j =1 n
∑ (x
Peters estimator:
i =1
i
− x )2
n−1 n
σ p =
π 2
∑x i =1
i
= σ2
−x
n(n − 1)
(1.5)
Under the assumptions of Equation 1.4 and that δi has a normal p is an unbiased estimator of σ. distribution, it can be shown that σ s is In general, if measurement data have no negligent errors, σ p is p . When there exist negligent errors in the data, σ better than σ better than σ s since it is more robust. 1.3.2.4 Estimation of Mean and Standard Deviation δi ~ N(0, σ2), i.i.d. If x = (1/n)Σni =1xi, then
Var(x ) =
© 2012 by Taylor & Francis Group, LLC
σ2 n
Suppose that (1.6)
13
eRR o R t heo Ry
Therefore,
x =
1 n
n
∑ i =1
∧
xi
and σ x =
σs
n
=
1 n(n − 1)
n
∑ (x i =1
i
− x )2
(1.7)
are popularly used as estimators of mean and standard deviation. 1.3.3 Distributions and Precision Indices of Random Errors
Measurement practices show that many random errors in measurement processes follow normal or Gaussian distributions. Central limit theorems in probability theory affirmed the inevitability of this phenomenon. Those theorems state that if a random variable is a combination of a large number of random elements and each element takes a small share of gravity, the random variable follows a normal distribution. Theoretically, central limit theorems reveal that random errors produced by a large number of measurement processes follow normal distributions.
1.3.3.1 Distributions of Random Errors
Example 1.4 Figure 1.2 is the histogram of measurement errors in distance measured by a continuous-wave radar that tracks a spacecraft. This graph is very similar to the density function of a normal distribution.
Figure 1.2
The histogram of measurement errors in distance.
© 2012 by Taylor & Francis Group, LLC
14
M e A suReM en t DAtA M o D eLIn G
It is necessary to point out that not every measurement error has a normal distribution. Random errors with other distributions also appear frequently. We focus on the discussion of random errors with normal distributions below. The density function of a normal distribution is f (x ) =
( x − µ) 2 1 exp − 2σ 2 2π
where
(1.8)
+∞
µ=
(1.9)
∫ xf (x) dx
−∞
Suppose that x ∼ N(µ, σ ). In practice, only discrete samples x1, x 2, . . ., xn of x are available, and μ and σ2 are usually unknown. The following statistics n
1 x = n
n
∑x , i =1
i
σ 2s =
∑ (x − x ) i =1
i
2
n−1
are estimators of μ and σ2 because Ex = µ, Var(x ) =
σ2 n
(1.10)
Let v = x − x− . It can be proved that ν has a normal distribution and Eν = 0, Var( ν) =
n −1 2 σ n
(1.11)
Note that w = (n /(n − 1)) v ∼ N(0, 1). For any t > 0, P w < t = Pv <
(
)
n−1 σt = n
2 2π
w2 exp − dw 0 2
∫
t
(1.12) Let t * =
© 2012 by Taylor & Francis Group, LLC
((n − 1) (n))t.
15
eRR o R t heo Ry
2 2π
w2 exp − dw = Pα = 1 − α 0 2
∫
t
(1.13)
Using Equations 1.12 and 1.13
(
) (
)
P x − x < σt * = P x − σt * < x < x + σt * = Pα = 1 − α (1.14) Pα is called the confidence level, α the level of significance, t* the confidence coefficient, and σt* the precision index or error limit of x. Since t* is uniquely determined by Pα and is independent of the measurement process, σ is called the precision index of measurement.
This section only considers the impact of random errors on the precision index of measurement on static objects.
1.3.3.2 Precision Index of Measurement
1.3.3.2.1 Precision Index of a Single Measurement Assume that the measurement error follows a normal distribution and σ is its population standard deviation. A single measurement could be regarded as a sample from the population. When the confidence level is 0.9973, its precision index is 3σ. Below are some precision indexes for a single measurement.
• Standard deviation σ or its estimate σs • Average error θ or its estimate θs n
θ = 0.7979σ or θ s = 0.7979σ s = 0.7979
∑ (x i =1
i
− x )2
n−1
(1.15)
• Limit error δ lim = ±3σ, δ lim = ±3σ s
(1.16)
Standard deviation and limit error are the most used indices in providing measurement data.
© 2012 by Taylor & Francis Group, LLC
16
M e A suReM en t DAtA M o D eLIn G
1.3.3.2.2 Determination of Limit Error In addition to the supply of the value of measurement, it is necessary to provide the precision of index. The limit error is the most commonly used precision index in practice. Essentially, the limit error is determined by the confidence level
P {(x − 3σ ) ≤ x ≤ (x + 3σ )} = Pα = 0.9973
(1.17)
When n is relatively large and σs is very close to σ, it is approximate that P {(x − 3σ s ) ≤ x ≤ (x + 3σ s )} = Pα = 0.9973 Therefore, the limit error can be estimated by δlim = ±3σs. When n is not large, there is a difference between σs and σ, the limit error can be determined by the corresponding confidence level. It can be proved that (as an exercise) σ2 x − x ∼ N 0, n 2 −2 2 (n − 1)σ s σ ∼ χ (n − 1) x − x is independent with σ s2 σ −2
(1.18)
Thus, t =
n ( x − µ) ∼ t (n − 1)u σs
(1.19)
With a given level of significance α, the value of tα could be determined by P (| t | < t α ) = Pα = 1 − α
(1.20)
t σ t σ P µ − α s ≤ x ≤ µ + α s = Pα = 1 − α n n
(1.21)
or equivalently use
to determine t α . ± (t α σ s )/ n is the precision index of mean x .
© 2012 by Taylor & Francis Group, LLC
eRR o R t heo Ry
17
Example 1.5 Let n = 5, σ s = 0.01(mm ), x = 1.12. Find the precision index for Pα = 0.95. Solution
Since n = 5, the degree of freedom of the t distribution is 4. For P {| t | ≤ t α } = 1 − 0.05 = 0.95 the t table gives tα = 2.776. Thus, the precision index of mean x is ±
2.776 × 0.01 tασ s =± ≈ ±0.012 (mm ) 5 5
Some commonly used distributions in processing measurement data are normal, χ2, t, and F distributions. More information of these distributions is given in Appendix 2. Precision indices and confidence levels are closely correlated. Under the same condition, different confidence levels produce different precision indices. Below are some empirical rules of selecting Pα: Pα = 0.95 (t = 2) for a general accuracy of measurement. Pα = 0.9973 (t = 3) for a relatively important measurement in scientific research. Pα = 0.9999 (t = 4) for an ultrahigh reliability requirement in a specific scientific research and precision measurement. 1.4 Systematic Errors
A systematic error is fixed or follows a certain kind of regularity. There exist systematic errors in a measurement process. Sometimes systematic errors are large. Systematic errors are different from random errors. They are not easily identified or compensated. Hence, it is very important to analyze, estimate, and amend systematic errors. 1.4.1 Causes of Systematic Errors
1. Measuring equipments: System errors can be caused by designs, structures, principles of measuring equipments, etc.
© 2012 by Taylor & Francis Group, LLC
18
M e A suReM en t DAtA M o D eLIn G
2. Environmental factors: The environment of measurement is not consistent with the requirement of standard conditions under which devices operate normally. 3. Factors in measuring principles and computational methods: The measurement depends on an approximate formula, that is, the formula has a fixed error. 4. Factors of surveyors: The psychological and physiological factors of surveyors influence measurements. 1.4.2 Variation Rules of Systematic Errors
A systematic error is fixed or follows a certain kind of functional relationship. In many cases, the regularity can be found. Several commonly used functions are introduced as follows: 1. Constant systematic errors: The sign and the magnitude of a systematic error are fixed in the entire measuring process. For example, constant errors may be caused by various nonstandard weights in the process of weighing the same object by multiple balances. 2. Linearly systematic errors: Measurements increase or decrease proportionally with factors such as time, position, etc. in the measurement process and changes can be described by multivariate linear functions that go through the origin. 3. Periodic systematic errors: Measurement errors change periodically and can be described as periodic functions in the measurement process. For example, if there exists an eccentricity error e between the rotary and dial centers of the meter pointer, the reading error of the pointer at angle ϕ is ΔL = e sin ϕ. 4. Other systematic errors: Measurement errors follow some rules but these rules are too complicated to be described. 5. Uncertain systematic errors: The magnitude and the sign of a systematic error in the measurement process are unknown. However, the range of the error is estimable. The method of dealing with such an uncertain systematic error is relatively complex. Sometimes systematic errors as modification errors in acceleration of gravity, atmosphere refraction, etc. can be treated as random errors.
© 2012 by Taylor & Francis Group, LLC
eRR o R t heo Ry
19
The five systematic errors and their combinations may appear in the same engineering process. For example, in the process of tracking spacecrafts with radars, there exist constant systematic errors, linearly systematic errors, periodic systematic errors of radar tracking, location errors of tracking station, time errors, atmospheric refraction modification errors, and so on. 1.4.3 Identification of Systematic Errors
In general, systematic errors are large and noncompensatory. They rarely cause fluctuations in the data, have a large latency, and are difficult to be distinguished from signals. Therefore, it is very important to analyze measured targets, measuring devices, measuring principles, and measuring environments in detail. The following discussion introduces some useful methods for identifying systematic errors. 1. Contrast method: measuring the same target with equipments of different accuracies. Adjusting ranges of shooting targets and optical alignments use contrast methods. 2. Residual analysis: if there is no systematic error, the residual vi = xi − x is a random error of a white noise with zero mean. There will be no trend in the residual plot. This method can identify a certain kind of systematic error but cannot rule out such an error. Figure 1.3 displays residuals from several systematic errors. It is especially important to point out that residual plots can only confirm the existence of systematic errors and they cannot show their nonexistence. In other words, if there is no evidence of systematic errors in a residual plot, it does not mean there are no systematic errors in the measurement process. Systematic errors may be hidden in the data or compensated by other systematic errors. Example 1.6 Suppose there are 100 measurements of a physical quantity with a true value of 10, the random error is εi, and the systematic error is constant α. The value of each measurement is xi = 10 + α + εi , for i = 1, 2,… , 100
© 2012 by Taylor & Francis Group, LLC
20
M e A suReM en t DAtA M o D eLIn G
Y1(j)
4 0 –4
Y2(j)
20
10 Y3(j)
0
50
100 j
150
200
100 j
150
200
100 j
150
200
(2) A linear systematic error
0
–20
50
0
(3) A periodic systematic error
0
–10
20 Y4(j)
(1) No systematic error
0
50
(4) A combination of linear and periodic systematic errors
0
–20
Figure 1.3
0
50
100 j
150
200
Residuals caused by several types of systematic errors.
The residual is 1 vi = x i − x = x i − 100
100
∑ i =1
1 xi = ε i − 100
100
∑ ε , for i = 1, 2, … , 100 i =1
i
regardless of whether α = 0 and α ≠ 0. In this case, the residual cannot identify the constant systematic error.
© 2012 by Taylor & Francis Group, LLC
eRR o R t heo Ry
21
Remark 1.2 Example 1.6 provides an example that the residual plot does not reveal systematic errors, but it does not mean there is no systematic error. A more detailed analysis is needed from both the engineering background and the characteristics of the residual. 3. Residual analysis in regression models: test residuals by constructing statistics. This method may not detect all systematic errors. It is necessary to analyze the structure of residuals and the mathematical model of systematic errors to construct statistics. It should be pointed out that this method cannot find all possible systematic errors either. To find all possible systematic errors, we have to systematically analyze the structure of residuals and the mathematical models of systematic errors. The book will focus on the method of using regression analysis to find, estimate, and correct systematic errors. Detailed discussion is presented in Chapters 3 and 6. 1.4.4 Reduction and Elimination of Systematic Errors
The existence of systematic errors could be generally identified by analyzing thoroughly measurement equipments, measurement principles, observational data, and residuals. However, it is difficult to determine rules of change of systematic errors. The main content of dealing with systematic errors is to remove them from the data. There are three ways of elimination: 1. Eliminate the root cause of systematic errors i. Prior to make accurate measurements, a comprehensive analysis should be done to the principles, methods, equipments, and environment conditions of measurement to eliminate and correct sources of systematic errors. ii. Test whether benchmarks and standards (metric gauge, light wavelength, light speed, etc.) are accurate and reliable. iii. Make sure that measuring equipments are in normal working conditions and there are no accidents or abnormalities. iv. Test the accuracy and rationality of installation and location of measuring equipments. For example, the position of the observation station is a key factor in tracking spacecrafts.
© 2012 by Taylor & Francis Group, LLC
22
M e A suReM en t DAtA M o D eLIn G
v. Check whether principles and methods of measurement, and calculation formulas are correct and whether there are theoretical errors. vi. Check whether measurement sites are consistent with objective requirements. vii. Avoid subjective errors of surveyors. 2. Eliminate systematic errors in measuring process: If systematic errors cannot be removed from the root cause, they can be eliminated in the process of measurement. The method of elimination is based on a clear understanding of the characteristics of measuring equipments, methods, and processes as well as measured targets. In the practice of measurement, measuring equipments and measured objects are varied. Therefore, there should be specific methods for specific problems. Since many problems in modern science and technology (such as space technology) produce data with extremely high precision, the requirement for the elimination of systematic errors is also high and the corresponding method should be strongly specifically targeted. Commonly used methods of eliminating systematic errors in the process of measurement can be found in Refs. 1–3, 6, 8, and 10. 3. Estimate and correct systematic errors after measurement: Many systematic errors can be estimated and corrected with experimental data by being expressed as explicit functions of finite unknown parameters. This is the main focus of this book and will be discussed thoroughly in Chapters 3 and 6. Estimate systematic errors by finding rules of change in systematic errors, describing systematic errors using some explicit functions with finite unknown parameters, and estimating unknown parameters in the functions by mathematical methods (Chapter 3). Amend measurement data according to the estimation of systematic errors. 1.5 Negligent Errors
In general, the magnitude of negligent errors is larger than that of systematic errors and random errors. Therefore, negligent errors distort the measurement severely and must be identified and eliminated from the measurement data.
© 2012 by Taylor & Francis Group, LLC
23
eRR o R t heo Ry
1.5.1 Causes and Avoidance of Negligent Errors
1.5.1.1 Causes of Negligent Errors
1. Subjective reasons: Reading or recording errors from improper operations of staff due to lack of responsibility and experience or due to fatigue. 2. Objective reasons: Unexpected changes of targets and environmental conditions, failures of measuring equipments, etc. For example, measuring targets and environmental conditions change drastically in tracking rockets when stage stations are separated. In general, negligent errors can be avoided in measuring process by strengthening surveyors’ responsibility, avoiding fatigue and overload, and intensifying maintenance of measuring equipments. It is helpful to identify negligent errors by using multiple measuring equipments.
1.5.1.2 Avoidance of Negligent Errors
1.5.2 Negligent Errors in Measurement Data of Static Objects
The general method of dealing with negligent errors is identification and elimination. Only measurements of static objects are discussed in the following. Negligent errors in measurement data of dynamic objects will be discussed in Chapter 3. The premise of applying Romannovschi criterion is that there exist a small number of outliers in a series of independent and equal precision data when the data set is not large. Romannovschi criterion: Suppose that x1, x2, . . ., xn are n independent and equal precision measurements of a certain physical quantity and xj is an abnormal value. Let 1.5.2.1 Romannovschi Criterion
x( j) =
1 n−1
∑x i≠ j
i
vi = xi − x ( j ), 1 ≤ i ≤ n
© 2012 by Taylor & Francis Group, LLC
24
M e A suReM en t DAtA M o D eLIn G
σ2 ( j ) =
1 n−2
∑v i≠ j
2 i
xi ∼ N(µ, σ 2 ), {xi } i .i .d Then, n v j ∼ N 0, σ2 , n−1
n − 1 vj ∼ t (n − 2) n σ( j )
(1.22)
Therefore, if x j − x ( j ) > K (n, α )σ( j )
(1.23)
xj has a negligent error and should be eliminated; otherwise, xj should be retained. Here, α is the level of significance. For n from 4 to 29; α of 0.05 and 0.01, K(n, α)’s are given in Table 1.1. Proof Let xi = μ + εi, εi ~ N(0, σ2), i = 1, . . ., n. Then, v j = x j − x( j) = ε j − Table 1.1 α n 4 5 6 7 8 9 10 11 12 13 14 15 16
1 n−1
∑ε i≠ j
i
K(n,α) Value 0.05
0.01
4.97 3.56 3.04 2.78 2.62 2.51 2.43 2.37 2.33 2.29 2.26 2.24 2.22
11.45 6.53 5.04 4.36 3.96 3.71 3.54 3.41 3.31 3.23 3.17 3.12 3.08
© 2012 by Taylor & Francis Group, LLC
α n 17 18 19 20 21 22 23 24 25 26 27 28 29
0.05
0.01
2.20 2.18 2.17 2.16 2.15 2.14 2.13 2.12 2.11 2.10 2.10 2.09 2.09
3.04 3.01 3.00 2.95 2.93 2.91 2.90 2.88 2.86 2.85 2.84 2.83 2.82
eRR o R t heo Ry
25
Thus, Ev j = 0, Var(v j ) = σ 2 +
n n−1 2 σ = σ2 2 n−1 (n − 1)
Therefore, n v j ∼ N 0, σ2 n−1 Without loss of generality, assume j = n, i = 1, 2, . . ., n − 1, A = (aij) is an orthogonal matrix of (n − 1) × (n − 1) whose first row is
(1/
) (
)
n − 1 ,…, 1/ n − 1 . According to the orthogonal property, 1, i = 1 , a1k aik = 0, i > 1 k =1
n −1
n −1
∑
∑a k =1
ik
= 0, i = 2, 3, …, n − 1
Let y x 1 1 = A y x n −1 n −1 Since xi for i = 1, 2,. . ., n are independent and have the same distribution of N(μ, σ2), yi(i = 1, . . ., n − 1) are independent and have normal distributions. More specifically, 1. y1 ∼ N yi =
(
n − 1µ, σ 2
n −1
∑a x k =1
ik k
)
∼ N (0, σ 2 ), i = 2, 3, …, n − 1
2. y1 =
© 2012 by Taylor & Francis Group, LLC
1 n−1
n −1
∑x i =1
i
=
n − 1x (n)
26
M e A suReM en t DAtA M o D eLIn G n −1
∑ i =1
yi2 = =
n −1
∑ i =1
xi2 =
n −1
∑v i =1
n −1
∑
Therefore,
i=2
yi2 =
n −1
∑ x i =1
2
i
− x (n) + (n − 1)x 2 (n)
+ (n − 1)x 2 (n)
2 i
n −1
∑v i =1
2 i
Since {yi} for i = 2, 3, . . ., n – 1 are independent and have a normal distribution with mean zero and standard deviation σ,
∑ ni =−21 yi2 / σ 2 ∼ χ 2 (n − 2). 3. Since 1 n−1
vn = x n −
n −1
∑ j =1
1 E(vn yi ) = E xn − n−1 =
n −1
∑ k =1
=−
aik Exn x k −
1 n−1
x j , yi =
n −1
∑a x ,
∑
n −1 n −1
∑∑ j =1 k =1
n −1
∑ k =1
i = 2, 3, …, n − 1
ik k
k =1
xj j =1
n −1
1 n−1
n −1
aik x k
aik Ex j x k = −
µ+σ
2 n −1
∑ a (µ + σ ) = − n − 1 ∑ a 2
ij
j =1
ij
j =1
1 n−1
n −1
∑ a Ex j =1
ij
= 0, i = 2, 3, …, n − 1
νn and y2, . . ., yn−1 are mutually independent. Therefore, (n − 1/(n)(vn /σ) 2
(1/σ ) =
© 2012 by Taylor & Francis Group, LLC
n −1
∑y i =2
2 i
=
(n − 2)
n − 1 vn ∼ t (n − 2) n σ(n)
(n − 1/n)(vn /σ) 2
(1/σ )
2 j
n −1
∑v i =1
2 i
(n − 2)
27
eRR o R t heo Ry
The premise of applying Grubbs criterion is that the data set is large (in general, n ≥ 20). Suppose that there are n independent measurements of equal precision for a quantity with a true value of μ; the random error in each measurement follows N(0, σ2). The model of measurement data is
1.5.2.2 Grubbs Criterion
yi = µ + ε i
(1.24)
where εi ’s are i.i.d. N(0, σ2). By rearranging y1, . . ., yn from the smallest to the largest, we have the following criterion: Grubbs criterion: Let z1, . . ., zn in ascending order be the data obtained by rearranging y1, . . ., yn from the smallest to the largest and 1 y = n
∑
1 yi , z = n
Then,
{
max yi − y
1≤ i ≤ n
∑
1 zi = y , σ = n−1 2 s
} = max { z
1
n
∑( y i =1
i
− y )2
}
− z , zn − z = yd − y
Under the level of significance α, if yd − y ≥ G (n, α )σ s , yd has a negligent error and will be eliminated; otherwise, it is retained. Again, for some n and α, G(n, α) is obtained from Table 1.2. Table 1.2 α n 3 4 5 6 7 8 9 10 11 12 13 14 15 16
G (n,α) Values 0.05
0.01
n
α
0.05
0.01
1.15 1.46 1.67 1.82 1.94 2.03 2.11 2.18 2.23 2.29 2.33 2.37 2.41 2.44
1.16 1.49 1.75 1.94 2.10 2.22 2.32 2.41 2.49 2.55 2.61 2.66 2.71 2.75
17 18 19 20 21 22 23 24 25 30 35 40 50 100
2.48 2.50 2.53 2.56 2.58 2.60 2.62 2.64 2.66 2.75 2.81 2.87 2.96 3.17
2.79 2.82 2.85 2.88 2.91 2.94 2.96 2.99 3.01 3.10 3.18 3.24 3.34 3.59
© 2012 by Taylor & Francis Group, LLC
28
M e A suReM en t DAtA M o D eLIn G
1.5.2.3 Summary of Identification Criteria
1. The two criteria aforementioned are essentially derived from the assumptions that yi = μ + εi, εi ~ N(0, σ2) and yi ’s are mutually independent. When n is between 20 and 100, the Grubbs criterion is better. When n is between 4 and 20, the Romannovschi criterion is better. 2. Both criteria assume that there is only one outlier in the data. When there are multiple outliers in the data, one way of applying the two criteria is to divide n measurements into N groups such that each group has at most one outlier. The method of division should be independent of the measurement data [10,11]. Also, the Tietjen–Moore test is a generalization of the Grubbs’ test to the case of multiple outliers [12]. However, the Tietjen–Moore test requires that the suspected number of outliers be specified exactly. If this is not known, it is recommended that the generalized extreme studentized deviate test be used instead, which only requires an upper bound on the number of suspected outliers [13]. 3. The method of eliminating outliers one by one, introduced in Chapter 3, is applicable to both static and dynamic data and can be used to treat data with multiple outliers too. 1.6 Synthesis of Errors
Measurement errors exist in any measuring process. They are combinations of a series of errors in various measurement stages. The primary contents of error synthesis include analyzing various factors of errors accurately and describing their synthetic influence precisely. 1.6.1 Uncertainty of Measurement
Because of measurement errors, measurement data cannot be regarded as true values of the quantity to be measured. When a surveyor provides the data to users, he or she should give not only the numbers in the data but also a neighborhood of the data. Such a neighborhood contains the true value with a large probability and is called the uncertainty of measurement.
© 2012 by Taylor & Francis Group, LLC
29
eRR o R t heo Ry
In general, an uncertainty of measurement should be provided at the same time as the measurement data to show the confidence level of measurement. 1.6.1.1 Estimation of Measurement Uncertainty The uncertainty of measurement has various components. In order to estimate the overall uncertainty, each component has to be estimated first. There are two kinds of methods in estimating measurement uncertainty: statistical methods and nonstatistical methods. 1.6.1.1.1 Statistical Methods: Utilize the Principle of Small Probability to Determine the Uncertainty For example, for a level of significance α,
( (
)
the confidence level Pα = 1 − α, we have P µ − t α σ s / n ≤ x ≤
(
µ + tα σs / n
(
)) = P . α
Therefore, the uncertainty of the mean x of n measurements is
)
± t α σ s / n , where tα is determined from a normal distribution or a t distribution. 1.6.1.1.2 Nonstatistical Methods: Estimate Uncertainty Empirically or by Other Informational Estimates In many cases, the uncertainty of
measurement cannot be estimated by statistical methods; nonstatistical methods should be used. Whenever it is feasible, a nonstatistical method is a very efficient way to quantify the uncertainty of measurement. For example, the uncertainty of measurement can be obtained by comparing data from multiple measurements of a quantity using a device with those from multiple measurements of the same quantity using a device of higher precision (e.g., an order of magnitude higher). Comparing radar tracking measurements using continuous waves with aircraft measurements using light waves is a nonstatistical method of estimation. The data provided to users are preprocessed data. They should only contain random errors and nonestimable systematic errors. Since measurement uncertainty has several components, its description should cover all components and identify whether the component is obtained by a statistical method or a nonstatistical method.
1.6.1.2 Propagation of Uncertainties
© 2012 by Taylor & Francis Group, LLC
30
M e A suReM en t DAtA M o D eLIn G
Note that errors considered in uncertainty are always treated as random errors and random errors are described by standard deviations (or variances). The propagation of uncertainties follows the propagation of variances. Some important notes in propagation are as follows: 1. Precisely identify properties and distributions of various errors. For a specific confidence level, random errors of different distributions have different uncertainties. 2. Accurately describe relationships among various errors. 3. Correctly determine the propagation relationship of various errors to the total error. The distribution of error (see common error distributions in Refs. 7, 10) has a direct impact on the synthesis of measurement uncertainties. Regardless of the distributions of random errors, the method of synthesizing random errors by variance is the same. (The dependence among errors should be checked and limit errors should correspond to the same confidence probability.) Since confidence levels corresponding to the same confidence probability are different for a random error with different distributions, it is necessary to identify the distribution of a random error in order to determine its measurement uncertainty accurately. In the previous discussion, it is supposed that random errors of measurement are independent. Such an assumption is difficult to fulfill in practical applications. Note that the dependence among random errors has an influence on the measurement uncertainty. The covariance matrix should be taken into consideration in studying measurements of high precision. Sometimes it is necessary to estimate the covariance matrix. In general, error factors propagate their impact into final measurements according to certain relationships of transfer. Nonlinear propagations are commonly encountered in indirect and combined measurements. The component of measurement uncertainty can only be accurately obtained by manipulating propagation relationships correctly. The practice of measurement and the synthesis of measurement uncertainty are closely related. Since there are many sources of measurement errors, the measurement uncertainty for the same object and equipment changes with environmental conditions and surveyors.
© 2012 by Taylor & Francis Group, LLC
eRR o R t heo Ry
31
1.6.2 Functional Errors
Section 1.1 introduced direct, indirect, and combined measurements. The last two methods of measurement are often used in the measurement practice of high precision. There is no need to discuss the error propagation in direct measurements. The error propagation relationship of combined measurements is complex. It requires modern linear and nonlinear regression analysis (see Chapter 3). Here, we focus on the functional error in indirect measurements. Let y = f (x 1, x 2 , . . ., xn) be a multivariable function, where x1, x2, . . . , xn are direct measurements and y is an indirect measurement, that is, the indirect measurement is one or more functions of direct measurements. Thus, errors of indirect measurements are function errors of direct measurements. This kind of error is called functional error. There are two types of functional errors: functional systematic errors and functional random errors. 1.6.2.1 Functional Systematic Errors Suppose that f is a continuously differentiable function of n variables x1, x2, . . ., xn and systematic errors of direct measurements are Δx1, Δx2, . . ., Δxn. The systematic error of indirect measurement y is defined as
∆y ≈
∂f ∂f ∂f ∆x1 + ∆x 2 + + ∆xn ∂x1 ∂x 2 ∂xn
(1.25)
where (∂f/∂xi) (i = 1, 2, . . ., n) are partial derivatives and are called error transfer functions. Functional random errors are usually described by standard deviations. Suppose that random errors from direct measurements δxi (i = 1, 2, . . . , n) have the following properties: 1.6.2.2 Functional Random Errors
Eδxi = 0, Eδxi2 = σ x2i Eδxi δx j = ρij σ xi σ x j (i , j = 1, 2,..., n)
© 2012 by Taylor & Francis Group, LLC
(1.26)
32
M e A suReM en t DAtA M o D eLIn G
According to the full differential derivative formula, the random error of indirect measurement can be obtained from the total differential equation δy =
n
∂f
∑ ∂x δx i =1
i
(1.27)
i
Therefore, Eδy = 0 σ 2y = E(δy )2 = E +2
n
∑ i =1
(1.28) 2
∂f δxi = ∂xi
∂f ∂f ρij σ xi σ x j ∂ x ∂ x i j 1≤ i < j ≤ n
n
∑ i =1
2
∂f 2 ∂x σ xi i
∑
(1.29)
Particularly, if ρij = 0 (i ≠ j )
(1.30)
then σ = 2 y
n
∑ i =1
2
∂f 2 ∂x σ xi i
(1.31)
In practice, σ xi and ρij have to be estimated and this can be done by the following method. Suppose that there are Nj measurements for the jth physical quantity, the measurement data are x j 1 , x j 2 ,…, x jN j
( j = 1, 2, …, n)
(1.32)
σ xi can be estimated by the Bessel formula 1 xj = Nj
© 2012 by Taylor & Francis Group, LLC
Nj
∑x i =1
ji
(1.33)
33
eRR o R t heo Ry
σ
2 xj
1 = ( N j − 1)
Nj
∑ (x i =1
ji
(1.34)
− x j )2
And Eδxiδxj and ρij can be estimated as follows: 1 Eδxi δx j = Ni N j
1 ρ ij = N j N i σx j σxi
Ni
Nj
∑ ∑ (x k =1 l =1
Ni
ik
Nj
∑ ∑ (x k =1 l =1
ik
− xi )(x jl − x j )
(1.35)
− xi )(x jl − x j )
(1.36)
1.7 Steps of Data Processing: Static Measurement Data
So far we have introduced some basic concepts of measurement and have discussed some basic issues in measurements of static objects. The basic steps of data processing are briefly explained below using measurements of static objects as an example. Let μ be the true value and x1, x2, . . ., xn be n independent and equal precision measurements. 1. Identify and eliminate negligent errors. 2. Calculate x , residual vi = xi − x (i = 1, 2, …, n), and the standard deviation σ s =
(1 (n − 1)) ∑ v . n
i =1
2 i
3. Check and amend systematic errors. Let xi (i = 1, ... , n) be the amended data. Calculate x =
1 n
n
∑ i =1
xi , vi = xi − x , σ s =
1 n−1
n
∑ v , σ i =1
2 i
x
=
1 n
σ s
4. Calculate the precision index ±t α σ x where t σ t σ P µ − α s ≤ x ≤ µ + α s = 1 − α n n
© 2012 by Taylor & Francis Group, LLC
(1.37)
34
M e A suReM en t DAtA M o D eLIn G
P (µ − t α σ x ≤ x ≤ µ + t α σ x ) = 1 − α
(1.38)
5. Obtain the total limit δ lim x based on methods of error propagation, and present the measurement interval x ± δ lim x and its confidence level Pα = 1 − α. It is necessary to point out that 1. Steps 1–5 may need to be repeated several times for a complex problem. 2. Measuring uncertainty is supported internationally. The book Recommendations on Measuring Uncertainty INC1(1980) from National Measurement Bureau suggested some regulations. 3. There are some differences between steps of dealing with static and dynamic data. The key issue in processing dynamic data is to establish mathematical models based on statistical characteristics of true signals, systematic errors, and random errors. Mathematical methods of dealing with dynamic data are far more complex. EXERciSE 1 1. Let xi = μ + δi, i = 1, 2, . . ., n, and x = (1/n) ∑ni = 1 xi, where μ is the real value and Eδ i = 0, Eδ i 2 = σ 2 , i = 1, 2,…, n, Eδ i δ j = 0, (i ≠ j ) Prove that 1 E n − 1
(xi − x− )2 = σ 2 i =1 n
∑
2. Let xi = μ + δi, i = 1, 2, . . ., n; x− = (1/n) ∑ ni = 1 xi where μ is the expectation and {δi} i.i.d. ~ N(0,σ2).
© 2012 by Taylor & Francis Group, LLC
eRR o R t heo Ry
Prove that
n
E
∑x i =1
i
−x
n (n − 1)
35
2 σ π
=
3. Let xi = µ + δi , i = 1, 2, …, n; x = (1/n) ∑ ni = 1 xi , where μ is the expectation and {δi} ~ χ2 (n). Prove that i. x − µ ∼ N(0,(σ 2 /n)).
ii. (n − 1) σ S 2 /σ 2 ∼ {δ i }χ 2 (n − 1).
iii. x − µ and σ s 2 /σ 2 are independent. iv. t = σs =
n (x − µ)/(σ s ) ∼ t (n − 1), where
(xi − x− )2 (n − 1) . i =1 n
∑
4. Suppose that ε ~ N(0, σ2 In), ε = (ε1, ε2, . . ., εn)τ. Prove that η ∼ N(0, σ 2 ( I n − H n )) where 1 ε + + εn η= ε− 1 , n 1
1 1 H n = (1,…, 1) n 1
5. Let xi = μ + δi, i = 1,2, . . ., n, and {δi} i.i.d. ~ N(0, σ2), x( j) =
1 n−1
∑x i≠ j
i
vi = xi − x ( j ), i = 1, 2,…, n σ( j ) =
© 2012 by Taylor & Francis Group, LLC
1 n−2
∑v i≠ j
2 i
36
M e A suReM en t DAtA M o D eLIn G
Prove that n − 1 vj ∼ t (n − 2) n σ( j ) 6. Let x1,x2,⋅⋅⋅,xn be n independent and equal precision measurements for the physical quantity μ. Suppose that measurement errors follow a normal distribution N(0, σ2), x , σ s , x ( j ) , σ ( j ) , v j defined in Section 1.5. Use x , σ s , x j − x to express x ( j ), σ( j ), x j − x ( j ). 7. Let x1, x2, … ,xn be n independent and equal precision measurements for a physical quantity μ. Suppose that measurement errors are subjected to a normal distribution N(0, σ2). If xi is the only outlier among x1,x2, … ,xn calculate the probability of misidentifying xj ( j ≠ i) as an outlier. 8. For a fixed α, explain why K(n, α) decreases while G(n, α) increases as n increases. 9. Explain why Δx1, Δx2, . . ., Δxn have to be relatively small in Equation 1.25. If Δx1, Δx2, . . ., Δxn are relatively large, how to amend Equation 1.25? Give examples to show whether σ x21, σ x2 2 , ..., σ x2n in Equation 1.29 must also be relatively small? 10. In Section 1.7, why is it necessary to identify and eliminate negligent errors first? If not, what is the influence on other steps? 11. What are main points of this chapter?
References
1. Jinwen Liang, Lincai Chen, Gong He. Error Theory and Data Processing. Beijing: China Metrology Publishing Press, 1989 (in Chinese). 2. Xiuyin Zhou. Error Theory and Experimental Data Processing. Beijing: Beijing University of Aeronautics and Astronautics Press, 1986 (in Chinese). 3. Yetai Fei. Error Theory and Data Processing. Beijing: China Mechanics Press, 1986 (in Chinese). 4. Shiying Zhang, Zhimin Liu. Measurement Practices and Data Processing. Beijing: Science Press, 1977 (in Chinese). 5. Shiji Zhang. Measurement Errors and Data Processing. Beijing: Science Press, 1979 (in Chinese). 6. Yaoming Xiao. Error Theory and Applications. Beijing: China Metrology Publishing Press, 1987 (in Chinese).
© 2012 by Taylor & Francis Group, LLC
eRR o R t heo Ry
37
7. Zhimin Liu. Theory on Error Distribution. Beijing: Atomic Energy Press, 1988 (in Chinese). 8. Yu Sha, Yi Wu, Zhengming Wang, Mengda Wu, Guangxian Cheng, Liren Wu. Introduction to Accuracy Analysis of Ballistic Missiles. Changsha: National University of Defense Technology Press, 1995 (in Chinese). 9. Peizhang Jia. Error Analysis and Data Processing. Beijing: National Defense Industry Press, 1992 (in Chinese). 10. Dingguo Sha. Applied Error Theory and Data Processing. Beijing: Beijing Institute of Technology Press, 1993 (in Chinese). 11. Douglas M. Hawkins. Identification of Outliers. London: Chapman and Hall, 1980. 12. Vic Barnett, Toby Lewis. Outliers in Statistical Data. 2nd ed. New York: John Wiley & Sons, 1994. 13. Gary L. Tietjen, Roger H. Moore. Some Grubbs-type statistics for the detection of outliers, Technometrics, 1972, 14(3): 583–597. 14. Bernard Rosner. Percentage points for a generalized ESD many outlier procedure. Technometrics, 1983, 25(2): 165–172.
© 2012 by Taylor & Francis Group, LLC
2 Par amE Tric
r EPrEsEnTaTions of fun cTions To B E E sTimaTEd
2.1 Introduction
Chapter 1 introduces the basic knowledge on measurement and measurement error. It presents a preliminary discussion on processing static measurement data. The models in processing static measurement data are simple. In processing dynamic measurement data, modeling is extremely important for measurement data because it is the key factor in determining the results of data analysis. The core idea of this book is to transform problems of processing measurement data into issues of parametric estimation in linear or nonlinear regression models and solve the problems using research results in modern regression analysis. The effectiveness of solving problems of processing measurement data using modern regression analysis depends on two factors. The first one is by using advanced methods of estimation and the second one is by establishing good models. These two factors are complementary to each other. A good model uses the least possible parameters to represent true signals and systematic measurement errors, has a small representation error, separates signals and systematic errors easily using the theory and method of modern regression analysis, and estimates statistical properties of random errors easily [1,2]. Unlike books on functional approximation [3–8], this chapter does not study problems of function approximation. It applies results in function approximation and other mathematical methods to express an unknown function (true signal measured, systematic errors) in terms of known functions or expressions with the least possible 39
© 2012 by Taylor & Francis Group, LLC
40
M e A suReM en t DAtA M o D eLIn G
unknown parameters. To save space, proofs of many results on function approximation in this chapter are not provided. Interested readers could refer to the references of this chapter. Remark 2.1 The main focus of this chapter is parametric representation of functions to be estimated. Unlike numerical analysis, such parametric representations are not for the convenience of computation but for the accurate estimation of functions to be estimated and measurement errors using measurement data. Therefore, the criterion of judging various methods of representation is not one’s computational convenience but whether it helps to estimate the true function and systematic error [1,2]. The primary ways of representing unknown functions (true signals or systematic errors) using known functions with a few estimable parameters in mathematical modeling of dynamic data are as follows [9-11]: 1. Using algebraic polynomials or trigonometric polynomials 2. Using splines of algebraic polynomials or trigonometric polynomials 3. Using known differential equations satisfied by estimable functions with estimable initial values 4. Using empirical formulas that are derived from scientific laws or engineering background. Such empirical formulas take different values of parameters in various situations. This chapter mainly introduces the four methods mentioned above in converting estimable functions to know functions with estimable parameters. Through such representations, problems of processing dynamic data become problems of estimating parameters in regression models. 2.2 Polynomial Representations of Functions to Be Estimated
Definition 2.1 Suppose that a 0, ak, bk (k = 1, 2, . . ., n) ∈ R, Pn (t ) = a0 + a1t + + an t n (an ≠ 0)
© 2012 by Taylor & Francis Group, LLC
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
Tn (t ) = a0 +
n
∑ (a cos kt + b sin kt )(a k =1
k
k
2 n
41
+ bn2 ≠ 0)
are an algebraic polynomial of nth degree and a trigonometric polynomial of nth degree, respectively. Next we discuss how to use algebraic polynomials (or trigonometric polynomials) to fit a continuously differentiable function (or a periodic and continuously differentiable function). Our main focus is to determine the degree of a polynomial that has the best fit to the function under certain accuracy. The following discussion has four sections. Section 2.2.1 proves that any continuous function on a closed interval can be approximated by a polynomial; Section 2.2.2 proves the existence and uniqueness of the best fitting polynomial; Section 2.2.3 discusses the conversion among best approximations on different intervals and the relationship between the approximation of continuous functions on closed intervals and that of periodic functions with a period of 2π; Section 2.2.4 studies the selection of degrees of polynomial approximation for continuously differentiable functions of high orders; Section 2.2.5 provides basic methods in polynomial approximation. 2.2.1 Weierstrass Theorem
The following two theorems are the groundbreaking work in polynomial approximation of continuous functions. Their proofs are detailed in Ref. 3. It is necessary to point out that the approximation polynomial given in Theorem 2.1 below is not the best one. In other words, the polynomial given in Theorem 2.1 does not have the highest accuracy among all approximation polynomials of the same degree. Theorem 2.1 Suppose that f ∈ C[0,1] Bn ( f , t ) =
© 2012 by Taylor & Francis Group, LLC
n
∑ f n C t (1 − t ) k=0
k
k k n
n−k
(2.1)
42
M e A suReM en t DAtA M o D eLIn G
Then, Bn ( f,t) is a polynomial of degree n and lim max Bn ( f , t ) − f (t ) = 0
n →∞ 0 ≤ t ≤ 1
(2.2)
Theorem 2.2 Suppose that f ∈ C2π (the set of all continuous functions with a period 2π). Then ∀ ε > 0, ∃ Tn (t), s.t. max Tn (t ) − f (t ) < ε t ≤π
(2.3)
Note that functions in C[a,b] can be easily transformed into functions in C[0,1] through a transformation of the independent variable. Theorem 2.1 has a universal value of applications. The conclusion of Theorem 2.1 that a continuous function can be approximated by a polynomial at any accuracy is first proved by Weierstrass and is improved by Bernstein who provided a constructive proof using Bernstein polynomials [3]. Theorem 2.1 combines works of both Weierstrass and Bernstein. Weierstrass theorem shows that continuous functions can be approximated by polynomials at any accuracy. Here are some interesting and important questions regarding the approximation. 1. Given a fixed approximation accuracy ε, how to choose an approximation polynomial with a small number of parameters to obtain such an accuracy? Should we use algebraic polynomials, trigonometric polynomials, spline polynomials or wavelet functions? 2. Given the approximation accuracy ε, how to select the algebra polynomial? How to determine the degree of approximation polynomial? Generally speaking, the lower the order the better the estimation. 3. If an algebraic polynomial is used to approximate a continuous function, how to determine its order? In general, the lower of the order of an approximation polynomial the better. 4. If an algebraic polynomial of certain order is used, the number of estimable parameters is fixed. How to estimate such parameters so that the highest accuracy is achieved?
© 2012 by Taylor & Francis Group, LLC
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
43
There are no effective methods in general to answer question 1. Such questions can only be answered case by case. Questions 2 and 3 will be discussed in the following section: 2.2.2 Best Approximation Polynomials
Let Hn and H n* be sets of algebraic and trigonometric polynomials of degree n with real coefficients, respectively. ∆( P ) = max f (t ) − P (t ) , P (t ) ∈ H n a≤t ≤b
(2.4)
is called the deviation between f(t) and P(t) and
En = En ( f ) = inf {∆( P )} P ∈H n
(2.5)
is called the least deviation between Hn and f(t). A polynomial in Hn that satisfies Δ(P) = En is called the best approximation polynomial of f (t) of degree n. Similarly, a polynomial in H n* that satisfies ∆(T ) = En* is the best trigonometric approximation polynomial of f (t) with degree n. Theorem 2.3 ∀f ∈ C[a,b], ∃ P (t) ∈ Hn such that ∆( P ) = En
(2.6)
Theorem 2.4 P(t) ∈ Hn is the best approximation polynomial of f (t) if and only if |P(t) – f (t)| has at least n + 2 maximum points in [a,b] and the signs of P(t) – f (t) at two adjacent maximum points are opposite. Theorem 2.5 ∀f ∈ C[a,b], Hn has one and only one best approximation polynomial of f.
© 2012 by Taylor & Francis Group, LLC
44
M e A suReM en t DAtA M o D eLIn G
Theorem 2.6 ∀f ∈ C 2π , H n* has one and only one best approximation polynomial of f . Theorem 2.7 If f ∈ C2π is an odd (respectively, even) function, the best approximation polynomial in H n* is also an odd (respectively, even) function. The proofs of Theorems 2.3 through 2.6 are given in Ref. 3. The proof of Theorem 2.7 is left as an exercise. 2.2.3 Best Approximation of Induced Functions
When analyzing the relationship between the structure of a nonperiodic function and the degree of its algebraic approximation polynomial, the simplest method is to transform the approximated function into a trigonometric function by variable substitution. In fact, ∀f ∈ C[a,b], let x = ((b – a)/ 2)t + (b+a)/2, t ∈ [ – 1.1], then b + a b − a ϕ(t ) = f (x ) = f t+ ∈ C [−1, 1] 2 2 Let t = cos θ, θ ∈ [0, π]. Then ψ (θ) = ϕ(cos θ), θ ∈ ( −∞, +∞)
(2.7)
is an even function and ψ(θ) ∈ C2π . ψ(θ) is called the induced function of f . Theorem 2.8 Suppose that En is the least deviation between an algebraic approximation polynomial of degree less than n and function f(x) ∈ C[a, b]. En* is the least deviation between a trigonometric approximation polynomial ψ(θ) and f . Then En = En* .
© 2012 by Taylor & Francis Group, LLC
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
45
Proof By the theorem of the best approximation, there is one and only one polynomial P(x) ∈ Hn such that f (x ) − P (x ) ≤ max f (x ) − P (x ) ≤ En x ∈[ a ,b ]
(2.8)
Suppose T(θ) is the induced function of P(x). Obviously, T (θ) ∈ H n* . Then, inequality (2.8) can be rewritten as ψ (θ) − T (θ) ≤ En
(2.9)
Therefore, En* ≤ En . On the other hand, since ψ(θ) ∈ C2π , using Theorem 2.7, there is one and only one best approximation polynomial (since ψ(θ) is an even function, I(θ) is also an even function) T ( θ) =
n
∑ a cos kθ k=0
k
(2.10)
By a property of trigonometric functions, T(θ) can also be expressed as T ( θ) =
n
∑ c cos θ k=0
k
k
(2.11)
Since T(θ) is the least deviated polynomial to approximate ψ(θ), ψ (θ) − T (θ) ≤ En*
(2.12)
Let cos θ = t. We have φ(t ) −
© 2012 by Taylor & Francis Group, LLC
n
∑c t k=0
k
k
≤ En*
(2.13)
46
M e A suReM en t DAtA M o D eLIn G
Substituting t by a linear transform of x, f (x ) − P (x ) ≤ En*
(2.14)
Hence, En ≤ En* . By combining two inequalities we have En = En* . 2.2.4 Degrees of Best Approximation Polynomials
The core issue in processing measurement data is the determination of degrees of best approximation function. This is because there are many tools in functional approximation. Knowing approximate rates of various tools is useful for comparing the number of parameters to be estimated in the tools. Given the approximation precision, it is helpful to choose appropriate approximation function with a least number of parameters. When polynomials are chosen as an approximation tool, it is necessary to know the approximate rate of an approximation polynomial. Jackson obtained some good results on best approximation polynomials in his PhD thesis in 1905 [4]. Theorem 2.9 (k) . Then Suppose that f ∈ C 2π
π 1 E ≤ 2 n + 1 * n
k
f (k)
∞
(2.15)
where π /2 is the best approximating coefficient independent of f, k, n, and f ( k ) = max f ( k ) (t ) . ∞
t ≤π
Theorem 2.10 Suppose that f ∈ C[− 1,1]. Then
1. En = En ( f ) ≤ ( πλ / 2(n + 1)) if f (x ) − f ( y ) < λ x − y
© 2012 by Taylor & Francis Group, LLC
(2.16)
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
47
k f (k) π ∞ 2. E n ( f ) ≤ (n ≥ k) 2 (n + 1)n…(n + 2 − k)
if f ( k ) ∈ C [ −1, 1]
(2.17)
Proofs of Theorems 2.9 and 2.10 are long and interested readers are referred to Ref. 4 for details. If f (t) ∈ C[a, b], Theorem 2.10 becomes Theorem 2.11. Theorem 2.11 If f (t) ∈ C[a, b], then 1. En ( f ) ≤
πλ b−a ⋅ 2(n + 1) 2
when | f (x ) − f ( y ) | < λ | x − y |
(2.18)
k k f (k) π b − a ∞ 2. E n ( f ) ≤ ⋅ 2 (n + 1)n…(n + 2 − k) 2
when f ( k ) ∈C [a, b ]; n > k
(2.19)
Example 2.1 (4) (4) ≤ 1 and best approximation Suppose that f ∈ C[ −1,1] , f
polynomial P(t) satisfies
max f (t ) − P (t ) ≤ 10 −3 |t |≤ 1
Determine the degree of P(t).
© 2012 by Taylor & Francis Group, LLC
(2.20)
48
M e A suReM en t DAtA M o D eLIn G
Solution Using Theorem 2.10, the degree of the best approximation polynomial is determined by the following inequality: 4 f (4) π −3 ∞ 2 (n + 1)n… (n + 2 − 4) ≤ 10
(2.21)
(4) 3 4 Since f ∞ ≤ 1, (n + 1)n(n − 1)(n − 2) ≥ 10 ( π / 2)
and n are at least 10.
Example 2.1 shows that even if the explicit expression of f(t) is unknown we can obtain the approximation accuracy of f(t) using polynomials if f(t) is continuously differentiable and some information of its derivative is given. Since the degree of a polynomial is finite, approximating function f (t) becomes estimating finite coefficients in an approximation polynomial. 2.2.5 Bases of Polynomial Representations of Functions to Be Estimated
The meaning of basis selection
2.2.5.1 Significance of Basis Selection
has two folds:
1. What type of basis functions should we choose? Algebraic polynomials, trigonometric polynomials, or splines? 2. For a given basis function, what is the proper order? This section only considers the approximating problem of f (t) ∈ C[−1,1]. Given the approximating precision ε such that max f (t ) − P (t ) < ε t ≤1
then order N of the approximation polynomial can be determined by the information of polynomial derivatives and Theorem 2.10. Then P (t ) =
N
∑c t i =0
i
i
(2.22)
Here, (1, t, t 2, . . ., tN )is a basis set of approximation polynomials. When using this kind of approximating basis, if the objective function
© 2012 by Taylor & Francis Group, LLC
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
49
f (t) is unknown (as in the case of processing dynamic data), estimating f (t) is equivalent to estimating parameters (c0, c1, . . ., cN). The number of estimable parameters can be reduced if we can find another set of basis functions, say, (Q 0, Q 1, . . ., Q N), when we use Q(t ) = ∑ iN=1 biQi (t ) to approximate f (t), some of bi Qi (t)’s are small enough to be negligible. This is the method of reducing the number of parameters by basis selection. Note that, for ∀t ∈ [−1,1], there exists one and only one θ ∈ [0,π] such that θ ∈ [0, π]. Let
2.2.5.2 Chebyshev Polynomials
Tn (t ) = cos(nθ) = cos(n arccos t ), n = 0, 1, 2,…
(2.23)
Then Tn (t) is a polynomial of t and is called the Chebyshev polynomial. Using the trigonometry identity cos nθ + cos(n − 2)θ = 2 cos θ cos(n − 1)θ we have T0 (t ) = 1, T1 (t ) = t Tn (t ) = 2tTn − 1 (t ) − Tn − 2 (t ), n = 2, 3,…
(2.24)
The Chebyshev polynomial Tn (n = 1,2,...) has the following properties: Property 1 Tn is an algebraic polynomial of degree n and its first coefficient is 2 n−1. Property 2 Tn (t ) ≤ 1, t ∈ [−1, 1]
© 2012 by Taylor & Francis Group, LLC
(2.25)
50
M e A suReM en t DAtA M o D eLIn G
Property 3 Let tk = cos(kπ/n). Then Tn (t k ) = cos kπ = ( −1)k Tn
∞
= ( −1)k , k = 0, 1,…, n (2.26)
Property 4 τk = cos((2k+1)π/2n), π 2k + 1 Tn ( τ k ) = cos π = cos kπ + = 0, k = 0, 1,…, n − 1 2 2 (2.27) Property 5 Orthogonality π, Tm (t )Tn (t ) dt = π / 2, 1 − t2 0, −1 1
∫
m =n=0 m =n≠0 m ≠n
(2.28)
2.2.5.3 Bases of Interpolation Polynomials of Order n Consider f (t) ∈ Cn+1[−1, 1]. Given the values of f(t) at t 0, t 1, . . ., tn, say y 0, y1, . . . , yn, the following theorem is provided.
Theorem 2.12 There exists a unique algebraic polynomial Pn(t) of order n such that Pn (ti ) = f (ti ) = yi , i = 0, 1, … , n
© 2012 by Taylor & Francis Group, LLC
(2.29)
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
51
Proof Let l i (t ) =
(t − t 0 )(t − t1 ) … (t − ti − 1 )(t − ti + 1 ) … (t − tn ) (ti − t 0 )(ti − t1 ) … (ti − ti − 1 )(ti − ti + 1 ) … (ti − tn )
(2.30)
and Pn (t ) =
n
∑ y l (t ) i =0
i i
(2.31)
It is straightforward to prove that Pn(t) is the unique algebraic polynomial satisfying Equation 2.29. Next we study the interpolation error. li(t), i = 0, 1, . . . , n are bases of interpolation polynomials of order n and En ( f ; t ) = f (t ) − Pn (t )
(2.32)
is the interpolation error. Theorem 2.13 Suppose f (t) ∈ Cn+1 [−1, 1] and t0, t1, . . ., tn are n + 1 different interpolation points in [−1, 1]. Then, for any t, there exists a unique ξ ∈ (−1, 1) such that En ( f ; t ) =
ω(t ) f (n + 1) (ξ) (n + 1)!
(2.33)
where ω(t ) = (t − t 0 )(t − t1 )…(t − tn ). Proof If ω(t) = 0, t must be an interpolation point. Equation 2.33 holds naturally. Suppose that ω(t) ≠ 0. Note that F (z ) = f (z ) − Pn (z ) −
© 2012 by Taylor & Francis Group, LLC
w(z ) [ f (t ) − Pn (t )] ω(t )
(2.34)
52
M e A suReM en t DAtA M o D eLIn G
Obviously, F(z) ∈ Cn+1 [−1,1] and there are n + 2 different points such that F(z) = 0. In other words, F (z ) = 0, z = t , t 0 , t1 ,..., tn By the Rolle theorem, F ′(z) has n + 1 different roots in (−1, 1) and (z) has at least one root in (−1, 1), that is, there exists a ξ such F that (n+1)
F (n + 1) (ξ) = f (n + 1) (ξ) − P (n + 1) (ξ) −
(n + 1)! ( f (t ) − P (t )) = 0 ω(t )
Therefore, Equation 2.33 holds. If sup f (n + 1) (t ) ≤ M n + 1 t ≤1
Equation 2.33 shows that En ( f , t ) ≤
M n +1 ω(t ) (n + 1)!
(2.35)
If interpolation points t 0, t 1, . . ., tn are chosen such that max ω(t ) = min t ≤1
(2.36)
then Pn(t) is a good approximation at any point t ∈ [−1,1]. By property 4 of Chebyshev polynomials, if the n + 1 different roots of Tn+1(t) = 0, that is, t k = cos
2k + 1 π, k = 0, 1,…, n 2(n + 1)
(2.37)
are selected as the interpolation points, then polynomials ω(t) and Tn+1(t) have the same roots and one is a multiple of the other. Comparing the first coefficients of ω(t) and Tn+1(t), we have ω(t ) =
© 2012 by Taylor & Francis Group, LLC
1 Tn + 1 (t ) 2n
(2.38)
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
53
Using Equations 2.35 and 2.38, we have En ( f , t ) ≤
Mn+1 Tn + 1 (t ) 2 (n + 1)!
(2.39)
n
Using property 3 of Chebyshev polynomials, we have max En ( f , t ) ≤ t ≤1
Mn+1 2 (n + 1)!
(2.40)
n
Equation 2.39 provides an estimation of En( f, t) for each t and Equation 2.40 gives a uniform estimation of En( f, t) in the whole interval. In general, if M n + 1 = sup|t |≤ 1 | f (n + 1) (t )| is not very large for a function f(t) ∈ Cn+1 [−1, 1], the nth interpolation polynomial Pn(t) approximates f(t) with a relatively high precision when interpolation points are roots of Tn+1(t) = 0. More specifically, if f (t) ∈ Cn+1 [−1, 1] and Mn+1 is not very large, there exists a set of (c 0, c 1, . . ., cN) such that max | f (t ) − |t |≤ 1
N
∑C l (t )| ≤ 2 (n + 1)! i =0
i i
n
Mn+1
(2.41)
where (l 0(t), l1(t), . . ., ln(t)) approximates f (t) as a set of interpolation polynomial bases and the interpolation points of such bases are the roots of Tn+1(t) = 0. If the interval is [a,b] instead of [−1,1], we have the following theorem. Theorem 2.14 Let f (t ) ∈ C n [a, b ] ∩ C n + 1[a, b ], M n + 1 = sup f (n + 1) (t ) and a
tk =
© 2012 by Taylor & Francis Group, LLC
(
)
(b + a ) + (b − a )cos ( 2k + 1) ( 2(n + 1)) π 2
(2.42)
54
M e A suReM en t DAtA M o D eLIn G
Then, En ( f , t ) ≤
M n + 1 (b − a )n + 1 ⋅ (n + 1)! 2 2n + 1
(2.43)
The proof is left as an exercise. Example 2.2 Let f (t) = sin x. Construct an algebraic interpolation polynomial of degree 10 on [0, π] and compare it with the Taylor polynomial of 10th order. Solution
Let xk = (π /2) (1 + cos ((2K + 1) / 22)π), k = 0, 1, . . . , 10, be interpolation points and P10 (t ) = Σ 10 i = 0 (sin xi )l i ( x ) be the interpolation polynomial. Using Theorem 2.14, sin x − P10 (x ) ≤
π11 ≈ 3 × 10 −9 , 0 ≤ x ≤ π 11 ! 2 21
This shows that, when using P 10 (x) to approximate sin x, the precision can reach the 8th digit after the decimal point. If the Taylor polynomial of 10th order is used, we have 4
sin x −
x 2i + 1
∑ (−1) (2i + 1)! ≈ i
i =0
| sin ξ | 11 x , 0≤x≤π 11!
The precision reaches the second digit after the decimal point.
Using Chebyshev polynomial bases reduces orders of approximate polynomials without sacrificing the approximation precision. Thus, Chebyshev polynomial bases are commonly used in data processing. Let us see an example first.
2.2.5.4 Chebyshev Polynomial Bases
Example 2.3 For −1 ≤ x ≤ 1, the 12th Taylor polynomial of e−x is P12 (x ) = 1 −
© 2012 by Taylor & Francis Group, LLC
x x2 x3 x 12 + − ++ 1! 2 ! 3 ! 12 !
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
55
Since each xk (k = 0, 1, . . ., 12) can be represented by a linear combination of Chebyshev polynomial bases, we have P12 (x ) = 1.266065877752 ⋅ T0 − 1.130318207984 ⋅ T1 + 0.2714953339533 ⋅ T2 − 0.044336849849 ⋅ T3 + 0.005474240442 ⋅ T4 − 0.000542926312 ⋅ T5 + 0.000044977322 ⋅ T6 − 0.000003198436 ⋅ T7 + 0.000000199212 ⋅ T8 − 0.000000011037 ⋅ T9 + 0.000000000550 ⋅ T10 − 0.000000000025 ⋅ T11 + 0.000000000001 ⋅ T12 Note that |Ti (x)| ≤ 1 (i = 0, 1, 2,. . ., 12). We have P12 (x ) ≈ 1.266065877752 ⋅ T0 − 1.130318207984 ⋅ T1 + 0.2714953339533 ⋅ T2 − 0.044336849849 ⋅ T3 + 0.005474240442 ⋅ T4 − 0.000542926312 ⋅ T5 = ϕ 5 (x ) and |e−x − φ5(x)| ≤ 0.00005, −1 ≤ x ≤ 1. If we use the 5th Taylor polynomial P5 (x ) = 1 −
x x2 x3 x4 x5 + − + − , 1! 2 ! 3 ! 4 ! 5 !
max e − x − P5 (x ) ≤ 0.001615 x ≤1
The absolute error of using P 5(x) to approximate e−x is 33 times larger than that of φ5(x).
Example 2.3 shows that Chebyshev polynomial bases provide better approximations. What is the theoretic basis? Is this a coincidence or a fact? This is what would be discussed next. Section 2.2.3 discusses the approximation of induced functions. The optimal approximation using algebraic polynomial in C[a,b] can be transformed into the approximation using trigonometric polynomials in C2π . Moreover, their maximum approximation errors are
© 2012 by Taylor & Francis Group, LLC
56
M e A suReM en t DAtA M o D eLIn G
equal, that is, En ( f ; x ) = En * ( g ; t ) . Let f (x) ∈ C[a,b] and x = ((b − a) cos t + b + a)/2. We have g (t) = f (((b − a) cos t + b + a)/2) ∈ C2π . Therefore, ∞
g ( t ) = f ( x ) = a0 +
∑ (a cos kt + b sin kt ) k =1
k
k
(2.44)
where 1 a0 = 2π bk =
1 π
π
∫
−π
1 f (x ) dx , ak = π
π
∫ f (x)cos kx dx
−π
π
∫ f (x)sin kx dx foor k = 1, 2, 3,…
(2.45)
−π
We can use the sum of the first n terms Sn [ f ] = a0 +
n
∑ (a cos kt + b sin kt ) k =1
k
k
(2.46)
to approximate g(t). Here, g(t) is an even function, bk = 0 (k = 1, 2, 3, . . .). If f(x) is a trigonometric polynomial of n degree, its Fourier series is itself. In order to analyze result of using Sn [f ] to approximate f, two important lemmas in mathematical analysis are introduced. lemma 2.1 Suppose that f(t) ∈ C2π . We have 1 Sn [ f ] = π
© 2012 by Taylor & Francis Group, LLC
π/2
∫ 0
[ f (t + 2x ) + f (t − 2x )]
sin( 2n + 1)x dx sin x
(2.47)
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
57
This is the famous Dirichlet lemma and its long proof is usually given in textbooks of mathematical analysis. lemma 2.2 For a positive integer n ≥ 2, the following inequality holds 1 π
π/2
∫ 0
sin( 2n + 1)x 1 dx < ( 2 + log n) sin x 2
(2.48)
Proof Note that sin nt ≤ n sin t , sin t ≥
π 2 t, 0 ≤ t ≤ π 2
We have 1 π
π/2
∫ 0
sin( 2n + 1)t 1 dt ≤ sin t π
(π / 4 n + 2)
< 1+
∫ 0
1 ( 2n + 1) dt + 2
( π / 2)
∫
(π / 4 n + 2)
dt t
1 log n 2
The Lebesgue theorem next demonstrates the result of using Fourier series to approximate f(t) ∈ C2π where all the bases of logarithmic functions are e. Theorem 2.15 Let f(t) ∈ C2π and En* be the error bound of the optimal approximation using trigonometric polynomials in H n* to approximate f(t). When n ≥ 2, we have Sn [ f ] − f ≤ (3 + log n)En*
© 2012 by Taylor & Francis Group, LLC
(2.49)
58
M e A suReM en t DAtA M o D eLIn G
Proof Using Lemmas 2.1 and 2.2, we have Sn [ f ] ≤ f
∞
( 2 + log n), ∀f (t ) ∈ C 2π
Let T (t ) ∈ H n* be the optimal approximation of f (t). Note that Sn [T] = T. We have Sn [ f ] − f ≤ Sn [ f ] − T + T − f ≤ Sn [ f − T ] + En* ≤ f −T
∞
( 2 + log n) + En*
≤ (3 + log n)En* Theorem 2.15 demonstrates that using Sn [f] to approximate f (t) ∈ C2π is similar to using trigonometric polynomials. If n ≤ 1100, the approximation precision of using Sn [f] is at most one magnitude order larger than that of using optimal trigonometric polynomials T (t ) ∈ H n* of degree. The following example shows the result of using Chebyshev polynomial bases. Example 2.4 Estimate the approximation precision of using algebraic polynomials on [–1,1] to approximate f (t) = arcsin t. Solution Let ψ(θ) = f(cos θ) = arcsin cos θ. Then S 2 l + 1[ψ ] =
cos 3θ cos( 2l + 1)θ 4 cos θ + ++ 9 π ( 2l + 1)2
Note that cos kt = Tk (t), (k = 0, 1, 2, . . .). We have arcsin t ≈
© 2012 by Taylor & Francis Group, LLC
4 π
m
T2 l + 1(t )
∑ (2l + 1) l =0
2
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
59
To guarantee the approximation error is less than 10−10 when calculating arcs in t in [−1,1], it is enough to take m = 9 when using above algebraic polynomial, while it needs 25 terms when using the Taylor expansion. In summary, for f (t) ∈ C[−1, 1], {T0(t), T1 (t), . . ., Tn(t)} can be used as approximation bases.
For f (t) ∈ C[−1, 1], we recommend two
2.2.5.5 Bases and Coefficients
basis representations.
n
∑ a T (t ),
f (t ) =
1.
i =0
f (t ) =
2.
i
i
n
∑ c l (t ), i =0
i i
f (t ) ∈ C k [ −1, 1], k ≤ n
f (t ) ∈ C n + 1 [−1, 1]
where {Ti (t)} are bases of Chebyshev polynomials and {li (t)} are bases of interpolation polynomials whose nodes are roots of Chebyshev polynomials of order n + 1. One advantage of the two representations is that degree n is relatively small for a given approximation precision. The selection of n is discussed from the perspective of functional approximation. Certain criteria are needed to determine how large n is and which set of bases should be used for specific scenarios. Such criteria will be studied in Chapter 3. It should be pointed out that there is information on derivatives of functions to be estimated in many practical engineering problems. Such information can be used as supplement information of the parameters and is helpful to construct biased estimates of parameters to be estimated. Example 2.5 For 1
2
f (t ) ∈ C [−1, 1],
∫ f ′′(t ) (1 + t ) dt ≤ 10 2
−1
© 2012 by Taylor & Francis Group, LLC
2
60
M e A suReM en t DAtA M o D eLIn G
Express f(t) by known functions with finite estimable coefficients. Solution By using bases of Chebyshev polynomials we have f (t ) =
n
∑ i =0
aiTi (t ),
f ′′(t ) =
n
∑ a T ″ (t ) i=2
i i
1
∫ f ′′(t ) (1 + t ) dt ≤ 10 2
2
−1
Namely, 1 (1 + t 2 )Ti ″ (t )T j ″ (t ) dt ai a j ≤ 10 i , j = 2 −1 n
∑∫
(2.50)
Equation 2.50 is the supplement information for coefficients ai (i = 2, 3, . . ., n). Polynomials are popular tools of approximation in data processing. However, polynomial approximation has the following disadvantages [1–4,11,13]: 1. Polynomials are infinitely differentiable regardless of the differentiability of the function approximated. There is a difference between a function and its approximation polynomial in terms of differentiability. 2. Generally, an algebraic polynomial of order n has n − 2 inflexions regardless of inflection status of the function approximated. There is a difference between a function and its approximation polynomial in terms of curve inflexion. 3. Polynomial approximation does not preserve convexity or monotonicity, etc. Attention should be paid to these disadvantages when polynomials are used to approximate a function. However, this does not negate using polynomials as an approximation tool. In fact, polynomial approximation is very useful in many occasions.
© 2012 by Taylor & Francis Group, LLC
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
61
2.3 Spline Representations of Functions to Be Estimated 2.3.1 Basic Concept of Spline Functions
Section 2.2 discusses issues of polynomial approximation. Any continuous function defined on a bounded, closed interval can be approximated by algebraic polynomials to any precision. Several disadvantages of polynomial approximation are also pointed out. The goal of this section is to represent an unknown function f (t) with a known function containing finite estimable parameters such that the number of estimable parameters is as small as possible under the premise of ensuring approximation accuracy. Besides polynomial approximations, other tools in functional approximation can also be used [5,8]. One effective tool is spline functions. Example 2.6 (4) (4) Suppose f (x ) ∈ C [−1, 1], f
∞
≤ 10. Compare approximations
of f (t) using polynomials and polynomial splines. Solution
Let the approximation precision ε = 10 −3 or ε = 10 −5. Using Theorem 2.10, in order for Q ( t ) − f ( t ) < ε, − 1 ≤ t ≤ 1 the degree of approximation polynomial Q(t) should be n = 16 or n = 50 and there are 17 or 51 coefficients to be estimated, respectively. If we use spline functions to approximate f (t), Theorem 2.21 shows that there are 10 or 25 parameters to be estimated, respectively.
Example 2.6 shows that using spline functions to approximate an unknown function is better than using polynomials in some situations. However, concrete analyses are needed for specific scenarios. For example, if f ( 4 ) is very large and f ( 5) is very small, the poly∞ ∞ nomial representation of the function is better than the spline one. Further discussion of selecting approximation tools for a specific situation will be presented in Chapter 3.
© 2012 by Taylor & Francis Group, LLC
62
M e A suReM en t DAtA M o D eLIn G
Owing to the special status of cubic splines in functional approximations and dynamic data processing, this section mainly introduces cubic splines. Definition 2.2 Let π : a = t 0 < t1 < < tn = b be a partition on [a, b]. If s(t) satisfies 1. s(t) ∈ Ck−1 [a, b] 2. s(t) is a polynomial of order k on [ti−1, ti] then s(t) is called a spline function of order k. The set of all spline functions of order k is denoted by s(π, k) and it is a linear subspace of C[a, b]. The following discussion uses cubic spline functions in s(π, 3) as examples. Let t +m
t m , t > 0 = 0, t ≤ 0
Consider the following two types of questions: s(ti ) = f (ti ), i = 0, 1, … , n s ′(t 0 ) = f ′(t 0 ), s ′(tn ) = f ′(tn )
(2.51)
s(ti ) = f (ti ), i = 0, 1, …, n s ′′(t 0 ) = f ′′(t 0 ), s ′′(tn ) = f ′′(tn )
(2.52)
and
Theorem 2.16 There exists a unique cubic spline satisfying Equation 2.51.
© 2012 by Taylor & Francis Group, LLC
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
63
Proof Suppose that s(t ) = a−1 + a0 t + a1t + a2 t + 2
3
i −1
∑a k =1
k+2
(t − t k )3+ , t ∈ [ti − 1 , ti ] (2.53)
When t ∈ [t 0 , t1 ], there are s(t ) = a−1 + a0 t + a1t 2 + a2 t 3 , S′(t 0 ) = a0 + 2a1t 0 + 3a2 t 02 ; and when t ∈ [tn − 1 , tn ], we have s(t ) = a−1 + a0 t + a1t 2 + a2 t 3 +
n −1
∑a k =1
n −1
∑a
S′(tn ) = a0 + 2a1tn + 3a2 tn2 + 3 Using Equation 2.51, 0 1 1 1 1 0
2t 0
1 t0
t
2 0
t
2 2
3t 02
t12
t1
t2
tn2
tn 1
=
2t n
© 2012 by Taylor & Francis Group, LLC
0
t
0
t13
0
t
3 2
tn3
3t
f ′(t 0 ) f (t 0 ) f (t1 ) f (t 2 ) f (tn ) f ′(tn )
k =1
3 0
2 n
3
( t n − t 1 )3
(t 2 − t1 )
3(tn − t1 )
2
k+2
k+2
( t − t k )3 ,
(tn − t k )2
a −1 0 a0 a 0 1 a2 0 (tn − tn − 1 )3 an 3(tn − tn − 1 )2 an + 1 0
(2.54)
64
M e A suReM en t DAtA M o D eLIn G
Since the coefficient matrix on the left side of Equation 2.54 is nonsingular, the linear equations above have a unique solution. To prove the uniqueness, we only need to prove that if s(t) satisfies s(ti ) = 0, i = 0, 1,…, n s ′(t 0 ) = s ′(tn ) = 0
(2.55)
then s(t) ≡ 0. In fact, using Equation 2.55, B
b
n
ti
∫ s ′′ (t )dt = ∫ s ′′(t )ds ′(t ) = s ′′(t )s ′(t ) − ∑ ∫ s ′(t )s ′′′(t )dt = 0 2
A
b
a
a
Thus,
∫
b
a
s ′′ 2 (t )dt =
[ti−1, ti]. So we have
n
∑∫ i =1
ti
ti − 1
i =1 ti −1
(2.56)
s ′′ 2 (t )dt = 0. For any i, s″ = 0 on
s(t ) = c + dt , t ∈ [ti − 1 , ti ] Since s(ti−1) = s(ti) = 0 and ti−1 ≠ ti, 1 1
t i − 1 c 0 = , c = d = 0 ti d 0
Finally, we have s(t) ≡ 0, t ∈ [a, b]. Remark 2.2 The proof of Theorem 2.16 actually provides a way for constructing a cubic spline function satisfying Equation 2.51. By the uniqueness, the constructed function is the exact spline function we need whose coefficients a−1, a 0, a1,. . ., an+1 (total n + 3 coefficients) are determined by interpolation conditions or observed data. (1, t , t 2, t 3, (t − t1 )3+ ,…,(t − tn − 1 )3+ ) are spline bases of the order 3 on [a, b].
© 2012 by Taylor & Francis Group, LLC
65
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
Theorem 2.17 There exists a unique cubic spline satisfying Equation 2.52. The proof is omitted since it is similar to that of Theorem 2.17. 2.3.2 Properties of Cubic Spline Functions
Theorem 2.18
Suppose that f (t) ∈ C2[a, b], Sf (t) ∈ S (π, 3) is a solution of interpolation (2.51). Then, b
∫ S ′′(t )
2
f
dt =
a
b
∫ f ′′(t )
2
dt −
a
b
∫ f ′′(t ) − S ′′(t ) f
2
dt
(2.57)
a
Proof We have b
∫ f ′′(t ) − S ′′(t )
2
f
b
∫ f ′′
dt =
a
2
dt +
a b
∫ f ′′
=
b
∫ S ′′
2
f
b
∫
dt − 2 f ′′ S ′′f dt
a
2
dt −
a
b
∫ S ′′
a
2
f
b
∫
dt − 2 ( f ′′− S ′′f )S ′′f dt
a
a
(2.58) and b
∫
( f ′′− S ′′f )S ′′f dt =
a
b
∫ S ′′d ( f ′′ − S ′ ) f
f
a
b a
= S ′′f (t )( f ′(t ) − S ′f (t )) − = S ′′f (t )( f ′(t ) − S ′f (t )) + © 2012 by Taylor & Francis Group, LLC
n
∑∫ i =1
ti
ti − 1
b a
n
∑∫
−
i =1
ti
ti − 1
( f ′(t ) − S ′f (t ))S ′′′ f ( t ) dt
n
∑ ( f ′(t ) − S ′ (t ))S ′′′(t ) i =1
( f (t ) − S f (t ))S (f4 ) (t ) dt
f
f
ti ti − 1
∫ ( f ′′− S ′′ )S ′′ dt = ∫ S ′′d ( f ′′ − S ′ ) f
f
f
a
f
a
n
ti
∑∫
= S ′′f (tM )(ef A′(suReM t ) − S ′f (ten )) bat− DAtA M ( fo′D (teLIn ) − S ′f G(t ))S ′′′ f ( t ) dt
66
ti − 1
i =1
b a
= S ′′f (t )( f ′(t ) − S ′f (t )) +
n
∑∫
ti
ti − 1
i =1
n
∑ ( f ′(t ) − S ′ (t ))S ′′′(t )
−
f
i =1
f
ti ti − 1
(2.59)
( f (t ) − S f (t ))S (f4 ) (t ) dt
For the three terms on the right side of Equation 2.59, the first term is 0 because f ′(a ) = S ′f (a ), f ′(b ) = S ′f (b ); the second term is 0 since f ′(ti ) = S ′f (ti ) (i = 0, 1,…, n); and the third term is 0 because f (ti ) = S f (ti ), S (f4 ) (t ) = 0. Therefore, we have b
b
∫ ( f ′′ − S ′′ )S ′′ dt = ∫ S ′′d ( f ′′ − S ′ ) = 0 f
f
f
a
(2.60)
f
a
and b
∫ S ′′(t )
2
f
dt =
a
b
∫ f ′′(t )
2
dt −
a
b
∫ f ′′(t ) − S ′′(t ) f
2
dt
a
Remark 2.3
Theorem 2.18 shows that 1.
∫
b
∫
b
a
a
2
S ′′f (t ) dt ≤
∫
b
a
2
f ′′(t ) dt . So if function f (t) satisfies
∫
2
f ′′(t ) dt ≤ v , then
a
positive number. 2.
∫
b
a
2
f ′′(t ) − S ′′f (t ) dt ≤
b
∫
b
a
2
S ′′f (t ) dt ≤ v, here v is a given
2
f ′′(t ) dt
Theorem 2.19 Suppose that Sf (t) ∈ S(π, 3) is a solution of interpolation (2.51) and S(t) ∈ S(π, 3). Then,
© 2012 by Taylor & Francis Group, LLC
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns 2
2
67
2
f ′′(t ) − S ′′(t ) 2 = f ′′(t ) − S ′′f (t ) + S ′′f (t ) − S ′′(t ) 2 2 2
(2.61)
≥ f ′′(t ) − S ′f′ (t ) 2 where f
2
=
∫
b
a
f dt 2
12
.
Theorem 2.19 shows that, among all functions in S(π, 3), S ′′f is an optimal solution of approximating to f ″ under ⋅ 2 . Next, we discuss uniform partitions, that is, ti − ti−1 = h = (b − a)/n. Theorem 2.20 Suppose that Sf (t) ∈ S(π, 3) is the solution of interpolation (2.51) and the nodes of π are equally distributed. Then S ′f − f ′
2 2
2
Sf − f
Sf − f
2
2 2
≤
4h 2 S ′′f − f ′′ π2
≤
h2 S ′f − f ′ π2
≤
2 2
2
(2.63)
2
4h 4 S ′′f − f ′′ π4
(2.62)
2 2
(2.64)
Proof 1. If g ∈ C2[0, h], g(0) = g(h) = 0, we can extend g to an odd and periodic function such that ∞
g (t ) =
kπ
∑ a sin h t k =1
k
where ak (k = 1, 2, . . .) are Fourier coefficients of g. Then
© 2012 by Taylor & Francis Group, LLC
(2.65)
68
M e A suReM en t DAtA M o D eLIn G ∞
g ′( t ) =
kπ
kπ
∑ h a cos h t k =1
k
By the quadrature principle of trigonometric function, h
h
∞
∫ g (t ) dt = ∑ a ∫ k =1
0
h
∫
2 k
2
( g ′(t )) dt = 2
0
∞
∑ k =1
≥
0
kπ h sin t dt = 2 h 2
∞
∑a k =1
h
2 k
k2π2 2 h π2 2 kπ a t t cos d = k h 2 h2 h2
∫ 0
h π2 2 h2
∞
∑ k =1
ak2 =
π2 h2
h
∞
∑k a k =1
∫ g dt 2
2 2 k
(2.66)
0
Then h
∫ 0
h2 g dt ≤ 2 π 2
h
∫ g ′ dt 2
(2.67)
0
Equation 2.63 is proved. 2. Proof of Equation 2.62: Note that Sf (t) − f (t) ∈ C2[a, b] and s f (ti ) = f (ti ), i = 0, 1,…, n . s t f t s t f t ( ) − ( ) = 0 , ( ) − ( ) = 0 ′ ′ ′ ′ 0 0 f f n n By Rolle theorem, there are ξ 0 = t 0 < ξ1 < t 1 < ξ 2 < t 2 < < t n − 1 < ξn < t n = ξn + 1 such that S ′f (ξi ) − f ′(ξi ) = 0, i = 0, 1,…, n + 1 and ξi − ξi−1 ≤ 2h.
© 2012 by Taylor & Francis Group, LLC
(2.68)
69
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
So S ′f − f ′
2
=
2
n + 1 ξi
∑ ∫ (S ′ − f ′) dt i = 1 ξi − 1
4h 2 ≤ 2 π
2
f
n + 1 ξi
∑∫
i = 1 ξi − 1
(S ′′f − f ′′)2 dt =
4h 2 S ′′f − f ′′ π2
2 2
(2.69)
3. Equation 2.64 can be easily derived by combining Equation 2.63 with Equation 2.62. Theorem 2.21 Suppose f (t) ∈ C2[a, b], π is a uniform partition (h denotes the distance between any two consecutive nodes), Sf (t) ∈ S(π, 3), and Sf satisfies the interpolation condition (2.51). Let M 2 = f ′′(t ) 2 . Then, f ′(t ) − S ′f (t ) ≤ M 2 h ∞
f (t ) − S f (t )
∞
≤
12
M2 3 2 h 2
(2.70)
(2.71)
Proof Since f (ti ) − Sf (ti ) = 0, i = 0, 1,. . ., n, by Rolle theorem, there are ξ 0 = t 0 < ξ1 < t 1 < ξ 2 < t 2 < < t n − 1 < ξn < ξn + 1 = t n , ξi − ξi − 1 ≤ 2 h such that f ′(ξi ) − S ′f (ξi ) = 0, i = 0, 1,…, n + 1
© 2012 by Taylor & Francis Group, LLC
70
M e A suReM en t DAtA M o D eLIn G
Since f ′(t ) − S ′f (t ) ∈ C [a, b ], there is a t * ∈ [a, b] such that f ′(t ) − S ′f (t ) = max f ′(t ) − S ′f (t ) = f ′(t * ) − S ′f (t * ) ∞ t ∈[ a ,b ]
Suppose t * ∈ [ξi−1, ξi, ξi−ξi−1 ≤ 2h]. One of |t * − ξi−1| ≤ h, |t * − ξi| ≤ h holds. If |t * − ξ i−1| ≤ h is true, *
*
f ′(t ) − S ′f (t ) = f ′(t ) − S ′f (t ) = ∞
∫
t*
ξi − 1
( f ′′(t ) − S ′′f (t ))dt
Using the Holder inequality, t* ≤ dt ξi − 1
1/ 2
∫
f ′(t ) − S ′f (t ) ∞
t* ( f ′′(t ) − S ′′f (t ))2 dt ξi − 1
1/ 2
∫
b 1/ 2 ≤ h ( f ′′(t ) − S ′′f (t ))2 dt a
∫
1/ 2
= h 1 / 2 f ′′(t ) − S ′′f (t ) 2 ≤ h 1 / 2 f ′′(t ) 2
Th .2.18
Equation 2.70 is proved. Similarly, f (t ) − S f (t )
∞
= max f (t ) − S f (t ) = f (t ) − S f (t ) t ∈[ a ,b ]
t ∈ [t j − 1 , t j ] Suppose t − t j −1 ≤ ( h / 2). Then, by f (t j − 1 ) − S f (t j − 1 ) = 0,
f (t ) − S f (t )
∞
= f (t ) − S f (t ) =
t
∫ ( f ′(t ) − S ′ (t ))dt f
t j −1
© 2012 by Taylor & Francis Group, LLC
t ≤ dt t j −1
∫
12
t ( f ′(t ) − S ′f (t ))2 dt t j −1
∫
12
f (t ) − S f (t )
= f (t ) − S f (t ) =
t
∫ ( f ′(t ) − S ′ (t ))dt
f PA R A M e∞t RI C Re P Re sen tAtI o ns o F F un C tI o ns t j −1
t ≤ dt t j −1
12
∫
t ( f ′(t ) − S ′f (t ))2 dt t j −1
≤
Th. 2.19
12
∫
h f ′ − S ′f 2
≤
71
2
h 2h f ′′ − S ′′f 2 π
≤
2 Th. 2.18
2 32 M2 3 2 h M2 ≤ h π 2
Equation 2.71 is proved. Theorems 2.20 and 2.21 show that, given f (t) ∈ C2[a, b], the spline function satisfying interpolation condition (2.51) could approximate f (t) and f ′(t) very well in L2[a, b] and C[a, b]. Furthermore, if f (t) has higher orders of differentiability, cubic spline functions Sf (t) that satisfy condition (1) may have better approximation properties. Theorem 2.22 Suppose that f (t) ∈ C4[a, b], π is a uniform partition (h = (b − a)/n), and Sf (t) ∈ S(π, 3) is a solution of interpolation (2.51). Then f ( α ) (t ) − S (fα ) (t )
∞
≤ Cα f ( 4 ) (t ) h 4 − α ∞
(2.72)
where α = 0, 1, 2, 3, C 0 = 5/384, C1 = 1/24, C2 = 3/8, C3 = 1, and C 0, C1 are optimal coefficients independent of f, n. Proof See Ref. 5. Remark 2.4 Theorem 2.22 shows that 1. If f(t) ∈ C4[a, b], the spline function that satisfies interpolation (2.51) is a good approximation for f, and its first, second, and third derivatives are also good approximations for corresponding derivatives of f, respectively.
© 2012 by Taylor & Francis Group, LLC
72
M e A suReM en t DAtA M o D eLIn G
2. Whether to select a polynomial or a spline approximation should be determined by specific situations. 3. Theorem 2.21 holds for uniform partitions. For nonuniform partitions, spline functions are also commonly used. A general rule of partitions is to select large distance between two consecutive nodes when f (4) is small and small distance between two consecutive nodes when f (4) is large. Such a technique of selection not only guarantees the approximation accuracy but also reduces the number of spline coefficients to be estimated as much as possible. Example 2.7 Consider cubic spline function approximations of f(t) based on two different uniform partitions where t 3, f (t ) = 3 t + 100(t − 1 2)4+ ,
−1 ≤ t ≤ 0 0≤t ≤1
1. If π is a uniform partition such that h = ti − ti −1 =
2 1 i = , ti = −1 + , i = 0, 1,…, 20 20 10 10
then the spline function that satisfies interpolation condition (2.51) is 19
S f (t ) = a−1 + a0t + a1t 2 + a 2t 3 +
∑a i =1
i+2
(t − ti )3+
It can be proved that a3 = a4 = = a11 = 0 There are only 14 spline coefficients to be estimated. 2. If π is a uniform partition such that π : τ 0 = −1, τ1 = 0, τ 2 =
1 ,…, τ11 = 1 10
then the cubic spline function that satisfies the interpolation condition (1) is
© 2012 by Taylor & Francis Group, LLC
73
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns 10
* f
S (t ) = a−1 + a0 t + a1t + a2 t + 2
3
∑a i =1
i+2
(t − τi )3+
* It can be proved that S f (t ) = S f (t ), − 1 ≤ t ≤ 1.
Remark 2.5 Example 2.7 shows that S *f has 14 coefficients to be estimated while Sf has 23 coefficients. Cubic spline approximations based on nonuniform partitions may not be worse than those based on uniform partitions. Partition selection in cubic spline approximation is a fine art. For a given approximation precision, the number of coefficients to be estimated in a spline approximation can be reduced by selecting proper partitions. 2.3.3 Standard B Splines
B splines (i.e., basic splines) are powerful in the theory and computation of spline functions. Any spline function can be expressed as a linear combination of B splines. For simplicity, only standard B splines of equidistant nodes are discussed in the following. Definition 2.3 1 M m (t ) = (m − 1)!
m
∑
m −1
m ( −1) C t + − j + 2 j =0 j
j m
(2.73)
is called a standard B spline of degree m − 1. Theorem 2.23 A standard B spline of degree m − 1 has the following properties: 1. Mm(t) ∈ C m − 2 (−∞, +∞) 2. Mm(t) = 0, |t| ≥ (m/2)
© 2012 by Taylor & Francis Group, LLC
74
M e A suReM en t DAtA M o D eLIn G
3. Mm(t) is an m − 1 piecewise polynomial on (−∞, +∞) and has m + 1 nodes −
m m m m , − + 1,…, − + m = 2 2 2 2
4. Mm(t) is an even function 5. Mm(t)> 0, |t| < (m/2) 6.
∫
m 2
−m 2
M m (t )dt =
∫
+∞
−∞
M m (t )dt = 1
Proof The proof is left as an exercise. Commonly used standard B splines are as follows: 0 0 0, 1 1 M 1 (t ) = t + − t + − 1 = + 1, 2 + 2
1 2 1 t < 2 t ≥
(2.74)
M 2 (t ) = (t + 1)+ − 2(t + 1 − 1)+ + (t + 1 − 2)+ = (t + 1)+ − 2t + + (t − 1)+ 0, = 1− t,
t >1
2
M 3 (t ) =
(2.75)
t ≤1 2
2
2
1 3 3 1 3 1 1 3 t + − t + + t − − t − 2 2 + 2 2 + 2 2 + 2 2 +
0, 3 = − t2, 4 2 t − 3 t + 9 , 2 2 8
© 2012 by Taylor & Francis Group, LLC
3 2 1 t < 2 t ≥
other
(2.76)
75
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
0, 3 t 2 − t2 + , M 4 (t ) = 3 32 t 4 2 − 6 + t − 2 t + 3 ,
t ≥ 2 t <1
(2.77)
other
Tables 2.1 through 2.3 show the commonly used values. Next we use M4 to construct an interpolation cubic spline with equidistant nodes. Table 2.1 value Summary of Mk (t ) t
K 2 3 4
0
±0.5
±1
±1.5
±2
1 3/4 2/3
1/2 1/2 23/48
0 1/8 1/6
0 0 1/48
0 0 0
Table 2.2 value Summary of M k′ (t ) t
0
±0.5
±1
±1.5
±2
2
—
∓1
—
0
0
3
0
∓1
∓
1 2
0
0
4
0
∓
1 2
∓1
0
K
∓
5 8
Table 2.3 value Summary of M k′′(t ) t
K 2 3 4
© 2012 by Taylor & Francis Group, LLC
0
±0.5
±1
±1.5
±2
— −2 −2
0 — − 1/2
— 1 1
0
0 0 0
1/2
76
M e A suReM en t DAtA M o D eLIn G
Theorem 2.24 Suppose that there is a uniform partition π : a = t 0 < t1 < < tn = b , h = ti − ti − 1 =
b−a n
Note that t−1 = t 0 − h, tn+1 = tn + h, Sf (t) ∈ S(π, 3) satisfies the interpolation condition S ′(a ) = f ′(a ), S ′(b ) = f ′(b ) S (ti ) = f (ti ), i = 0, 1,…, n
(2.78)
Then there is only one set of coefficients b−1, b 0, b1,. . ., bn+1 that satisfies n+1
t − tj h
∑ b B
(2.79)
n+1 b a − tj j B′ = f ′( a ) h j = −1 h n + 1 ti − t j bjB = f (ti ), i = 0, 1,…, n h j = −1 n+1 bj b − t j B′ = f ′(b ) h h j = −1
(2.80)
S f (t ) =
j = −1
j
where B(t) = M4(t). Proof It is enough to show that
∑ ∑ ∑
© 2012 by Taylor & Francis Group, LLC
77
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
Since B ( 0) =
2 1 1 1 , B( ±1) = , B ′(0) = 0, B ′( −1) = , B ′(1) = − 3 6 2 2
ti − t j ti − t j ti − t j B = B′ = 0, j − i ≥ 2 namely ≥ 2 h h h We have 1 2h 1 6 0 0 0
0 2 3 1 6
−
1 2h 1 6 2 3
0
0
0
0
0
0
0
0
1 6
0
0
0
0
0
0
0
0
0
1 6 1 2h
2 3 0
1 6 1 − 2h
b f ′(a ) −1 b ( ) f a 0 b1 f (t1 ) = bn f (b ) bn + 1 f ′(b )
(2.81)
Since the coefficient matrix is nonsingular, there is a unique solution b−1, b 0, b1,. . ., bn+1. Remark 2.6 1. This theorem shows that if a function can be approximated well by cubic spline functions, it can also be approximated well by linear combinations of standard B splines. However, the computation of standard B splines is much simpler than that of cubic splines. This can be observed by comparing two coefficient matrices. t − t −1 t − t 0 t − tn + 1 2. B ,B ,…, B h h h is a basis of the cubic spline space on [a, b]. Therefore, the dimension of the cubic spline space is n + 3.
© 2012 by Taylor & Francis Group, LLC
78
M e A suReM en t DAtA M o D eLIn G
3. The proof itself provides a way of representing a function with standard B splines. Such a method has the following advantages over the one proposed by Theorem 2.16: (i) It is easy to calculate the inverse of the coefficient matrix because of its sparsity. (ii) This solution is more accurate and stable. (iii) For any t ∈ [tj, tj+1], the calculation of S(t) is only related to four coefficients bj−1, bj, bj+1, bj+2, that is, standard B splines have local support. 2.3.4 Bases of Spline Representations of Functions to Be Estimated
For f (t) ∈ C (4)[a, b], there are two methods to approximate functions by splines. 1. For any partition π : a = t 0 < t1 < < tn = b any cubic spline function on π can be expressed as follows: s(t ) = a−1 + a0 t + a1t 2 + a2 t 3 +
n −1
∑a i =1
i+2
(t − ti )3+
(2.82)
This method is applicable to both uniform and nonuniform partitions. 2. For a uniform partition π : h = ti − ti −1 =
b−a n
let t−1 = a − h, tn+1 = b + h, then f (t ) =
n+1
t − tj h
∑ b B j = −1
j
(2.83)
Equations 2.81 through 2.83 can be unified in the form of f (t ) =
© 2012 by Taylor & Francis Group, LLC
n+1
∑ d ψ (t ) j = −1
j
j
(2.84)
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
79
where ψj(t) is a cubic spline given a division π. There are n + 3 coefficients in Equations 2.81 through 2.83. When we use these two methods of representation in data processing, one method is usually chosen by a specific criterion (a detailed discussion of criterion selection is given in Chapter 3). If some aj ’s in Equations 2.81 and 2.82 are very small or 0, or some bj ’s in Equation 2.83 are very small or 0, variable selection criteria can be used to remove parameters that have little impact on the model (see Section 3.4 for details). For Equation 2.52, another method to reduce parameters to be estimated is to increase h. The two methods have their own advantages and disadvantages: 1. The first method is widely applied in practice. It is applicable for both uniform and nonuniform partitions. However, it is difficult and complicated to calculate the inverse matrix and it is computationally intensive. 2. The second method is only applicable for uniform partitions. It is easy to calculate the inverse matrix. Its solution is stable and has a high precision. General B splines can be used for nonuniform partitions. If we know not only the observation data of f(ti) but also some information on f (t) or its derivative, supplement information on spline coefficients dj ( j = − 1, 0,. . ., n + 1) can be derived. For example, if b
∫ (2 + t )[ f ′′(t )] dt ≤ 5 2
2
(2.85)
a
using Equation 2.85, we have a restriction on spline coefficients d−1, d 0,. . .,dn+1} in Equation 2.84 n+1
b
∑ [∫ (2 + t )ψ ′′(t )ψ ′′(t ) dt ] d
i , j = −1 a
2
i
j
i
j
≤5
(2.86)
or similar information. Such information is very helpful for estimating spline coefficients dj ( j = −1, 0, . . ., n + 1). We will discuss this in Section 3.8.
© 2012 by Taylor & Francis Group, LLC
80
M e A suReM en t DAtA M o D eLIn G
2.4 Using General Solutions of Ordinary Differential Equations to Represent Functions to Be Estimated 2.4.1 Introduction
One of the main tasks of dynamic data processing is to give a proper estimation of the signal and the systematic error. In order to obtain a precise estimation, an accurate and parsimonious model should be constructed first. Since signals (or systematic errors) generally satisfy certain differential equations in practice, they can be expressed by differential equations with a few parameters. Example 2.8 Suppose a true signal f (t) satisfies the following ordinary differential equations (ODE). f ′′(t ) + 4 f (t ) = 4t ( −1 ≤ t ≤ 1)
(2.87)
Find an expression of f(t) with a finite number of parameters to be estimated. Solution There are at least three ways to express f(t): f (t ) = C1 cos 2t + C 2 sin 2t + t f (t ) =
f (t ) =
N
∑ a ϕ (t ) i
i =0
i
(2.88) (2.89)
n+1
∑ b ψ (t ) j =−1
j
j
(2.90)
where (φ0(t), φ1(t),. . .,φ N (t)) is a set of polynomial bases of degree N and (ψ−1(t),ψ0(t),. . ., ψn+1(t)) is a set of cubic splines for a partition π : −1 = t 0 < t1 < < tn = 1 . Here, Equation 2.88 is a general solution of ODE (2.87).
© 2012 by Taylor & Francis Group, LLC
81
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
It is well known that both polynomial and spline approximations of f(t) need a large number of parameters to achieve higher precision. For this example, a polynomial approximation uses n + 1 parameters while a spline approximation needs n + 3. If the general solution of ODE is used to approximate f (t), there are only two unknown parameters. It is very helpful for dynamic data processing. It is important to point out that many dynamic quantities in practice actually satisfy certain known ODEs. Therefore, using the general solution of an ODE to express f (t) has a relatively wide adaptability. 2.4.2 General Solutions of Linear Ordinary Differential Equations
1. The observed functions are vector functions of dimension n. Suppose that an observed function f(t) is a vector function of dimension n that satisfies df = A (t ) f (t ) + g (t ) dt
(2.91)
where A(t) is a known n × n matrix whose elements are continuous differentiable functions in [a, b]. g(t) is a known vector function of dimension n whose elements are also continuous differentiable functions on [a, b]. Theorem 2.25 Suppose that f (t) satisfies differential equation 2.91. Then f (t ) = c1ψ 1 (t ) + + cn ψ n (t ) + ψ 0 (t )
(2.92)
where ψi(t) is a vector function of dimension n that satisfies the initial value condition dψ 0 dt = A (t )ψ 0 (t ) + g (t ) ψ 0 (a ) = (0,…, 0)′
© 2012 by Taylor & Francis Group, LLC
(2.93)
82
M e A suReM en t DAtA M o D eLIn G
dψ i dt = A (t )ψ i (t ), i = 1, 2,…, n ψ i (a ) = (0,…0, 1, 0,…, 0)′
(2.94)
2. The observed functions are scalar functions. Suppose an observed function f 1(t) is a scalar function of t that satisfies dn f 1 dn − 1 f 1 + a ( t ) + + an (t ) f 1 = g n (t ) 1 dt n dt n − 1
(2.95)
This is a homogeneous linear equation of order n. Note that f = ( f 1 , f 1′,…, f 1(n − 1) )τ , 0 0 A = 0 an
g = (0,…, 0, g n )τ
1
0
0
0
1
0
0
0
0
an − 1
an − 2
an − 3
0 0 1 a1
Then, Equation 2.95 can be written as df = A (t ) f (t ) + g (t ) dt
(2.96)
Using Theorem 2.25, there are n + 1 vector functions of dimension n ψ i (t ) (i = 0, 1,..., n) that satisfy: f (t ) = c1ψ 1 (t ) + + cn ψ n (t ) + ψ 0 (t )
(2.97)
Take φi(t) as the first element of ψi(t), then f 1 (t ) = c1ϕ 1 (t ) + + cn ϕ n (t ) + ϕ 0 (t )
© 2012 by Taylor & Francis Group, LLC
(2.98)
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
83
2.4.3 General Solutions of Nonlinear Equation or Equations
1. Basic issues and basic conclusions In many cases, f(t), as an n-dimension vector function to be observed, satisfies a nonlinear ODE, not a linear ODE. For example, a six-dimensional vector function of a spacecraft orbit X (t ) = (x(t ), y(t ), z(t ), x(t ), y (t ), z(t ))T satisfies a nonlinear ODE [12]. Suppose that the objective vector function with dimension n satisfies the following differential equation: df dt = F (t , f ),T0 < t < T1 f (T0 ) = a
(2.99)
Here, F(t, f ) is a known function and a is an unknown parameter. Two basic conclusions are introduced (their proofs are given in Refs. 15 and 16). Theorem 2.26
Let F(t, f ) be a continuous function on interval G : T0 < t < T1 , f : T0 < t < T1 , f ∞ < +∞ and F(t, f ) satisfies the following inequality
F (t , f ) ∞ < L(r ) r =
n
∑f i =1
i
2
and L(r) > 0 is continuous at the condition of r ≥ 0. When r > 0, L(r) > 0, and
∫
+∞
1
(dr /L(r )) = +∞. Then the solution of Equation 2.99
exists for t ∈ (T0, T1 ).
© 2012 by Taylor & Francis Group, LLC
∞
< +∞
84
M e A suReM en t DAtA M o D eLIn G
Theorem 2.27 Under the same conditions as in Theorem 2.26, if F(t, f )satisfies the Lipschitz condition, that is, for f 1, f 2 ∈ Rn, there is a constant N satisfying F (t , f 1 ) − F (t , f 2 ) ∞ ≤ N f 1 − f 2
∞
(2.100)
then the solution of differential equation 2.99 is unique. 2. Runge–Kutta formula (for numerical computation) Consider solving the initial value problem of Equation 2.99. If the existence and uniqueness conditions of Theorems 2.26 and 2.27 are satisfied, the solution can be obtained from the famous Runge–Kutta formula. Let h be the step length. Ti = T0 + ih, i = 1, 2, 3,… ∆ ∆ f (i ) = f (Ti ) = f (T0 + ih )
(2.101)
The Runge–Kutta formula is f ( 0) = a 1 f (i + 1) = f (i ) + ( K 1 + 2K 2 + 2K 3 + K 4 ) 6
(2.102)
where f (0), f (i), f (i + 1), K1, K 2, K 3, K4 are all vectors of dimension n: K 1 K 2 K 3 K 4
© 2012 by Taylor & Francis Group, LLC
= h ⋅ F (Ti , f (i )) 1 h = h ⋅ F Ti + , f (i ) + K 1 2 2 1 h = h ⋅ F Ti + , f (i ) + K 2 2 2 = h ⋅ F (Ti + 1 , f (i ) + K 3 )
(2.103)
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
85
Remark 2.7 Here, h is the step length to solve the ODE, not the observed or sampling time interval. The step length h could be chosen as any small value to obtain a precise numerical solution, while the sampling time interval is determined by the performance of equipment and the practical requirement. 3. Use initial value a to express f (t). If a is given, f (t) can be calculated by the Runge–Kutta formula if it satisfies Theorems 2.26 and 2.27. If a is unknown, although F(t, f ) is known, f(t) cannot be calculated directly. Based on the discussion above, f(t) can actually be described as follows: f (t ) = Wt (a ), T0 ≤ t ≤ T1
(2.104)
where Wt (⋅) is a known function determined by the Runge–Kutta formula. When t and a are given, Wt (⋅) can be determined completely and is a vector function whose components are functions of n variables. In fact, Equation 2.104 exists in an implicit form that is commonly used in dynamic data processing. It is of particular note that when F(t, f ) is a nonlinear function of f, Wt (a) would be also a nonlinear function of a. 2.5 Empirical Formulas
Recall that the focus of this chapter is to express the unknown function (a signal or a systematic error) in terms of basis functions with a few unknown parameters. Sections 2.2 through 2.4 introduce functional representations of algebraic polynomials, trigonometric polynomials, splines, and solutions of ODE with initial values, etc. We introduce another way of representation, that is, the method of empirical formulas. The method of empirical formula summarizes and refines the previous experience of dealing identical or similar problems in data processing. Empirical formulas are explicit functions previously refined from observed signals or systematic errors. Such functions usually contain a few unknown parameters to be estimated. A good empirical
© 2012 by Taylor & Francis Group, LLC
86
M e A suReM en t DAtA M o D eLIn G
formula is usually established through a cyclical process. First, an initial mathematical model is established based on the understanding of the practical problem; second, a new model is obtained by improving the inadequacies of the initial model exposed in various tests and evaluations; and lastly, the new model is applied in specific situations for testing and improvement [13,14]. There are two advantages of using empirical formulas in the mathematical processing of measurement data. One is the precision enhancement of measurement data and the other is the result validation of data processing. Many empirical formulas come from the laws of physics, mechanics, and other disciplines. Others come from a large number of scientific experiments and productive practices. 2.5.1 Empirical Formulas from Scientific Laws Example 2.9 A particle moves along a straight line due to a constant force (magnitude and direction are fixed). Tracking the particle at time ti = 0.05i (i = 1, 2,. . ., 200) s, the observed locations of S(t) are yi (i = 1, 2, . . . ,200). Suppose yi = S (ti ) + εi , i = 1, 2,… , 200 2 {εi } i.i.d. ∼ N (0, σ )
(2.105)
Estimate the location S(300) at t = 300 s. Solution By Newton’s law, the particle is moving with a constant acceleration. So S ( t ) = S0 + V0 t +
1 2 at 2
(2.106)
where S 0, V0 are the particle’s location and velocity at t = 0 and a is the acceleration. Substituting Equation 2.106 in Equation 2.105, we get
© 2012 by Taylor & Francis Group, LLC
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
1 2 t1 at1 y1 1 2 1 2 S0 y 2 1 t2 at 2 2 V0 + = a y 200 1 2 1 t 20 at 200 2 ( ε1 , ε 2 ,… , ε 200 )T ∼ N (0, σ 2 I 200 )
ε1 ε2 ε 200
87
(2.107)
From Equation 2.107 and yi (i = 1, 2,. . ., 200), the estimate of (S 0, V0, a), that is, (S 0 ,V 0 , a ), can be obtained (estimation method will be introduced in Chapter 3). Substituting (S 0 ,V 0 , a ) in Equation 2.106, we get
S (300) = S 0 + V 0 ⋅ 300 +
1 a ⋅ 300 2 2
(2.108)
Equation 2.106 is an empirical formula and is derived from Newton’s law.
2.5.2 Empirical Formulas from Experience
Empirical formulas from experience are formulas derived from scientific research, production practice, and real life. Such formulas are extremely useful. However, when applying empirical formulas from experience, complete analysis must be done to the problem we are concerned with so that a proper formula is selected, perfected, and applied successfully. Example 2.10 Growth models [1,2] are widely used in agriculture, biology, economics, and chemistry. For example, the growth of forage production in grassland with time, the weights of pigs with time, the economic growth with time, etc. follow such models. The typical characteristics of growth models is that the growth rate increases at the beginning and starts to decrease when it comes to a turning point. Growth curves are in the shape of S. Commonly used models include y = exp(θ1 − θ 2 ⋅ θ3x ) + ε x
© 2012 by Taylor & Francis Group, LLC
(Gompertz model ) (2.109)
88
M e A suReM en t DAtA M o D eLIn G
y =
θ1 + εx 1 + exp(θ 2 − θ3x )
y = θ1 − θ 2 exp( −θ3 ⋅ x θ4 ) + ε x
(Logistic model )
(Gompertz model )
(2.110)
(2.111)
where θ1, θ2, θ3, θ4 are parameters to be estimated, y is the observed data, and ε stands for random factors affecting the growth and is not interpreted as random error. Obviously, if we choose a model to deal with experimental data that have fewer random factors but more deterministic factors, laws revealed by the model are more accurate. Example 2.11 The yield and density model [1,2] is described. Let x be the density of a crop plant and y be the yield per plant. The crop yield per unit area is xy. The Hollidany model y = (θ1 + θ 2x + θ3x 2 )−1 + ε
(2.112)
is commonly used to model crop yield. Example 2.12 The systematic error model [17] of an R − R′ spaceflight tracking system is ∆R = a0 + a1t + a2tR + a3R + a4 R ′ + a5 csc E + ε R d ∆R ′ = a1 + a 2 ( R + R ′t ) + a3R ′ + a4 R ′′ + a5 csc E + δ R dt where a 0 is the constant error, a1 is the linear error, a2 is the drift term of the station’s frequency, also called the scale bias, a 3 is the scale error, a 4 is the time offset, a5 is the residual of the first-order refraction error, and εR , δ′R are random errors in measuring R and R′.
2.5.3 Empirical Formulas of Mechanical Type
Empirical formulas of mechanical type are derived from the knowledge of scientific laws or assumed differential, integral, or difference
© 2012 by Taylor & Francis Group, LLC
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
89
equations combined with further analysis of the effect of random errors. Example 2.13 Suppose that the growth limit of y is a. Its growth rate is directly proportional to the growth allowance a – y. Then, we have dy = k(a − y ) dt
(2.113)
y = a[1 − β exp( − kt )] + ε
(2.114)
Therefore,
This model is frequently applied to describe the relation between the trunk perimeter of an orange tree y and its number of growth days t. Suppose the relative growth rate is directly proportional to the relative growth allowance log a – log y. We have the Gompertz model y = a exp[−β exp( − kt )] + ε
(2.115)
More details are given in Refs. 1 and 2.
2.5.4 Empirical Formulas of Progressive Type
Empirical formulas of progressive are widely used in engineering and biology. Example 2.14 The volume of a ladle containing molten steel in tapping is increased due to the erosion of refractory materials. To find the relationship between the increment of the volume (y) and the number of times (x) the ladle has been used, the data in Table 2.4 are observed.
Two characteristics of this problem are 1. The volume of the ladle containing molten steel increases with the number of times used. 2. The volume of the ladle cannot exceed a limit.
© 2012 by Taylor & Francis Group, LLC
90 Table 2.4
M e A suReM en t DAtA M o D eLIn G
Number of Times of Using Ladle xi and Increments of Volume yi
xi
yi
xi
yi
xi
yi
2 3 4 5 6
6.42 8.20 9.58 9.50 9.70
7 8 9 10 11
10.00 9.93 9.99 10.49 10.59
12 13 14 15 16
10.60 10.80 10.60 10.90 10.76
Consider the following two models: y =
1 b a+ x
+ε
(2.116)
d y = c exp + ε x
(2.117)
Using Equation 2.116, 15
a = 0.0823, b = 0.1312,
∑v i =1
= 1.19
2 i
(2.118)
Using Equation 2.117, 15
c = 11.6789, d = −1.1107,
∑v i =1
2 i
= 0.94
(2.119)
Comparing Equation 2.118 with Equation 2.119, model 2.117 is better in this case. Example 2.14 shows that sometimes empirical formulas are not accurate and some formulas are not mature enough or have some limitations. The commonly used empirical progressive formulas [1,2] include
© 2012 by Taylor & Francis Group, LLC
y = α − βγ 2 + ε,(0 < γ < 1)
(2.120)
y = α − exp[−(β + γ 2 )] + ε,(0 < γ < 1)
(2.121)
y = α − β exp( − γz ) + ε,(0 < γ )
(2.122)
PA R A M e t RI C Re P Re sen tAtI o ns o F F un C tI o ns
91
EXERciSE 2
1. Prove Theorem 2.7. 2. Use Theorem 2.10 to prove Theorem 2.11. 3. Prove Theorem 2.12. 4. Prove Theorem 2.14. 5. Suppose t 0 < t1 < < tn . Show that the coefficient matrix of Equation 2.54 is nonsingular. 6. Prove Theorem 2.19. 7. Complete the proof of Theorem 2.18. 8. Prove Theorem 2.23. 9. Suppose f(t) = exp(t), t ∈ [−1, 1]. How many interpolation points are needed such that the residual of using Lagrange interpolation polynomial to approximate function f(t) is less than 10− 4? 10. Suppose f (t ) ∈ c[ −1, 1], f ( 5) ≤ 1 . How large is n if the ∞ residual of using Sn[f] to approximate f (t) is less than 10−4 ? 11. Prove that M m (t ) =
1 t + (m / 2) 1 (m / 2) − t M m −1 t − + M m −1 t + m−1 m−1 2 2
holds for standard B splines of order m. Use this property to prove Theorem 2.23. 12. Prove that when t ∈ [τj, τj+1], spline function S(t) defined by Equation 2.79 only depends on four coefficients bj−1, bj, bj+1, bj+2. 13. Suppose that f (t) satisfies the differential equation d3 f d2 f df − 3 +3 − f = t (6 − t ) 3 2 dt dt dt Express f(t) in terms of known function using the least number of parameters. 14. What is the role of empirical formulas in data processing? How do we interpret ε in Equations 2.109 through 2.122? Is ε a measurement error or a modeling error? If there are two empirical formulas for one specific situation, one has fewer unknown parameters and ε is small, while the other has more
© 2012 by Taylor & Francis Group, LLC
92
M e A suReM en t DAtA M o D eLIn G
unknown parameters and ε is large. Which formula is better? Justify your answer. 15. What are the contents of this chapter? What is the main line? What types of problems do these contents address?
References
1. Bocheng Wei. Modern Nonlinear Regress Analysis. Nanjing: Publishing House of Southeast University, 1989 (in Chinese). 2. Ratkowsky D. A. Nonlinear Regression Modeling. New York: Marcel Dekker, 1983. 3. Lizhi Xu, Renhong Wang, Yunshi Zhou. The Theory and Method of Function Approximation. Shanghai: Shanghai Scientific and Technical Publishers, 1983 (in Chinese). 4. Cheney E. W. Introduction to Approximation Theory. Shanghai: Shanghai Scientific and Technical Publishers, 1981 (in Chinese). 5. Shengfu Wang. Spline Function and Its Application. Xi’an: Northwestern Polytechnical University Press, 1989 (in Chinese). 6. Youqian Huang, Yuesheng Li. Numerical Approximation. Beijing: Higher Education Press, 1987 (in Chinese). 7. Lorentz G. G. Approximation of Functions. Shanghai: Shanghai Scientific and Technical Publishers, 1981 (in Chinese). 8. Zhengxing Cheng. Data Fitting. Xi’an: Xi’an Jiaotong University Press, 1986 (in Chinese). 9. Jinwen Liang, Lincai Chen, Gong He. Theory of Error and Data Analysis. Beijing: China Metrology Publishing House, 1989 (in Chinese). 10. Yetai Fei. Theory of Error and Data Analysis. Beijing: China Machine Press, 1987 (in Chinese). 11. Xiuyin Zhou. Theory of Error and Experimental Data Analysis. Beijing: Beihang Press, 1986 (in Chinese). 12. Peiran Jia, Kejun Chen, Li He. Long-Range Ballistic Rocket. Changsha: National University of Defense Technology Press, 1992 (in Chinese). 13. Zhdanyuk, B. J. Fundamentals of Wireless External Ballistic Measurement Data Statistical Processing. Beijing: China Astronautic Publishing House, 1987 (in Chinese). 14. Zhengming Wang. A practicable method and algorithm of data processing. Journal of National University of Defense Technology, 1994(3) (in Chinese). 15. Hartman P. Ordinary Differential Equations. Birkhauser: Boston, 1982. 16. Rouhuai Wang, Zhuoqun Wu. Lectures on Ordinary Differential Equations. Beijing: People’s Education Press, 1963 (in Chinese). 17. Yu Sha, Yi Wu, Zhengming Wang, Mengda Wu, Guangxian Cheng, Liren Wu, Introduction to Accuracy Analysis of Ballistic Missiles. Changsha: National University of Defense Technology Press, 1995 (in Chinese).
© 2012 by Taylor & Francis Group, LLC
3 m E Tho ds of m od Ern r Eg rEs si o n a nalysis
3.1 Introduction
Chapter 1 explained the processing of static measurement data. It discussed the recognition of systematic errors and negligent errors, the estimation of statistical characteristics of random errors, the precision analysis of physical quantities, etc. Chapter 2 studied various parametric representations of signals from dynamic measurement targets. It laid the foundation for constructing regression models of information to be estimated. In this chapter, we will focus on how to estimate unknown parameters in regression models. Since the goodness of a regression model has a very important influence on the precision of parameter estimation, model evaluation and optimization will also be studied. Regression analysis is one of the most important subjects of statistics. There are a lot of research in the theory and applications of regression analysis (see Refs. 1–7). Most problems in processing dynamic measurement data can be transformed into problems of regression analysis. In this chapter, we mainly focus on solving problems in processing dynamic measurement data by using the theory and methods in regression analysis. First, let us see an example to understand the relationship between measurement data processing and regression analysis. Example 3.1 Suppose that a certain particle moves in a straight line. At time t, its position is expressed by f (t) that satisfies f
(4)
(t )
∞
≤ 1.5.
At time ti = 0.01i (i = 0, 1, …, 200), f (t) is yi. Suppose that there are no systematic errors or negligent errors in measuring processes
93
© 2012 by Taylor & Francis Group, LLC
94
M e A suReM en t DAtA M o D eLIn G
and random errors ei ~ N(0,0.22), i.i.d. How do we estimate f(t) for t ∈ [0, 2]? Solution
From the information mentioned above, the following model is set up: y i = f ( t i ) + ei 2 ei ∼ N(0, 0.2 ), i.i.d.
(3.1)
f(ti) can be estimated in the following ways: 1. Take yi as an estimate of f(ti). Then E( yi − f (ti ))2 = 0.04 200 E ( yi − f (ti )) 2 = 201 × 0.04 = 8.04 i = 0
∑
(3.2)
The estimation has the following issues: • The accuracy is not good when using yi as the estimate of f(ti) directly. • There is no effective way to give a proper estimation of f(t) when t ∈ [0, 2]\{t 0, t 1, …, t 200}. 2. Give an equidistant division of [0, 2] first: π : t −1 = − h , t 0 = 0,… , t N = 2, t N + 1 = 2 + h, h = 2 /N
(3.3)
Using Theorem 2.22 in Section 2.2, there is an interpolation cubic spline function to approximate an objective function f(t) such that s( t ) − f ( t ) ∞ ≤
5 (4) s (t ) h 4 ∞ 384
(3.4)
In order to make the modeling error as small as possible (generally, no larger than 1/100 of standard deviation of random error), note that σ = 0.2, s( t ) − f ( t ) ∞ ≤
© 2012 by Taylor & Francis Group, LLC
σ = 0.002 100
(3.5)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs 4
5 2 × 1.5 × ≤ 0.002 ⇒ N = 3.536 384 N
(3.6)
Therefore, N = 4, h = 0.5. Then it is appropriate to construct the following regression model: 5
f (t ) =
j =−1
5
yi =
t − tj , 0.5
∑ b B j
t j = j × 0.5 (3.7)
ti − t j + ei , i = 0, 1,… , 200 0.5
∑ b B j =−1
j
Let Y = (y 0, y1, . . ., y 200)τ , β = (b−1, b 0, . . ., b5)τ , and e = (e 0, e 1, . . ., e 200)τ , τ j = 0.5 j ,
j = −1, 0,..., 5.
t − tj xij = B i 0.5
, X = (xij )201×7 .
Then, Y = X β + e e ∼ N(0, 0.04 I 201 )
(3.8)
Equation 3.8 is a linear regression model. Hence, the least squares estimate of β is β = ( X τ X )−1 X τY and f (ti ) = X i β (Xi is the ith row of X), 200
∑ E( f (t ) − f (t )) = 7 × 0.04 i =1
i
i
2
(3.9)
3. Furthermore, if more prior information is available, such as the particle moves in a straight line with uniform acceleration, the model could be simplified as s(t ) = a0 + a1t + a2t 2
© 2012 by Taylor & Francis Group, LLC
95
96
M e A suReM en t DAtA M o D eLIn G
Let
1 1 Z = 1
t0
t1
t 200
t 02 a 0 t12 = a , α 1 . a 2 2 t 200
Then Y = Z α + e e ∼ N(0, 0.04 I 201 ) This is also a linear regression model. The corresponding estimate of s(t) is s(ti ) = Zi α = Zi (Z τZ )−1Z τY and
200
∑ E( s(t ) − s(t )) i =1
i
i
2
= 3 × 0.04
(refer to Theorem 3.6).
The aforementioned analysis shows that, to give an estimation of signal, it is better to present the signal using regression analysis model and estimate the unknown parameters rather than to use the observed data directly. Concretely, (1) the regression analysis model can improve the representation accuracy directly; (2) it can predict the signal values at the other moments besides giving the estimation at observation moments; and (3) furthermore, given more engineering background of the object to be observed, a regression model with restriction would be constructed, which could help to improve the estimation accuracy. The following aspects should be focused in data processing: 1. For specific measurement background and measurement data, establish and select linear or nonlinear regression models (mainly parametric models) of the signals (both true signals and systematic errors) to be estimated using corresponding engineering background and physical laws. 2. Based on estimates of unknown parameters in the regression model, estimate the precision of signals measured and provide precision index of estimation.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
97
3. If there are outliers in dynamic measurement data, there should be some proper ways to identify and eliminate outliers (in particular when some outliers are interwoven in the measurement data).
4. The correctness of measurement data, mathematical modeling, and results of data processing should be validated. The following four regression models are commonly used in processing dynamic measurement data: Y = X β + e , e ∼ (0, σ 2 I )
(3.10)
Y = X β + e , e ∼ (0,G )
(3.11)
Y = f (β) + e , e ∼ (0, σ 2 I )
(3.12)
Y = f (β) + e , e ∼ (0,G )
(3.13)
where Xβ and f(β) are nonstochastic terms, Xβ is a known linear function of β, f(β) is a known nonlinear function of β, and G is a known positive definite symmetric matrix. Models 3.10 and 3.11 are linear regression models and models 3.12 and 3.13 are nonlinear regression models.
Definition 3.1 β in Equations 3.10 through 3.13 is the vector of regression coefficients and contains unknown parameters to be estimated. Y is the vector of measurement data, e is the vector of measurement errors, and X is the design matrix. It should be pointed out that research on linear regression analysis starts early and has been well established while research on nonlinear regression starts late and there are many imperfections. Since each column of X in a multiple linear regression analysis corresponds to an independent variable, it is frequently called an independent variable. This chapter mainly focuses on linear regression analysis, and some applications of nonlinear regression models are also introduced. For more details, see Ref. 4.
© 2012 by Taylor & Francis Group, LLC
98
M e A suReM en t DAtA M o D eLIn G
It should also be pointed out that, in Equations 3.10 through 3.13, σ2 and G are often unknown in practice and should be estimated. The estimation of σ2 will be discussed in the next section. In many cases, {ei} can be modeled by a zero-mean stationary AR, MA, or ARMA time series with some low order. Matrix G can be estimated by the method of iterative estimation of parameters in time series and the details are given in Chapter 4. 3.2 Basic Methods of Linear Regression Analysis 3.2.1 Point Estimates of Parameters
Consider a linear regression model Y = X β + e , e ∼ (0, σ 2 I )
(3.14)
Here, X is the design matrix, β is the vector of regression coefficients, and e is the vector of observation errors. Generally, e ~ (0, σ2 I) is called the Gauss–Markov assumption. Suppose Y = Ym×1, X = Xm×n, rank(X) = n, β = βn×1, e = em×1. Let τ
Q = (Y − X β) (Y − X β) = Y − X β
2
=
m
∑ i =1
yi −
β k X ik k =1 n
∑
2
(3.15)
β is the least squares estimate of β if it satisfies the equation Q = min. Note that m ∂Q = −2 yi − ∂β j i =1
β k X ik X ij = 0, k =1 n
∑
∑
j = 1, 2,…, n
namely m
∑ i =1
yi X ij =
β k X ik X ij = k =1 k =1
m
n
∑∑ i =1
n
m
∑∑ i =1
X ik X ij β k
Let m
∑X i =1
ik
X ij = l kj ,
© 2012 by Taylor & Francis Group, LLC
m
∑Y X i =1
i
ij
= l jy , (l kj )n × n = X τ X ,(l jy )n × l = X τ Y
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
99
Then, X τY = Xτ Xβ. If X τ X is nonsingular, the least squares estimate of β is β = ( X τ X )−1 X τ Y
(3.16)
The least squares estimate β has some statistical properties that are introduced as follows. Theorem 3.1 β is an unbiased linear estimate of β. Moreover, for ∀c ∈ Rn, c τβ is an unbiased linear estimate of c τβ. Theorem 3.2 Suppose that c ∈ Rn, and S = X τX
(3.17)
Then, COV(β ) = σ 2S −1
(3.18)
Var(c τβ) = σ 2 c τS −1c
(3.19)
The proofs of Theorems 3.1 and 3.2 can be obtained by applying Equations 3.14 and 3.16 and are left as exercises. Theorem 3.3 Let ∀c ∈ Rn. Among all unbiased linear estimators of cτβ, cτβ is the unique uniformly minimum variance unbiased estimator of cτβ. Proof First, we prove the property of minimum variance. Suppose that aτY is an unbiased linear estimator of cτβ. By the unbiasedness, we have E(a τ Y ) = a τ E(Y ) = a τ X β = c τβ
© 2012 by Taylor & Francis Group, LLC
(3.20)
10 0
M e A suReM en t DAtA M o D eLIn G
So cτ = aτ X. Then
X τ a − c = X τ a − X τ X ( X τ X )−1 c = X τ (a − XS −1c ) = 0 Var(c τ β ) = σ 2 c τS −1c = σ 2 XS −1c
2
(3.21) (3.22)
Var(a τ Y ) = E(a τ Y − E(a τ Y ))2 = E(a τ (Y − EY ))2 = E(a τ e )2 = E(a τ e )(e τ a ) = σ 2a τa = σ 2 a
2
= σ 2 a − XS −1c + XS −1c
= σ 2 a − XS −1c
2
+ XS −1c
= σ 2 a − XS −1c
2
2 + XS −1c ≥ σ 2 XS −1c
2
2
+ 2c τS −1 X τ (a − XS −1c ) 2
= Var(c τβ )
only when a = XS−1c, the equal sign holds. Now we consider the uniqueness. Suppose aτY is a linear unbiased estimate of cτβ, which satisfies Var(a τ Y ) = Var(c τβ ). By the above proof, it is easy to get a = XS−1c. So a τY = c τS −1 X τY = c τβ There is an assumption that e ~ (0, σ2 I) in the linear regression model. This assumption shows that the random errors of m measurements are i.i.d. For non-i.i.d. random errors, the linear model derived from processing dynamic measurement data can be expressed as Y = X β + e , e ∼ (0,G )
(3.23)
where σ2 > 0, and σ2 is known or unknown. G is a known positive matrix. For model 3.10 we have the following theorem. Theorem 3.4 Let β = ( X τG −1 X ) X τG −1Y . Under the assumption of model 3.10, ∀c ∈ Rn, c τβ is the unique uniformly minimum variance unbiased estimator of cτβ.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
101
Proof Since G is a positive definite matrix, it is possible to use Cholesky decomposition to get G = Aτ A. Here, A is an inverse upper (lower) triangular matrix. λ 1 A = P
λ n
λi (i = 1, …, n) are eigenvalues of G. P is an orthogonal matrix. Then the model can be changed into A −1Y = A −1 X β + A −1e A −1e ∼ (0, σ 2 I )
(3.24)
Using Theorem 3.3, cβ = ( A −1 X )τ A −1 X ( A −1 X )τ Y = ( X τG −1 X ) X τG −1Y is the uniformly minimum variance unbiased estimator of cτβ. Theorems 3.3 and 3.4 are called Gauss–Markov theorems in statistics. β is called a weighted least squares estimator. When applying Theorems 3.3 and 3.4, the G–M condition on random errors should be verified, especially it needs to be checked whether Ee = 0 holds or not. Here, it should be pointed out that the G–M condition is important not only for studying unbiased estimators but also for studying biased estimators. Note that S is a symmetric positive matrix. There is an orthogonal matrix P that satisfies: S = P ΛP τ , Λ = diag(λ 1 , λ 2 ,…, λ n ) P = (P 1, P 2, . . ., Pn) 0 < λ1 ≤ λ 2 ≤ ≤ λn
© 2012 by Taylor & Francis Group, LLC
10 2
M e A suReM en t DAtA M o D eLIn G
Then we consider model 3.14. Suppose that β = S −1 X τ Y
θ = P τβ, θi = Pi τβ
θ = P τ β , θ i = Pi τβ
Using Theorem 3.3 and e ~ (0, σ2 I), θ i is the uniformly minimum variance unbiased estimator of θi (i = 1, 2, . . ., n) among all linear unbiased estimators. Obviously, estimating β is equivalent to estimating θ. There are many methods of obtaining biased estimators, such as principle component estimation, modified principle component estimation, ridge estimation, generalized ridge estimation, etc. They are all efficient methods to improve the estimation efficiency of θ (see Ref. 1, 8, and 9). Those methods will be discussed in Section 3.5. Theorems 3.3 and 3.4 discuss the estimation of parameter β. The estimation of σ2 is discussed as follows. Under the G–M condition of e ~ (0, σ2 I), σ2 is an unknown parameter that reflects the magnitude of measurement errors. Estimation of σ2 is very important because it is necessary in constructing biased estimators and providing measurement precision. Consider model 3.10. Let vi = yi − X i β, i = 1, 2,…, m be the residual from the ith observation. The residual sum of squares RSS =
m
∑v i =1
2 i
= Y − X β
2
measures the magnitude of σ2. Residuals and residual sum of squares are widely used in regression analysis. In order to use RSS to estimate σ2, we prove the following lemma first. lemma 3.1 Suppose that ξ is an m-dimensional random vector. ξ ~ (a,V), A is a constant matrix of rank m. Then Eξ τ A ξ = a τ Aa + tr ( AV )
© 2012 by Taylor & Francis Group, LLC
(3.25)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
10 3
In particular, when COV(ξ) = σ2 Im, Eξ τ A ξ = a τ Aa + σ 2 tr( A )
(3.26)
Proof Let ξ = a + η, then η ~ (0,V). Note that tr(CD) = tr(DC); we have Eξ τ A ξ = E (a τ + ητ ) A (a + η) = a τ Aa + 2a τ AEη + E( ητ A η) = a τ Aa + Etr( A ητ η) = a τ Aa + trE( A ητ η) = a τ Aa + tr( AV ) Let V = σ2 I. Equation 3.26 holds. Now we can use Lemma 3.1 to analyze RSS. Note that EY = X β, COV (Y ) = σ 2 I ( X β)τ ( I m − XS −1 X τ ) X β = 0 tr( I m − XS −1 X τ ) = tr( I m ) − tr( XS −1 X τ ) = m − tr( XS −1 X τ ) = m − n Using Lemma 3.1, we can easily get Theorem 3.5 Let σ 2 =
RSS m−n
Under the G–M condition, we have Eσ 2 = σ 2 .
© 2012 by Taylor & Francis Group, LLC
(3.27)
10 4
M e A suReM en t DAtA M o D eLIn G
Proof Since Y ~ (Xβ, σ2 Im), let H X = X ( X τ X )−1 X τ ) = Y τ ( I − H X )τ ( I − H X )Y = Y τ ( I − H X )Y We have RSS = Y − X β
2
= Y − X ( X τ X )−1 X τ Y
2
= ( I − H X )Y
2
Using Lemma 3.1 we have ERSS = E Y τ ( I − H X )Y = ( X β)τ ( I − H X )( X β) + tr ( I − H X )σ 2 I m = β τ X τ ( I − H X ) X β + σ 2 tr I − H X = β τ X τ ( X − X ( X τ X )−1 X τ X )β + σ 2 m − tr( X ( X τ X )−1 X τ ) = σ 2 m − tr(( X τ X )−1 X τ X ) = σ 2 (m − n) So E σ 2 = E
RSS = σ2 m−n
3.2.2 Hypothesis Tests on Regression Coefficients
In many cases of using regression analysis to solve practical problems in processing measurement data, we need to test whether some components or combinations of components of vector β are zero. Consider model 3.14. Let H = X ( X τ X )−1 X τ = ( hij )m × m Then, H is a half-positive definite matrix, which satisfies H 2 = H , tr( H ) = n
© 2012 by Taylor & Francis Group, LLC
(3.28) (3.29)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
10 5
The following theorem shows the necessity of hypothesis tests on regression coefficients. Theorem 3.6 Suppose that e ~ (0, σ2 I), β is the least squares estimate of coefficient β in model 3.14 Then X β − X β = He
(3.30)
X i β − X i β ∼ (0, σ 2 hii )
(3.31)
E X β − X β
(3.32)
2
= nσ 2
Proof Since
β = ( X τ X )−1 X τY = β + ( X τ X )−1 X τ e
(3.33)
we have X β − X β = X ( X τ X )−1 X τ e = He and Equation 3.30 could be easily derived from the definition of H. From Equation 3.33 we know that X i β − X i β = X i ( X τ X )−1 X τ e So E( X i β − X i β) = E( X i ( X τ X )−1 X τ e ) = 0 Var( X i β − X i β) = E( X i β − X i β)2 = E X i ( X τ X )−1 X τ ee τ X ( X τ X )−1 X iτ = X i ( X τ X )−1 X τ σ 2 I m X ( X τ X )−1 X iτ = σ 2 X i ( X τ X )−1 X iτ = σ 2 hii
© 2012 by Taylor & Francis Group, LLC
10 6
M e A suReM en t DAtA M o D eLIn G
Thus, X i β − X i β ∼ (0, σ 2 hii ) 2 E X β − X β = E
m
∑ i =1
( X i β − X i β)2 =
m
∑ σ h = σ tr(H ) = nσ i =1
2
ii
2
2
Theorem 3.6 shows that the estimation error is positively proportional to the number of unknown parameters when using X β as the estimate of Xβ. The larger the number of parameters, the worse the estimate. This fact should be carefully considered in the process of modeling dynamic measurement data. Now we use an example to illustrate the point. Example 3.2 Suppose that the true observed function f(t) is a cubic polynomial. Independent measurements f(ti) (i = 1, 2, . . . , m) with equal precision are measured at time ti and are denoted by tuple (ti, yi) (i = 1, 2, . . ., n). Also,
yi = f (i ) + ei , i , j = 1, 2,…, m 2 Eei = 0, Eei ej = σ δ ij Estimate f(t). Solution
Note that f(ti) = a 0 + a1t + . . . a2t2 + a3t3. Let
β = (a0 , a1 , a2 , a3 )τ Y = ( y1 , y 2 ,…, ym )τ e = (e1 , e 2 ,…, em )τ 1 1 X = 1
© 2012 by Taylor & Francis Group, LLC
t1
t12
t2
tm
t 22
tm2
t13 t 23 tm3
(3.34)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
Then Equation 3.34 can be rewritten as Y = X β + e , e ∼ (0, σ 2 I )
(3.35)
Using Equation 3.35 the least squares estimate of polynomial coefficient β is β = (a 0 , a 1 , a 2 , a 3 )τ , the estimate of f(t) is f(ti ) = a 0 + a 1 t + a 2 t 2 + a 3 t 3 Using Theorem 3.6, E
m
∑ f (t ) − f (t ) j =1
i
2
i
= 4σ 2
(3.36)
Suppose that f(t) is a quadratic polynomial and is expressed by f(ti) = a 0 + a1t + a2t 2. Repeating the process above, we have f (ti ) = a0 + a1t + a2t 2 and E
m
∑ f (t ) − f (t ) j =1
i
i
2
= 3σ 2
(3.37)
By comparing Equation 3.36 with 3.37, we conclude that if f (t) is indeed a quadratic polynomial but is treated as a cubic polynomial, the result may not be as good as the quadratic case. When there are no errors of modeling or modeling errors are negligible, the smaller the number of parameters the better. It is a hypothesis testing problem to choose a cubic polynomial or a quadratic polynomial to estimate f (t). The hypothesis is H : a3 = 0 If the hypothesis is accepted, we use a quadratic polynomial f (t ) as an estimate of f(t). Otherwise, we use a cubic polynomial f (t ) as an estimate of f(t). Commonly, the number of unknown parameters derived by function approximation is more than necessary. Theorem 3.6 shows that overparameterization has negative influence on data processing. If we assert via an analysis of engineering background or a hypothesis test that some parameters or their linear combinations are 0, it is very helpful to increase the accuracy of data processing.
© 2012 by Taylor & Francis Group, LLC
10 7
10 8
M e A suReM en t DAtA M o D eLIn G
Next, we discuss the general case that whether some linear combinations of regression coefficients are zero or not, that is, (3.38)
H : Gβ = 0 where G is a matrix of k × n, and k ≤ n, rank(G ) = k . By matrix theory, there exists an L(n−k)×n such that L D = G is a nonsingular matrix of n × n. Let Z = XD−1, α= Dβ. Then, Z α = X β.
(3.39)
Model 3.14 is equivalent to (3.40)
Y = Z α + e , e ∼ (0, σ 2 I ) Let
Z = (z1 ,… , zn ) α = (α 1 ,… , α n ) τ Z * = (z1 ,… , zn − k ) α * = (α 1 , α 2 ,… , α n − k ) τ α = (Z τZ )−1Z τY −1
α * = (Z * )τ Z * (Z * )τ Y RSS H = Y − Z *α *
2
2 RSS = Y − X β = Y − Z α
2
Obviously, Gβ = 0 ⇔ αn− k +1 = = αn = 0
(3.41)
So, when H holds, model 3.14 can be rewritten as Y = Z *α * + e , e ∼ (0, σ 2 I )
© 2012 by Taylor & Francis Group, LLC
(3.42)
10 9
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
It is easy to prove that RSSH ≥ RSS. Furthermore, we have the following theorem.
Theorem 3.7 If e ~ N(0, σ2 Im) in model 3.14, then RSS ∼ χ 2 (m − n) σ2
(3.43)
when hypothesis 3.38 holds, 1.
RSS H − RSS ∼ χ 2 ( k) 2 σ
(3.44)
2. RSS is independent of RSSH − RSS 3. FH =
m − n RSSH − RSS ⋅ ∼ F ( k, m − k ) k RSS
(3.45)
where χ2(m − n), F(k, m − k) are χ2 distribution with degrees of freedom (m − n) and F distribution with degrees of freedom (k, m − k). The proof of Theorem 3.7 is complicated and will be given in Section 3.4. Readers can also refer to Ref. 1. Using Theorem 3.7, we give the following criterion. criterion 3.1 Consider model 3.14. If e ~ N(0, σ2 Im), then we can use the following method to test whether hypothesis (Equation 3.38) holds or not: given a certain level of significance α (generally, 0.05 or 0.01), if FH > Fk,m−n(α), hypothesis 3.38 is rejected. Otherwise, it is accepted. 3.2.3 Interval Estimates of Parameters
Consider the linear regression model 3.14 and assume that e ~ N(0, σ2 Im). We study interval estimates of regression coefficients, their linear combinations, and σ2. Interval estimation of parameter is not only the basic content of regression analysis, but also the foundation of evaluating precision indices of measurements.
© 2012 by Taylor & Francis Group, LLC
110
M e A suReM en t DAtA M o D eLIn G
lemma 3.2 Suppose that ξ ∈ R n, ξ ~ N(0,G), and G is a symmetrical, positivedefinite matrix of rank n, then ξ τG −1 ξ ∼ χ 2 (n) Proof Since G is a symmetrical square matrix, there exists a nonsingular matrix A satisfying G = AA τ and A −1ξ ~ N(0, In). Then ξ τG −1 ξ = ( A −1ξ)τ A −1ξ ∼ χ 2 (n) lemma 3.3 Consider regression model 3.14. If e ~ N(0, σ2 Im), then parameter β is independent of RSS. Proof Note that e ~ N(0, σ2 Im), H is the idempotent symmetrical matrix defined by Equation 3.28 satisfying Equation 3.29. So
β = ( X τ X )−1 X τ ( X β + e ) = β + ( X τ X )−1 X τ e ∼ N(β, σ 2 ( X τ X )−1 ); ( I − H )e ∼ N(0, σ 2 ( I − H )) Note that τ E[( I − H )e ]β = 0
RSS = Y − X β = ( X β + e ) − X ( X τ X )−1 X τ (( X β + e )) 2
= ( I − H )e
© 2012 by Taylor & Francis Group, LLC
2
2
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
111
Therefore, RSS is independent Then (I − H)e is independent of β. of β. Theorem 3.8 Suppose A is a matrix of k × n, rank(A) = k, e ~ N(0, σ2 Im). Then, for model 3.14, we have m − n (β − β)τ A τ ( A ( X τ X )−1 A τ )−1 A (β − β) ⋅ ∼ F ( k, m − n) (3.46) k RSS
Proof Since e ~ N(0, σ2 Im), A (β − β) = A (( X τ X )−1 X τ ( X β + e ) − β) = A ( X τ X )−1 X τ e ∼ N(0, σ 2 A ( X τ X )−1 A τ )
Since σ2 A(X τ X)−1 Aτ is a symmetrical positive definite matrix of rank k, and by Lemma 3.2, we know that (β − β)τ A τ ( A ( X τ X )−1 A τ )−1 A (β − β) σ2
∼ χ 2 ( k)
(3.47)
Using (1) of Theorem 3.7, RSS/σ 2 ∼ χ 2 (m − n). Also, we know that β is independent of RSS based on Lemma 3.7; therefore, (β − β)τ A τ ( A ( X τ X )−1 A τ )−1 A (β − β) k σ2 RSS m−n σ2 =
m − n (β − β)τ A τ ( A ( X τ X )−1 A τ )−1 A (β − β) ∼ F ( k , m − n) ⋅ k RSS
© 2012 by Taylor & Francis Group, LLC
112
M e A suReM en t DAtA M o D eLIn G
Theorem 3.8 has some special cases: Suppose the confidence probability Pα = 1− α (α is the level of significance) is given. 1. Let A = In. Then 2 (β − β)τ A τ ( A ( X τ X )−1 A τ )−1 A (β − β) = X β − X β
RSS = (m − n)σ 2 . Using Theorem 3.8, X β − X β 2 P F ( ) ≤ α = 1−α n , m − n 2 nσ
(3.48)
So the accuracy index of using X β as an estimate of Xβ is nσ 2 Fn,m − n (α). 2. Let A = c = c 1×n. Then (β − β)τ A τ ( A ( X τ X )−1 A τ )−1 A (β − β) =
2 cβ − cβ
c ( X τ X )−1 c τ
, k =1
Using Theorem 3.8, − cβ 2 c β P F ( ) ≤ α = 1−α , m n 1 − τ −1 τ 2 c ( X X ) c σ we can get an interval estimate of cβ. In particular, if we take c = (0, … , 0, 1, 0, … , 0), β i − βi P 2 sii σ
2
≤ F1,m − n (α ) = 1 − α
(3.49)
sii is the element in the ith row and the ith column in matrix (X τ X)−1. We can also get the interval estimate of parameter βi. 3. Assume that A = Xi (the ith row of X). Then − X β2 X i β i P F ( ) ≤ α = 1−α 1,m − n 2 hii σ
© 2012 by Taylor & Francis Group, LLC
(3.50)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
113
hii is the element of ith row and ith column in matrix H = X(Xτ X)−1 Xτ . The interval estimate of the ith real signal Xi β and its precision index can be obtained. Cases (1), (2), and (3) are commonly used in studying the precision index of measurement. Particularly, Equation 3.50 can be expressed in terms of t distribution. In fact, X i β − X i β = X i ( X τ X )−1 X τ e ∼ N(0, hii σ 2 ), RSS/σ 2 ∼ χ 2 (m − n) β is independent of RSS, so X i β − X i β hii σ
RSS X i β − X i β 1 ⋅ 2 = ∼ t (m − n) m−n σ hii σ
then X β − X i β α ≤ tm − n = 1 − α P i 2 hii σ So α tm − n α ≤ X i β ≤ X i β + hii σ tm − n = 1 − α P X i β − hii σ 2 2 (3.51) The interval estimate of Xiβ has a precision index of α ± hii σ tm − n . 2 Using Theorem 3.7, RSS/σ2 ~ χ2 (m − n), the interval estimate of σ2 can be determined. Theorem 3.9 In model 3.14, suppose e ~ N(0,σ2 Im), χm2 − n (α ) is a upper quatile of χ2 distribution with m−n degrees of freedom. Then RSS RSS P ≤ σ2 ≤ = 1−α α α 2 2 χm − n χm − n 1 − 2 2
© 2012 by Taylor & Francis Group, LLC
(3.52)
114
M e A suReM en t DAtA M o D eLIn G
Proof Using Theorem 3.7, RSS/σ2 ~ χ2 (m − n). Then α RSS α P χm2 − n 1 − ≤ 2 ≤ χm2 − n = 1 − α 2 2 σ and Equation 3.52 is derived. 3.2.4 Least Squares Estimates and Multicollinearity
So far, we have studied linear regression model 3.14. Section 3.2.1 discussed point estimations of β,σ2; Section 3.2.2 dealt with hypothesis tests on some linear combinations of β are zero; and Section 3.2.3 provided interval estimates of β,σ2. All these conclusions are classical. However, all conclusions so far are based on the least squares estimates of the coefficients β. The Gauss–Markov theorems, Theorems 3.3 and 3.4, guarantee that the least squares estimate has the minimum variance among all linear unbiased estimates. Moreover, if we assume that measurement errors follow normal distributions, it can be shown that least squares estimates have more good properties. Because of their good properties, least squares estimates have been the only estimates that are widely used for a long period of time since they were discovered by Carl Friedrich Gauss. However, with the development of modern science and technology, people encounter more and more large-scale regression problems with many variables. For example, it is common to have a linear regression model with 30–50 variables or a design matrix of 30–50 columns in error separation models in guidance tools of a spacecraft and data processing models in space measurements (see Chapter 6). Practical applications in data processing show that least squares estimates are inefficient in dealing with large-scale regression models with many variables and can be very bad in specific scenarios. Many statisticians have done a lot of research to improve least squares estimates since the 1950s. It is envisaged that, since least squares estimates are optimal among linear unbiased estimates and if they are satisfactory in dealing with a problem using regression analysis,
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
115
other linear unbiased estimators will be more powerless. Naturally, biased estimates should be applied (see Section 3.5 for details). It is natural to ask when least squares estimates deteriorate and under what conditions biased estimates should be adopted. To answer such questions, the mean square error (MSE) is introduced as a reasonable criterion to compare the merits of biased and unbiased estimates. Definition 3.2 Suppose θ is a vector of unknown parameters of dimension n. θ is an estimate of θ. 2 MSE(θ ) = E θ − θ
(3.53)
is called the mean square error of θ. MSE(θ ) measures the difference between a parameter θ and its esti A good estimate has a small mean square error. Now we calmate θ. culate the mean square error of a least squares estimate. Using Equation 3.34 β = β + ( X τ X )−1 X τ e Since e ~ N(0,σ2 Im), then − β 2 = ( X τ X )−1 X τ e MSE(β ) = E β Using Lemma 3.1,
2
MSE(β ) = σ 2 tr( X τ X )−1
(3.54)
Note that Xτ X is a positive definite matrix; there exists an orthogonal matrix P satisfying that X τ X = P ΛP τ , where Λ = diag(λ 1 , λ 2 , …, λ n )
(3.55)
where 0 < λ1 ≤ λ2 ≤ . . . ≤ λn are the eigenvalues of Xτ X. Then, using Equations 3.54 and 3.55, we have ( X τ X )−1 = P Λ −1P τ
© 2012 by Taylor & Francis Group, LLC
116
M e A suReM en t DAtA M o D eLIn G
MSE(β ) = σ 2 tr( X τ X )−1 = σ 2 tr( P Λ −1P τ ) = σ 2 tr( Λ −1PP τ ) = σ 2 tr( Λ −1 ) =σ
2
n
∑λ i =1
(3.56)
−1 i
From Equation 3.55, we know that if X τ X has one small eigenvalue, its corresponding λ−1σ2 will be large and MSE(β ) is large. Least squares estimates are inefficient in such a case. An example is given next for further elaboration. This example demonstrates that X τ X can have eigenvalues close to zero. Example 3.3 Suppose that there is a set of observation data (ti, yi) (i = 0, 1, . . . , 200); the mathematical model can be described by yi = f (ti ) + ei , i = 0, 1, …, 200 {ei }i.i.d. ∼ N (0, 0.01)
(3.57)
where ti = 0.01i − 1(i = 0, 1, . . . , 200). The function f (t) to be estimated satisfies 1
∫
2
1 − t 2 f ′′(t ) dt ≤ 20
(3.58)
−1
Estimate f(t).
Solution
Using the discussion in Section 2.2, Chebyshev polynomials can be chosen as basis functions. Let 32
P (t ) =
© 2012 by Taylor & Francis Group, LLC
∑β T j =1
j
j −1
(t )
(3.59)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
be an approximation function to f(t) where Ti (t) (i = 0, 1, ..., 31) is the ith-order Chebyshev polynomial and (β1, β2, ... ,β32) are the unknown coefficients. Using Theorems 2.11 and 2.15 we know that P(t) can approximate f (t) to a fairly high precision (the truncation error is much less than 0.1, which is the mean square error of the random error). Based on the discussion above, we have 32 yi = β jT j −1(t ) + ei , i = 0, 1,…, 200 j =1 {e }i.i.d. ∼ N(0, 0.01) i
∑
(3.60)
Note that β = (β1 , β 2 , ..., β32 )τ , Y = ( y1 , y 2 , ..., y 200 )τ , e = (e1 , e 2 , ..., e 200 )τ .
T0 (t 0 ) T0 (t1 ) X = T t ( 0 200 )
T1(t 0 )
...
T1(t1 )
...
T1(t 200 )
...
T31(t 0 ) T31(t1 ) T31(t 200 )
Then, model 3.60 can be simply rewritten as Y = X β + e , e ∼ N(0, σ 2 I )
(3.61)
Every element of X can be expressed in terms of ti, Tj (i = 0, 1, . . . , 200, j = 0, 1, . . ., 31). It can be shown that the minimum and the second minimum eigenvalues of X τ X are λ1 = 0.000054, λ2 = 0.000065. Since σ2 = 0.01, we have MSE(β ) >
σ2 σ2 + > 339 λ1 λ 2
Obviously, it is not good to use β as an estimation of β. Similarly, it is also not good to use
© 2012 by Taylor & Francis Group, LLC
117
118
M e A suReM en t DAtA M o D eLIn G
P (t ) =
32
∑ β T j =1
j
j −1
(t )
as an estimation of f(t). (Further discussion is given in Example 3.8.) This example shows a situation that X τ X has eigenvalues close to zero in practical applications of measurement data processing. One may argue that if we just use a lower-order polynomial to approximate the function, we can avoid the situation having eigenvalues close to zero. However, lower-order polynomials cannot guarantee that P(t) ≈ f(t) holds in many cases. Therefore, we cannot obtain Equation 3.61 from Equation 3.57. If X τ X has one or more eigenvalues close to zero, then it can be proved that all X′s columns are approximately linear dependent. In fact, suppose λ1 is an eigenvalue of Xτ X close to 0, and P 1 is an eigenvector of λ1 that satisfies ∙P 1∙ = 1. Then XP1
2
= P1τ X τ XP1 = λ 1
So if λ1 ≈ 0, then XP 1 ≈ 0. Since P 1 is a nonzero vector (∙P 1∙ = 1), columns of X are approximately linear dependent. This is customarily referred to as multicollinearity. The root cause of deterioration of least squares estimates is multicollinearity rather than the number of unknown parameters to be estimated. However, it is important to point out that, based on the premise of small modeling errors, for the same regression model of dynamic measurement data, the smaller the number of columns in the design matrix the better. Therefore, biased estimates are only applied when there are some eignvalues of X τ X close to zero. Generally, least squares estimates are used.
3.3 Optimization of Regression Models
The most effective way to process measurement data is to transform the data processing problem into a problem of regression analysis so that research results in modern regression analysis can be well applied to increase the precision of data processing. In mathematical processing of radar measurements, the estimation of orbit parameters and systematic errors of measurement can both be transformed into problems in regression analysis (see Chapter 6 for details).
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
119
The key in transforming problems of processing dynamic data into problems of regression analysis is transformation. One data processing problem may be transformed into different regression analysis models. Different methods of transformation and different models transformed result in different model accuracies, different results of mathematical processing, and different interpretations of measurement data and errors. It is natural to ask: What is the optimal model? To get an ideal mathematical model of dynamic measurement data, it is necessary to study the following issues: 1. Comparing different models in processing the same measurement data 2. Separating true signals and systematic errors 3. Selecting optimal submodels from a full model 3.3.1 Dynamic Measurement Data and Regression Models
We first consider dynamic measurement data with random errors only. Suppose (ti, yi)(i = 1,2,. . .,m) are observed data of f (t) at time ti for i = 1, 2, . . ., m. Assume that the model for the observed data is yi = f ( t i ) + ei , i = 1, 2,…, m i.i.d . ei ∼ N(0, σ 2 )
(3.62)
The main task is measuring data to estimate f(t). If f (ti) (i = 1,2, . . . , m) can be estimated accurately, then f(t)(ti ≤ t ≤ ti) can be estimated by the method of functional approximation in Chapter 2. Let y1 y2 Y = , ym
© 2012 by Taylor & Francis Group, LLC
f =
f (t1 ) e1 f (t 2 ) e2 , e = em f (tm )
12 0
M e A suReM en t DAtA M o D eLIn G
Using Equation 3.62 we have (3.63)
Y = f + e , e ∼ (0, σ 2 I ) Suppose that f (t ) =
N
∑ β ψ (t ) + b(t ) j =1
j
(3.64)
j
where (ψ1, ψ2, …, ψN) is a set of linearly independent basis functions, and b(t) is the residual between f (t) and its approximation using the best combination of the basis functions. Let X = (xij )m × N , xij = ψ j (ti ) β = (β1 , β 2 , …, β N )τ b = (b(t1 ), b(t 2 ), …, b(tm ))τ Then f = Xβ + b
(3.65)
Y = X β + b + e 2 e ∼ (0, σ I )
(3.66)
Using model 3.66, the least squares estimate of β is β = ( X τ X )−1 X τ Y = β + ( X τ X )−1 X τ b + ( X τ X )−1 X τ e (3.67) Thus, X β is an estimate of f with an estimation error E X β − f
2
= E X β − ( X β + b )
= E b − X ( X τ X )−1 X τ b Let HX = X(X τ X)−1 X τ . Then
© 2012 by Taylor & Francis Group, LLC
2
2
+ E X ( X τ X )−1 X τ e
2
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
E X β − f
2
= ( I − H X )b
2
+ N σ2
121
(3.68)
For a measurement device, σ2 is fixed. Using Equation 3.68, we observe that the key to estimate f accurately is to select a proper basis function such that ∙(I − H X)b ∙2 and the number of unknown parameters N are minimized simultaneously. It should be pointed out that by using Equation 3.65, we have ( I − H X )b
2
= (I − H X ) f
2
Therefore, using Equation 3.68, we have E X β − f
2
= (I − H X ) f
2
+ N σ2
Readers can naturally ask the following question. Since the selection of basis functions is so important, if there are two or more than two sets of basis functions available, how to select a better one? Next, we present a method from the perspective of residual analysis. Suppose that Y = f + e, f = Xβ + b. Consider the sum of squared residuals: E Y − X β
2
= E f + e − X ( X τ X )−1 X τ ( f + e ) = E (I − H X ) f + (I − H X e) = (I − H X ) f
2
2
2
(3.69)
m − N )σ 2 + (m
When σ2 is known, 2 E ( 2 N − m )σ 2 + Y − X β
= ( I − H X )b
2
+ Nσ
2
(3.70)
Suppose that there is another set of basis functions. Using Equation 3.63, we get model Y = Z α + d + e , 2 e ∼ (0, σ I )
© 2012 by Taylor & Francis Group, LLC
rank (Zm × M ) = M
(3.71)
12 2
M e A suReM en t DAtA M o D eLIn G
Let = (Z τZ )−1 Z τ Y H Z = Z (Z τZ )−1 Z τ , α Similar to Equation 3.70, we conclude that 2
E[( 2 M − m )σ 2 + Y − Z α ] = M σ 2 + ( I − H X )d
2
(3.72)
Based on two different sets of basis functions, we use X β and Zα to estimate f. Their mean squared errors are E X β − f
E Z α − f
2
2
2
= E (I − H X ) f
+ N σ2
= E ( I − H X )b
2
+ N σ2
= E (I − HZ ) f
2
+ M σ2
= E ( I − H Z )d
2
+ M σ2
(3.73)
(3.74)
By comparing the right sides of Equations 3.73 and 3.74, we can determine whether X β or Zα is a better estimate of f. Using Equations 3.70 and 3.71, we need to construct the following two statistics: D = ( 2 N − m )σ 2 + Y − X β 2 = ( 2 N − m)σ 2 + ( I − H )Y X X 2 DZ = ( 2 M − m )σ 2 + Y − Z α = ( 2 M − m )σ 2 + ( I − H Z )Y
2
2
(3.75) If DX < DZ , model 3.66 is better than model 3.71, otherwise model 3.71 is better. In particular, when N = M, the model with a smaller sum of squared error is better. When σ2 is unknown, if m − N, m − M are relatively large, the corresponding residual sum of squares can be used as an estimate of σ2 by Theorem 3.6. Let
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
2 , RSS = Y − Z α RSS X = Y − X β Z RSS X RSSZ σ *2 = min , m − N m − M
12 3
2
(3.76)
2 Replacing σ2 in statistic 3.75 with σ * , we can calculate values of statistics DX and DZ , respectively. It should be pointed out that, since 2 the values of statistics σ * , DX, and DZ are only related to residual sum of squares and the number of unknown parameters, Equations 3.75 and 3.76 can be rewritten as
DX* = ( 2 N − m)σ *2 + RSS X * 2 DZ = ( 2 M − m)σ * + RSSZ
(3.77)
RSS X RSSZ σ *2 = min , m − N m − M
(3.78)
criterion 3.2 If DX* < DZ* , model 3.66 is better than model 3.71, otherwise model 3.71 is better. Remark 3.1 1. Criterion 3.2 is applicable for both linear and nonlinear models. For nonlinear models, 2 DX* = ( 2 N − m )σ *2 + Y − g (β )
(3.79)
2 DZ* = ( 2 M − m )σ*2 + Y − h(α )
(3.80)
where g and h are the nonlinear functions of unknown parameters β and α. β , α are the nonlinear least squares estimates of parameter β and α.
© 2012 by Taylor & Francis Group, LLC
12 4
M e A suReM en t DAtA M o D eLIn G
2. When there are no systematic errors in the observed data, it is required to have the number of unknown parameters small and the model accurate and good in estimating f. It is not required to have eigenvalues close to zero in the design matrix. 3.3.2 Compound Models for Signals and Systematic Errors
Consider dynamic measurement data with both systematic errors and random errors. Let s(t) be the unknown function for systematic errors, s = (s(t 1),…, s(tm))τ . Consider the following measurement data model: Y = f + s + e 2 e ∼ (0, σ I )
(3.81)
Select two sets of basis functions simultaneously and express f(t) and s(t) as linear combinations of corresponding basis functions. We have (by ignoring truncation errors) f = X P βP , s = X RβR
(3.82)
where XP is a matrix of m × p and XR is a matrix of m × (n − p). Let X = ( X P , X R ) , β = (β P , β R )τ , rank( X P ) = p, rank( X R ) = n − p Then, model 3.81 can be rewritten as Y = X β + e = X P βP + X R βR + e 2 e ∼ (0, σ I )
(3.83)
If rank(X) = n, the least squares estimate of β from model 3.83 is β β = P = ( X τ X )−1 X τ Y β R
(3.84)
XP βP is used to approximate f and XR βR is used to approximate s. By ignoring truncation errors, Using Equation 3.84 and Theorem 3.6, we have
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
E X β − X β = nσ 2 .
12 5
(3.85)
The left side of Equation 3.85 is independent of whether X τ X has some eigenvalues close to zero or not. It only related to σ2 and the total number of parameters to be estimated in the true signals and systematic errors. What is emphasized is that, for model 3.83, it is XP βP and XR βR that need to be estimated rather than Xβ. It is natural to use X P β P and X R β R as estimates of XP βP and XR βR . Let us analyze their estimation errors. Since X = Xm×n, rank(X) = n, X τ X , X τp X P , X Rτ X R are all nonsingular matrices. Let β P = ( X τp X P )−1 X τp Y τ −1 τ A = (XP XP ) XP XR X RR = X R − X P A τ B −1 = X RR X RR
(3.86)
τ −1 τ C = ( X R X R ) X R X P X PP = X P − X RC τ D −1 = X PP X PP
(3.87)
lemma 3.4 Suppose that rank(X) = n. Then, τ
X X
−1
X Pτ X P = τ X R X P
X Pτ X R ( X τp X P )−1 + ABA τ = − BA τ X Rτ X R
− AB B (3.88)
© 2012 by Taylor & Francis Group, LLC
12 6
M e A suReM en t DAtA M o D eLIn G
Proof Note that B τ = B , X Pτ X P A = X Pτ X R
(3.89)
τ X RR X P A = ( X R − X P A )τ X P A = ( X Rτ − A τ X Pτ ) X P A
= [ X Rτ X P − X Rτ X P ( X Pτ X P )−1 X Pτ X P ] A = 0 So ( X Pτ X P A − X Pτ X R )BA τ = 0
(3.90)
τ X Rτ X R = A τ X Pτ X P A + X RR X RR = X Rτ X P ( X Pτ X P )−1 X Pτ X R + B −1
(3.91) X Rτ X R B = X Pτ X P ( X Pτ X P )−1 X Pτ X R AB + I n − P = X Rτ X p AB + I n − P
(3.92)
Then, Using Equations 3.89 through 3.92, we have X Pτ X P τ X R X P
X Pτ X R ( X τp X P )−1 + ABA τ − BA τ X Rτ X R
I P + ( X Pτ X P A − X Pτ X R )BA τ = τ τ τ ( I n − P + X R X P AB − X R X R B ) A Ip = 0
− AB B −( X Pτ X P ) AB − X Pτ X R B − X Rτ X P AB + X Rτ X R B
0 = In I n − p
Using Lemma 3.4, we can easily prove the following theorem. Theorem 3.10 Consider model 3.83 τ τ τ β P = β P + ( X PP X PP )−1 X PP e ⇒ COV (β P ) = σ 2 ( X PP X PP )−1
(3.93)
© 2012 by Taylor & Francis Group, LLC
12 7
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
τ τ τ X RR )−1 X RR e ⇒ COV ( β R ) = σ 2 ( X RR X RR )−1 β R = β R + ( X RR
(3.94) Proof Since β P τ −1 τ β = β = ( X X ) X Y R ( X Pτ X P )−1 + ABA τ = − BA τ
− AB X Pτ Y B X Rτ Y
β R = − BA τ X Pτ Y + BX Rτ Y = B( X R − X P A )τ Y τ τ = ( X RR X RR )−1 X RR ( X P βP + X RβR + e )
(3.95)
Note that τ X RR X P = 0, X R = X P A + X RR
Thus, τ τ τ τ X RR )−1 X RR X RR β R + ( X RR X RR )−1 X RR e β R = ( X RR τ = β R + ( X RR X RR )−1 X Rτ R e
Similarly, τ τ β P = β P + ( X PP X PP )−1 X PP e
Using Theorem 3.10, we can easily obtain (left as an exercise) E X P βP − X P β P
2
= σ 2 tr ( X τp X P )( X τpp X PP )−1
(3.96)
E X R βR − X R β R
2
τ = σ 2 tr ( X Rτ X R )( X RR X RR )−1
(3.97)
© 2012 by Taylor & Francis Group, LLC
12 8
M e A suReM en t DAtA M o D eLIn G
Similar to the derivation of Equation 3.91, we have τ X τp X P = X τp X R ( X Rτ X R )−1 X Rτ X P + X PP X PP
(3.98)
Using Equations 3.96 and 3.98, E X P β P − X P β P
2
τ X PP )−1 ] (3.99) = pσ 2 + σ 2 tr[ X τp X R ( X Rτ X R )−1 X Rτ X P ( X PP
Similarly, using Equations 3.97 and 3.91, E X R β R − X R β R
2
τ = (n − p )σ 2 + σ 2 tr[ X Rτ X P ( X Pτ X P )−1 X Pτ X R ( X RR X RR )−1 ]
(3.100) Combining Equations 3.99 and 3.100, we have E X P β P − X P βP
2
+ E X R β R − X R βR
2
≥ nσ 2
(3.101)
It is straightforward to see that only when X Pτ X R = 0 , we have E X β − X β P P P P E X R β R − X R β R E X P β P − X P β P
2
= pσ 2
2
= (n − p )σ 2
2
+ E X R β R − X R β R
(3.102) 2
= nσ 2
This means that only when XP and XR are orthogonal, the estimate is the best. The more XP and XR are correlated the worse the estimates of f and s are. Let us see an example next. Example 3.4 Consider tracking moving subjects with a range finder. Suppose object A moves in a straight line with uniform velocity and its position is x(t) at time t. Object B moves in a straight line with a varying velocity and its position is z(t) at time t where z(t) satisfies z ”(t) + z(t) = 0.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
Let yx (ti),yz(ti) (i = 1, 2, …, m) be the observed data of x(ti) and z(ti) at time ti and
yx (ti ) = x(ti ) + a + ex (ti ), i = 1, 2,..., m 2 ex (ti ) ∼ (0, σ ), Eex (ti )ex (t j ) = 0, (i ≠ j )
(3.103)
yz (ti ) = z(ti ) + a + ez (ti ), i = 1, 2,..., m 2 ez (ti ) ∼ (0, σ ), Eez (ti )ez (t j ) = 0, (i ≠ j )
(3.104)
Estimate x(t), z(t) (t1 ≤ t ≤ tm) and the constant systematic error a. Solution
Let
Yx = ( yx (t1 ), yx (t 2 ),… , yx (tm ))τ Yz = ( yz (t1 ), yz (t 2 ),… , yz (tm ))τ ex = (ex (t1 ), ex (t 2 ),… , ex (tm ))τ ez = (ez (t1 ), ez (t 2 ),… , ez (tm ))τ 1 1 X = (XP , XR ) = 1 sin t1 sin t 2 Z = (Z P , Z R ) = sin tm
t1
1 1 1
t2
tm cos t1
cos t 2
cos tm
1 1 1 1
s c β P 0 aP 1 β = = v0 , α = = c 2 βR aR a a
© 2012 by Taylor & Francis Group, LLC
12 9
13 0
M e A suReM en t DAtA M o D eLIn G
where parameter a is the constant systematic error and
β = α = a R R x ( t ) = s0 + v 0 t z(t ) = c1 sin t + c 2 cos t
(3.105)
Using Equations 3.103 and 3.104 and notations above,
Yx = X β + ex , ex ∼ (0, σ 2 I m )
(3.106)
Yz = Z α + ez , ez ∼ (0, σ 2 I m )
(3.107)
Since X R can be expressed by X P, X is not a matrix of full column rank. The constant system error in model (Equations 3.106) cannot be estimated directly. However, estimates of v 0 and s 0 + a can be derived using model (Equations 3.106). When m > >3, model (Equations 3.107) can be used to estimate c 1, c 2, a.
Remark 3.2 Example 3.4 shows that 1. When rank(XP) = p and rank(XR) = n − p, rank(X) = n may not be true. If rank(X) < n, X P β P and X R β R cannot be estimated. When setting up models for true signals and systematic errors, it is better to use two sets of basis functions that are uncorrelated with each other. There should be no Eigenvalue of X τ X close to zero or equal to zero. 2. The systematic error of a measurement instrument (assumed that such a error has the same expression and parameters in various scenarios) can be measured by designing various objects observed. The constant systematic error of a device can be observed from the simple harmonic motion of an object rather than the straight-line uniform motion. 3. When rank(X) < n, try to find the reason of correlation and to estimate some linear combinations of parameters through model adjustment. For example, estimates of v 0 and s0 + a are
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
131
given in Example 3.4. Such estimates of linear combinations of parameters are very helpful for data processing in practice. If different systematic errors are correlated with each other but are uncorrelated to true signals, they have no influence on the estimation of true signals. 3.4 Variable Selection
Section 3.3 discussed methods of transforming a problem in processing dynamic measurement data into a linear regression model. It concludes that an ideal mathematical model in processing measurement data should have a small error of modeling and a few unknown parameters, and is good for the estimation of true signals and systematic errors. Generally, a linear regression model for processing dynamic measurement data is based on practice experiences, knowledge of functional approximation, and general rules. In many cases, some parameters to be estimated are in fact zero or negligible. The introduction of unnecessary parameters into a model will definitely increase the number of unknown parameters and affect the result of data processing. When using regression analysis to solve problems in processing dynamic measurement data, it is necessary to select an optimal model. For example, if Xβ = x1β1 + x 2β2 + … + xnβn is used to approximate f, and xiβi is close to 0, eliminating xiβi is beneficial to estimate both f and coefficients. Variable selection here means to find out and eliminate some xiβi close to 0 such that the number of unknown parameters in the model is reduced. We have observed by the discussion in Section 3.3 that, under the premise of guaranteeing the model accuracy, the smaller the number of unknown parameters the better. The following example illustrates the role of variable selection well. Example 3.5 Suppose that f(t) ∈ C4[−1, 1], ∙ f (4)∙ ∞ < 24. Let yi = f (ti) + ei, ei ~ N(0, 0.12), i.i.d. be the observed data at time ti = 0.01i − 1, i = 0,1, …, 200. Find a polynomial model and a spline model for f, respectively. If f(t) = a 0 + a1T1(t) + a2T2(t) + a3T3(t) + a4T4(t) + a5T 5(t), (|a 5| < 1), compare the two models.
© 2012 by Taylor & Francis Group, LLC
13 2
M e A suReM en t DAtA M o D eLIn G
Solution
Let Pn(t) be the best polynomial approximation of order n for f(t). Using Theorem 2.10, the number of parameters to be estimated N should be determined so that the mean square root error of truncation errors is at least two magnitude order smaller than that of random errors. Note that ∙ f (4)∙∞ < 24. We have π 4 24 0.1 < 2 N ( N − 1)… ( N − 3) 100 N = min that is, N = 21. On the other hand, let s(t) be the cubic spline approximation of f(t) satisfying (Equation 2.51). Using Theorem 2.2, the number of unknown coefficients to be estimated is determined by (here,
h = 2 /(M − 3)
4 5 0.1 2 ⋅ 24 ⋅ < 100 384 3 M − M = min
that is, M = 11. Let Y = ( y0 , y1 ,… , y 200 )τ xij = T j −1(ti −1 ), i = 1, 2,… , 201; τi = −1 + ( j − 2)h,
j = 1, 2,… , N
j = 1, 2,… , m
ti −1 − τ j − 2 zij = B , i = 1, 2,… , 201; h
j = 1, 2,… ,m
We have two regression models for polynomial coefficients β and spline coefficients α Y = X β + e , e ∼ N(0, 0.01I ) Y = Z α + e , e ∼ N(0, 0.01I ) Note that the number of unknown parameters in the spline model is fewer than that of the polynomial one. From the discussion
© 2012 by Taylor & Francis Group, LLC
13 3
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
in Section 3.2, results of data processing based on the spline model are better. If function f (t) is known as a polynomial of order 4 when we try to build a model from the data, we can use a polynomial model and N = 5. If it is not known whether function f (t) is a polynomial of order 4 or not, the criteria of variable selection can be used to identify whether xiβi = 0 (i = 6, 7, … , n) or not. If the number of zero or negligible coefficients is determined by criteria of variable selection, the number of parameters to be estimated in a polynomial model is less than that in a spline model. Therefore, polynomial models are better than spline models in this case.
Remark 3.3 Various mathematical models for dynamic measurement data can be derived based on criteria of functional approximation. When these models are compared, it is better to compare corresponding optimal reduced models chosen by the criteria of variable selection. Certain criteria are needed to select an optimal reduced model from a full model. Such criteria are established on practical needs and mathematical theory. Moreover, implementation of such criteria is computationally intensive. Suppose that X = ( X 1 ,…, X n ), β = (β1 ,…, βn )τ X P = ( X 1 ,…, X p ), X R = ( X p + 1 ,…, X n ) β P = (β1 ,…, β p )τ , β R = (β p +1 ,…, βn )τ Let Y = X β + e = X P βP + X R βR + e 2 e ∼ (0, σ I )
(3.108)
Y = X P βP + e 2 e ∼ ( X R βR , σ I )
(3.109)
be the full model and optimal reduced model, respectively.
© 2012 by Taylor & Francis Group, LLC
13 4
M e A suReM en t DAtA M o D eLIn G
Model selection is the same as variable selection (each column of X corresponds to a variable). Next, we discuss variable selection in three steps: consequences of variable selection, criteria of variable selection, and algorithms of variable selection. 3.4.1 Consequences of Variable Selection
Following the notations of Section 3.3, let A = ( X Pτ X P )−1 X Pτ X R
(3.110)
X RR = X R − X P A = ( I − H P ) X R
(3.111)
τ B = ( X RR X RR )−1 , β = ( X τ X )−1 X τ Y , β P = ( X Pτ X P )−1 X Pτ Y
(3.112) σ 2 =
Y − X β m−n
2
, σ 2 =
Y − X P β P m− p
2
(3.113)
β = (β τP , β τR )τ
where n and p are numbers of columns for X and XP, respectively. Using Lemma 3.4, we have the following theorem and its proof is left as an exercise. Theorem 3.11 Under the same assumptions as models (Equations 3.108 and 3.109), we have 1. E(β P ) = β P + Aβ R ; ∧
2. COV ( β P ) = σ 2 ( X Pτ X P )−1 + ABA τ ; 3. COV(β P ) = σ 2 ( X Pτ X P )−1 ; τ τ 4. β R = BX RR Y = β R + BX RR e;
5. COV( β R ) = σ 2 B.
© 2012 by Taylor & Francis Group, LLC
13 5
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
Theorem 3.11 shows that COV (β P ) ≤ COV (β P ). However, this does not mean β P is a better estimate than β P because β P is a biased estimate of βP and βP is an unbiased estimate. Next, we introduce the mean square error matrix as a rational standard of evaluating parameter estimates.
Definition 3.3 Suppose that θ is an estimate of unknown parameter θ, the mean square error matrix of θ is defined as MSEM(θ ) = E(θ − θ)(θ − θ)τ
(3.114)
To analyze further the consequences of variable selection, we have the following lemma. lemma 3.5 Suppose A is a symmetric, positive square matrix of rank k, x∈R k, a > 0. Then, aA ≥ xx τ ⇔ x τ A −1x ≤ a
(3.115)
Proof When x = 0, Equation 3.115 is obvious. When x ≠ 0, suppose aA ≥ xx τ . We have aA − xx τ as a nonnegative square matrix, namely, ∀y ∈ R k, y τ (aA − xx τ ) y ≥ 0 In particular, let y = A −1 x (since A is positive and A −1 is also positive). Then x τ A −1 (aA − xx τ ) A −1x = ax τ A −1x − x τ A −1x ⋅ x τ A −1x ≥ 0 So ax τ A −1x ≥ (x τ A −1x)2. It follows that a ≥ x τ A −1x (since x ≠ 0 and A −1 is positive, x τ A −1x > 0).
© 2012 by Taylor & Francis Group, LLC
13 6
M e A suReM en t DAtA M o D eLIn G
Conversely, when a ≥ x τ A −1x, then ∀u ∈ R k, 2
(x τ u )2 = (x τ A −1/ 2 )( A 1/ 2u ) ≤ A −1/ 2 x
2
2
A 1/ 2u (Cauchy inequalitty)
= x τ A −1/ 2 A −1/ 2 x ⋅ u τ A 1/ 2 A 1/ 2u = x τ A −1x ⋅ u τ Au ≤ u τ aAu
∀u ∈ R k, we have u τ xx τu ≤ u τ aAu, namely u τ (aA − xx τ )u ≥ 0
(3.116)
Since u is arbitrary, we have aA ≥ xx τ Theorem 3.17 Under the same assumptions as models (Equations 3.108 and 3.109), E X β − X β
2
= nσ 2
2 E X P β P − X β = X RR β R
2
(3.117) + pσ 2
(3.118)
In particular, X RR β R
2
≤ σ 2 ⇔ COV(β R ) = σ 2 B ≥ β R β τR
(3.119)
X RR β R
2
≤ σ 2 ⇒ COV (β P ) ≥ MSEM(β P )
(3.120)
Proof Equation 3.117 has been proven in section 3.2. Since β P = ( X Pτ X P )−1 X Pτ Y = ( X Pτ X P )−1 X Pτ ( X P β P + X R β R + e ) = β P + ( X Pτ X P )−1 X Pτ X R β R + ( X Pτ X P )−1 X Pτ e = β P + Aβ R + ( X Pτ X P )−1 X Pτ e where A = ( X ′P X P )−1 X ′P X R, we have
© 2012 by Taylor & Francis Group, LLC
13 7
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
X P β P − X β = ( H P − I ) X R β R + H P e = − X RR β R + H P e
(3.121)
Thus, E X P β P − X β
2
= E − X RR β R + H P e
2
= X RR β R
2
+ E(e τ H P e )
= X RR β R
2
+ pσ 2
Equation 3.118 holds. Using Theorem 3.11, we know τ β R = β R + BX RR e,
E β R = β R
So τ COV ( β R ) = E( BX RR ee τ X RR B τ ) = σ 2 BB −1 B τ = σ 2 B τ = σ 2 B τ Since B = ( X RR X RR )−1, using Lemma 3.5, τ σ 2 ≥ β τR X RR X RR β R = β τR B −1β R ⇔ σ 2 B ≥ β R β τR
(3.122)
Using MSEM(β P ) = E(β P − β P )(β P − β P )τ = E( Aβ R + ( X Pτ X P )−1 X Pτ e )( Aβ R + ( X Pτ X P )−1 X Pτ e )τ = Aβ R β τR A τ + ( X Pτ X P )−1 X Pτ E(ee τ ) X P ( X Pτ X P )−1 = Aβ R β τR A τ + σ 2 ( X Pτ X P )−1 ∧
Theorem 3.11 shows that COV(β P ) = σ 2 ABA τ + ( X Pτ X P )−1 . Then, ∧
COV (β P ) ≥ MSEM(β P ) ⇔ σ 2 ABA τ ≥ Aβ R β τR A τ (3.123)
© 2012 by Taylor & Francis Group, LLC
13 8
M e A suReM en t DAtA M o D eLIn G 2
When ∙XRR βR∙ ≤ σ2, β τR B −1β R ≤ σ 2 , using Lemma 3.5, σ 2 B ≥ β R β τR . So σ 2 ABA τ ≥ Aβ R β τR A τ Equation 3.120 is proved. 2 2 Theorem 3.12 shows that when ∙XRR βR∙ = ∙ (I − HP)XRβR ∙ is small (≤σ2), β P (the least squares estimate from the optimal reduced model) is a better estimate of βP than β P . Moreover, X P β P is a better estimate of Xβ than X β since nσ2 ≥ pσ2 + σ2. Therefore, if there exist some variables XR such that (I − HP)XRβR is very small, the reduced model derived from the full model by eliminating XRβR is better than the full model in terms of parameter estimation as well as estimation of true signals. Remark 3.4 1. It is important to point out that it is for the sake of convenience to let XP stand for the first p columns and XR for the last n-p columns. In practice, columns in XP and XR are usually determined by the analysis of engineering background and some mathematical criteria. 2. When comparing regression models based on different basis functions in processing dynamic measurement data, we should compare optimal reduced models rather than full models. The comparison methods are given in Sections 3.1 and 3.2. 3.4.2 Criteria of Variable Selection
Section 3.4.1 shows that it is necessary to eliminate variables that are uncorrelated or have little correlation with the model. The criteria to identify variables to be discarded are needed. There are several criteria of variable selection [1,10]. We only discuss Cp criterion in the following because it has a close relationship with processing dynamic measurement data. If the purpose of variable selection is to find the best estimation of Xβ, − X β 2 = pσ 2 + X β E X Pβ P RR R
© 2012 by Taylor & Francis Group, LLC
2
= min
(3.124)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
13 9
is a criterion. Unfortunately, since σ2, β are unknown in Equation 3.124, we cannot select variables directly. However, we can use their estimates 2 Y − X β RSS 2 and β = ( X τ X )−1 X τY σ = = m−n m−n
(3.125)
instead. Using Theorem 3.10, we have τ τ β R = β R + ( X RR X RR )−1 X RR e,
E X RR β R
2
= X RR β R
2
+ (n − p )σ 2
(3.126)
Hence, we construct the following statistics: J P = pσ 2 + X RR β R = X RR β R
2
2
− (n − p )σ 2
+ ( 2 p − n)σ 2
(3.127)
− X β 2 = min holds. If JP reaches the minimal value, then E X P β P Using to the intensive computation of JP, we introduce a statistic that is easy to calculate CP =
Y − X p β P
2
σ 2
+ 2p − m
(3.128)
lemma 3.6 Using Equation 3.128, J C P = P2 = σ
X RR β R σ 2
2
+ ( 2 p − n)
(3.129)
Proof Note that X τ (Y − X β ) = X τY − X τ X ( X τ X )−1 X τY = 0, we have
© 2012 by Taylor & Francis Group, LLC
14 0
M e A suReM en t DAtA M o D eLIn G
X Pτ (Y − X β ) = 0 Y − X pβ P
2
2 = Y − X β + X β − X pβ P
2
(3.130)
Since
β P = ( X Pτ X p )−1 X Pτ Y = ( X Pτ X p )−1 X Pτ (Y − X β + X β ) = ( X Pτ X p )−1 X Pτ ( X p β p + X R β R ) = β p + Aβ R
So
X β − X p β P = X p β P + X R β R − X p β P − X p Aβ R = X R β R − H P X R β R = ( I − H P ) X R β R = X RRβR
and
X β − X p β P
2
= X RR β R
2
Therefore, Y − X p β P
2
= Y − X β + X RR β R 2
2
(3.131)
Using Equations 3.130, 3.125, and 3.128, it is straightforward to get Equation 3.129. To study the statistical properties of CP further, we need the following lemma. lemma 3.7 If H is an idempotent symmetric matrix of rank m (H τ = H, H 2 = H), e ~ N(0, σ 2 Im), the eigenvalues of H can only be 0 or 1. tr( H ) = rank( H )
(3.132)
e τ He χ 2 ( tr( H )) σ2
(3.133)
Proof Since H τ = H, H 2 = H, H = H τH, H is a symmetric nonnegative definite matrix whose eigenvalues are all nonnegative.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
141
Since there exists a positive definite matrix Q satisfying H = QDQ τ , D = diag(d 1 , d 2 ,…, d m ) where di (i = 1, 2, . . ., m) are the real eigenvalues of H. Since H 2 = H, we have (QDQ τ )(QDQ τ ) = QDQ τ
that is,
QD 2Q τ = QDQ τ Therefore, D 2 = D, d i2 = d i (i = 1, 2,..., m), which means di is either 0 or 1. Note that tr(UV) = tr(VU); we have tr( H ) = tr(QDQ τ ) = tr( DQ τQ ) =
∑d
i
rank( H ) = rank( D ) Since D = diag(d1, d2, …, dm), and di can only be 0 or 1, Equation 3.132 is proved. Note that e ~ N(0, σ2 Im). Let η = Q τ e. Then, η ~ N(0, σ2 Im). Therefore, e τ He ητ D η = = σ2 σ2
∑d η
2 i
i
σ
2
2 χ ( tr( H ))
lemma 3.8 If e ~ N(0, σ2 Im), then τ
τ X RR ( X RR X RR )−1 X RR e 2 σ
2 2 ∼ X (n − p )
Proof Note that τ τ rank( X RR ) = n − p, H = X RR ( X RR X RR )−1 X RR
© 2012 by Taylor & Francis Group, LLC
14 2
M e A suReM en t DAtA M o D eLIn G
Since H is an idempotent symmetrical matrix, τ τ tr( H ) = tr( X RR ( X RR X RR )−1 X RR )=n− p
Then, using Lemma 3.7, Equation 3.65 holds. lemma 3.9 If e ~ N(0, σ2 Im), then
RSS χ 2 (m − n). 2 σ
Proof Note that Y = Xβ + e RSS = Y − X ( X τ X )−1 X τ Y
2
2
= ( X β + e ) − X ( X τ X )−1 X τ ( X β + e ) = [ I m − X ( X τ X )−1 X τ ]e
2
= e τ [ I m − X ( X τ ′ X )−1 X τ ]e Since Im − X(X τ X)−1 X τ is an idempotent symmetric matrix of rank m−n, Lemma 3.9 follows Lemma 3.7 straightforwardly. Using Lemmas 3.8, 3.9, and 3.2, we can prove Theorem 3.7. The proof is left as an exercise. Statistic CP has the following properties. Theorem 3.13 Under the same assumptions of models 3.109 and 3.110, if e ~ N(0, σ2Im), we have (m − n) X RR β R 2(n − p ) E(C P ) = p + + m−n−2 (m − n − 2)σ 2
2
4 X RR β R (m − n)2 2(n − p ) + Var(C P ) ≥ 2 σ2 (m − n − 2)
© 2012 by Taylor & Francis Group, LLC
(3.134) 2
(3.135)
14 3
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
Proof Let τ V = X RR ( X RR X RR )
−1 2
τ , α = σ −1 ( X RR X RR )1 / 2 β R , η = σ −1V τ e
Then V τV = I n − p (orthogonal ), η ∼ N(0, I n − p ), α
2
τ X RR β R 1 β τR X RR = 2 X RR β R σ2 σ
=
2
Using Equation 3.94, we have 1 X RR β R σ2
2
=
2 2 1 τ −1 τ β X X X X X e + ( ) = Vα +Vη RR R RR RR RR RR 2 σ
= V (α + η) τ
2
τ
= (α + η) V V (α + η) =
n− p
∑ (α i =1
i
+ ηi )2
n− p X RRβ R ∼ χ n − p, α i2 = χ 2 n − p, σ2 i =1 2
∑
2
By a property of the noncentral χ2 distribution in Appendix 2, we know that 2 1 1 E 2 X RR β R = 2 X RR β R σ σ
2
+n− p
2 4 1 Var 2 X RR β R = 2(n − p ) + 2 X RR β R σ σ
(3.136)
2
(3.137)
RSS Using Lemma 3.9 we know 2 ∼ χ 2 (m − n), the density funcσ tion of χ2(m) is
© 2012 by Taylor & Francis Group, LLC
14 4
M e A suReM en t DAtA M o D eLIn G
− m2 −1 m − x2 m2 −1 2 Γ e x , x > 0 2 f (x ) = 0, x≤0 m −n − σ2 m − n 2 Γ −1 E = 2 2 RSS
=2 =
−
m −n 2
+∞
∫
0
m − n 2 Γ −1 2
−
x
x −1e 2 x
m −n −1 2
m −n −1 2
dx
m − n − 1 Γ 2
1 m−n−2
(3.138)
Similarly, 2
m −n − σ2 m − n E = 2 2 Γ −1 2 RSS
=
∫
+∞
0
−
−2
x 2
x e x
m −n −1 2
1 (m − n − 2)(m − n − 4)
dx
(3.139)
Using Lemma 3.6, X RR β R CP = 2 p − n + σ 2 = 2 p − n + (m − n)
2
X RR β R
2
σ2
σ2 RSS
Using Lemma 3.2 we know that β is independent of RSS. So we have Equations 3.136 and 3.138.
X RR β R E(C p ) = 2 p − n + (m − n)E σ2
2
1 = 2 p − n + (m − n) 2 X RR β R σ
σ2 E RSS 2
1 + n − p . m−n−2
(m − n) X RR β R 2(n − p ) = p+ + m−n−2 (m − n − 2)σ 2
© 2012 by Taylor & Francis Group, LLC
2
14 5
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
X RR β R Var(C p ) = (m − n) E RSS
2
2 X RR β R − E RSS
=
(m − n) 1 E X RR β R 4 (m − n − 2)(m − n − 4) σ
=
(m − n)2 m − n − 2 1 E X RR β R (m − n − 2) m − n − 4 σ 4
2
X RR β R (m − n)2 Var ≥ σ2 (m − n − 2)2 =
(m − n)2 (m − n − 2)2
4
2
− (m − n)2
4
−
2
2
(E
2
X RR β R
2
)
2
(m − n − 2)2 σ 4
(
1 E X RR β R 4 σ
2
) 2
2 4 2(n − p ) + σ 2 X RR β R
Remark 3.5 1. Theorem 3.13 shows that CP is not an unbiased estimation of 2 E X P β P − X β X RR β R = + p 2 2 σ σ
However, since m >> n in processing dynamic measurement data, we have E(C p ) ≈ p +
X RR β R
2
σ2
(3.140)
2. Furthermore, when || XRRβR ||2 ≤ σ2, E(CP) ≤ p + 1 when XRβR is eliminated. 3. When n − p (the number of variables eliminated) or
X RR β R
2
σ2 is relatively large, Var(CP) is also large. CP can be very different
© 2012 by Taylor & Francis Group, LLC
14 6
M e A suReM en t DAtA M o D eLIn G
from E X P β P − X β
2
σ2.
Therefore, optimal reduced models
2 selected by CP = min and E X P β P − X β = min may be very different. This is a situation that needs particular attention when CP criterion is applied. Reference 11 discusses such issues in detail and proposes some improvements on CP statistic. When all variables in the full model have either very close relation or almost no relation to the model, using CP statistic as a criterion to select an optimal reduced model is the most effective.
In particular, Var(CP) becomes large as n − p increases. Other statistics for variable selection have the same issue. Therefore, it is important to provide a precise full model. If relevant background shows that some variables have no relationship or are negligible to the model, such variables should not be included in the model. In general, the number of variables eliminated, that is, n − p should be <3 or at most 2n . 5, and p ≥ 3 Using Theorem 3.13 and Equation 3.120, ||XRRβR||2 ≤ σ2, we can determine the optimal reduced model by the following criterion. CP criterion: A criterion to select an optimal reduced model 3.109 from the full model 3.108 is C P ≤ p + 1 C = min P
(3.141)
3.4.3 Fast Algorithms to Select Optimal Reduced Regression Model
If there are n variables in a full model 3.108, an optimal reduced model has to be chosen from 2n reduced models. When n is large, the selection is computational intensive and fast algorithms are needed. Detailed analysis of the modeling process is very helpful for the selection of optimal reduced models. Some worthy notes are as follows: 1. Based on engineering background and mathematical knowledge, variables that are empirically identified as having little impact on the model should be removed from the model. For example, when high order polynomials are used to
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
147
approximate function f (t), that is, f (t) = a0T0(t) + a1T1(t) + … + anTn(t), the high-order term Ti(t) has little impact on the model and should be eliminated. 2. If the optimal reduced model contains p variables, the number of variables eliminated n − p should be small or at most 5 and p ≥ 2n / 3 based on practical requirements and CP criterion. On the other hand, Equation 3.135 shows that Var(CP) becomes large as n − p increases (Var(CP) > 2(n − p)). If n − p is too large, CP criterion is ineffective. 3. When n ≤ 20, an all-subset selection based on the CP criterion (3.141) is computationally feasible. i. Eliminate the redundant variables. The two lemmas given below identify redundant variables. Removing redundant variables reduces the workload of computation in optimal selection of regression models. It can be observed that the elimination process is sequential, is convenient for manual intervention, and thus, can be facilitated easily with engineering analysis. lemma 3.10 Suppose that XR has n − p columns and X RR β R
2
≤ (n − p )σ 2
(3.142)
Then, CP ≤ p
(3.143)
holds for optimal reduced models if n − p variables are eliminated. Proof Using Lemma 3.6, X RRβ R J C P = P2 = σ σ 2
© 2012 by Taylor & Francis Group, LLC
2
+( 2 p − n) ≤ (n − p ) + ( 2 p − n) = p
14 8
M e A suReM en t DAtA M o D eLIn G
lemma 3.11 Let S = (sij)n×m = (X τ X)−1 and β i
2
≤ sii σ 2
(3.144)
If the ith variable is eliminated, the optimal reduced model satisfies CP ≤ n − 1
(3.145)
Proof Without loss of generality, let XR = Xn. Then X RR β R
2
τ −1 X RR = β n2 B −1 = β n2 snn = β n2 X RR
(3.146)
where the third equality of equation (3.146) holds because Lemma 3.4 shows that B = [(Xτ X)−1]nn = snn. 2 2 If β ≤ s σ 2 , then X β ≤ σ 2 . n
nm
RR R
Using Lemma 3.7, n − p = 1, so
CP ≤ p = n − 1 Inequality 3.144 shows that, if there are several variables satisfying Equation 3.144, the jth variable should be eliminated such that CP attains minimum where β j
2
s jj
= min i
β i
sii
2
(3.147)
In summary, we have the following algorithm to eliminate a redundant variable sequentially: 1. {1, 2, . . ., n} ⇒ P , n ⇒ p 2. β P = ( X Pτ X P )−1 X Pτ Y , β P ⇒ β,( X Pτ X P )−1 ⇒ S = ( sij ) p × p
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
14 9
2 β i 3. Ti = , i ∈ P ,T j = min Ti i ∈P sii 4. When CP\{ j} > p − 1, turn to step (5). Otherwise, let P \{ j } ⇒ P , p − 1 ⇒ p . Then turn to (2)
5. End
ii. Determine essential variables. For convenience, models after eliminating redundant variables are called full models. P, R are sets of subscripts of X P, XR , respectively. lemma 3.12 Let X, be X’s ith column, i ∈ P , i ∈ R, sii be the ith diagonal element of S = (X τ X)−1, and 2 β i Ti = sii σ 2
Then, X RRβ R
2
(3.148)
≥ Ti σ 2 .
Proof We use the method of proof by contradiction. If 2 = sii−1 β i X RR β R < Ti σ
2
that is, τ X RR β R < sii−1 β i β τR X RR
2
(3.149)
Using Lemma 3.5, β R β Rτ < sii−1 β i
2
(X
τ RR
X RR
)
−1
thus, β i
2
2 < sii−1 β i sii = β i
which is self-contradictory.
© 2012 by Taylor & Francis Group, LLC
2
(3.150)
15 0
M e A suReM en t DAtA M o D eLIn G
Theorem 3.14 For full model 3.108, suppose that the number of variables in an optimal reduced model is l and (3.151)
Ti ≥ 2(n − l ) Then, Xi ’s corresponding Ti ’s are essential variables. Proof Since CP =
Y − X pβ P σ 2
2
X RRβ R + 2p − m = σ 2
2
+2 p − n
C p = m − n + 2n − m = n, and Ti ≥ 2(n − l ), i ∈ R when p = n Furthermore, since, Ti ≥ 2(n − l), i ∈ R, we have X RR β R CP = + 2 p − n ≥ Ti + 2 p − n ≥ 2(n − l ) + 2 p − n σ 2 = n + 2( p − l ) > n 2
Therefore, Cp > n holds for reduced models without Xi. Such models cannot be optimal because all optimal models satisfy Cp = min, Cp ≤ p + 1. iii. Regression on all subset of variables We have discussed the elimination of the redundant variables and the determination of the necessary variables. After such an elimination and determination, the number of variable to be determined can be greatly reduced. For convenience, we denote the full model as Y = Z B α B + ZC αC + Z D α D + e , e ∼ N(0, σ 2 I m )
(3.152)
where ZB, ZD are essential and redundant variables. After eliminating redundant variables, model 3.152 becomes
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
Y = Z B α B + ZC αC + e , e ∼ N(0, σ 2 I m )
151
(3.153)
Obviously, model (Equation 3.153) can take the place of model 3.152 and it has fewer independent variables than model (Equation 3.152). The optimal reduced model from model (Equation 3.153) contains all columns of ZB and some columns of ZC . Only model (Equation 3.153) is considered next. Let Z P = (Z B , Z 2 P ), ZC = (Z 2 P , Z R ), Z = (Z B , ZC ) = (Z P , Z R ), αB α 2P αP , = , = α α αP = α α C α 2P R R Then, model (Equation 3.153) can be rewritten as Y = Z B α B + ZC αC + e = (Z B
αB Z2P ) − Z 2 P α 2 P + (Z 2 P α 2P
α 2P ZR ) +e αR (3.154)
= Z P α P + Z R α R + e , e ∼ N(0, σ 2 I m )
If rank (Z) = n or matrix Z is full column rank, ZB, ZC , Z2P are all full column rank. Let rank(Z B ) = p1 , rank(Z 2 P ) = q
D
rank(Z P ) = rank(Z B ) + rank(Z 2 P ) = p1 + q = p Suppose that the optimal reduced model is Y = Z P α P + e , e ∼ N(Z R α R , σ 2 I m )
(3.155) 2
Let α = (Z τZ )−1 Z τY , α P = (Z Pτ Z P )−1 Z Pτ Y , σ 2 = Y − Z α /(m − n) and
RSSP = Y − Z P α P
© 2012 by Taylor & Francis Group, LLC
2
= Y − Z P (Z τp Z P )−1 Z Pτ Y
2
2
= ( I − H P )Y .
15 2
M e A suReM en t DAtA M o D eLIn G
Then, CP =
RSSP + 2p − m σ 2
(3.156)
Let H B = Z B (Z Bτ Z B )−1 Z Bτ
(3.157)
X P = ( I − H B )Z 2 P , X R = ( I − H B )Z R
(3.158)
βP = α 2 P , βR = α P
(3.159)
Y = ( I − H B )Y , β P = ( X Pτ X P )−1 X Pτ Y
(3.160)
F = (Z Bτ Z B )−1 Z Bτ Z 2 P , G = ( X Pτ X P )−1
(3.161)
Using Lemma 3.4, we have (Z Bτ Z B )−1 + FGF τ (Z Pτ Z P )−1 = −GF τ
− FG G
(3.162)
lemma 3.13 Using Equations 3.157 through 3.161 RSSP = Y − Z P α P
2
= Y − H P Y
2
(3.163)
where H P = X P ( X Pτ X P )−1 X Pτ . Proof Since Z Pτ (Y − H P Y ) = Z Pτ (Y − Z P (Z Pτ Z P )−1 ZPτ Y ) = 0 , we have Y
2
= Y − HPY + HPY
© 2012 by Taylor & Francis Group, LLC
2
= ( I − H P )Y
2
+ HPY
2
15 3
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
Then, RSSP = Y − Z P α P
2
= ( I − H P )Y
2
= Y
2
− HPY
2
(3.164)
Since (Z Bτ Z B )−1 + FGF τ Z Bτ Y − FGZ 2τP Y = (Z Bτ Z B )−1 Z Bτ Y + FG (Z 2τP H B − Z 2τP )Y = (Z Bτ Z B )−1 Z Bτ Y − FGX Pτ Y
(3.165)
−GFZ Bτ Y + GZ 2τP Y = G (Z 2τP − Z 2τP H B )Y = GX Pτ Y (3.166) H P Y = Z P (Z Pτ Z P )−1 Z Pτ Y = (Z B
(Z Bτ Z B )−1 + FGF τ Z2P ) −GF τ
− FG G
= (Z B
(Z Bτ Z B )−1 Z Bτ Y − FGX Pτ Y Z2P ) GX Pτ Y
Z Bτ Y τ Z 2 P Y
we have = Z B (Z Bτ Z B )−1 Z Bτ Y − Z B FGX Pτ Y + Z 2 P GX Pτ Y = H B Y − Z B FGX Pτ Y + Z 2PP GX Pτ Y = H B Y + X P GX Pτ Y = H B Y + X P ( X Pτ X P )−1 X Pτ Y = H B Y + H X P Y
(3.167)
Using Y
2
= Y − H BY
2
= Y
2
− H BY
2
(3.168)
we have Y − H X P Y Similarly, Y
© 2012 by Taylor & Francis Group, LLC
2
2
2
= Y
= Y − H BY + H BY
2
− H X P Y
2
= Y − H BY
(3.169) 2
+ H BY
2
15 4
M e A suReM en t DAtA M o D eLIn G
Therefore, RSSP = Y
2
− HPY
2
= Y
= Y
2
− H BY
2
− H XP Y
= Y
2
− H XP Y
2
2
− H BY + H XP Y
2
2
= Y − H X P Y
2
(3.170)
because H X P Y
2
H X P ( I − H B )Y
2
= H XP Y − H XP H BY
2
= H XP Y
Note that Y − H P Y RSSP CP = + 2 p − m 2 σ 2 σ
2
+ 2 p1 + 2q − m (3.171)
The full model is Y = ZB α B + Z2 P α 2 P + ZR α R + e
(3.172)
and
(
H B Y = Z B Z Bτ Z B
)
−1
Z Bτ Y = Z B α B + H B Z 2 P α 2 P + H B Z R α R + H B e
Hence, ( I − H B )Y = ( I − H B )Z 2 P α 2 P + ( I − H B )Z R α R + ( I − H B )e that is, Y = ( I − H B )Y = X P α 2 P + X R α R + e X P β P + X R β R + e = X β + e where X = (XP
X R ) = ( I − H B )(Z 2 P
= ( I − H B )e
© 2012 by Taylor & Francis Group, LLC
Z R ) = ( I − H B )ZC , e
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
15 5
rank( X ) = rank(( I − H B )ZC ) = n − p1 rank( X P ) = rank(( I − H B )Z 2 P ) = rank(Z 2 P ) = q rank( X R ) = rank(( I − H B )Z R ) = rank(Z R ) = n − p1 − q = n − p Let the optimal reduced model from the full model be Y = X P β P + e *
(3.173)
Then we have the following theorem. Theorem 3.15 Selecting an optimal reduced model Y = Z P α P + e from Y = ZBαB + ZCαC + e is equivalent to finding an XP from Y = X P βP + X R βR + (1 − H B )e such that Y − H P Y σ 2
2
+ 2 p1 + 2q − m = min
(3.174)
Therefore, the problem of model selection essentially becomes a problem of calculating residual sum of squares. Since X has n – p1 columns, there are 2n-p1 possible combinations and it is a tedious task to calculate the residual sum of square for each combination. Many scholars have done a lot of research on such issues. The work of Furnival and Wilson [1] introduced next a scanning algorithm to solve linear equations and designed a lexicographical order to lessen computational workload which will be discussed next. Definition 3.4 Suppose A = (aij)m×n, aij ≠ 0. Let bij = aij−1 aij a ji , b ji = , j ≠i bij = a a ii ii bkl = akl − ail aki aii−1 , k ≠ i , l ≠ i B = (bij )m × n
© 2012 by Taylor & Francis Group, LLC
15 6
M e A suReM en t DAtA M o D eLIn G
The transformation from A to B is called an S operation at pivot aii. Let B = Si A. It is easy to prove that SiSi A = A, Si Sj A = SjSi A
Let
N = n − p1 = 1, A = (aij ) N × N
X T Y Y T Y
XT X = T Y X
(3.175)
Theorem 3.16 Suppose that XP consists of columns i1,i2,…,iq in X. Let B = Si1, Si2, . . ., Siq A. Then,
(
)
−1
1. X Pτ X P is the submatrix consisting of entries of B at rows i1, i2, . . ., iq and columns i1,i2,…, iq.
(
2. B = (bij ) N × N ,(bi1N , bi2 N , . . ., biq N )τ = X Pτ X P 2 3. bNN = Y − H X r Y .
)
−1
X Pτ Y .
For proof, see Ref. 1. 3.4.4 Summary
Sections 3.2 and 3.3 discussed modeling issues in processing dynamical measurement data. When we transform a problem in dynamical measurement data processing into a problem in regression analysis, the fewer the number of parameters to be estimated in the regression model the better. The aim of this section is to achieve such a goal. For a given set of basis functions, the number of parameters to be estimated is further reduced by variable selection. Although the discussion in this section focuses on linear regression model only, many criteria only relate to residual sum of squares and the number of parameters to be estimated. Moreover, many nonlinear models can be approximated by linear models through iterative
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
15 7
processing. Therefore, except the all-subset selection algorithm in 3.4.3, most methods introduced in this section are also applicable to nonlinear models. 3.5 Biased Estimation in Linear Regression Models 3.5.1 Introduction
A lot of study has been done in the linear regression model Y = X β + e , X = X m × n , e ∼ N(0,σ 2 I )
(3.176)
(see Refs. 1–3, 8–17, 18–23). Section 3.2.4 shows that LS estimators are inefficient when matrix X τ X is ill-posed. In such a case, methods of biased estimation are popularly used. Currently, commonly used methods of biased estimation are ridge estimation, generalized ridge estimation, principal component estimation, improved principal component estimation, etc. However, biased estimates need to be applied with practical experience in many cases [1]. In particular, there are cases where estimation results are different for the same model and data when different methods of biased estimation are applied [2]. Because of this, applications of methods of biased estimation are limited. Ref. 8 gives detailed summary and analysis of bias estimation, proposes some new ideas, and provides numerical examples. The following discussion is from Ref. 8. Definition 3.5 Suppose that β , β are unbiased least squares estimates and biased estimate of parameter β in model 3.176, 2 2 ρ = E β − β E β − β
is called the estimation efficiency of a biased estimate β of β. What are the differences and relations among biased estimates? How to select methods of biased estimation? How to improve estimation efficiencies of biased estimates? These are questions to be answered in the application of biased estimates.
© 2012 by Taylor & Francis Group, LLC
15 8
M e A suReM en t DAtA M o D eLIn G
It is a problem worthy of study to construct a scale factor for a linear model such that a biased estimation of the linear model achieves high efficiency. Suppose Q is a known nonsingular matrix. Let Z = XQ , α = Q −1β
(3.177)
Then model 3.176 can be rewritten as Y = Z α + e , e ∼ N(0,σ 2 I )
(3.178)
by If the estimate of α can be obtained by Equation 3.178, say α, Equation 3.177 β = Qα is an estimate of β. If α is the least squares. estimate of α, then, no matter what value Q takes, β = Qα ≡ ( X τ X )−1 X τ Y
If α is a biased estimate of α, β = Qα is dependent on Q. How to choose Q such that β has relatively high estimation efficiency? This section focuses on the construction of biased estimates of compression type, clarifies differences and relations among such biased estimates, and proposes new methods of determining ridge parameters and new concepts of principal components. In order to improve the efficiency of biased estimation, scale factors are used to normalize the models. The construction and effect scale factors are also discussed. 3.5.2 Biased Estimates of Compression Type
Next, we discuss the estimation of regression coefficient β’s in model 3.176. Let X τ X = P ΛP τ , V = XP Λ = diag(λ 1 , λ 2 , . . ., λ n ), P = ( P1 , P2 , . . ., Pn ) τ τ τ −1 τ θ = P β, u = P ( X X ) X e β = ( X τ X )−1 X τ Y , θ = P τ β = θ + u
© 2012 by Taylor & Francis Group, LLC
(3.179)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
15 9
where 0 < λ1 ≤ λ2 ≤ … ≤ λn are the eigenvalues of X τ X, and P is an orthogonal matrix. Using the same notations as those in Equation 3.179, the canonical regression model of model 3.176 is Y = V θ + e , V τV = Λ, e ∼ N(0,σ 2 I )
(3.180)
lemma 3.14 For linear models 3.176 and 3.177, u = Λ −1V τ e , u ∼ N(0,σ 2 Λ −1 ) X θi Pi 2 = λ i θi2 2 2 Xβ = V θ =
n
∑ i =1
(3.181)
(3.182)
λ i θi2
E θ i − θi 2 = σ 2 λ i−1 2 2 E λ i θi θi −θi = min E r θ i − θi r λ i θi2 + σ 2
2
(3.183)
For convenience, β is expressed as β=
n
∑θ P i =1
i i
(3.184)
The estimation of β’s becomes the estimation of θ’s. Using Lemma 3.14, we have the following theorem. Theorem 3.17 The ridge, generalized ridge, principal, and improved principal component estimates of the regression coefficients in linear regression model 3.176 can be written in the following form:
© 2012 by Taylor & Francis Group, LLC
16 0
M e A suReM en t DAtA M o D eLIn G
β =
n
∑ k (θ i =1
i
i
+ ui ) Pi
(3.185)
where 0 ≤ ki ≤ 1 (i = 1, 2, . . ., n) are called the “compression coefficients.” In particular, when k (i = 1,2, . . ., n) are kiR =
kiPC
λi λ i θi2 , kiGR = λi + k λ i θi2 + σ 2
0, i ≤ r (λ i θ 2 ≤ σ 2 ) 0, λ i θi2 < σ 2 MPC = = , ki 2 2 2 2 1, λ i θi ≥ σ 1, i > r (λ i θ > σ )
the four biased estimates are obtained, respectively Note that 0 ≤ ki (i = 1,2,…,n). We have β ≤ β
(3.186)
Therefore, estimates of type in Equation 3.185 are called β’s compression-type biased estimation. The difference of four biased estimates in Theorem 3.17 lies in different values of ki ’s. Values of ki ’s in ridge and principal component estimates are determined by λi ‘s. 2 When λi is relatively small, E θi − θi = σ 2 λ i−1 is large. Generalized ridge and improved principal component estimates use λ i θi−1 to determine ki ’s. When λ i θi−1 is relatively small, || XθiPi || is relatively small too. Since Eui2 = σ 2 λ i−1 has a relatively large proportion in θi + ui. Based on the analysis above, the constructions of ki ’s in all four compression-type biased estimation are reasonable. Definition 3.6 When λi θi−1 is relatively large (larger than σ2), β ’s component θiPi makes a large contribution to Xβ and θi takes a large proportion in θi + ui. Therefore, θiPi is called β’s principal component. Remark 3.6 Based on the previous analysis, compression-type biased estimates have a relatively high estimation efficiency when one of the following holds
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
161
θ12 ≤ θ 22 ≤ ≤ θn2
(3.187)
λ 1θ12 ≤ λ 2 θ 22 ≤ ≤ λ n θn2
(3.188)
In such a case, four biased estimates are consistent. Since λ1 ≤ λ2 ≤ … ≤ λn, it is better when inequality 3.187 holds. Moreover, the more the parameters on the left side of inequality 3.187 are smaller than those on the right side the better. In practice, since σ2 and θi2 (i = 1, 2, . . ., n) are unknown, σ 2 and 2 max{0, θi − σ 2 λ i−1 } are often used as estimates of σ2 and θi2 in calculating ki ’s . When m >> n, σ 2 is a precise estimate of σ2. However, the mean square error of using θi 2 − σ 2 λ i−1 as an estimate of θi2 is relatively large when λi is relatively small. 3.5.3 A New Method to Determine Ridge Parameters
Currently, the mostly widely used methods of biased estimation are ridge estimation and generalized ridge estimation. However, the key to determining ridge parameters is worth further study. Consider the following model: Y = X β + e , e ∼ N(0,σ 2 I )
(3.189)
where X = X m × n , rank( X ) = n, m − n ≥ 19 Let N = {1, 2, . . ., n}, CardM = the number of elements in set M (3.190) L = ( X τ X )1 / 2
(3.191) ∧
© 2012 by Taylor & Francis Group, LLC
α = P τ Lβ = Λ1 / 2 θ, α = P τ Lβ = Λ1 / 2 θ
(3.192)
S = P τ L( X τ X )−1 X τ e = Λ1 / 2u
(3.193)
16 2
M e A suReM en t DAtA M o D eLIn G
lemma 3.15 Suppose that e ~ N(0,σ2 Im). Using the notations in Equations 3.191 through 3.193, we have α = α + S , S ∼ N(0, σ 2 I m ) α
2
2
= Xβ , α − α
σ2
2
= X β
2
(3.195)
2
∼ χ 2 (n)
(3.196)
∼ F (1, m − n), i ∈ N
(3.197)
σ2 α i − α i
α
(3.194)
2
is determined, estimate of β’ s is determined If the estimate of, α, − 1 by β = L P α . The estimation of α is given as follows. Using Lemma 3.15,
{
}
2
P α i − α i < 3σ 2 ≥ 0.9, m − n ≥ 19, i ∈ N E α − α
2
= E(nσ 2 ) = nσ 2
(3.198) (3.199)
The solution α of the extreme value problem (Equation 3.200) is used as an estimate of α. n λ i−1α i2 = min i =1 2 2 α − α = n σ 2 2 α i − α i ≤ 3σ , i ∈ N
∑
The rationale of Equation 3.200 is as follows. Note that
© 2012 by Taylor & Francis Group, LLC
(3.200)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
β
2
≡
n
∑λ i =1
16 3
−1 i
α i2
2 2 E β = β + σ 2
n
∑λ i =1
−1 i
Since some λi ’s are small, the least squares estimate is not good 2 because ||β|| is large. It is expected that the length of estimate is as short as possible, which is the rationale of the first equation of Equation 3.200. The rationale of the second equation of Equation 3.200 is straightforward because E α − α
2
2 ) = nσ 2 = E(nσ
The rationale of the third equation of Equation 3.200 is Equation 3.197. Since α
2
= Xβ
2
>> e
2
> nσ 2
and E α
2
= α
2
+ nσ 2
we assume in the following discussion that
∑ α i ∈M
2 i
2n 2 > n σ , ∀M ⊂ N , Card M ≥ 3
(3.201)
lemma 3.16 Under the assumption of Equation 3.200, solutions to the extreme value problems (Equations 3.200 and 3.202) are the same n λ i−1α i2 = min i =1 α − α 2 − nσ 2 ≤ 0 α − α i 2 − 3σ 2 ≤ 0, i ∈ N
∑
© 2012 by Taylor & Francis Group, LLC
(3.202)
16 4
M e A suReM en t DAtA M o D eLIn G
Proof Let
z = {z1 , z2 , . . .,zn + 1 }, r = {r1 , r2 , . . ., rn + 1 } L(α, z, r ) =
n
∑λ i =1
+
−1 i
α i2 + rn + 1 ( α − α
n
∑r ( α i =1
i
i
− α i
2
2
− nσ 2 + zn2+ 1 )
− 3σ 2 + zi2 )
Using the method of Lagrange multiplier, the solution to Equation 3.202 is determined by the following four equations: αi =
ri + rn + 1 α i , i ∈ N λ + ri + rn + 1 −1 i
ri zi = 0, i = 1, 2,, n + 1 2 (α i − α i )2 + zi2 = 3 σ , i ∈ N
α − α
2
+ zn2+ 1 = n σ
2
(3.203) (3.204) (3.205) (3.206)
In order to prove Lemma 3.16, we only need to prove that zn2+ 1 = 0 in Equation 3.206. In fact, if zn2+ 1 ≠ 0, using Equation 3.204 rn+1 = 0, we have 2
i )2 = αi , i = 1, 2,, n (α i − α 1 + λ i ri Equations 3.201, 3.204, and 3.205 contradict Equation 3.206. Note that the objective function of extreme value problem (Equation 3.207) is strictly convex and the feasible region is also convex. Then, using Lemma 3.16 and Kuhn–Tucker theorem [30], we have the following theorem. Theorem 3.18 Under the assumption of Equation 3.201, the unique solution to extreme value problem (Equation 3.200) can be expressed as
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
α i =
ri + rn + 1 α i , i ∈ N λ + ri + rn + 1 −1 i
16 5
(3.207)
where r = {r 1, r 2, . . ., rn+1} is determined by the following equations: 2 i α ri ≥ 0, ≤ 3σ 2 , i ∈ N 1 + λ i (ri + rn + 1 ) 2 i α − 3σ 2 = 0, i ∈ N ri 1 + λ i (ri + rn + 1 ) 2 n i α rn + 1 ≥ 0, = nσ 2 λ 1 ( ) + + r r + 1 i i n i =1
(3.208)
∑
Note that the proof of Theorem 3.18 provides a process of finding r = {r 1, r 2, . . ., rn+1 }. Step 1. Solve equation
n
∑b i =1
i
= n σ 2 by the method of one-dimensional search,
and find rn+1 where 2 α i 2 3 σ 2 , 1 + λ r > 3 σ i n+1 bi = 2 2 α i α i 2 , ≤ 3σ 1 + λ i rn + 1 1 + λ i rn + 1
Step 2. After rn+1 is obtained, solve r = {r 1, r 2, . . ., rn+1 }, where 2 α i ≤ 3α 2 0, 1 + λ i rn + 1 ri = 2 2 αi −1 i α > 3σ 2 − 1 λ i − rn + 1 , 3σ 2 1 + λ i rn + 1
© 2012 by Taylor & Francis Group, LLC
16 6
M E A SUREM EN T DATA M O D ELIN G
Remark 3.7 Let Ri = (rn+1 + ri)−1 (i = 1, 2, . . ., n). Using Equations 3.191 and 3.192, β = L−1P∼α =
n
∑λ i =1
i
λi (θi + ui )Pi + Ri
(3.209)
β ’s in Equation 3.209 are the same as the generalized ridge estimate βGR’s in Theorem 3.17 and {R1, R 2, . . ., Rn} are the ridge parameters of the generalized ridge estimation. Equation 3.209 provides a new method of determining ridge parameters. Reference 1 recommends several methods of determining ridge parameters and all of them use (θi + ui )2 − σ 2 λi−1 2 2 as an estimate of θ or β − σ 2 i
n
∑λ i =1
−1 i
2
as an estimate of β . When
X τ X has some eigenvalues close to zero, these estimates in Reference 1 have relative large mean square errors. In contrast, determining ridge parameters using extreme value problem (3.200) does not depend on whether X τ X is ill-posed or not. Such a method is satisfactory in terms of numerical results shown in Section 3.5.5. 3.5.4 Scale Factors
Remark 3.6 shows that biased estimates have relatively high efficiencies when Equation 3.187 or 3.188 holds. A method of normalizing models using a scale factor Q such that Equations 3.187 and 3.188 hold is given next. First, we construct a proper scale factor Q. Let Z τZ = RDR τ ,
U = ZR
D = diag(d 1 , d 2 , , d n ) R = ( R1 , R2 , , Rn ) r = R τ α,
c = R τ (Z τZ )−1 Z τ e
α = (Z τZ )−1 Z τ Y ,
r = R τ α
where 0 < d1 ≤ d2 ≤ … ≤ dn are the eigenvalues of Zτ Z, and R is an orthogonal matrix.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
16 7
Remark 3.6 shows that estimates from model 3.178 have relatively high estimation efficiencies if the following inequalities hold. r12 ≤ r22 ≤ ≤ rn2
(3.210)
lemma 3.17 Suppose that ||β|| ≠ 0, rank(X) = n, L = (Xτ X)1/2. There exists an orthogonal matrix W such that ||Lβ||−1 Lβ is the nth column of W. Theorem 3.19 Suppose that 0 < η1 ≤ η2 ≤ . . . ≤ ηn ≤ 10−5 ηn are n given positive numbers and L and W are matrices in Lemma 3.17. Let Q = L −1Wdiag(η1, η2, . . ., ηn in Equation 3.177. Then Z τZ = diag( η12 , η22 ,…, ηn2 ) = D
(3.211)
R = I , I is a unit matrix
(3.212)
c = D −1 / 2W τ L−1 X τ e , e ∼ N(0, σ 2 D −1 )
(3.213)
r1 = r2 = = rn − 1 = 0, rn = ηn−1 Lβ
(3.214)
r = r + c = D −1 / 2W τ L−1 X τ Y
(3.215)
Proof Equations 3.211 and 3.212 hold by the constructions of Z, L, Q. Note that L2 = X τ X , Z τZ = Q τ L2Q r = α = Q −1β = D −1 / 2W τ Lβ W τ Lβ = Lβ (0,…, 0, 1)τ Equations 3.213 through 3.215 hold. Equation 3.214 in Theorem 3.19 shows that, when a proper matrix Q is chosen, Equation 3.210 holds.
© 2012 by Taylor & Francis Group, LLC
16 8
M e A suReM en t DAtA M o D eLIn G
Suppose that model 3.178 has been normalized by scale factor Q. Then, by Equation 3.178, the compression-type biased estimate of α’s is determined as follows: α =
n
∑ k (r + c )R i =1
i
i
i
(3.216)
i
where the four methods of constructing ki ’s in Theorem 3.17 provide four compression-type biased estimates of α’s. Since 0 < η12 ≤ ≤ ηn2 − 1 ≤ 10 −5 ηn2 , we have 0, i ≤ n − 1 kiR ≈ kiGR ≈ knPC = knMPC = 1, i = n
(3.217)
Equations 3.216 and 3.217 show that the four compression-type biased estimates are basically the same after model 3.178 is normalized by scale factor Q. Theorems 3.20 Suppose that model 3.178 is normalized from model 3.176 by scale factor Q in Theorem 3.178 and α = (rn + cn )Rn , β = Qα . Then E X β − X β
2
= σ2
(3.218)
2 2 −2 E β − β = β X β σ 2
(3.219)
Proof Since Q is a nonsingular matrix, Xβ = Zα = Zr, X β = Z α , and E X β − X β
2
= E Z α − Z α
2
= E diag( η1, η2 ,…, ηn )(α − α ) = E ηn cn
© 2012 by Taylor & Francis Group, LLC
2
= σ2
2
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
16 9
2 2 E β − β = E Q(α − α )
= E L−1Wn ηn cn = E L−1 Lβ = β
2
Lβ
= β
2
Xβ
−2
−1
2
Lβηn cn
2
σ2
−2
σ2
Remark 3.8 Recall that ||Xβ||2 >> ||β||2. Theorem 3.20 shows that using a scale factor Q to normalize the original model really improves efficiencies of biased estimates. we have By comparing β with the least squares estimate β, E X β − X β
2
= σ 2 << nσ 2
2 2 −2 E β − β = β X β σ 2 << σ 2
∑λ
−1 i
2 = E β − β
This shows that biased estimate β with a scale factor is very efficient in estimating both β and X β. Biased estimation methods have been widely applied in processing dynamical measurement data. Unfortunately, since parameter β is unknown in practical applications, it is hard to get an exact scale factor Q in Theorem 3.19. However, there is additional information on parameter β in many practical applications. Such additional information is useful to construct a second-best scale factor Q. Two common cases are given below. case 1: Suppose that β ~ N(0, σ2G). Let Q = G1/2. Then α = Q −1β, α ∼ N(0, σ 2 I ), r ∼ N(0, σ 2 I ) Er12 = Er12 = = Ern2 = σ 2
© 2012 by Taylor & Francis Group, LLC
(3.220)
17 0
M e A suReM en t DAtA M o D eLIn G
Using Remark 3.6, a biased estimate has a relatively high efficiency when it is from model (3.221)
Y = Z α + e , e ∼ N(0, σ 2 I ) case 2: If |βi| ≤ σi | (i = 1, 2, . . ., n) is known, then Q = diag(σ 1 , σ 2 ,…, σ n ) as a scale factor. 3.5.5 Numerical Examples
Three examples are given to demonstrate the roles the methods of determining ridge parameters and scale factors play. Example 3.6 We compare methods of estimating ridge parameters proposed in this section and those recommended by Ref. 1 and 3. Suppose β is the true value of the regression coefficient in model 3.176.
X τ X = P ΛP τ , Λ = diag(λ 1 , λ 2 , . . ., λ n ), θ = P τβ where P is an orthogonal matrix. Sections 3.5.2 and 3.5.3 show that estimating β is equivalent to estimating θ. All methods of biased estimation discussed in this section try to estimate θ accurately. Methods of biased estimation are used only when Xτ X has eigenvalues that are close to 0. In such a case, all estimation methods of ridge parameters in Ref. 1 and 3 have relative large mean square errors (Remark 3.7). The simulations next show advantages of estimation method of ridge parameters proposed in this section. Using Equations 3.191 and 3.193, 1
1
1
α = Λ 2 θ = Λ 2 (θ + u ) = Λ 2 θ + S , S ∼ N(0, σ 2 I )
(3.222)
For two groups of θ’s, generate 100 group S’s by taking σ2 = 1. Using ( j) Equation 3.222, 100 groups of θ’s are gained (θ , j = 1, 2, . . ., 100). Several estimates of θ are obtained by using θ ’s as follows:
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
171
1. Ridge estimation (its ridge parameter R is determined by the Madonald–Galarneau method [1]) θiR =
λi θi , λi + R
i = 1, 2,…, n
2. Generalized ridge estimation (its ridge parameter is determined by the Hemmerle–Brantle method [1,3]) 2
WR i
θ
2 λ θ i − σ = i , i = 1, 2,…, n λi θi2
3. Generalized ridge estimation (its ridge parameter is determined by the method in Section 3.5.3) θWi =
λi
λi + (rn + 1 + ri )
−1
θ i , i = 1, 2,…, n
The true values of λi ’s and θi ’s are given in Table 3.1 (the two group values of θ are θ(1) and θ(2)). Table 3.1 Eigenvalues and Canonical Regression Coefficients i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
© 2012 by Taylor & Francis Group, LLC
λ1
θi (1)
θi (2)
0.22D − 02 0.484D − 0.2 0.106D − 01 0.234D − 01 0.101D + 02 0.101D + 02 0.102D + 02 0.105D + 02 0.112D + 02 0.127D + 02 0.158D + 02 0.299D + 02 0.383D + 02 0.722D + 02 0.147D + 03 0.311D + 03 0.672D + 03 0.147D + 04
3.081 1.228 2.143 −3.210 0.780 3.104 −2.691 1.499 0.366 1.944 0.202 0.944 −3.137 0.599 −0.803 0.004 5.997 1.770
−1.590 −0.399 2.515 0.132 −0.474 1.055 −0.896 −2.306 2.784 1.637 1.724 −4.276 −2.227 0.822 2.837 1.907 2.005 −0.219
17 2
M e A suReM en t DAtA M o D eLIn G
In order to evaluate different methods objectively, define that E
R
1 = 100
EB =
1 100
EW =
1 100
E
LS
1 = 100
100
∑ θ − θ
( j )R
2
θ − θ ( j ) B
2
j =1
100
∑ j =1
100
∑ θ − θ
2
( j )W
j =1
100
∑
θ − θ ( j )
2
j =1
where θ( j ) R, θ ( j )WB , θ ( j )W , and θ ( j ) are the jth simulated outcome of θ( j) from three ridge estimates (Madonald–Galarneau method, Hemmerle–Brantle method, and our method) and the least squares estimate. Table 3.2 shows that our method of estimating ridge parameters proposed in this section is much more superior to other two classical methods. The reason is that our method uses extreme value problem (3.200) to determine ridge parameters, which properly uses the fact that S ~ N(0, σ2 In), while the other two methods only use the fact that Si ~ N(0, σ2) or equivalently ui ∼ N(0, σ 2 λ i−1 ). It may be difficult to discuss the statistical properties of this new generalized ridge estimation without its explicit expression, but its potential wide application could be demonstrated by simulation. In practice, based on the parameters’ approximate ranges, we could compare the different methods’ efficiency by a large quantity of simulated computation (the theory foundation can be referred to in Ref. 24). Actually, the efficiency of biased estimation is also related to the different location of parameter β in parameter space. In simulation Table 3.2 θ(1) θ(2)
© 2012 by Taylor & Francis Group, LLC
Average mean square Errors of Estimates ER
E|| B
EW
ELS
0.2966D + 3 0.2645D + 3
0.2626D + 3 0.2317D + 3
0.3742D + 2 0.1876D + 2
0.7678D + 3 0.7688D + 3
173
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
comparison, we should choose different values of θ for different groups and compare the average mean square errors. A large number of tests show that the new method in this section will bring one magnitude order less error than the least squares estimation and ridge estimation. Example 3.7 Consider the following linear regression model: Y = X β + e , e ∼ N(0, σ 2 I ) m = 24, n = 5, σ = 1, X = (xij )24 × 5
(3.223)
xij = 5 × 10 −10 i −2 [i + j 2 + 5 sin(ij )]8 The 24 elements of the error vector e simulated by computer are sequentially listed as follows: 1.552 0.299 −0.842
−1.345 −0.402 −0.108
−0.750 0.002 0.891
0.183 2.999 1.263
0.972 0.885 0.837
0.101 0.036 −0.407
−1.568 −0.297 1.925
0.472 −0.923 −0.772
Since this is a simulation, parameter β is known. We use the scale factor Q in Theorem 3.8; λi (i = 1, 2, . . . , 5) are eigenvalues of X τ X, and β = (β1, β2, . . . , β5) are true values of regression coefficients. β and β R are, respectively, the least squares and ridge estimates of β from model 3.222. β R = ( X τ X + RI )−1 X τY The ridge parameter R is determined by ||βR|| = ||β||. Let α* be the ridge estimate of α = Q−1β in the normalized model. Then, β* = (Q τ X τ XQ + R1 I)−1Q τ X τY whose ridge parameter R1 is determined by α ∗ = Q −1β , β∗ = Qα ∗ ⋅ β is the generalized ridge estimate from model 3.176 using our method. The results of the simulation are given in Table 3.3.
Table 3.3
simulation Results
i
βI
β i
βiR
β*i
β i
λI
1 2 3 4 5
−0.5597 −5.5001 1.1887 −0.7175 0.4183
1.815 −6.650 1.203 −0.772 0.430
−0.553 −1.514 0.230 −0.680 0.413
−0.553 −.5.573 1.122 −0.718 0.414
−0.453 −4.496 0.958 −0.665 0.414
0.831e − 10 0.675e + 00 0.367e + 01 0.349e + 03 0.187e + 05
© 2012 by Taylor & Francis Group, LLC
174
M e A suReM en t DAtA M o D eLIn G
Table 3.3 shows that a scale factor has great influence on biased estimation. The ridge estimate from a normalized model using a scale factor is much better than that from a model without being normalized by a scale factor. Example 3.8 Consider Example 3.2. Suppose that f (t ) = sin t + e t + (1 + 5t 2 )−1 , t ∈ [−1, 1] The error vector is simulated by {ei} i.i.d. ~ N(0, 0.01). Let y(ti ) = f (ti ) + ei , i = 0, 1, . . . , 200 P (t ) =
32
∑ β T j =1
j
j −i
(t )
32
P (t ) =
∑ β T j =1
j −1
(t )
where P (t) is a polynomial determined by the method in Example 3.2 such that 1
∫
1 − t 2 ΩP ′′ (t )Ω2 dt ≤ 20
−1
The coefficients (β 1 , β 2 , . . . , β 32 ) of P(t) are determined by the extreme value problem 200
2 Ωyi − P (t )Ω = min i =0 1 2 2 ′′ 1 − t ΩP (t )Ω dt ≤ 20 −1
∑ ∫
(3.224)
Then, coefficients of the polynomial obtained are the biased estimates of coefficients (β1, β2, . . . , β32) of the best approximation polynomial P * (t ) =
32
∑β T j =1
j
j −1
(t ).
From the precision perspective of data processing, P(t) is a better approximation of f(t) than P (t ).
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
17 5
3.6 The Method of Point-by-Point Elimination for Outliers 3.6.1 Introduction
The method of least squares has been widely used in parameter estimation of the following linear regression model: Yi = X i β + ei , Ee = 0, Ee e = σ 2 δ , i j ij i
i = 1, 2, . . . , T
i , j = 1, 2, . . . , T
due to its theoretical perfection and result richfulness. However, modern robust statistics show that the influence function of βLS is an unbounded function and its breakdown point ε*(βLS) = 0 . This means that when LS estimation is used, no outlier is allowed in the observational data set. Otherwise, LS estimates are not reliable. However, engineering practices show that outliers are unavoidable. Huber [20] proposed the following M estimation:
∑ ρ(Y − X β) = min i
i
(3.225)
where ρ is a continuous convex function. M estimation is one of the most popular methods in robust estimation. When ρ(τ) = τ2, M estimation becomes the least squares estimation. Huber expected to obtain a robust estimate through the construction of ρ. The questions are: How to construct a proper convex function ρ? How to analyze the estimation error of a nonlinear estimate? Furthermore, M estimates can only be obtained by iterative processes [18] and it is hard to obtain initial values [22]. Thus, it is important to identify and eliminate outliers before estimating parameter β. Reference 14 proposes the criteria of identifying and eliminating outliers and discusses their misidentification probabilities. Next, we introduce these criteria given in Ref. 14. Consider the following model: Yi = X i β + ei , i ∈ M 2 {ei , i ∈ M }i.i.d. ∼ N(0, σ ) Y = X β+ η , j ∈N j j j { η j , j ∈ N }i.i.d. ∼ N(0, kσ 2 ) Eei ηi = 0, i ∈ M , j ∈ N
© 2012 by Taylor & Francis Group, LLC
(3.226)
(3.227) (3.228) (3.229)
(3.230)
17 6
M e A suReM en t DAtA M o D eLIn G
where Xi is a row vector of dimension P, β is a column vector of dimension P, β is the vector of unknown parameters, and M and N are two unknown index sets. M = m, N = n, m + n = T M ∩ N = Φ, M ∪ N = {1, 2, . . . , T } A =
∑X X , i ∈M
i
τ i
B =
A >B
∑X X j ∈N
j
τ j
A is a nonsingular matrix β = ( A + B )−1
T
∑X Y i =1
τ i i
= ( X τ X )−1 X τ Y
Reference 14 constructed a criterion of low misidentification probability to identify index sets M and N by introducing the descending residual of least squares estimates. It also presented a corresponding fast algorithm. Reference 14 addressed two well-known examples of modeling pollution data in the literature. It carried out a simulation study at the same time and compared this with Huber’s M estimation. 3.6.2 Derivation of Criteria
Suppose that the corresponding design matrixes of index sets M and N are X* and X** in models (Equations 3.226 through 3.230). The dependent variables are Y* and Y**, Y = (Y*, Y**), X = (X*, X**) and then models (Equations 3.226 through 3.230) can be expressed as the following equations: Y = X β + e , e ∼ N(0, σ 2 I ) * m * 2 Y* * = X * *β + η, η ∼ N(0, kσ I n ) τ Ee η = 0
© 2012 by Taylor & Francis Group, LLC
(3.231)
(3.232) (3.233)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
17 7
For convenience, let X(t) be the remaining matrix where the ith row of X has been eliminated, Y(t) be the remaining vector where the tth element of Y has been eliminated, and β (t ) = ( X τ (t ) X (t ))−1 X (t )τ Y (t ). Let H = ( hij )T ×T = X ( X τ X )−1 X τ hi = hi (i = 1, 2, . . . , T )
It is easy to prove that τ
H = H, H = H, 2
T
∑h j =1
2 ij
= hij = hi
(3.234)
δ t2 1 − ht
(3.235)
lemma 3.18 Suppose δ t = Yt − X t β , (0 ≤ t ≤ T ) ; then Y − X β
2
2 = Y (t ) − X (t ) β (t ) +
Proof Suppose that t = 1. Using Ref. 1, it is known that (proof is left as an exercise) ( X τ X )−1 δ 1 X 1τ β − β (1) = 1 − h1 Y − X β
2
= δ 12 + Y (1) − X (1)β
Y (1) − X (1)β
(3.236)
2
= Y (1) − X (1)β (1) X (1)(β − β (1))
© 2012 by Taylor & Francis Group, LLC
2
2
=
2
2 + X (1)(β − β (1))
h1 δ 12 1 − h1
178
M e A suReM en t DAtA M o D eLIn G
Combining the four equations above leads to the following results:
Y − X β
2
= δ 12 + =
h1 δ 12 + Y (1) − X (1)β (1) 1 − h1
δ 12 + Y (1) − X (1)β (1) 1 − h1
2
2
Let δ t = Yt − X t β , ζ t =
δ t2 , (t = 1, 2, . . . , T ) 1 − ht
The following study shows that the larger the value of ξ the larger the probability that t ∈ N. For convenience, let h = min hii . 1≤ t ≤T
lemma 3.19 2 ( k − 1)σ 2 hii2 , σ + 1 − ht i ∈N Eζ t = 2 kσ 2 − ( k − 1)σ hii2 , 1 − ht i ∈M
∑ ∑
t ∈M t ∈N
(3.237)
In particular, ( k − 1)σ 2 ht 2 , ≤ σ + 1 − ht Eζ t 2 ≥ kσ 2 − ( k − 1)σ ht , 1 − ht
t ∈M t ∈N
(3.238)
Proof Suppose that t = 1. Under the assumptions of Equations 3.231 through 3.233, t ∈ M, we have
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
δ 1 = e1 − X 1 ( X τ X )−1 X 1τ e1 − X 1 ( X τ X )−1 − X 1 ( X τ X )−1 = (1 − h1 )e1 −
T
m
179
∑X e i=2
τ i i
∑Xe
τ j j
j =m +1 m
∑ i=2
h1i ei −
T
∑he
j =m +1
j j
Using Equation 3.234, Eδ = (1 − 2h1 ) + 2 1
h12j σ 2 i =1 j =m +1 T h12j σ 2 = (1 − h1 )σ 2 + ( k − 1) j = m + 1 T
∑
h + ( k − 1) 2 1i
T
∑
∑
Using the above equation and ζ1 = Eζ1 = σ 2 +
δ 12 , we have 1 − h1
( k − 1)σ 2 1 − h1
∑h i ∈N
2 1i
h1 ≤σ + ( k − 1)σ 2 1 − h1 2
For other t ∈ M, the first equations of Equations 3.237 and 3.238 can be proved similarly. The second equations of Equations 3.237 and 3.238 can also be proved similarly. Theorem 3.21 If h <
1 , k > 1, then 4
Eζi < Eζ j , i ∈ M ,
© 2012 by Taylor & Francis Group, LLC
j ∈N
(3.239)
18 0
M e A suReM en t DAtA M o D eLIn G
Proof 1 h 1 , we have 0 ≤ ≤ . Using Equation 3.238, 4 1−h 3 when i ∈ M and j ∈ N, we have
Since 0 ≤ h ≤
( k − 1)σ 2 , i ∈M 3 ( k − 1)σ 2 , j ∈N Eζi ≤ kσ 2 − 3 Eζi ≤ σ 2 +
So, when k > 1, Equation 3.239 holds. Remark 3.9 Theorem 3.21 clearly shows that it is possible and effective to identify outliers by comparing the value of ξi. Next, we will discuss misidentification probability. Suppose that a1, a2 are a given positive number and α is a real number,
α 2 < a22 a1−2 , a3 = a12 C = 2 αa1
a22 − α 2 a12 αa12 a22
It is easy to prove that C
−1
a12 = 2 αa1
αa12 a22
−1
a22 1 2 = D = 2 a1 a3 −α
−α 1
(3.240)
lemma 3.20 Suppose that u is a normal random vector of dimension 2, u = (u1, u2)τ , Eu = 0, COV(u) = C, a2 > a1 > 0, then
© 2012 by Taylor & Francis Group, LLC
181
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
P{u < u12 } = 2 2
a1 a22 − α 2 a12 1 arctg π a22 − a12
(3.241)
Proof −
Using Equation 3.240, C −1 = D, and [det(C )] then
1 2
1
= [det( D )] 2 = 1/a1a3 ,
P{u22 < u12 } =
1 2πa1a3
a 2 a −2 x 2 − 2αxy + y 2 exp − 2 1 dx dy 2 2 a 3 y 2≤ x 2
=
1 2πa1a3
x2 z2 exp − 2 − 2 dx dz 2a3 2a1 | z + ax |≤| x |
=
1 2πa1a3
∫∫
∫∫
∫∫
a3a1−1tgθ + α ≤ 1
r2 exp − a1a3rdrdθ 2
=
a1 1 arctg(1 − α ) + a3 π
a1 arctg(1 + α ) a 3
=
a1a3−1 1 arctg π 1 − a12 a3−2 (1 − α 2 )
=
a a 2 − α 2 a12 1 arctg 1 22 π a2 − a12
Equation 3.241 is proved. Using Lemma 3.20, it is easy to prove the following theorem. Theorem 3.22 Suppose that h ≤
1 , k > 1. Then 4
P{ζ j > ζi } ≤
© 2012 by Taylor & Francis Group, LLC
Eζ j Eζi 1 arctg , ( j ∈ N , i ∈ M ) (3.242) Eζ j − Eζi π
18 2
M e A suReM en t DAtA M o D eLIn G
Theorem 3.22 shows that when h(1 − h)−1 is relatively small (less than 1/4 or the smaller the better) and k is relatively large (k > 10), Eξj >> Eξi and P{ξj > ξi}( j ∈ N, i ∈ M) is relatively small. When ξi is relatively large, t ∈ N. The smaller an h is the easier Eξj >> Eξi holds. Now we discuss the method of point-by-point elimination of outliers. This method first identifies the relatively large ξi and then eliminates the corresponding equation from Equations 3.231 through 3.233. The process is repeated. One advantage of this method is to eliminate outliers that are easy to be identified first. As the process of elimination proceeds, n/T becomes small and outliers that are hard to be identified become relatively easy to be identified. Next we discuss the decrease of residual sum of squares after eliminating outliers from Equations 3.231 through 3.233. Theorem 3.23 Let β* be the least squares estimate of β in Equation 3.231: R =
T
∑ i =1
(Yi − X i β )2 , R * =
T
∑ (Y − X β ) i
i =1
* 2
i
Then ER = (T − p )σ 2 +
∑ (1 − h )(k − 1)σ j
j ∈N
2
(3.243) (3.244)
ER * = (m − p )σ 2 Proof
It is straightforward to show that Equation 3.244 holds. Note that
∑h , i ∈ M − ( k − 1)σ ∑ h , j ∈ M
Eδ i2 = (1 − hi )σ 2 + ( k − 1)σ 2 Eδ 2j = (1 − h j )kσ 2
© 2012 by Taylor & Francis Group, LLC
j ∈N 2
i ∈N
2 ij
2 ij
18 3
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
Then
ER =
∑ Eδ + ∑ Eδ = ∑ (1 − h )σ +∑ (1 − h )kσ i ∈M
2 i
j ∈N
2 j
i ∈M
i
2
j ∈N
j
2
Equation 3.243 is proved. From Theorem 3.23 we see that when hj ≤ 1/4( j ∈ N) and k is relatively large,
ER /ER * >> (T − p )/(m − p ) The residual sum of squares is reduced a lot after eliminating n outliers and their corresponding equations in models to 3.231 through 3.233. On careful reading, it may be found out that the main idea of the elimination process is to eliminate the data point corresponding to the maximal ξi and the equation corresponding to the maximal ξi. When there are outliers in data, the method will eliminate outliers step by step. However, if there are no stopping criteria, the method will eliminate normal data, which is not good, after all outliers have been eliminated. Therefore, it is necessary to construct a stopping rule to protect normal data from being eliminated. Consider the following model: Y = X β + e , e ∼ N(0, σ 2 I m )
(3.245)
where X = Xm× p, rank (X) = p, and Y is a vector of observed data. For convenience, let H = ( hij )m × m = X ( X τ X )−1 X τ δ = (δ 1 , δ 2 , . . . , δ m )τ = ( I − H )Y = ( I − H )e
ζ = (ζ1 , ζ 2 , . . . , ζm )τ ζi = (1 − hi )−1 δ i2 , 1 ≤ i ≤ m Obviously, the stopping rule we construct should guarantee that normal data of model (3.245) will not be eliminated. To lay out the stopping rule, we need the following lemmas.
© 2012 by Taylor & Francis Group, LLC
18 4
M e A suReM en t DAtA M o D eLIn G
lemma 3.21 Suppose that u1, u2, …, ui are independent of each other and have the same distribution. u1 ~ N(0,σ2), α ∈ (0,1). Let I (l , α ) = P {u12 > α(u12 + u 22 + + ul2 )} Then I (l , α ) =
∫
π 2
0
−1
sin l − 2 ϕdϕ
∫
arcsin 1 − α
0
sin l − 2 ϕ dϕ (3.246)
Proof Using the probability distribution function of a normal random vector and the spherical coordinate transformation of I multiple integrations, we get the folowing: 1 I (l , α ) = 2π
∫
2π
0
l
∫
+∞
0
r2 exp − r l − 1dr 2 2
sin l − 3 ϕ 2 dϕ 2 …
∫
2π
0
∫
arcsin 1 − α
0
sin l − 2 ϕ 1 dϕ 1
sin ϕ l − 2 d ϕ l − 2
Note that I(I,0) = 1, I (l , α ) = I ( l , 0)
∫
sin l − 2 ϕ dϕ π 2 sin l − 2 ϕ dϕ 0
arcsin 1 − α
0
∫
Equation 3.246 is proved. lemma 3.22 Under the same assumption as model 3.245,
{
P ζi > α δ
© 2012 by Taylor & Francis Group, LLC
2
} = I (m − p,α),
i = 1, 2, . . . , m
(3.247)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
18 5
Proof Note that ( I − H )2 = ( I − H ), rank ( I − H ) = m − p and ( I − H )τ = ( I − H ) There exists a diagonal matrix D = diag(1, …, 1, 0, …, 0) and an orthogonal matrix P = (P 1, P 2) (the number of rows of P 1, P 1 are both m and the number of columns of P 1, P 2 is m − p and p) such that ( I − H ) = PDP τ = P1P1τ
(3.248)
Let ε = P1τ e . Since e ~ N(0,σ2 Im), it is easy to know that e ~ N(0,σ2 Im−p), and δ = ( I − H )Y = ( I − H )e = P1P τ e = P1e
(3.249)
Let Ri (i = 1, 2, …, m) be the ith row of P 1. Then, using Equation 3.248 and the definition of ξi, it is easy to know that Ri
2
1 − hii
Ri = 1, ζi = 1 − hii
ε
2
(3.250)
Using the first equation of Equation 3.250, there exists an orthogonal matrix Q whose elements in a row are R i / 1 − hii . Let g = Qε. Since ε ~ N(0,σ2 Im−p) and Q is an orthogonal matrix, using the second equation of 3.250, it is easy to know that g ∼ N(0, σ 2 I m − p ), ζi = g 12
(3.251)
Using Equation 3.250 and g = Qε, we have δ
2
= ε
2
= g
2
(3.252)
Finally, combining Equations 3.251 and 3.252, and Lemma 3.21, Equation 3.247 holds.
© 2012 by Taylor & Francis Group, LLC
18 6
M e A suReM en t DAtA M o D eLIn G
Theorem 3.24 Under the same assumption as model 3.245,
{
P max ζi > α δ 1≤ i ≤ m
2
} ≥ 1 − mI (m − p, α)
(3.253)
Proof By Bonferroni inequality, 2
2
2
P{ξ1 ≤ α δ , ξ 2 ≤ α δ , . . . , ξm ≤ α δ } ≥ 1 −
m
∑ P{ζ i =1
2
i
≤αδ }
The inequality above combined with Lemma 3.22 shows that Equation 3.253 holds. From Theorem 3.24 and the expression of I(m − p,α) we see that when parameter α is large, the probability of normal data being eliminated is small. But if the value of α is too large, criterion max ξi ≤ α∙δ ∙2 will keep outliers in the data. Therefore, α should not be too large. Note that E δ
2
= (m − p )σ 2 , ζi σ −2 ∼ χ 2 (1)
So ξi ≤ 7.29σ2 with a large probability. For keeping normal data, α = 7.29 /m − p is good. For a given α, the probability that normal data are eliminated is less than mI(m − p,α) using Equation 3.253. Now, let us go back to models 3.231 through 3.233. Suppose that Y = (Y*τ , Y*τ* )τ = (Y1 , Y2 , …, Ym + n )τ X = ( X *τ , X *τ* )τ = ( X 1 , X 2 , …, X m + n )τ
ε = (eτ ,ητ)τ H = X(X τ X )−1 X τ . Then models 3.231 through 3.233 can be rewritten as Y = X β + ε, Eε = 0
© 2012 by Taylor & Francis Group, LLC
(3.254)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
18 7
criterion 3.3 Consider linear regression model 3.254. Define that δ = ( I − H )Y and ζ t = (1 − ht )−1 δ t2
(t = 1, 2, …, m + n)
When ξi = max ξi > α∙ δ ∙2 , Yi is regarded as an outlier where α is determined by Equation 3.246 and (m + n)I(m − n − p,α) ≈ 0.05. Remark 3.10 Criterion 3.3 is derived from Theorems 3.21 and 3.22, which both require h < 1 / 4, k >> 1. To demonstrate Criterion 3.3 is widely applicable, it is necessary to show h < 1/4, k >> 1 for models in processing common dynamic measurement data. Concretely, m is the sample size of normal data, n is the number of outliers. It is common to have m >> n, namely the proportion of outliers is small. p is the number of unknown parameters; we also have m >> p, namely the sample size is much larger than the unknown parameters. Since we have H = (hij)T ×T = X(X τ X)−1 X τ , T = m + n, hi = hii (i = 1, 2, …,T ); H 2 = H , H τ = H ,
T
∑h j =1
2 ij
= hii = hi ; h = min hii 1≤ t ≤T
And X is a matrix with dimension T × p; the trace of H is as follows: tr(H) = tr[X(X τ X)−1 X τ] = tr[(X τ X(X τ X )−1] = tr(Ip) = p, the sum the diagonal elements is
T
∑h i =1
ii
= p.
Since T = m + n >> p, it is
p 1 < . 1≤ t ≤T 4 T On the other hand, from Equations 3.227 and 3.229, k is variance ratio between outliers and normal data, it could be regarded as k >> 1. Therefore, Criterion 3.3 is widely applicable.
easy to deduce h = min hii ≤
3.6.3 Numerical Examples
The following numerical examples illustrate the applicability of Criterion 3.3 well:
© 2012 by Taylor & Francis Group, LLC
18 8
M e A suReM en t DAtA M o D eLIn G
Table 3.4 Elements of e-Generated by Computer −1.549 −0.204 0.837 −0.491 1.818
−0.337 2.156 0.405 −0.052 −0.416
−0.924 0.211 1.358 −1.095 0.337
−0.293 0.746 −0.672 1.373 0.089
−0.647 1.124 −0.116 1.766 −0.463
0.633 0.162 0.346 −0.990 −0.573
Example 3.9 This example provides results from a simulation study where X = (xij)30×4, β = (10, 10, 10, 10)τ , e ∼ N(0, I 30 ) 10ei , i = 1, 8, 15, 22, 29 εi = e , other i
xi 1 = sin(1 + 0.5i ), xi 2 = cos(1 + 0.5i ) xi 3 = ln(1 + 0.5i ), xi 4 = (1 + 0.5i )2 Y = ( y1 , y 2 , …, y30 )τ = X β + ε where (y1,y 8,y15,y 22,y 29) are outliers. Table 3.5 shows estimation results. The last column of Table 3.5 2 contains residual sum of squares ∙δ∙ , true values of β are given in the first row, and the second row has corresponding estimates obtained by Huber’s M estimation where initial values of β’s are their true values. The third row contains least squares estimates using all the Table 3.5
True value Huber — 8 1 15 22 29 21 26
© 2012 by Taylor & Francis Group, LLC
Comparison Table of Estimates 2
β1
β2
β3
β4
∙δ∙
10.000 10.179 7.752 9.032 10.388 9.376 10.198 10.161 10.187 10.027
10.000 10.588 11.351 11.003 11.085 11.682 10.715 10.470 10.612 10.712
10.000 10.205 11.786 10.822 11.194 10.376 10.310 10.110 10.104 10.118
10.000 9.996 9.982 9.994 9.987 9.999 9.993 9.999 10.000 10.001
— 19.52 1126 726.1 430.6 215.4 28.58 11.15 8.186 5.563
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
observations. The remaining rows are least squares estimates based on data eliminating (y8), (y8,y1), (y8,y1,y15), (y8,y1,y15,y22), ( y8,y1,y15, y22,y29), ( y8,y1,y15,y22,y29,y21), respectively. The order of eliminating observations is determined by repeatedly applying Criterion 3.3. Using Criterion 3.3, five outliers are eliminated. If we continue the elimination process, Criterion 3.3 is violated. The reduction of residual sum of squares is small and estimates deteriorate. Table 3.5 shows clearly that 1. Eliminating outliers and their corresponding regression equations sharply reduces the residual sum of squares of a linear regression model. 2. Eliminating outliers greatly improve parameter estimates. The more the residual sum of squares is reduced the greater parameter estimates improve. 3. The residual sum of squares is reduced even when a normal observation is eliminated. However, the magnitude of reduction is small. Normal observations can be kept by choosing a proper α in Criterion 3.3. For example, α = 7.29(m + n − p)−1. 4. The least squares estimates after eliminating all five outliers are much better than Huber’s M estimates. Example 3.10 Table 3.6 lists annual inflation rate of commodity price in Taiwan from 1940 to 1947. Consider a linear regression model between rate yi and year Xi yi = β0 + β1 X i + εi , i = 1, 2, …, 8
It is easy to find out that inflation rate at 1947 is an outlier by Criterion 3.3. The fitting straight lines with and without the outlier are y (x ) = −47.49 + 1.18x y(x ) = −1.253 + 0.07754x Table 3.6 Year xi Rate yi
© 2012 by Taylor & Francis Group, LLC
Annual Inflation Rate of Commodity Price
40 1.62
41 1.63
42 1.90
43 2.64
44 2.05
45 2.13
46 1.94
47 15.5
18 9
19 0
M e A suReM en t DAtA M o D eLIn G
The latter is more realistic and its reasonableness is shown in Table 3.7 where δi = yi − y (xi ),
δ i = yi − y (xi )
Only δ 8 is relatively large and all other δ i ’s are small. Satisfactory results are also obtained when a linear regression model is carried out for the number of international long distance calls from 1950 to 1973 in Belgium [19] and outliers are identified by Criterion 3.3.
3.7 Efficiency of Parameter Estimation in Linear Regression Models 3.7.1 Introduction
Consider the linear regression model Y = X β + e,
e ∼ N(0, K )
(3.255)
where X is a known matrix of m×n, β is a vector of regression coefficients, and K is a positive matrix. If K is known, the model can be rewritten as K −1 / 2 Y = ( K −1 / 2 X )β + K −1 / 2 e , K −1 / 2 e ∼ N(0, I ) (3.256) And, by Gauss–Markov theorem, the best linear unbiased estimator of β is
β = ( X τ K −1 X )−1 X τ K −1Y
(3.257)
In practice, K is usually unknown or cannot be exactly determined. Hence, it is impossible to get β from Equation 3.257. We can only use β = ( X τ X )−1 X τ Y
(3.258)
as an estimate of β. It is not difficult to show that both β and β are linear unbiased estimators of β, that is, Table 3.7 Comparison of Residuals Year xi δi δi
40 +1.92
41 +0.74
42 −0.17
43 −0.61
44 −2.38
45 −3.48
46 −4.85
47 +7.85
−0.14
−0.21
−0.01
+0.65
−0.01
−0.01
−0.27
+13.26
© 2012 by Taylor & Francis Group, LLC
191
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
Eβ = β, and
Eβ = β
(3.259)
Var(C τβ ) ≥ Var(C τβ )
(3.260)
It is natural to ask the following questions. How to measure the How much is β worse than β? difference between β and β? Next, we explore the difference between β and β in a simple example. Example 3.11 Suppose that yi = β + ei, ei ~ N(0,ki), i = 1, 2, …, m, where ei ’s are independent of each other. Then,
Y = ( y1 , …, ym )τ , X = (1, …, 1)τ , K = diag( k1 , …, km ) (3.261) If ki is known, then β =
(∑ k ) ∑ k −1 i
−1
y , and we have
−1 i i
2 E β −β 2
( )
1 1 = E ki−1 yi − β = E ∑ ki−1( yi − β) ∑ 2 ∑ k −1 1 − i ∑ ki
(
1
=
(∑ ki ) = ( ∑ ki ) −1
−1
2
∑ ki−2 E ( yi − β)2 =
−1
1
(∑ ki ) −1
2
)
2
∑ ki−2 ⋅ ki (3.262)
1 However, if ki is unknown, then β = ∑ yi , and we have m 2 E β − β
1 = E m
∑
© 2012 by Taylor & Francis Group, LLC
2
1 yi − β = 2 m
∑ E( y
i
− β) 2 =
1 m2
∑k
i
(3.263)
19 2
M e A suReM en t DAtA M o D eLIn G
By the Holder inequality: m = 2
∑
2
1 ki ⋅ ≤ ki
∑k ⋅ ∑ k i
1 i
So
E β − β =
1 m2
2
∑
ki ≥
1
∑k ∑k
−1 i
i
⋅
∑
ki =
(∑ k ) −1 i
−1
2 = E β −β
(3.264) The equality holds only when k1 = k 2 = … = km. Now we analyze the difference between Equations 3.262 and 3.263. For two given positive integers a,b, let k1 =
σ2 , k2 = = km −1 = σ 2 , km = b σ 2 a
Then 2 E β − β =
(∑ ) ki−1
−1
1 1 1 a = 2 + 2 ++ 2 + σ bσ 2 σ σ
−1
−1
σ2 1 = a + m − 2 + σ2 < m−2 b 2 1 E β − β = 2 m
∑k
i
=
1 1 2 + m − 2 + b σ m2 a
lim E β − β 2 = +∞ a → 0+ 2 lim E β − β = +∞ b → +∞
(3.265)
2 2 For the two cases in Equation 3.265, E β − β /E β −β → +∞ , which means that β is much worse than β as an estimate of β. On the other hand, when a → 0 +, b → +∞, y1 and ym are both outliers that should be eliminated before data processing. In order to identify the estimate quality of β and β , the following two concepts in estimation efficiency are introduced.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
19 3
Definition 3.7 Consider the linear regression model (Equation 3.255). Let β = ( X τ X )−1 X τY . We define the estimation efficiency of β as det (COV (β )) e1 (β ) = det COV (β )
(3.266)
COV(β ) = ( X τ K −1 X )−1 σ 2
(3.267)
COV(β ) = ( X τ X )−1 X τ KX ( X τ X )−1 σ 2
(3.268)
(
)
where
Definition 3.8 Consider the linear regression model 3.265. Let β = ( X τ X )−1 X τ Y ; the estimation efficiency of c τβ is defined as e 2 (c τ β ) =
Var(c τ β ) Var(c τ β )
(3.269)
where Var(c τ β ) = c τ ( X τ K −1 X )−1 c σ 2
(3.270)
Var(cτ β ) = c τ ( X τ X )−1 X τ KX ( X τ X )−1 c σ 2
(3.271)
It is easy to prove that 0 ≤ e1 (β ) ≤ 1, 0 ≤ e 2 (cτ β ) ≤ 1 Obviously, the more the estimation efficiency ei is close to 1, the more it is appropriate to use β as an estimate of β. It is also feasible to
© 2012 by Taylor & Francis Group, LLC
19 4
M e A suReM en t DAtA M o D eLIn G
evaluate how much is lost in using β to estimate β through two expressions of estimation efficiency introduced above. 3.7.2 Efficiency of Parameter Estimation in Linear Regression Models with One Variable
Suppose that a linear regression model with one variable is y = xβ + e , e ∼ N(0, K )
(3.272)
where y = ( y1 , . . ., ym )τ , x = (x1 , . . ., xm )τ , e = (e1 , . . ., em )τ Now we discuss the estimation efficiency of regression coefficient β in the model. Let β = (x τ K −1x )−1 x τ K −1 y
(3.273)
β = (x τ x )−1 x τ y
(3.274)
e1 (β ) = e 2 (c τβ ) = A
(3.275)
It is easy to show that
where A = (x τ K −1x )(x τ x )−2 x τ Kx
−1
(3.276)
Since K is a symmetrical positive definite matrix, there exists an orthogonal matrix P such that K = P ΛP τ , Λ = diag(λ 1 , . . ., λ m ), where 0 < λ1 ≤ … ≤ λm. Note that x ≠ 0. Let v =
© 2012 by Taylor & Francis Group, LLC
1 τ
x x
x , u = P τv
19 5
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
Then, A = (x τ K −1x )(x τ x )−2 x τ Kx
−1
= ( x τ xv τ K −1 x τ xv )(x τ x )−2 ( x τ xv τ K x τ x v ) = v τ K −1vv τ Kv
−1
= (u τ Λ −1u )(u τ Λu ) =
Since
∑λ
∑u
2 i
1 −1 2 i i
u ⋅
−1
= u τ P τ P Λ −1P τ Puu τ P τ P ΛP τ Pu
−1
−1
∑λ u
2 i i
= u τu = v τ P τ Pv =
prove that
1 x τx
xτ
1 x τx
x = 1, it is easy to
(∑ λ u )(∑ λ u ) ≤ (λ4λ+ λλ ) −1 2 i i
2 i i
1
m
2
1 m
Therefore, A ≥
4 λ 1λ m (λ 1 + λ m )2
(3.277)
When λ1 = λm, k = λ1I, then β = β and the estimation efficiency attains its maximum A = 1. Remark 3.11 The inequality (3.277) provides a lower bound for estimation efficiency. The exact estimation efficiency can be calculated using Equation (3.276) and may be much larger than the lower bound.
© 2012 by Taylor & Francis Group, LLC
19 6
M e A suReM en t DAtA M o D eLIn G
Example 3.12 Analyze the estimation efficiency of regression coefficients in the following model y1 1 y = 1 β + e , e ∼ N(0, K ) 2 where 1 K = −a
−a , |a| < 1 1
Solution
A simple calculation shows that 1 1 1 β = (x τ x )−1 x τ y = x τ y = x τ (xβ + e ) = β + x τ e 2 2 2 2 E(β − β )
( )
1 1 1 1 1 1 = E x τ e = x τ Kx = −a 2 4 4
−a 1 1 − a = 1 1 2
β = (x τ K −1x )−1 x τ K −1 y = (x τ K −1x )−1 x τ K −1(xβ + e ) = β + (x τ ( K −1x )−1 x τ K −1e −β)2 E(β = E((x τ K −1x )−1 x τ K −1e )2 = (x τ K −1x )−1 x τ K −1KK −1x(x τ K −1x )−1 = (x τ K −1x )−1 =
1−a 2
Therefore, E(β −β)2 A = e1 (β ) = =1 E(β − β)2
© 2012 by Taylor & Francis Group, LLC
19 7
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
On the other hand, since 1 −a
−a 1 1 = (1 − a ) 1 1 1
1 −a
−a 1 1 = (1 + a) −1 1 −1
so
λ 1 = 1− | a|, λ 2 = 1 + | a | The lower bound of estimation efficiency is
4 λ 1λ m 4(1 − a 2 ) = = 1 − a2 2 4 (λ 1 + λ m ) When a → 1, the lower bound approaches to 0. However, the true estimation efficiency is 1. We see that there is indeed a difference between the lower bound of estimation efficiency and its true value.
3.7.3 Efficiency of Parameter Estimation in Multiple Linear Regression Models
1. General conclusions Similar to the linear regression model with a single variable, it is not difficult to derive the lower boundary of estimation efficiency for a multiple regression model. Theorem 3.25 Suppose that K is a positive matrix, 0 < λ1 ≤ … ≤ λm are eigenvalues of K, rank(Xm×n) = n, and m > 2n. Then e1 (β ) ≥ regardless of X.
© 2012 by Taylor & Francis Group, LLC
n
∏ (λ i =1
4λi λm − i + 1 2 i + λm − i + 1 )
(3.278)
19 8
M e A suReM en t DAtA M o D eLIn G
Theorem 3.26 Under the same assumption as Theorem 3.25, for any given c, a constant vector of dimension n, we have e 2 (c τβ ) ≥
4 λ 1λ m (λ 1 + λ m )2
(3.279)
Remark 3.12 1. Theorems 3.25 and 3.26 provide two types of lower bounds of estimation efficiency. In the case when the ratio of the largest eigenvalue of K to the smallest eigenvalue is large, two lower bounds are close to zero and there may be a large difference between the exact estimation efficiencies and their corresponding bounds. Example 3.11 is such an example. 2. The lower bounds in Theorems 3.25 and 3.26 only depend on the eigenvalues of K. Note that K is unknown in practice and so are its eigenvalues. In contrast, X is a known matrix but the two bounds do not use any information of X. Therefore, two theorems only provide coarse measurements of estimation efficiency. 2. Simulation methods for efficiency analysis Consider the linear regression model Y = X β + η, η ∼ N(0, K )
(3.280)
where X is a nonstochastic matrix of full column rank of m × n, and X is known, ηi = g (ti )ei
(3.281)
{ei} is a stationary time series with mean zero. Using Remarks 3.11 and 3.12, the bounds of estimation efficiency in Theorems 3.25 and 3.26 do not use any information of the design matrix X. The following simulation uses all the data in X.
© 2012 by Taylor & Francis Group, LLC
19 9
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
Using the characteristics of a stationary time series, we have
Then,
Eei e j = r| j − i | , i , j = 1, 2, . . ., n
(3.282)
Eη = 0 Eηi η j = g (ti ) g (t j )r| j − i | K = (Eη η ) i j m ×m
(3.283)
From the discussion in Chapter 2, let g (t ) =
N
t − Tj −2 H
∑ a B j =1
j
(3.284)
where (a1,a2, … , aN) is a vector of coefficients to be determined, tm − t1 T2 = t1 ,TN − 1 = tm , H = N − 3 T j = T2 + ( j − 2)H , j = 1,…, H Since an MA(q) model can approximate a stationary time series well (see Chapter 4), let ei = b 0 ei + b1ei−1 + … + beei−q; we have b j b0 + b j + 1b1 + + bq bq − j , ri = 0,
0≤ j ≤q
q < j ≤m
(3.285)
Note that efficiencies in Definitions 3.7 and 3.8 are invariant if K is replaced by aK (a is any positive constant). It is unnecessary to consider σ ε2 in calculation of rj. By combining Equations 3.283 through 3.285, matrix K can be generated by the following method where R[0,1] represents the uniform distribution on [0,1]. 1. Simulate {ai }iN= 1 i.i.d. ∼ R[0, 1] q {bi }i = 1 i.i.d. ∼ R[0, 1]
© 2012 by Taylor & Francis Group, LLC
200
M e A suReM en t DAtA M o D eLIn G
2. Get g(t) and r from Equations 3.284 and 3.285. 3. Generate matrix K using Equation 3.283. Note that, if {ηi} is a stationary time series, it is unnecessary to simulate g(t). Let g(t) ≡ 1 in Equation 3.283. In general, 4 ≤ N ≤ 10 and 5 ≤ q ≤ 15. When m is small, N and q should be small. If m is large, N and q should be large. The simulation method is as follows: Step 1. Simulate matrix K. Step 2. Calculate e1 (β ) and e 2 (β ) using Equations 3.266 and 3.269, respectively. Step 3 Repeat Step 1 and Step 2 100 times, then calculate the minimal values of e1 (β )s and e 2 (β )s, respectively. Let the minimum efficiency bounds be e1* (β ) and e 2* (β ), respectively. Remark 3.13 The simulation method is simple and intuitive. It is a practical and effective method. In general, it is enough to calculate 100 iterations and the theoretical justification is given in Ref. 25. In fact, the lower bounds obtained by the simulation method are more close to their true values than those calculated from Theorems 3.25 and 3.26. This simulation method is applicable not only to linear regression models but also to nonlinear regression models. It should be pointed out that, in Chapter 6, β is always used in processing radar measurement data instead of β . The reason is that simulations show that the estimation efficiency of β is close to 1 and the calculation of β is much more complicated than that of β . It seems that the computational workload of simulation method is large. In fact, for n = 20, m = 200, the computation can be finished in a very short time. 3.8 Methods of Nonlinear Regression Analysis 3.8.1 Models of Nonlinear Regression Analysis
In Section 2.5, a series of empirical formulas was introduced. Many of them could be expressed in the form y = f ( x , θ) + ε (3.286)
© 2012 by Taylor & Francis Group, LLC
2 01
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
where x is the independent variable, y is the dependent variable, f is a function with a known expression, θ is an unknown parameter, and ε is the sum of many random errors such as modeling errors, random errors, etc. For a value of x, the corresponding y is obtained by yi = f (xi , θ) + εi , i = 1, 2, . . ., m
(3.287)
Let us take Table 2.4 as an example. The relationship between the volume increase of the molten steel ladle and the number of times it is used can be written in the form of Equation 3.287. Such a nonlinear regression model is generated by empirical equations. Next, we discuss a class of nonlinear regression models in processing measurement data. Example 3.13 Suppose that the orbit parameters of a spacecraft
X (t ) = (x(t ), y(t ), z(t ), x (t ), y (t ), z (t ))τ satisfies a nonlinear differential equation
dX (t ) = F (t , X (t )) dt
(3.288)
where F(t,X) is an known function of variables t and X. Given an initial value X(t 0) = η, for any t > t 0, X(t) can always be calculated through a numerical method. Therefore, a problem of processing measurement data can be converted into a problem in estimation of an initial value η. Suppose that there are three range-measured radars tracking this spacecraft. The observed data at the time ti > t 0 are y j (ti ) =
( x ( t i ) − x j ) 2 + ( y ( t i ) − y j ) 2 + ( z( t i ) − z j ) 2 + a j + e j (ti ), i = 1, . . ., m; j = 1, 2, 3
(3.289)
where (xj,yj,zj) ( j = 1,2,3) are three radar stations, e j (ti ) ∼ N(0, σ 2j ), and ej(ti) are independent of each other. a1,a2,a3 are
constant systematic errors to be estimated. Construct a mathematical model to estimate the unknown parameter
β = ( η1 , η2 , . . ., η6 , a1 , a2 , a3 )τ
© 2012 by Taylor & Francis Group, LLC
(3.290)
202
M e A suReM en t DAtA M o D eLIn G
Solution
By using methods in Section 2.4.3, we can write the orbit parameter X(ti) at time t 1 as X (ti ) = Wi ( η)
(3.291)
where Wi (i = 1, . . ., m) is a known function whose expression is determined by Equation 3.288 and the Runge–Kutta method. Then, using Equations 3.289 and 3.291, we have Y = g ( η) + Ua + e
(3.292)
where g(η) is completely determined by the arithmetic root function of Equation 3.287 and the expression of Wi(η)
Y = ( y1 (t1 ), . . ., y1 (tm ), y 2 (t1 ), . . ., y 2 (tm ), y3 (t1 ), . . .,, y3 (tm ))τ e = (e1 (t1 ), . . ., e1 (tm ), e 2 (t1 ), . . ., e 2 (tm ), e3 (t1 ), . . ., e3 (tm ))τ 1 1 0 U = 0 0 0
0 0 1 1 0 0
0 0 0 , a = (a , a , a )τ 1 2 3 0 1 1 3m × 3
By applying Equation 3.290, Equation 3.292 could be rewritten as Y = f (β) + e
(3.293)
This is a nonlinear regression model to estimate β.
From the discussion above, nonlinear regression models are commonly used in studying both empirical formulas and measurement data processing.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
203
3.8.2 Methods of Parameter Estimation
1. Least squares estimation The primary issue for a nonlinear regression model Y = f (β) + e , e ∼ N(0, σ 2 I )
(3.294)
is to give relatively accurate estimates of β and σ2. The research of nonlinear regression models starts relatively late. Currently, the majority of the research is on least squares estimation. It is necessary to note that LS estimation in nonlinear regression models is different from that in linear regression models. i. The least squares estimate in a nonlinear model is biased, while the LS estimate in a linear model is unbiased. ii. The least squares estimate in a nonlinear model does not have an explicit expression and can only be calculated by optimal algorithms. In contrast, the LS estimate in a linear model has an explicit expression. A brief introduction to methods of the least squares estimation in a nonlinear model is introduced as follows and a detailed discussion is given in Ref. 4. Let e(θ) = Y − f (θ) ( when θ = β, e(β) = e ) ∂f V (θ) = t (t = 1, 2, . . ., m; j = 1, 2, . . ., n) ∂θ j ∂2 f t W ( θ) = (t = 1, 2, . . ., m; j = 1, 2, . . ., n) ∂ θi ∂ θ j S ( θ) = Y − f ( θ)
2
τ
= (Y − f (θ)) (Y − f (θ)) =
∂S ∂S ∂S ∇S ( θ ) = , , . . ., ∂ θ1 ∂ θ 2 ∂θn θ ∈ Θ, Θ is a known compact set of Rn.
© 2012 by Taylor & Francis Group, LLC
m
∑( y k =1
k
− f k (θ))2
204
M e A suReM en t DAtA M o D eLIn G
lemma 3.23 In model 3.294, suppose that f(θ) ∈ C 1(Θ), β is the least squares estimate of β, that is, S (β ) = min S (θ)
(3.295)
θ ∈Θ
Then, the residual e = Y − f (β ) satisfies eτ V (β ) = 0
(3.296)
Proof Since f(θ) ∈ C 1(Θ), S(θ) ∈ C 1(Θ). Note that S(θ) attains its minimal value at β . β must be a stable point and we have 0 = ∇S (β ) =
m
∑ (−2)( y k =1
k
− f k (β ))∇f k (θ)
θ =β
= −2 e τ V (β )
Theorem 3.27 Suppose that Θ is a compact subset of Rn, f(θ) ∈ C(Θ). Then there exists a β ∈ Θ such that S (β ) = inf S (θ) θ ∈Θ
Proof See Ref. 4. Theorem 3.27 Shows that the least squares estimate exists. Note that there are no constraints on e. If e ~ N(0,σ2 I), then we have the following theorem.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
205
Theorem 3.28 In model 3.294, if e ~ N(0,σ2 I), the maximum likelihood estimate and the least squares estimate of β are the same. The maximum likelihood estimation of σ2 is 1 1 σ 2 = S (β MLE ) = e MLE m m
2
Proof Since e ~ N(0,σ2 I), we have Y ∼ N( f (β), σ 2 I ) The probability function of Y is m 1 exp − 2πσ
( y k − f k (β))2 k =1 2σ 2 m
∑
The logarithmic likelihood function is l (θ, Y ) = −m log
(
m
)
∑( y
1 2πσ − 2σ 2
k =1
k
− f k (β))2
(3.297)
Then l (θ, Y ) = max ⇔ S (θ) =
m
∑( y k =1
k
− f k (β))2 = min
that is, β MLE = β LS After differentiating Equation 3.297 with respect to σ, we have
© 2012 by Taylor & Francis Group, LLC
206
M e A suReM en t DAtA M o D eLIn G
∂l m 1 = − + 3 S ( θ) ∂σ σ σ Let ∂l /∂σ = 0. Then 1 1 σ 2 = S (β MLE ) = eMLE m m
2
2. Methods of getting the least squares estimates We will discuss methods of computing the least squares estimate β . Theorem 3.29 Suppose that Θ is a strictly convex area of R*. S(θ) is a strictly convex function. There exists a unique least squares estimate β such that S (β ) = min S (θ) θ ∈Θ
(3.298)
Proof The existence of LS estimate is proved in Theorem 3.27. The uniqueness is justified using proof by contradiction. Suppose that there are two least squares estimates β 1 , β 2 such that S (β 1 ) = S (β 2 ) = min S (θ) θ ∈Θ
∀α ∈ (0,1), since S(θ) is a strictly convex function, Θ is a strictly convex area, then αβ 1 + (1 − α) β 2 ∈ Θ, and S (αβ 1 + (1 − α ) β 2 ) < αS (β 1 ) + (1 − α )S (β 2 ) = S (β 1 ).
(3.299)
However, inequality 3.299 contradicts S (β 1 ) = min S (θ). θ ∈Θ How can we choose a proper Θ such that S(θ) is strictly convex on Θ in practice? Justifying the strict convexity directly is complex. We look for equivalent conditions of strict convexity next. By the definition of convex function, let A (θ) = ( ∂ 2S /∂θi ∂θ j )n × n , it is easy to prove that, if A(θ) is a positive definite matrix, S(θ) is a strictly convex function.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
207
Note that m
∂f ∂S = −2 [ yt − f t (θ)] ∂θ t ∂ θi i =1
∑
A (θ) = 2V (θ)τV (θ) − 2
m
∑ k =1
e k ( θ)
∂2 f k ∂ θi ∂ θ j
(3.300) n ×n
Suppose that V(θ) is a matrix of full column rank. Then, V τ (θ)V (θ) naturally is a positive definite matrix. If we assume further that, for each k, ek(θ) is very small or matrix (δ2f k /δθiδθj)n×n is very small, then A(θ) ≈ 2V τ (θ)V (θ), where A(θ) is a positive matrix. Now we introduce the Gauss–Newton method: Given θ( 0 ) , θ( 0 ) ∈Θ (i + 1) θ = θ(i ) + [V (θ(i ) )τV (θ(i ) )]−1V (θ(i ) )τ [Y − f (θ(i ) )]
(3.301)
lemma 3.24 If Equation 3.294 represents a linear regression model, then using the Gauss–Newton method, from any initial value, we obtain the least squares estimate of parameter β after one iteration, that is, ∧
β = θ(1) Proof When Equation (3.294) represents a linear regression model, f ( θ) = X θ Then V(θ) = X θ. By the Gauss–Newton iteration equation θ(1) = θ( 0 ) + ( X τ X )−1 X τ [Y − f (θ( 0 ) )] = θ( 0 ) + ( X τ X )−1 X τ [Y − X θ( 0 ) ] = ( X τ X )−1 X τ Y = β
© 2012 by Taylor & Francis Group, LLC
(3.302)
208
M e A suReM en t DAtA M o D eLIn G
which is just the least squares estimate of β for the linear regression model. Lemma 3.24 shows that the Gauss–Newton method is very effective in solving LS estimates from a linear model. In particular, the Gauss–Newton method is a very good method for nonlinear regression models that are approximately linear (or the norm of δ2f k / δθiδθj)n×n is very small). It has little dependence on initial values and converges fast. Now we introduce the improved Gauss–Newton method that is applicable for general nonlinear regression models. lemma 3.25 Suppose that ∂S ∂S ∂S ∇S ( θ ) = , , . . ., ≠0 ∂ θ1 ∂ θ 2 ∂θn D(θ) = [V τ (θ)V (θ)]−1V (θ)τ e(θ) There exists a positive value λ* such that S[θ + λD(θ)] < S(θ), ∀λ ∈ (0, λ*)
(3.303)
Proof Note that q(λ ) = S[θ + λD(θ)] then q ′( 0 ) = ∇ S ( θ ) D ( θ ) = −2e(θ)τV (θ)[V (θ)τV (θ)]−1V (θ)τ e(θ) < 0 Thus, there exists a positive value λ* that is sufficiently small such that
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
209
q ′(λ ) < 0, ∀λ ∈ (0, λ * ] Therefore, Equation 3.303 holds. Lemma 3.25 shows that after θ(i) is obtained, a proper λi can be chosen for θ(i + 1) = θ(i ) + λi D(θ(i ) ) such that S (θ(i + 1) ) < S (θ(i ) ) Obviously, if V(θ(i)) e(θ(i))τ = 0, then D (i ) = D(θ(i ) ) = 0 and S(θ(i+1)) = S(θ(i)). ∇S(θ(i)) = 0 Since θ(i) is already a stable point, it will not change in further iterations. Theorem 3.30 Suppose that Θ is a bounded convex set in Rn, the initial value θ(0) ∈ Θ/∂Θ, and 1. V (θ) ∈ C (Θ) is a matrix of full column rank. 2. S (θ( 0 ) ) < S = inf S (θ) (minimal point is not on the θ∈δΘ boundary). 3. There are no θ*,θ** such that S(θ*) = S(θ**), ΔS(θ*) = ΔS(θ**) = 0 simultaneously. Let {θ(i)} be a series where D (i − 1) = V (θ(i − 1) )τV (θ(i − 1) ) −1 V (θ(i − 1) )′ τ Y − f (θ(i − 1) ) ( i − 1) ( i − 1) ( i − 1) ( i − 1) S ( θ + λ D ) = inf S ( θ + λ D ) i −1 0≤ λ ≤1 θ(i ) = θ(i − 1) + λ D (i − 1) i −1 Then {θ(i)} satisfies 1. All θ(i) ∈ Θ\δΘ 2. lim θ(i ) = θ* ∈ Θ \ ∂Θ, and ΔS(θ*) = 0 i →+∞
© 2012 by Taylor & Francis Group, LLC
210
M e A suReM en t DAtA M o D eLIn G
Proof By the construction of {θ(i)}, we know that {S(θ(i))} is a sequence that is monotonically decreasing and bounded below (the lower bound is 0). Then S (θ(1) ) ≤ S (θ( 0 ) ) < S = inf S (θ) θ ∈∂Θ
Thus, θ(i) (i = 1,2,3,. . .) are interior points of Θ. Since {S(θ(i))} is monotonically decreasing and bounded below by 0,
lim S (θ( i ) ) = S ∗
i →+∞
(3.304)
Note that {θ(i)} ⊂ Θ (Θ is a bounded convex set). {θ(i)} has a convergent subsequence. For convenience, let the convergent subsequence be {θ(i)} and its limit be θ*. Using Equation 3.304, we know that lim θ(i ) = θ * ,
lim S (θ(i ) ) = S (θ * ) = S *
l → +∞
(3.305)
Next, we prove ∇S(θ * ) = 0
(3.306)
Let D * = [V ′(θ * )V (θ * )]−1V (θ * )τ e(θ * ).q(λ ) = S[θ * + λD(θ * )] since ∇S (θ * ) = −2e(θ * )τV (θ * ) If Equation 3.306 does not hold, then D * ≠ 0, and we have q ′(0) = ∇S (θ * )D(θ * ) = −2e(θ * )τV (θ * )[V (θ * )τV (θ * )]−1V (θ * )τ e(θ * ) < 0
(3.307) Since θ* is an interior point of Θ, from Equation 3.307, there exists a θ**, θ** = θ* + λ*D *, λ* is a positive number and is sufficiently small. such that S (θ * * ) < S (θ * )
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
211
which contradicts the construction of {θ(i)} and Equation 3.304. Therefore, Equation 3.306 holds. Suppose that there is another subsequence {θ(i)} whose limit is θΔ . Then it can be proved similarly that S (θ ∆ ) = S * .∇S (θ ∆ ) = 0
Using Equations 3.305 and 3.306 we know that S (θ * ) = S * .∇S (θ * ) = 0
(3.308)
(3.309) After combining Equations 3.308 and 3.309 with condition (3) in Theorem 3.30, we know that θ* and θΔ are the same. Therefore, {θ(i)} cannot have two subsequences with different limits. The iterations in Theorem 3.30 form the improved Gauss–Newton method. The sequence {θ(i)} obtained from such a method converges and its limit is a stable point. Further procedures are needed to check whether the stable point is a minimum point. If S(θ) is a strictly convex function on Θ, then the stable point must be the minimal point of S(θ) and it is the least squares estimate. In order for S(θ) to be strictly convex on Θ, A(θ) need to be a positive definite matrix, and in order for A(θ) to be positive definite, we can use the following method of reducing e(θ) = Y − f(θ) = f(β) − f(θ) + e. 1. Find an initial value θ(0) that is close to the true regression coefficient β (or make f(β) − f(θ(0)) as small as possible). 2. Reduce the influence of error vector e to the solving process as much as possible.
Method (1) is well known while method (2) has to be carried out according to specific cases. For a general problem in data processing, observed data are y(ti ) = f (ti , β) + e(ti ), {e(ti )}i.i.d. ~ N(0, σ 2 )
(3.310)
For many nonlinear regression models that we use, observed data are sampled at various time points. Let y( t1 ) y(t ) 2 , Y = y(tm )
© 2012 by Taylor & Francis Group, LLC
e(t1 ) f ( t 1 , β) e(t ) f ( t , β) 2 , e = 2 f (β) = e(tm ) f (tm , β)
(3.311)
212
M e A suReM en t DAtA M o D eLIn G
Then, the nonlinear regression model to estimate β is Y = f (β) + e , e ~ N(0, σ 2 I m )
(3.312)
In many practices of data processing, f(t,β) is a continuously differentiable function of t of high order and some information of highorder derivatives of f(t,β) to t is known. Therefore, we can express f(t,β) by an approximate linear model f ( t , β) =
N
∑ a ψ (t ) j =1
j
(3.313)
j
where {ψ1(t),ψ2(t), . . ., ψN (t)} are linearly independent basis functions and can be determined by the method in Chapter 2. Using Equations 3.310 through 3.313, we have Y = X α + e , e ~ N(0, σ 2 I m )
(3.314)
From Equation 3.314 we obtain the estimate of f(β) = Xα. From the discussion of Section 3.3, we have X α = X ( X τ X )−1 X τ Y E X α LS − f (β)
2
= N σ 2 mσ 2 = E Y − f (β)
2
(3.315)
LS is much closer to f (β) than Y. The nonlinear least squares estiXα mate from model X α LS = f (β) + η can be used as the initial value of
iterations. Suppose that β is the true value of regression coefficients in nonlinear regression model 3.294. We have the following theorem. Theorem 3.31
Suppose that β is the least squares estimate of nonlinear regression model 3.294, then β − β ≈ [V (β )τV (β )]−1V (β )τ e
© 2012 by Taylor & Francis Group, LLC
(3.316)
213
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
E(β − β)(β − β)τ ≈ σ 2 [V (β )τV (β )]−1
(3.317)
Proof See Ref. 4 for details. 3.9 Additional Information 3.9.1 Sources of Additional Information
Consider regression models Y = X β + e , e ~ (0, σ 2 I )
(3.318)
Y = f (β) + e , e ~ (0, σ 2 I )
(3.319)
The regression coefficients in models 3.318 and 3.319 depend on models and experimental data Y. In many practical applications, there are other ways to obtain relevant information of β. Such information is very useful for the estimation of β. This kind of information is called additional information. The three examples given next show that additional information has many sources and formats. Example 3.14 The vector of error coefficients of guidance instruments in ballistic missiles C (C is a vector of dimension n in which each element is an error coefficient of a guidance instrument) has the following relationship with speed difference ΔW between telemetry and tracking radar records:
∆W = SC + e , e ~ (0, K )
(3.320)
where S is a known environment function matrix determined by telemetry measurement data, e is a vector of measurement errors, and K is a known symmetric positive definite matrix. An estimate of C can be obtained from model 3.320. Due to the collinearity in S, the efficiency of the estimate is not good.
© 2012 by Taylor & Francis Group, LLC
214
M e A suReM en t DAtA M o D eLIn G
An effective way is to obtain additional information on C by carrying out ground rocket tests.
DC = DC + D ε, ε ~ (0, Σ )
(3.321)
where D is a diagonal matrix whose diagonal elements are either 0 or 1 (the ith diagonal element is 1 if a ground test is carried out for the ith coefficient; otherwise the element is 0), ∑ = diag(σ12 , σ 22 , . . ., σn2 ). Obviously, we can suppose that
Eεe τ = 0
(3.322)
We can get a much better estimate of C by combining Equations 3.320 through 3.322. Example 3.15 Suppose that a particle moves along a straight line. At time t its position is x(t) such that
x ( 4 ) (t ) ≤ 0.5, 0 ≤ t ≤ 100
∫
100
0
2
x ( 2 ) (t ) dt ≤ 10
(3.323)
(3.324)
The particle is tracked. The observed positions at t i = 0, i are denoted by yi(t = 1,2,. . .,100),
yi = x(ti ) + ei , {ei } ~ N(0, 1)
(3.325)
Estimate x(t) (0 ≤ t ≤ 100). Solution It is not enough to use observed data only to estimate x(t). Equations 3.323 and 3.324 are both additional information of x(t). Using Equation 3.323, we can set up the following spline model to estimate x(t).
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs 103
x(t ) =
∑ b B( t − 2 + j ) j =1
j
(3.326)
Using Theorem 2.22, there is a group (b1,b2, ..., b103) such that 103
max x(t ) −
0 ≤ t ≤ 100
∑ b B( t − 2 + j ) j =1
j
5 * 0.5 * 14 384 1 σ = 0.00651 < = 100 100
<
Let β = (b1 , b2 ,…, b103 ) X = (xij )1000 *103 xij = B(ti − 2 + j ) Y = ( y1 , y 2 ,…, y1000 ) e = (e1 , e 2 ,…, e1000 ) Then, by using Equation 3.325 we have a linear regression model Y = X β + e , e ~ N(0, I )
(3.327)
Using Equation 3.327 we have an estimate of β
β = ( X τ X )−1 XY
(3.328)
The estimate of x(t) is 103
x (t ) =
∑ b B(t − 2 + j ) j
j =1
(3.329)
On careful reading, it may be found out that the additional information in Equation 3.324 has not been used yet. In fact, using Equations 3.326 and 3.324 we can get the additional information of regression coefficient vector β.
© 2012 by Taylor & Francis Group, LLC
215
216
M e A suReM en t DAtA M o D eLIn G 103
∑a b b
ij i j
i =1
≤ 10 or β τ A β ≤ 10
(3.330)
where A = (aij)103•103, aij =
∫
100
B ′′(t − 2 + i )B ′′(t − 2 + j )dt
0
Equation 3.330 is the additional information of regression coefficients β in linear regression model 3.327. Therefore, using Equation 3.330 can improve the estimate of β. Example 3.16 Consider calibrating systematic radar errors using optical data. For example, we track an aircraft using R − R systems. Let X (t ) = (x(t ), y( y ), z(t ), x(t ), y (t ), z(t ))τ be orbit parameters at time t, and (xj,yj,zj) be the station position of the jth R − R system. Then the measurement data at time t are R (t ) = (x − x )2 + ( y − y )2 + (z − z )2 + a + b t + e(t ) j j j j j j (x − x j )x + ( y − y j ) y + (z − z j )z + b j + δ( t ) R j (t ) = R j (t ) (3.331) where aj + bjt and bj are the systematic errors of the jth R − R system, and e(t) and δ(t) are random errors. The orbit parameters X(t) at time t can be expressed by N x(t ) = q j ψ j (t ), x(t ) = j =1 N q j + N ψ j (t ), x(t ) = y(t ) = j =1 N z(t ) = q j + 2 N ψ j (t ), x(t ) = j =1
∑
N
∑ q ψ ′ (t ) j =1 N
∑
∑q
∑
∑q
j =1 N
j =1
j
j
j+N
ψ ′j (t )
j +2N
(3.332)
ψ ′j (t )
where (ψt (t),. . .,ψN(t)) is a set of known basis functions, such as polynomial basis or spline basis in Chapter 6.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
217
In order to estimate systematic errors, the comparison method of optical measurement data is used. We can use three theodolites to get the information y − yk E k (t ) = arctg + c k + ζ k (t ) (x − x k )2 + (z − zk )2 z − zk + A0 + d k + ηk (t ), k = 1, 2, 3 Ak (t ) = arctg x − x k
(3.333)
Combining the measurement data from four R − R Radar systems at times t 1, t 2, . . ., tm, we obtain a nonlinear regression model of β = (q1,q2, . . . ., q3N, a1,. . .a4, b1, . . ., b4). Obviously, if we can accurately estimate β, we can get accurate estimates of orbit parameters and systematic errors. Unfortunately, since the model is ill-conditioned, β cannot be accurately estimated only on the basis of this model. If measurement data from three theodolites are used as additional information, we can construct a nonlinear regression model of β and (c 1,c 2,c 3,d1,d2,d3) to provide good estimates of β and (c 1,c 2,c 3,d1,d2,d3). Note that such additional information is not only related to β but also to new parameters to be estimated. This kind of additional information is called effective additional information that is helpful to the estimation of β. When β is known, the new parameters can be accurately estimated from the effective additional information. We introduce only three types of additional information above. There are various types of additional information and they are interpreted and constructed based on specific situations.
3.9.2 Applications of Additional Information
There are many types of additional information and the methods of applying these additional information vary. Using additional information is helpful to improve the quality of parameter estimation. The additional information in Examples 3.14 and 3.15 can be combined with the regression equation of measurement data. Example 3.16 increases the dimension of parameters to be estimated. Next, we discuss an application of additional information commonly used in data processing.
© 2012 by Taylor & Francis Group, LLC
218
M e A suReM en t DAtA M o D eLIn G
Consider the linear regression model Y = X β + e , e ~ N(0, σ 2 I )
(3.334)
Additional information of regression coefficients β is β τ Aβ ≤ ν
(3.335)
where X is a full-rank matrix of m × n, β is a parameter vector of dimension n to be estimated, A is a known semipositive definite (or positive definite) symmetric matrix, and ν is a known positive number. Consider the extreme value problem y − X β 2 = min τ β Aβ ≤ ν
(3.336)
and its solution is used as an estimate of β. Intuitively, the solution to extreme value problem 3.336 has used additional information of inequality 3.335 and should be better than the least squares estimate without constraints. Now we try to solve the extreme value problem 3.336. For convenience, we consider the following problem: y − X β 2 = min τ β ( A + δI )β ≤ ν
(3.337)
where δ is an extremely small and positive number such that A + δI a positive matrix. Then, there exists a nonsingular matrix L such that A + δI = Lτ L Since L τL is symmetric positive definite matrix, there exists a positive matrix Q such that Q τ ( L−1 ) X τ XL−1Q = Λ = diag(λ 1 , λ 2 ,…, λ m ) A + δI = Lτ L
© 2012 by Taylor & Francis Group, LLC
(3.338)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
219
Let V = XL−1Q τ α = Q Lβ τ −1 τ α = (V V ) V Y τ τ −1 β = ( X X ) X Y
(3.339)
Note that X β = V α, X β = V α V τ (Y − V α ) = V τ ( I − V (V τV )−1V τ )Y = 0 Then y − Xβ
2
2
= y −Vα
= y − V α
2
= y − X β
2
2 + V α − V α
+
n
∑ λ (α − α ) i =1
i
i
i
2
(3.340)
and β τ ( A + δI )β = β τ Lτ Lβ = (β τ LτQ )(Q τ Lβ) = α τ α
(3.341)
Since ||y − X β|| is a fixed number, Equations 3.340 and 3.341 are both known. The extreme value problem 3.337 can be rewritten as 2
n 2 ∑ λ i (α i − α i ) = min i =1 α2≤ν 2
(3.342)
Theorem 3.32 Suppose that α = (α 1 , α 2 ,… , α n ) is a solution to the extreme value 2 problem 3.342. When α 2 ≤ ν, α = α . Otherwise
© 2012 by Taylor & Francis Group, LLC
220
M e A suReM en t DAtA M o D eLIn G
α i = 2 where 0 < λ < λn α 2 equation:
λi α i , i = 1, 2,…, n λi + λ
(3.343)
ν , λ is only determined by the following n
∑ i =1
2
λi α = ν λi + λ
(3.344)
Proof Solving the extreme value problem 3.342 is the same as finding the minimal point of a strictly convex function at the strictly closed convex area 2
B = {α | α 2 ≤ v } 2
2 When α 2 ≤ ν, the conclusion holds naturally. Suppose that α 2 ≤ ν Let
F ( α ) = D ( α ) + λ( α
2 2
− v + x2 )
By the method of slack variables in optimal computing [30], we know that the extreme point should satisfy that ∂F ∂α = 0, i = 1, 2,…, n ∂F i =0 ∂x2 α 2 = ν and λi αi = λ + λ αi i λx = 0 2 n λi αi − ν = 0 f (λ) = λ + λ i =1 i
∑
© 2012 by Taylor & Francis Group, LLC
(3.345)
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
2 21
Since α 2 ≤ ν, we know that f(0) > 0. Combining f(0) > 0 with 2 Equation 3.345 we know that λ ≠ 0. We prove λ > 0 next by the method of proof by contradiction. Suppose that λ* < 0, α i* =
λi α i , i = 1, 2,…, n λi + λ *
α * = (α1* , α *2 ,… , αn* )τ is the minimum of D(α) in the area B. Let
α * * = (α1* * , α *2* ,…, αn* * )τ α i* * =
λi α i , i = 1, 2,…, n λi − λ *
Since λi (i = 1,2,. . .,n) are all positive, we know that α i* *
2
2
< α i* , i = 1, 2,…, n
So α* ∈ B ⇒ α** ∈ B On the other hand, by simple comparison, we know that D (α * * ) < D (α * ) which contradicts the fact that α* is the minimal point of D(α) in the area B. So λ > 0. Note that f (λ ) < 0, f (0) > 0, Then we have 0 < λ < λn α 3.344.
© 2012 by Taylor & Francis Group, LLC
∀λ > 0 λ n α f v 2 2
2 2
<0
ν , λ only determined by Equation
222
M e A suReM en t DAtA M o D eLIn G
Using Equation 3.339,
L−1Qα = L−1Q( Λ + λI )−1 Λα = L−1Q( Λ + λI )−1V τ Y = L−1Q( Λ + λI )−1Q τ ( L−1 )τ X τ Y = L−1 (Q τ )−1 ( Λ + λI )−1Q −1 ( Lτ )−1 X τ Y = ( LτQ ΛQ τ L + λLτQQ τ L)−1 X τ Y = [ X τ X + λ( A + δI )]−1 X τ Y
Since the extreme value problem considered is Equation 3.336 rather than Equation 3.337, matrix Xτ X must be positive definite. lim[ X τ X + λ( A + δI )]−1 X τY = ( X τ X + λA )−1 X τY δ→0
The solution to extreme value problem 3.336 is β = ( X τ X + λA )−1 X τ Y
(3.346)
that is, using β of Equation 3.346 as an estimate of β. Using β as the estimate of β is better than using β . The estimate given by Equation 3.346 is actually a ridge estimate. Since λ is related to Y, such an estimate is a nonlinear estimate. In summary, we have the following theorem. Theorem 3.33 Suppose that the solution to the extreme value problem 3.336 is
~ ββ = ( X τ X + λA )−1 X τ Y
When β τ A β ≤ v , λ = 0, otherwise β τ A β 0 < λ < [tr ( X τ X )] ⋅ ν 2 1 A 2 ( X τ X + λA )−1 X τ Y ) = ν 2 The proof is left as an exercise.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
223
EXERciSE 3
1. Use polynomial basis functions to give an estimate of f(t) similar to Equation 3.9 in Example 3.1 and compare coefficients in polynomial basis and spline basis. 2. Prove Theorem 3.1. 3. Prove Theorem 3.2. 4. Prove Theorem 3.5. 5. Suppose X = X m ×n = ( X p , X R ) , β = (β τp , β Rτ )
τ
Y = X β + e = X P βP + X RβR + e β = ( X τ X )−1 X τ Y
(
β P = X Pτ X P
)
−1
X Pτ Y
show that RSSP = Y − X P β P
2
2 ≥ RSS = Y − X β .
6. Suppose ξ ~ N (0, Gn×n) G is a symmetrical positive matrix with rank n. Prove that ξτG −1ξ ~ χ2 (n). 7. Suppose X = X m ×n = ( X P , X R ), rank( X ) = n, rank( X P ) = p, rank( X R ) = n − p X PP = X P − X R ( X Rτ X R )−1 X Rτ X P X RR = X R − X P ( X Pτ X P )−1 X Pτ X R
(
Y = X P β P + X R β R + e , e ~ 0, σ 2 I
( β
τ P
)
(
, β τR = β = X τ X
show that
)
−1
)
X τY
(
)
(
)
i. E X P β P − X P β P
2
τ = σ 2 tr X Pτ X P ( X PP X PP )−1.
ii. E X Rβ R − X Rβ R
2
τ = σ 2 tr X Rτ X R ( X RR X RR )−1.
8. Prove Theorem 3.11. 9. Prove Theorem 3.7.
© 2012 by Taylor & Francis Group, LLC
224
M e A suReM en t DAtA M o D eLIn G
10. On the same assumption of Exercise (7), let α 2 =
Y − X β
2
m−n
, β = ( X Pτ X P )−1 X Pτ Y
Prove that 2
2 i. E X P β P − X β = X RRβ R + pσ . 2
ii. E X RR β R 2 + ( 2 p − n)σ 2 = X RRβ R 2 + pσ 2 . 11. Consider the model 1 Y = 1 1
1 1 2 + e , e ~ (0, I 3 ) 1 1
Suppose 1 1 1 X P = 1 , X P = 2 , X P = 1 1 1 1
1 2 1
2
In the three cases, calculate E X P β P − X β separately. Which one is the best model for X β? 12. Find examples such that the order determined by function approximation rules is too high. If we select some proper basis functions, we can use the variable selection method to reduce the number of unknown parameters. 13. Suppose Y4×1 = θ4×1 + e4×1. Here, e ~ N (0, σ2I) and θ1 + θ2 + θ3 + θ4 = 0. Prove that the statistic F of the hypothesis test H:θ1 = θ3 is 2(Y1 − Y3 )2 (Y1 + Y2 + Y3 + Y4 )2 14. Prove that C P ≤ p ⇔ F =
© 2012 by Taylor & Francis Group, LLC
m − n RSSP − RSS ⋅ ≤1 n− p RSS
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
225
15. Suppose X = Xm×n is a known matrix, rank(x) = n, m >> n, P is an orthogonal matrix composed by the eigenvectors of X τ X, which means that Xτ X = Pτ ΛP, Λ = diag(λ1 , λ 2 , . . ., λn ), 0 < λ1 ≤ λ 2 ≤ . . . ≤ λn . Let θ = P τβ, V = XP = (V1 ,V2 , . . ., Vn ) Consider model Y = X β + e = V θ + e , e ~ N(0, σ 2 I ) .
Let β = ( X τ X )−1 X τY , θ = P τβ Prove that i. θ i is the unique uniformly minimum variance unbiased estimate of all the linear unbiased estimations for θi. ii. Var( θ i ) = σ 2λ 1−1 iii. E β − β = σ 2 tr( X τ X )−1 = σ 2 2
n
∑λ i =1
−1 i
2 = E θ − θ .
iv. E X β − X β = E V θ − V θ = nσ 2 2
v. V θ − V θ =
2
∑ V θ − V θ = ∑ λ (θ − θ ) . i
i
i i
i
i
i
2
vi. Suppose θi2 is already known. Let θi , θ i = 0,
θi2 ≥ σ 2 λ i−1 θi2 < σ 2 λ i−1
Then E(θ i − θi )2 < E (θ i − θi )2 . vii. Suppose that k = (k1, k 2, . . ., kn)τ , σ2 and θi2 (i = 1, 2,…, n) are known, k * = ( k1* , k2* , . . ., kn* ), ki* =
λ i θi2 , (i = 1, 2, . . ., n) λ i θi2 + σ 2
θ = ( k1θ 1 , k2 θ 2 , . . ., kn θ n )τ, θ * = ( k1* θ 1 , k2* θ 2 , . . ., kn* θ n )τ
© 2012 by Taylor & Francis Group, LLC
226
M e A suReM en t DAtA M o D eLIn G
Then, E θ* − θ
2
2
= minn E θ − θ , E V θ * − X β k ∈R
2
= minn E V θ − X β
2
k ∈R
16. Suppose X1 is the first row of matrix X . X (1) is the submatrix of X after eliminating the first row. Prove that X (1)τ X (1)
−1
= ( X τ X )−1 +
( X τ X )−1 X 1 X 1τ ( X τ X )−1 1 − h1
and also prove Equation 3.236. 17. Prove Theorem 3.33.
References
1. Xiru Chen, Songgui Wang. The Principle, Method and Application of Modern Regress Analysis. Hefei: Anhui Education Press, 1987 (in Chinese). 2. Xiru Chen, Guijing Chen, Qiguang Wu, Lincheng Zhao. The Estimate Theory of Parameters in Linear Models. Beijing: Science Press,1985 (in Chinese). 3. Song Gui Wang. The Theory and Application of Linear Models. Hefei: Anhui Education Press, 1987 (in Chinese). 4. Bocheng Wei. Modern Nonlinear Regress Analysis. Nanjing: Publishing House of Southeast University, 1989 (in Chinese). 5. Peter J. Huber. Robust Statistics. New York: John Wiely & Sons, 1980. 6. David A. Rakovsky. Nonlinear Regression Modeling. New York: Marcel Dekker, 1983. 7. Gorge Arthur Frede Seber. Linear Regression Analysis. New York: John Wiley & Sons, 1977. 8. Zhengming Wang. Biased estimation on linear regress model. Journal of Systems Science and Mathematical Sciences, 1995, 15(4): 319–328 (in Chinese). 9. Zhengming Wang. Improved principal components estimate of regression coefficient. Mathematics in Practice and Theory, 1990, 20(1): 60–63 (in Chinese). 10. Zhengming Wang. A practicable method and algorithm of data processing. Journal of National University of Defense Technology, 1994, 16(3): 122–127 (in Chinese). 11. Guangchang Zheng. Ridge estimate in general Gauss Markov models. Acta Mathematicae Applicatae Sinica, 1986, 9(4): 420–431 (in Chinese). 12. Zhengming Wang. A fast algorithm for choosing the optimal regress model. Journal of Asreonautics, 1992, 13(3): 14–20 (in Chinese).
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F M o D eRn ReG Re s sI o n A n A LysIs
227
13. Zhengming Wang. Improved eigenvalue estimate. Applied Mathematics, 1990, 3(2): 85–88 (in Chinese). 14. Zhengming Wang. A new rule and algorithm for independent variable choosing. Journal of National University of Defense Technology, 1990,12(3): 98–101 (in Chinese). 15. Wang Zhengming. Smoothing noisy data of solutions of perturbation differential equations with splines. Information and Systems, Proceeding on ICIS, Vol. 2, IAP, 1991 (in Chinese). 16. Wang Zhengming. A new estimate method to regression parameters. Modeling, Simulaiton and Control, Proceedings on AMSE MSC’ 92, Vol. 1, USTC Press, 1993 (in Chinese). 17. Yu Sha, Zhengming Wang. New method and algorithm of optimal regression modeling. Statistics and Applied Probability, 1994, 6(4): 425–437 (in Chinese). 18. Victor J. Yohai. High breakdown—point and high efficiency robust estimate for regression, The Annals of Statistics, 1987, 15(20): 642–656. 19. Peter J. Huber. Robust estimation of a location parameter, The Annals of Mathematical, Statistics, 1964, 35: 73–101. 20. Robert G. Staudte, Simon J. Sheather. Robust Estimation and Testing. Wiley: New York, 1990. 21. Peizhang Jia. A new method of obtaining the robust initial estimate in linear models. Control Theory and Applications, 1992, 9(2): 141–147. 22. Hu Yang, Songhui Wang. Condition number spectrum norm and estimate accuracy. Chinese Journal of Applied Probability and Statistics, 1991, 7(4): 337–343 (in Chinese). 23. Yu Sha, Yi Wu, Zhengming Wang, Mengda Wu, Guangxian Cheng, Liren Wu. Introduction to Ballistic Missile Accuracy Analysis. Changsha: National University of Defense Technology Press, 1995 (in Chinese). 24. Xuansan Cai. Optimal and Optimal Control. Beijing: Tsinghua University Press, 1982 (in Chinese). 25. Dianne P. O’Leary. Robust regression computation using iteratively reweighted least squares. SIAM. MAA, 1990, 11(3): 466–480. 26. Songgui Wang. Adaptive shrunken prediction of finite populations. Chinese Science Bulletin, 1990, 35(1): 804–806 (in Chinese). 27. Songgui Wang. Coefficients of generalized correlation and estimate efficiency. Chinese Science Bulletin, 1985, 30(19): 1521–1524 (in Chinese). 28. Zhongji Xu. Monte Carlo Method. Shanghai: Shanghai Scientific and Technical Publishers, 1987 (in Chinese). 29. Zhengming Wang, Baozhi Wang. The method of point-by-point elimination for outliers in linear regression model. Mathematics in Practice and Theory, 1997, 27(3): 266–274 (in Chinese).
© 2012 by Taylor & Francis Group, LLC
4 m E Thods of TimE s EriEs a nalysis
4.1 Introduction to Time Series 4.1.1 Time Series and Random Process
A time series is a series of data arranged in the order of “time.” It is usually denoted as {e(t), t ∈ T }, where e(t) is the observation at time t. For instance, values of a stock at 10 a.m. each day recorded in the chronological order form a time series. Your daily blood pressure at 7 a.m. every morning is also a time series. A time series contains the information of both “order” and “value” [1,2]. If {e(t), t ∈ T } is deterministic, it is called a deterministic time series. If {e(t), t ∈ T } is a random sequence following a certain probability distribution, it is called a random time series, or simply a time series. Let (Ω, F, P) be a probability space, T a set of parameters, and e(t ) = e(t , ω) : T × Ω → R If e(t, ω) is a random variable over (Ω, F, P) for any t ∈ T , we say {e(t), t ∈ T } is a random process over (Ω, F, P). Usually, the parameter set T considered is a discrete set and the time series {e(t), t ∈ T } is a discrete random process. 4.1.2 Time Series Analysis
The nature of a time series is characterized by the correlation between adjacent observed data. Time series analysis is a technique of analyzing such dependence. More specifically, time series analysis analyzes 229
© 2012 by Taylor & Francis Group, LLC
230
M e A suReM en t DAtA M o D eLIn G
the stochastic dynamic model of time series data and applies it to prediction, control, and other important fields [3,4]. Time series analysis is a data processing method. Its key techniques are parametric modeling, parameter estimation, and realization of algorithms. Its main tasks are as follows [5,6]: 1. To analyze the statistical regularity of time series data and make inferences on the properties of the physical system that produces the sequence; to identify the systematic law of determining the input–output model; and then to present the effects of any given input on the output of the system 2. To establish or fit dynamic mathematical models and estimate the parameters according to the analysis of statistical regularities of the time series; to extract information from the time series; to forecast future behaviors of the time series with specific prediction accuracy; and to carry out simulations 3. To use the established mathematical model to describe and evaluate the impact of unusual interferences on the characteristics of the time series such as the impact of the increase of bank interest rate or the change of real estate policy on the stock market 4. To design control schemes to compensate for the potential bias between the system output values and the ideal ones through the input sequence adjustments 4.2 Stationary Time Series Models 4.2.1 Stationary Random Processes
If the moment functions of all orders of a random process are invariant to the initial time points, the process is a strictly stationary process. If the first and the second moment functions are independent of the initial time points, the process is stationary in a wide sense. In other words, let µ t = Eet , rt , s = COV (et , e s ) (autocovariance function) ∆
for random process {et, t ∈ T }. If μt = μ, rt ,s = rt − s = rk , then {et, t ∈ T } is a stationary process in a wide sense. Since only stationary processes in
© 2012 by Taylor & Francis Group, LLC
2 31
M e t h o D s o F tIM e seRIe s A n A LysIs
a wide sense will be discussed in this chapter, we will simply refer stationary processes in a wide sense as stationary processes. Autocorrelation functions of a stationary process are defined as ρk = rk / r0 , k = 0, ± 1, ± 2, … Autocovariance functions and autocorrelation functions of a stationary process have the following properties: 1. r0 =∆ σ 2 = Var(et ), ρ0 = 1 2. |rk| ≤ r 0, |ρk| ≤ 1, ∀k In fact, for any random variables x, y and any real number α, we have Var(x + αy ) = Var(x ) + 2α COV (x , y ) + α 2 Var( y ) ≥ 0 The discriminant of the quadratic form of α must be nonnegative, that is, Δ = COV2(x, y) − Var(x)Var(y) ≤ 0. Thus, rk2 ≤ r02 . 3. rk = r−k, ρk = ρ−k, ∀k, that is, autocovariance and autocorrelation functions are even functions. 4. Let r0 r 1 Γm = rm −1
r1 r0 rm − 2
rm −1 1 ρ rm − 2 , V = 1 m r0 ρm −1
ρ1 1 ρm − 2
ρm −1 ρm − 2 1
Γm, Vm are Toeplitz matrices because each descending diagonal from the left to the right in these matrices is constant. Γm, Vm are nonnegative definite. In fact, since the sufficient and necessary condition for |Γm| = 0 (or equivalently |Vm| = 0) is that et, et+1, ..., et+m−1 are linearly dependent and the probability that et, et+1, ..., et+m−1 are linearly dependent is generally zero, Γm, Vm are positive definite with a probability of one.
© 2012 by Taylor & Francis Group, LLC
232
M e A suReM en t DAtA M o D eLIn G
Definition 4.1 For a random series {et, t ∈ T }, if 1. μt = Eet = 0, ∀t ∈ T , then it is a zero-mean series 2. rt,s = COV(et, es) = 0, ∀t ≠ s, then it is a white noise series 3. For any t 1, . . ., tm ∈T, the joint distribution of et1 , …, etm is a normal distribution, then it is a Gaussian series 4.2.2 Autoregressive Models
Autoregressive (AR) models are the most commonly used and most convenient time series models [7]. Definition 4.2 A zero-mean series {et} is a p-order autoregressive series, denoted by AR(p), if et can be expressed as e t = φ1 e t − 1 + + φ p e t − p + ε t
(4.1)
where ϕ1, . . ., ϕp are autoregressive coefficients, and {εt} is a white noise series (in this chapter, we assume that {εt} is zero-mean Gaussian white noise), and for every t, Var( ε t ) =∆ σ ε2 , Eet − k ε t = 0 , for any k > 0. Let B be a backward shift operator, that is, Bet = et−1, B2et = et−2, . . ., and the autoregressive coefficient polynomial be Φ p ( B ) = 1 − φ1 B − − φ p B p
(4.2)
The AR(p) model can be written as Φ p ( B )et = ε t
(4.3)
Next we will discuss conditions for an AR(p) series to be stationary. Let us start with two simple examples.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F tIM e seRIe s A n A LysIs
Example 4.1 AR(1) Model: et = ϕ1et−1 + εt. Since Eet = 0, COV(et, et−k) = Eetet−k. If {et} is a stationary series, then rk = Eet et − k = Eet − k (ϕ 1et −1 + ε t ) = ϕ 1Eet −1et − k + Eet − k ε t k≠0
= φ1rk −1 = = φ1kr0
ρk = φ1k
( ∀k ≠ 0 )
Recall that |ρk| < 1(∀k ≠ 0), we have |ϕ1| < 1. Conversely, it can be shown that if |ϕ1| < 1, then {et} is stationary. Example 4.2 AR(2) Model: et = ϕ1et−1 + ϕ2et−2 + εt. If {et} is stationary, then rk = Eet et − k = ϕ 1Eet −1et − k + ϕ 2Eet − 2et − k + Eet − k ε t Therefore, r0 r1 r 2 rk
= φ1r1 + φ2r2 + σε2 ,
k=0
= φ1r1 + φ2r0 ,
k=2
= φ1r0 + φ2r1 ,
= φ1rk −1 + φ2rk − 2 ,
k=1
k>2
Let r 0 = σ2, then σ 2 (1 − φ1ρ1 − φ2ρ2 ) = σε2 , ρ1 = φ1 + φ2ρ1 , ρ = φ ρ + φ , 1 1 2 2 ρk = φ1ρk−−1 + φ2ρk − 2 ,
k=0 k=1
k=2
k>2
By solving these equations, we have ρ1 =
φ1 φ12 , ρ2 = + φ2 1 − φ2 1 − φ2
σ2 =
σ ε2 (1 − φ 2 ) (1 + φ 2 )(1 − φ1 − φ 2 )(1 + φ1 − φ 2 )
and for k > 2, ρk can be calculated recursively.
© 2012 by Taylor & Francis Group, LLC
233
234
M e A suReM en t DAtA M o D eLIn G
Hence, if we know model parameters ϕ1, ϕ2, σε , the statistical features of {et} can be derived. If {et} is stationary, then |ρk| < 1 (k ≠ 0). However, this condition is not easy to use in practice. We study the condition from another perspective as follows: Let the two roots of Φ(B) = 1 − ϕ1B − ϕ2B2 = 0 be λ 1−1 , λ −21. By the root–coefficient relationship, λ 1 + λ 2 = φ1 , λ 1 λ 2 = − φ 2 and ρk =
−(λ 22 − 1)λ 1k + 1 + (λ 12 − 1)λ k2 + 1 (λ 1 − λ 2 )(λ 1λ 2 + 1)
( k ≠ 0)
It can be shown that if |ρk| < 1 (k ≠ 0), then |λ1| < 1, |λ2| < 1, that is, both roots of the following equation Φ( B ) = 1 − φ1 B − φ 2 B 2 = (1 − λ 1 B )(1 − λ 2 B ) = 0 are outside of the unit circle, or equivalently, |ϕ2| < 1, ϕ2 ± ϕ1 < 1. Conversely, it can be proved that if |λ1| < 1, |λ2| < 1, both roots of Φ(B) = 1 − ϕ1B − ϕ2B2 = 0 are outside of the unit circle and {et} is stationary. In general, a necessary and sufficient condition for an AR(p) model to be stationary is that all the roots of Φp(B) = 0 are outside of the unit circle. All the parameters ϕ = (ϕ1, . . ., ϕp) that satisfy the stationary condition form a subset in Rp space. Such a subset is called the stationary domain of AR(p). The most commonly used method to verify conditions of a stationary domain is Jury criterion (see Ref. 1). Consider a general AR(p) model e t = φ1 e t − 1 + + φ p e t − p + ε t
(4.4)
By multiplying Equation 4.4 by et−k on both sides and taking the expectation, we get Eetet−k = ϕ1Eet−1et−k + . . . + ϕpEet−pet−k + Eεtet−k. Let k = 0, 1, 2, . . . , p respectively, then σ2 (1 − ϕ1 ρ1− . . . −ϕp ρp ) = σε2
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F tIM e seRIe s A n A LysIs
235
ρ1 = ϕ1 + ϕ2 ρ1 + . . . + ϕp ρp−1 ρ2 = ϕ1 ρ1 + ϕ2 + . . . + ϕp ρp−2 ... ρp = ϕ1 ρp−1 + ϕ2 ρp−2 + . . . + ϕp, and when k > p, we get ρk = ϕ1 ρk−1 + ϕ2 ρk−2 + . . . + ϕk−p. Let b = (ρ1 , …, ρ p ) τ , φ = (φ1 , … , φ p ) τ , 1 Vp = ρ p −1
ρ1
ρp−2
ρ p −1 1
We have V pφ = b
(4.5)
Equation 4.5 is called the Yule–Walker equation. For a stationary AR(p) model, Vp is positive definite, so φ = V p−1b . This shows that model parameters are determined by autocorrelation functions (or autocovariance functions) and vice versa. On the other hand, an AR(p) series satisfies the difference equation Φ p ( B )et = ε t For instance, AR(1) series satisfies Φ1(B)et = εt, that is, (1 − ϕ1B) et = εt, so ∞
et = (1 − φ1 B )−1 ε t =
∑ k=0
∞
φ1k B k ε t =
∑φ ε k=0
In general, for Φ p ( B )et = ε t let
∞
Θ( B ) = Φ −p1 ( B ) =
© 2012 by Taylor & Francis Group, LLC
∑θ B k=0
k
k
k 1 t −k
236
M e A suReM en t DAtA M o D eLIn G
Then ∞
−1 p
et = Φ ( B )ε t =
∞
∑θ B ε = ∑θ ε k=0
k
k t
k=0
(4.6)
k t −k
Equation 4.6 is the transfer form of an AR(p) series, where θk is determined by Θ( B )Φ p ( B ) ≡ 1 Since the roots of Φp(B) = 0 are all outside of the unit circle, it can be proved that |θ k | ≤ c1e − c 2k , k ≥ 0 . The transfer form of the AR(p) series illustrates that AR(p) model has long memory; hence {θk} is called the memory function. Figure 4.1 presents autocorrelation functions of the following AR models: 1. et = 0.8et−1 + εt 2. et = −0.8et−1 + εt (1)
1 0 –1
0 0
5 Lag
10
(2)
1
0
5 Lag
10
(3)
5 Lag
10
(5)
–1
0
5 Lag
10
(6)
1
0
Figure 4.1
0
0
1
–1
–1
1
0 –1
(4)
1
0 0
5 Lag
10
–1
0
The autocorrelation functions of AR models.
© 2012 by Taylor & Francis Group, LLC
5 Lag
10
M e t h o D s o F tIM e seRIe s A n A LysIs
237
3. et = 0.6et−1 + 0.2et−2 + εt 4. et = −0.6et−1 + 0.2et−2 + εt 5. et = 0.75et−1 − 0.5et−2 + εt 6. et = −0.8et−1 − 0.6et−2 + εt 4.2.3 Moving Average Model
Definition 4.3 A zero-mean series {et} is a q-order moving average (MA) series MA(q), if e t = ε t − θ1 ε t − 1 − − θ q ε t − q
(4.7)
where εt is a white noise. {et} can also be expressed as et = Θq ( B )ε t where Θ q ( B ) = 1 − θ1 B − − θ q B q
(4.8)
By definition, et is a linear combination of the current value of {εt} and its past q values. In other words, εt can only affect the future q values of et. Consequently, two et ’s are independent if they are q intervals or more than q intervals apart. Example 4.3 MA(1): et = εt − θ1εt−1. Since rk = Eet et − k = E(εt − θ1εt −1 )(εt − k − θ1εt − k −1 ) (1 + θ12 )σε2 , = −θ1σε2 , 0, ρ0 = 1, ρ1 = r1 r0 =
© 2012 by Taylor & Francis Group, LLC
k=0
k=1
k≥2
− θ1 , ρk = rk r0 = 0 ( k ≥ 2) 1 + θ12
238
M e A suReM en t DAtA M o D eLIn G
Example 4.4 MA(2): et = εt − θ1εt−1 − θ2εt−2. Since rk = Eetet−k = E(εt − θ1εt−1 − θ2εt−2)(εt−k − θ1εt−k−1 − θ2εt−k−2) (1 + θ12 + θ 22 )σ ε2 , 2 ( −θ1 + θ1θ 2 )σ ε , = −θ 2σ ε2 , 0,
ρ0 = 1, ρ1 = r1 /r0 = ρ2 = r2 /r0 =
k=0
k=1
k=2 k>2
−θ1 + θ1θ 2 1 + θ12 + θ 22
−θ 2 , ρk = 0 ( k > 2) 1 + θ12 + θ 22
Similarly, for a general MA(q), we have q 1 + θi2 σε2 , i =1 rk = ( −θk + θk +1θ1 + + θq θq − k )σε2 , 0,
∑
k=0 1≤k≤q
(4.9)
k>q
and 1, −θ k + θ k + 1θ1 + + θq θq − k , q ρk = + θi2 1 i =1 0,
∑
k=0 1≤ k≤q
(4.10)
k>q
It can be straightforwardly observed that MA(q) is a stationary series. The autocovariance functions or autocorrelation functions of an MA(q) have the property of q-step truncation, that is,
© 2012 by Taylor & Francis Group, LLC
239
M e t h o D s o F tIM e seRIe s A n A LysIs
autocovariance functions or the autocorrelation functions vanish after q steps. The following theorem shows that the converse is also true.
Theorem 4.1 An autocovariance function series {rk} is the autocovariance function series of some MA(q) series if and only if {rk} has the property of q-step truncation. Equation 4.9 shows that {rk} of an MA(q) is determined by model parameters and vice versa. If {rk} is known, linear iteration method or Newton–Raphson algorithm is commonly used to obtain model parameters θ = (θ1, . . ., θq)τ (see Section 4.3.3). Definition 4.4 If all the roots of a polynomial with real coefficients Θ q ( B ) = 1 − θ1 B − − θ q B q = 0
(4.11)
are outside of the unit circle, the MA(q) series is invertible. Such a condition is called the invertible condition. A set of parameters θ = (θ1 ,…, θq )τ satisfying the invertible condition is called the invertible domain. Theorem 4.2 If the autocovariance function series {rk} is truncated after q steps and satisfies q
∑r e k = −q
k
ikλ
> 0, − π ≤ λ ≤ π,
then among the solutions to Equation 4.9, there exists a unique set of solution θ1, . . . , θq , which makes et = Θq ( B )ε t an invertible MA(q) series.
© 2012 by Taylor & Francis Group, LLC
24 0
M e A suReM en t DAtA M o D eLIn G
Figure 4.2 presents the autocorrelation functions of the following MA models: 1. et = εt + 0.8εt−1 2. et = εt − 0.8εt−1 3. et = εt + 1.4εt−1 + 0.6εt−2 4. et = εt + 0.8εt−1 + 0.5εt−2 5. et = εt + 0.5εt−1 − 0.2εt−2 6. et = εt − 0.4εt−1 − 0.2εt−2 4.2.4 ARMA(p,q) Model
Definition 4.5 A zero-mean series {et} is an autoregressive moving average series ARMA(p,q), if e t = φ 1 e t − 1 + + φ p e t − p + ε t − θ1 ε t − 1 − − θ q ε t − q (1)
1 0 –1
0 0
5 Lag
10
(2)
1
0
5 Lag
10
(5)
0 0
5 Lag
10
(3)
1
–1
0
5 Lag
10
(6)
1
0
0
–1 0
Figure 4.2
–1
1
0 –1
(4)
1
5 Lag
10
–1
0
The autocorrelation functions of mA models.
© 2012 by Taylor & Francis Group, LLC
5 Lag
10
(4.12)
M e t h o D s o F tIM e seRIe s A n A LysIs
2 41
where {εt} is a white noise series, and Eεt = 0, Eet−kεt = 0, ∀k > 0, for all t. Specifically, ARMA(p,0) = AR(p), ARMA(0,q) = MA(q). Let Φ p ( B ) = 1 − φ1 B − − φ p B p , Θ q ( B ) = 1 − θ1 B − − θ q B q
(4.13)
Then ARMA(p,q) model can be rewritten as Φ p ( B )et = Θq ( B )ε t
(4.14)
When all the roots of Φp(B) = 0, Θq(B) = 0 are outside of the unit circle, the ARMA(p,q) model is stationary and invertible. Theorem 4.3 Let {et} be a stationary time series and Φp(B)et = Θq(B)εt, where {εt} is a white noise series. i. If all the roots of Φp(B) = 0 are outside of the unit circle, then ∞
et =
∑c ε k=0
k t −k
and there exist μ1, μ2 > 0 such that c k ≤ µ1e − kµ 2 ( transfer form ) ii. If all the roots of Θq(B) = 0 are outside of the unit circle, then ∞
εt =
∑d e k=0
k t −k
and there exist r 1, r 2 > 0 such that d k ≤ r1e − kr2 ( inverse form )
© 2012 by Taylor & Francis Group, LLC
242
M e A suReM en t DAtA M o D eLIn G
Proof We prove (i) only and the proof of (ii) is similar. Because all the roots of Φp(B) = 0 are outside of the unit circle, there exists δ > 0, such that Φ −1 p ( B ) is analytic in |B| < 1 + δ; hence −1 Φ p ( B )Θq ( B ) is analytic in the domain of |B| < 1 + δ and thus can be expanded to convergent power series. Let ∞
Φ −p1 ( B )Θq ( B ) =
∑c B , k
k
k=0
B < 1 + δ.
Therefore, ∞
et = Φ −p1 ( B )Θq ( B )ε t =
∑ k=0
∞
ck B k εt =
∑c ε k=0
k t −k
.
From the properties of solutions to the difference equation (see Appendix 1 in Ref. 2), it can be shown that there exist μ1, μ2 > 0 such that |c k | ≤ µ1e − kµ 2 . Remark The equation in (i) is the Wold expansion of {et} and ck is the Wold coefficient. Theorem 4.4 Let ϕ0 = −1, θ0 = −1, rk = COV(et, et−k). Then for a stationary invertible ARMA(p,q) series {et}, there exist i. p
p
∑∑φ φ r i =0 j =0
i
j i− j
=σ
2 ε
q
∑θ j =0
2 j
( k = 0)
ii. p
p
∑∑φ φ r i =0 j =0
© 2012 by Taylor & Francis Group, LLC
i
j i− j +k
=σ
2 ε
q−k
∑θ θ j =0
j
j +k
(0 < k ≤ q )
M e t h o D s o F tIM e seRIe s A n A LysIs
24 3
iii. p
∑φ r j =0
j k− j
= 0 (k > q )
Proof Since et = ϕ1et−1 + ⋅ ⋅ ⋅ + ϕpet−p + εt − θ1εt−1 − ⋅ ⋅ ⋅ − θqεt−q, by multiplying et−k on both sides and taking the expectation operation, we have Eet et − k = ϕ1Eet − 1et − k + + ϕ p Eet − p et − k + Eε t et − k − θ1Eε t − 1et − k − − θq Eε t − q et − k 1. If k > q, then rk = ϕ1rk−1 + ⋅ ⋅ ⋅ + ϕprk−p, that is, ∑ pj = 0 φ j rk − j = 0. 2. If k ≤ q, let wt = εt − θ1εt−1 − ⋅ ⋅ ⋅ − θqεt−q, then Φp(B)et = wt, which means that {wt} is an MA(q) series. Let the autocovariance function be rkw , then q σ ε2 θ 2j , j =0 q−k = 2 θ jθ j +k , σ ε j =0 0,
∑
rkw
k =0
∑
0< k≤q k >q
On the other hand, w k
r
(
= E Φ p ( B )et Φ p ( B )et − k
=
p
p
∑ ∑ φ φ Ee j =0 i =0
j
i
e
)
t − j t − k −i
p p j = E − ϕ j B et − ϕi B i e t − k j = 0 i = 0
∑
=
p
∑
p
∑∑φ φ r j =0 i =0
j
i i− j +k
Let k = 0 or 0 < k ≤ q. We have (i) or (ii) respectively.
© 2012 by Taylor & Francis Group, LLC
24 4
M e A suReM en t DAtA M o D eLIn G
Remark Theorem 4.4 demonstrates that if a given time series {et} follows an ARMA(p,q) model and all parameters φi , θ j , σ ε2 are known, then its statistical features such as the first- and second-order moments can be easily obtained. On the other hand, if we know a sample of {et} and want to establish an ARMA(p,q) model, parameters φi , θ j , σ ε2 can be determined by estimates r k from the observed data of {et}. Such a computation is nonlinear. (1) ϕ1 = –0.5; θ1 = +0.5
1 0.5
0.5
0
0
–0.5
–0.5
–1
0
2
4
Lag
6
8
–1
10
(2) ϕ1 = +0.2; θ1 = +0.6
1
0.5
0
0
–0.5
–0.5 0
2
4
Lag
6
0
2
8
–1
10
4
0
2
4
(3) ϕ1 = +0.6; θ1 = +0.2
1 0.5 0 –0.5 –1
Figure 4.3
0
2
4
Lag
6
8
The autocorrelation functions of ARmA(1,1) models.
© 2012 by Taylor & Francis Group, LLC
Lag
6
8
10
(5) ϕ1 = +0.7; θ1 = –0.3
1
0.5
–1
(4) ϕ1 = –0.6; θ1 = –0.2
1
10
Lag
6
8
10
24 5
M e t h o D s o F tIM e seRIe s A n A LysIs
Figure 4.3 presents the autocorrelation functions of some ARMA(1,1) models. 4.2.5 Partial Correlation Function of a Stationary Model
The autocorrelation function series {rk} for an MA(q) model has the property of truncation after q steps, while this is not true for an AR(p) or ARMA(p,q) model. The autocorrelation functions of the latter models do not have the truncation features (sometimes called the drag property). Consequently, we cannot separate AR and ARMA models from their autocorrelation functions. For this purpose, partial correlation functions are introduced. Definition 4.6 Suppose {et} is a stationary time series, the k-order partial correlation function is defined as the conditional correlation function of et, et−k given et−1, . . ., et−k+1: ϕ kk = ρet et − k |et −1 ,…,et − k + 1 =
E(et et − k | et − 1 , … ,et − k + 1 ) Var(et | et − 1 , … ,et − k + 1 )
(4.15)
The partial correlation function excludes the interference of other variables. However, it is difficult to be calculated in practice. Another perspective is given as follows. Suppose {et} as a stationary series. Consider the linear least squares estimation of et based on et−1, . . ., et−k. Let E et −
2
β j et − j = min E et − β1 , … ,βk j =1 k
∑
β j et − j j =1 k
∑
2
Then et =
k
∑ β e j =1
j
t− j
+ εt
By multiplying et−k on both sides and taking the conditional expectation of et−1, . . ., et−k+1 we have
© 2012 by Taylor & Francis Group, LLC
24 6
M e A suReM en t DAtA M o D eLIn G
E(et et − k | et − 1 , … , et − k + 1 ) = β 1E(et − 1et − k | et − 1 , … , et − k + 1 ) + + β k E(et − k et − k | et − 1 , …, et − k + 1 ) + E( ε t et − k | et − 1 , …, et − k + 1 ) = β k Var(et − k | et − 1 ,…, et − k + 1 ) Thus, φ kk = β k This means that the k-order partial correlation function is actually the coefficient of the last term when we use a linear least squares method to estimate et as a k-order AR model. Recall that J = E et −
2
k β j et − j = r0 + 2 βi β j r i − j − 2 j =1 i, j =1 k
∑
∑
k
∑β r j =1
j j
Since J reaches its minimum at β , (∂J)/(∂βi) = 0 and k
∑β r j =1
j |i − j |
= ri , i = 1, 2,…, k
which is 1 ρ 1 ρk − 1
ρ1
1
ρk − 2
ρk − 1 β1 ρ1 ρk − 2 β 2 ρ 2 = 1 β k ρk
That is to say, β is the solution to the equation V kβ = b k and denote φkj = β j . Thus, we conclude that, for an AR(p) model, ϕkk = 0, for any k > p, and ϕpj = ϕj(1 ≤ j ≤ p), which means that ϕkk has the property of truncation after p steps.
© 2012 by Taylor & Francis Group, LLC
2 47
M e t h o D s o F tIM e seRIe s A n A LysIs
In fact, when k > p, since β is the solution to Vkβ = bk, 1 1 ρ1 φ kk = β k = k V ρk − 1
ρ1
1
ρk − 2
ρk − 2
ρ1
ρk − 3
ρ2
ρ1
ρk
Recall that, for an AR(p) model, ρk =
p
∑φ ρ j
j =1
k− j
( k ≥ 1)
Therefore, 1
φ kk =
ρ1
1 ρ1 Vk ρk − 1
1
p
ρk − 2
∑φ ρ
ρk − 3
∑φ ρ
j =1 p
j =1
ρk − 2
ρ1
j 1− j
j
2− j
=0
p
∑φ ρ j =1
j
k− j
The determinant above is zero since the last column of the matrix is a linear combination of the previous p columns. Theorem 4.5 (The recursive expression of partial correlation function.) Let ϕk = (ϕk1, ..., ϕkk)τ, where ϕkk is the k-order partial correlation function. Then φ00 = 1, φ11 = ρ1 , φ k + 1,k + 1 =
k
∑ρ j =0
k + 1− j
φ kj
k
∑ρ φ j =0
j
kj
(φ k 0 = −1, k = 1, 2, ...)
φ k + 1, j = φ kj − φ k + 1,k + 1φ k ,k + 1− j , 1 ≤ j ≤ k
© 2012 by Taylor & Francis Group, LLC
24 8
M e A suReM en t DAtA M o D eLIn G
Proof Let αk = (ρk, . . ., ρ1)τ, βk = (ϕk+1,1, . . ., ϕk+1,k)τ, and bk = (ρ1, . . ., ρk)τ. Then V k + 1φ k + 1 = b k + 1 that is, Vk τ α k
α k βk b k = 1 φ k + 1,k + 1 ρk + 1
( )
Thus, k k k −1 k k V k β k + α k φ k + 1, k + 1 = b ⇒ β = (V ) ( b − α φ k + 1, k + 1 ) k τ k α β + φ k + 1,k + 1 = ρk + 1 τ k ⇒φ (V k )−1 (b k − α k φ k + 1,k + 1 ) k + 1, k + 1 = ρk + 1 − α
( )
( )
Let Γk = 1
1 k×k
Then
(Γ ) = (Γ ) k
τ
k
−1
= Γ k , Γ kV k Γ k = V k , α k = Γ k b k
Therefore,
( ) ( ) − (b ) Γ φ + (b ) φ φ τ
τ
φ k + 1,k + 1 = ρk + 1 − b k Γ k (V k )−1 b k + b k Γ k (V k )−1 Γ k b k φ k + 1,k + 1 = ρk + 1
© 2012 by Taylor & Francis Group, LLC
k
τ
k k
k
τ
k
k + 1, k + 1
24 9
M e t h o D s o F tIM e seRIe s A n A LysIs
that is,
( ) 1 − (b ) φ τ
φ k + 1,k + 1 =
ρk + 1 − b k Γ k φ k k
τ
k
=
k
∑φ ρ j =0
kj
k
k + 1− j
∑φ ρ j =0
kj
j
Recall that β k = (V k )−1 (b k − α k φ k + 1,k + 1 ) = φ k − (Γ kV k Γ k )−1 Γ k b k φ k + 1,k + 1 = φ k − Γ k (V k )−1 b k φ k + 1,k + 1 = φ k − Γ k φ k φ k + 1,k + 1 We have φ k + 1, j = φ kj − φ k + 1,k + 1φ k ,k + 1 − j , 1 ≤ j ≤ k Theorem 4.6 A zero-mean stationary series {et} is a stationary AR( p) series if and only if its partial correlation functions have the property of p-step truncation. Proof The necessity has already been proved. The sufficiency is given as follows. Suppose a zero-mean stationary series {et} has a partial correlation function ϕkk with p-step truncation, that is, φ kk = 0, k > p Let {rk} be the autocovariance function series of {et} and ϕ = (ϕ1, . . . , ϕp)τ be the solutions to the Yule–Walker equation V pφ p = b p Then φ pj = φ j ,
© 2012 by Taylor & Francis Group, LLC
j = 1, 2, ..., p
250
M e A suReM en t DAtA M o D eLIn G
Let εt = et − ϕ1et−1 − . . . − ϕpet−p. Using the p-step truncation property of the partial correlation function and its recursive formulas, we have ϕp+1,j = ϕp,j = ϕj, j = 1,2, . . ., p ϕp+2,j = ϕp+1, j = ϕj, j = 1,2, . . ., p + 1 ... ϕp+s,j = ϕp+s−1,j = ϕj, j = 1,2, . . ., p + s − 1
Let k = p + s − 1 (s > 0). Since ϕp+s,p+s = 0, we have rp + s =
p + s −1
∑r j =1
p+s− j
φ p + s − 1, j =
p
∑r j =1
p+s− j
φj
that is Φ p ( B )rp + s = 0 or Φ p ( B )rk = 0 ( k > p ) Again, by the Yule–Walker equation, we have rk =
p
∑r j =1
k− j
φj
(1 ≤ k ≤ p )
that is, Φ p ( B )rk = 0 (1 ≤ k ≤ p ) This means, for all k ≠ 0, there exists Φ p ( B )rk = 0 For t > s,
Eε t e s = E(et − ϕ 1et − 1 − − ϕ p et − p )e s = rt − s −
p
∑ϕ r j =1
© 2012 by Taylor & Francis Group, LLC
j t −s− j
= Φ p ( B )rt − s = 0 (t > s )
2 51
M e t h o D s o F tIM e seRIe s A n A LysIs
and Eε t ε t + k = EΦ p ( B )et Φ p ( B )et + k
= Φ p ( B )rk −
= E et −
p
∑ φ Φ (B)r j
j =1
p
k+ j
ϕ j et − j Φ p ( B )et + k j =1 p
∑
= 0 ( ∀k ≠ 0 )
Therefore, {εt} is a white noise series, and hence makes {et} an AR(p) series. Table 4.1 lists the main properties of AR, MA, and ARMA models. 4.3 Parameter Estimation of Stationary Time Series Models 4.3.1 Estimation of Autocovariance Functions and Autocorrelation Functions
Suppose a zero-mean stationary series {et} has a group of observations e1, e2, . . ., en. The estimation of the autocovariance function {rk} has two main forms: 1 r = n−k * k
n−k
∑e e t =1
t t +k
(4.16)
, ρ k* = r k* / r 0*
Table 4.1 main Properties of AR, mA, and ARmA models moDELs
AR
mA
ARmA
PRoPERTIEs
Invertible condition
Φ(B)et = εt All the roots of Φ(u) = 0 are outside of the unit circle None
Transfer form Inverse from Autocorrelation function Partial correlation function
et = Φ−1(B)εt εt = Φ(B)et Drag Truncation
model equation stationary condition
et = Θ(B)εt None
Φ(B)et = Θ(B)εt All the roots of Φ(u) = 0 are outside of the unit circle
All the roots of Θ (u) = 0 are outside of the unit circle et = Θ(B)εt εt = Θ−1(B)et Truncation Drag
All the roots of Θ(u) = 0 are outside of the unit circle et = Φ−1(B)Θ(B)εt εt = Θ−1(B)Φ(B)et Drag Drag
Note: The “drag” means that the tail part of the series is not always zero, namely, it does not possess the property of truncation.
© 2012 by Taylor & Francis Group, LLC
252
M e A suReM en t DAtA M o D eLIn G n−k
∑e e
1 r k = n
t =1
t t +k
(4.17)
, ρ k = r k / r 0
Both of the two forms have the following properties: n−k 1. E r k* = rk , E r k = rk → rk (n → ∞) n 2. Let * r 0 r * * τ n = 1 r * n −1
r 1*
r 0*
r n* − 2
r n* − 1 r n* − 2 , τ n = * r 0
r 0 r 1 r n − 1
r 1
r 0
r n − 2
r n − 1 r n − 2 . r 0
Then τ n is nonnegative definite but τ n* is not necessarily the case. In fact, let e1 0 Aτ = 0
e2
e1
0
en
0
en − 1
en
e1
e2
0 0 . en
1 AA τ and τ n is nonnegative definite. n The above property shows that although rk* is unbiased, its covariance matrix is not necessarily nonnegative definite. Therefore, the biased estimate r k is the one usually used, and {rk } is called the sample autocovariance function and {ρ k } the sample autocorrelation function. The following two theorems describe how close the estimate rk is to the true value of rk. Then τ n =
Theorem 4.7 (Bartlett formulas)
∞
∑
1 1 1. COV(r k , r k + v ) = (rm rm + v + rm + k + v rm − k ) + ο . n n m = −∞
© 2012 by Taylor & Francis Group, LLC
253
M e t h o D s o F tIM e seRIe s A n A LysIs ∞
∑
1 1 2. Var(r k ) = (rm2 + rm + k rm − k ) + ο . n n m = −∞ ∞
∑
1 3. COV(ρ k , ρ k + v ) = (ρmρm + v + ρm + k + vρm − k + 2ρkρk + vρm2 n m = −∞ 1 − 2(ρkρm − k − v + ρk + vρmρm − k )) + ο . n
∞
∑
1 4. Var(ρ k ) = (ρm2 + ρm + k ρm − k + 2ρk2ρm2 − 4ρk ρm ρm − k ) n m = −∞ 1 + ο . n
Theorem 4.8 If {et} are independent and have the identical distribution of N(0, σ ε2 ), then n (r 0 − r0 , r 1 − r1 ,…, r k − rk ) ∼ N(0,G )
(4.18)
n (ρ 0 − ρ0 , ρ 1 − ρ1 ,…, ρ k − ρk ) ∼ N(0, R )
(4.19)
asymptotically where G, R are the covariance matrices of {r k } and {ρ k } given in Theorem 4.7. The following part of this section will focus on the parameter estimation of three classic time series models. 4.3.2 Parameter Estimation of AR(p) Models
Let e1, e 2, ..., en be a sample of an AR(p) series {et} and r k be the estimate of its autocovariance function. By the Yule–Walker equation V φ = b ,
4.3.2.1 Moment Estimation of Parameters in AR Models
φ = V
© 2012 by Taylor & Francis Group, LLC
−1
b
(4.20)
25 4
M e A suReM en t DAtA M o D eLIn G
where 1 V = ρ1 ρ p −1
ρ 1
1
ρ p − 2
ρ p − 1 ρ φ 1 1 ρ p − 2 , b = , φ = φ ρ p p 1
Therefore, we have σ ε2 = r0 − φ 1 r 1 − − φ p r p = σ e2 (1 − φ 1 ρ 1 − − φ p ρ p ) (4.21) where σ e2 = r 0. 4.3.2.2 Least Squares Estimation of Parameters in AR Models
e p τ Y = ( e p + 1 , … , en ) , X = e n −1
Let
e1 en − p
φ = (φ1 ,…, φ p )τ , ε = ( ε p +1 ,…, εn )τ Then, the AR(p) series can be written as the following linear model: Y = Xφ + ε The least squares estimate of ϕ is φ = ( X τ X )−1 X τ Y Y − X φ Rss Y τ Y − Y τ X ( X τ X )−1 X τ Y σ ε2 = = = n− p n− p n− p
(4.22) (4.23)
Remark 1. If k > p, the moment estimates of parameters in an AR(p) model satisfy
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F tIM e seRIe s A n A LysIs
1 ρ 1 ρk − 1
ρ 1
1
ρ k − 2
ρ p − 1 ρ p − 2 ρ k − p
255
ρ φ 1 1 = φ p ρ k
or V φ = b in short, where 1 V = ρ1 ρk − 1
ρ 1
1
ρ k − 2
ρ p − 1 φ ρ 1 1 ρ p − 2 , φ = , and b = . φ p ρ k ρ k − p
This is a linear model whose least squares estimates of parameters are
φ = (V τV )−1V τ b It can be seen that, for different k, the estimate φ is different. The key question is how to determine k. 2. It is difficult to compare the two estimation methods introduced above in terms of the superiority of one over the other. Results from such a comparison vary from case to case [8]. 4.3.3 Parameter Estimation of MA(q) Models
Autocovariance functions {rk} of an MA(q) model satisfy the following equations: r0 = σ ε2 (1 + θ12 + + θq2 ) 2 rk = σ ε ( −θ k + θ k + 1θ1 + + θq θq − k ), k = 1, 2,…, q
© 2012 by Taylor & Francis Group, LLC
256
M e A suReM en t DAtA M o D eLIn G
Substituting rk ’s in these equations with rk yields
2 2 2 σ ε (1 + θ 1 + + θ q ) = r 0 2 σ ε ( − θ k + θ k + 1 θ1 + + θq θq − k ) = r k , k = 1, 2,…, q
(4.24)
These are nonlinear equations of θ 1 ,…, θ q . Numerical solutions can be obtained in two ways. 4.3.3.1 Linear Iteration Method
Equation 4.24 can be rewritten as
2 2 2 σ ε = r 0 (1 + θ 1 + + θ q ) q −k θ k = θ j θ k + j − r k σ ε2 j =1
∑
Given the initial values of θ (10 ),…, θ (q0 ), the iteration scheme is carried out as follows. q 2 ( i ) σ ε = r 0 θ 2j (i ) j =1 , i = 0, 1, 2,… q −k (i + 1) θ (ji ) θ (ji+) k − r k σ (εi ) θ k = j =1
∑
∑
The computation stops when the difference between solutions from two iteration steps is negligible, that is, less than a threshold given. The θ’s from the last step are approximate parameter estimates. 4.3.3.2 Newton–Raphson Algorithm
4.24 as
© 2012 by Taylor & Francis Group, LLC
We can also arrange Equation
r 0 = σ ε2 + (θ 1 σ ε )2 + + (θ q σ ε )2 r k = − σ ε (σ ε θ k ) + (σ ε θ k + 1 )(σ ε θ 1 ) + + (σ ε θ q )(σ ε θ q − k ), k = 1, 2,…, q
257
M e t h o D s o F tIM e seRIe s A n A LysIs
Let zk = σ ε θ k , k = 1, 2,…, q , z0 = − σ ε . The above equations can be written in the form of 2 2 2 f 0 (z0 , z1 ,…, zq ) = z0 + z1 + + zq − r 0 = 0 f (z , z ,…, z ) = z z + z z + + z z − r = 0, k q 0 k 1 k +1 q q−k k 0 1 k = 1,…, q Let z = (z 0, z1, . . ., zq )τ , f (z) = (f0, f1, . . ., fq)τ . These equations reduce to f (z) = 0. The iteration formula of Newton–Raphson algorithm is z ( k + 1) = z ( k ) − ∇f (z ( k ) )
−1
f (z ( k ) )
where ∇f is the gradient matrix of f, that is, ∂f 0 ∂z0 ∂f 1 ∇f = ∂z0 ∂f q ∂z0 z0 z 1 = zq
∂f 0 ∂z1 ∂f 1 ∂z1
∂f q
∂z1 z1
z2
∂f 0 ∂zq 2z0 ∂f 1 z1 ∂zq = z ∂f q q ∂zq
zq z0 0 +
z1
z0
2z1
z0 + z2
0
zq zq − 1 z0
and finally we will get
( )
2
σ ε2 = z0( k0 ) ,
© 2012 by Taylor & Francis Group, LLC
θ k = zk( k0 ) ( −z0( k0 ) ) , k = 1, 2,…, q
2zq zq − 1 z0
258
M e A suReM en t DAtA M o D eLIn G
4.3.4 Parameter Estimation of ARMA(p,q) Models
4.3.4.1 Moment
Estimation
ARMA(p,q) series {et} satisfy
Autocovariance
functions
of
an
rq + 1 = φ1rq + φ 2 rq − 1 + + φ p rq − p + 1 rq + 2 = φ1rq + 1 + φ 2 rq + + φ p rq − p rq + p = φ1rq + p − 1 + φ 2 rq + p − 2 + + φ p rq Substituting rk with r k, we have r q + 1 = φ 1 r q + φ 2 r q − 1 + + φ p r q − p + 1 r q + 2 = φ r q + 1 + φ r q + + φ r q − p p 1 2 , r q + p = φ 1 r q + p − 1 + φ 2 r q + p − 2 + + φ p r q
from which a solution of φ = (φ 1 ,…, φ p ) can be obtained. Let wt = εt − θ1εt−1 − . . . − θqεt−q. {wt} is an MA(q) series, Φp(B) et = wt, and
(
rkw = Ewt wt + k = E Φ p ( B )e t Φ p ( B )e t + k p = E − ϕ je t− j j = 0
∑
=
)
p ϕi e t + k − i − i = 0
∑
p
∑φ φ r i
i, j =0
j k + j −i
Therefore, we have w k
r =
© 2012 by Taylor & Francis Group, LLC
p
∑ φ φ r
i, j =0
i
j
k + j −i
, k = 0, 1,…, q , φ 0 = −1.
M e t h o D s o F tIM e seRIe s A n A LysIs
259
On the other hand, since {wt} is an MA(q) series, q r w0 = σ ε2 1 + θ 2j j =1 . q−k w 2 θ j θ j + k , k = 1, 2,…, q r k = σ ε − θ k + j =1
∑
∑
θ j , σ ε2 can be obtained by either linear iteration method or Newton–Raphson algorithm.
Let β = (ϕ1, . . ., ϕp, −θ1, . . ., −θq) , xt = (et−1, . . ., et−p, εt−1, . . ., εt−q) . Then an ARMA(p,q) model can be written as 4.3.4.2 Nonlinear Least Squares Estimation τ
τ
et = xt β + ε t
(4.25)
It seems that there is a linear relationship between et and β. However, since the xt contains unknown εt−k (k = 1, . . ., q) and are determined recursively by ε s = e s − φ1 e s − 1 − − φ p e s − p + θ1 ε s − 1 + + θ q ε s − q as
Equation 4.25 is actually a nonlinear model of β. It can be expressed et = f t (β) + ε t
Let Y = (ep+1, . . ., en)τ, F(β) = ( fp+1 (β), . . ., fn (β))τ, ε = (εp+1, . . ., εn)τ. The equation above can be written as Y = F (β) + ε whose residual sum of squares (RSS) is Q(β) = (Y − F(β))τ (Y − F(β)). From the perspective of optimization theory, the least squares estimation is to gain the minimum of an objective function Q(β). Various iteration algorithms in optimization theory can be applied to obtain the nonlinear least squares estimation. The Gauss–Newton iteration method is used in the following.
© 2012 by Taylor & Francis Group, LLC
260
M e A suReM en t DAtA M o D eLIn G
By the first-order Taylor expansion F(β) F (β) ≈ F (β( 0 ) ) + ∇F (β( 0 ) )(β − β( 0 ) ) where β(0) is an initial value of β. It follows that Q(β) = (Y − F (β))τ (Y − F (β)) ≈ (Y − F (β( 0 ) ) − ∇F (β( 0 ) )(β − β( 0 ) ))τ (Y − F (β( 0 ) ) − ∇F (β( 0 ) )(β − β( 0 ) )). Let
∂Q(β) = 0. Then ∂β −1
β − β( 0 ) = ∇F (β( 0 ) )τ ∇F (β( 0 ) ) ∇F (β( 0 ) )τ (Y − F (β( 0 ) )) The iteration formula is −1
β( k + 1) = β( k ) + ∇F (β( k ) )τ ∇F (β( k ) ) ∇F (β( k ) )τ (Y − F (β( k ) )) (4.26) The iteration steps are as follows. 1. Let ε−s = 0, ∀s > 0. 2. Given an initial value of β(0), which is usually the moment estimate of β, calculate F(β(0)) and ∇F(β(0)). −1
3. β(1) = β( 0 ) + ∇F (β( 0 ) )τ ∇F (β( 0 ) ) ∇F (β( 0 ) )τ (Y − F (β( 0 ) )). 4. If β(1) − β( 0 ) < c , where c is a threshold given in advance, then
β = β(1) , σ ε2 =
Q(β ) n− p−q
and the iteration is over; otherwise, let β(1) ⇒ β( 0 ) and return to step (2). 4.4 Tests of Observational Data from a Time Series
As is pointed out by George E. P. Box et al. (1987), “Essentially, all models are wrong, but some are useful,” there is never a model
© 2012 by Taylor & Francis Group, LLC
2 61
M e t h o D s o F tIM e seRIe s A n A LysIs
that can describe the objective reality absolutely right. Therefore, with the premise of sufficient data, it becomes very necessary to use statistical tests to diagnose the rationality of the model currently used. When we deal with observational data {x1, x2, . . ., xn} from a time series, we usually assume that {xt} satisfies certain assumptions such as normality, stationarity, etc. Hypothesis tests need to be done statistically on whether these assumptions are valid. In this section, tests of time series data are mainly focused on normality, independence, and stationarity. 4.4.1 Normality Test
Normality is generally tested by carrying out the Pearson χ2 goodness of fit [9]. Here, we introduce another method: the skewness and kurtosis test [10]. For observational data {x1, x2, . . ., xn}, four statistics are usually used to characterize the probability density function of the population. 1. Mean: x =
1 n
n
∑x k =1
1 2. Variance: s = n 2
k
n
∑ (x k =1
k
− x )2
3. Standard skewness coefficient: g 1 = 4. Standard kurtosis coefficient: g 2 =
1 6n n 24
n
xk − x s k =1
∑
1 n
3
4 xk − x − 3 s k =1 n
∑
The standard skewness coefficient g 1 reflects the asymmetry of the probability density function of the data series {xt} around the mean value, while the standard kurtosis coefficient g2 shows the kurtosis difference between the probability density function and the standard normal density.
© 2012 by Taylor & Francis Group, LLC
262
M e A suReM en t DAtA M o D eLIn G
Theorem 4.9 If {x1, x2, . . ., xn} is from a normal distribution, then 1. g 1 → N(0,1) in distribution 2. g 2 → N(0,1) in distribution The normality of {x1, x2, . . ., xn} can be tested according to the following steps: 1. Calculate x , s, g 1 , g 2 from {x1, x2, . . ., xn}. 2. Given a significance level α, we get α/2 percentile z α/2 by checking the standard normal distribution table. 3. If |g 1| ≤ z α/2, |g 2|≤ z α/2, then {x1, x2, . . ., xn} is from a normal population at a confidence level of 1 − α; otherwise the data are not from a normal distribution. Remark The skewness–kurtosis test usually requires the sample size n ≥ 20. 4.4.2 Independence Test
The routine independence tests are only applicable to data from static measurements. An independence test is introduced for data from dynamic measurements. Suppose y t is the fitting value or estimate of yt. Let ε t = yt − y t, then the independence of {yt} is inherited from the independence of {ε t }. If {ε t } is an independent normal time series, then ρ1 = ρ2 = . . . = ρk = 0. From Theorem 4.8, we have ( n ρ 1 , …, n ρ k ) → N(0, I k )
Consider a statistic Q =n
© 2012 by Taylor & Francis Group, LLC
k
∑ ρ r =1
2 r
M e t h o D s o F tIM e seRIe s A n A LysIs
263
Testing whether { ε t } is an independent normal series is the same as testing if Q follows a χ2 distribution with a degree of freedom k. In other words, let the null hypothesis be H 0 : { ε t , 1 ≤ t ≤ n} is an independent normal series. Then, given a significance level α, we find χα2 ( k) from the χ2 dis2 tribution table and calculate the value of the statistics Q. If Q ≤ χα ( k ), H0 is accepted and measurement data {yt, 1 ≤ t ≤ n} are independent; otherwise, H0 is rejected and the data are dependent. Remarks 1. The independence test is based on the assumption that { ε t } is a normal series. For a nonnormal series, there are no very efficient methods so far to test its independence. 2. The selection of k is generally difficult. In practical processing, it is generally taken as k ≈ n/10. 3. For the statistic Q = n
k
∑ ρ , Ref. 11 points out that, with the r =1
2 r
null hypothesis H0, the χ2 distribution cannot provide an adequate approximation for the distribution of statistic Q. The value of Q is smaller than it is expected under the χ2 distribution. Therefore, a modified form is provided as k
∑ n − r ρ
Q = n(n + 2)
r =1
1
2 r
>Q
n + 2 > 1 . n−r
4.4.3 Stationarity Test: Reverse Method
A stationarity test method is introduced here based on the reverse number. The principle is that if y1, y2, . . ., yn are stationary, there should be no remarkable difference among the mean values or variance values of its subseries. The steps of the test are as follows [12,13]: 1. Divide {yt, 1 ≤ t ≤ n} into k subseries with equal number of M data points, where kM ≤ n, k > 10 and discard the leftover, that is,
© 2012 by Taylor & Francis Group, LLC
264
M e A suReM en t DAtA M o D eLIn G
y11 y12 y1M ………… y k 1 y k 2 … y kM 2. Calculate the mean and variance of each subseries: 1 yi = M
M
∑ j =1
1 yij , s = M 2 i
M
∑( y j =1
ij
− yi )2 , i = 1, 2, …, k
3. Let 1, aij = 0,
when i < j ,
1, bij = 0,
when i < j , si > s j
and A =
yi > y j
otherwise
otherwise
∑a ,B = ∑b i< j
ij
i< j
ij
Then, A, B are actually the total number of reverse orders in { yi } and {si}. For example, if we divide {yt} into seven subseries, the mean values of which are 2.3, 3.1, 2.3, 2.5, 4.1, 3.8, 3.5 it is straightforward to observe that A = 15. 4. Theoretically, it can be proved that if {yt} is stationary, then for
{
}
a large M (M > 5), { yi : 1 ≤ i ≤ k} , si2 : 1 ≤ i ≤ k should
approximately be independent and have the same distribution, and 1 EA = EB = k( k − 1) 4 Var( A ) = Var( B ) =
© 2012 by Taylor & Francis Group, LLC
1 k( 2k 2 + 3k − 5) 72
M e t h o D s o F tIM e seRIe s A n A LysIs
265
Therefore, the statistics u =
v =
A + 1 2 − 1 4 k( k − 1) k( 2k 2 + 3k − 5) 72 B + 1 2 − 1 4 k( k − 1) k( 2k 2 + 3k − 5) 72
→ N(0, 1)
→ N(0, 1)
The steps of testing the stationarity of {yt, 1 ≤ t ≤ n} can be summarized as follows. 4.4.3.1 Testing the Mean Stationarity
1. Divide {yt, 1 ≤ t ≤ n} into k sections of M data points each; calculate the mean { yi : 1 ≤ i ≤ k }. 2. Calculate A and the statistic u. 3. For a given significance level α, find z α / 2 in the standard normal distribution table. 4. If |u| ≤ z α / 2, then the data series is considered to be stationary at the confidence level of 1 − α, otherwise it is considered to be nonstationary. 4.4.3.2 Testing the Variance Stationarity The steps are similar to the steps above. The only difference is to substitute A with B and u with v.
Remarks 1. When A, B are large, it shows that the mean and variance have a trend of increasing, while when A, B are small, it demonstrates an opposite trend. Therefore, the reverse test method is more sensitive for a time series with monotonic trend. 2. The results of the reverse order test rely heavily on the order of data occurrence. When the data show a symmetrical distribution, the reverse method is inefficient in the sense that the probability of nondetecting is high. Therefore, the usual practice is when {yt} has passed the stationarity test using the
© 2012 by Taylor & Francis Group, LLC
266
M e A suReM en t DAtA M o D eLIn G
reverse order method, we divide the series data into three sections again and exchange the middle section of the data with the last section. This can change a symmetric pattern into a monotone type. The reverse test is performed once again. If the modified series passes the stationarity test too, the original series will be considered as a stationary series, otherwise it is nonstationary. 4.5 Modeling Stationary Time Series
In general, prior to time series modeling, the data should be plotted first so that the sequence is visualized. After that the stationarity test should be carried out. If the time series is stationary, the modeling methods in this section can be applied; otherwise, the series is nonstationary and models in Section 4.6 should be considered. For stationary time series modeling, three aspects are taken into account: model selection, model order determination, and model testing. 4.5.1 Model Selection: Box–Jenkins Method
Theoretically, the unique feature of MA models is that ρk has the property of truncation, while the unique character of AR models is that ϕkk has the property of truncation. Thus, ρk and ϕkk can theoretically discriminate AR, MA, and ARMA models strictly. However, we can only obtain the observational values of {yt} in practice. Hence, only estimates of ρ k , φkk can be obtained. These estimates may not have the strict properties of truncation. We can only take tails of estimates as zero under some certain precision [14]. Using Theorems 4.7 and 4.8, if {yt} is an MA(q) series and when n is large, there is q 1 ρ k → N 0, 1 + 2 ρ m2 n m =1
∑
and ρ k ( k > q ) are independent of each other.
© 2012 by Taylor & Francis Group, LLC
(k > q )
M e t h o D s o F tIM e seRIe s A n A LysIs
267
Therefore, testing whether ρ k is zero with a given confidence level 1 − α reduces to testing whether {ρ k , k = q + 1, …, q + M } fall into the interval of −zα , zα with the probability of 1 − α, where M is 1
q
∑
2i ⋅ zα / 2 , and z α/2 1+ 2 ρ n i =1 is the α/2 percentile of the standard normal distribution. If the above test fails for q = 1, 2, . . ., q0 − 1 but passes for q = q0, then {yt} is considered to be an MA(q0) series. If {yt} fail the test even for a larger q, then calculate φ kk. Similarly, it can also be proved that for an AR(p) model, when k > p and n is large enough, then {φ kk , k > p } are independent of each other, and generally taken as n or n/10, zα =
φ kk → N( 0, 1/n). Therefore, the question of whether φ kk is zero with a given confidence level 1 − α will turn into the issue to test whether {φ kk , k = p + 1, …, p + M } fall into the interval of −zα , zα with the 1 ⋅ zα / 2, z α/2 is the α/2 percentile probability of 1 − α, where zα = n of the standard normal distribution, and M is generally taken as n or n/10. If the above test fails for p = 1, 2, . . ., p0 − 1 but passes for p = p0, then {yt} is preliminarily considered to be an AR(p0) series. If the test shows that neither ρ k nor φ kk has the property of truncation, then {yt} is taken as an ARMA series, but the order of the model cannot be determined yet.
4.5.2 AIC Criterion for Model Order Determination
The Japanese statistician Hirotugu Akaike proposed the AIC criterion in 1974 [15] from the perspective of information theory to describe the applicability and complexity of a model as the order determination criterion, known as Akaike Information Criterion. The general form is: AIC = −2 . (the maximum likelihood function for the estimated model) + 2 . (the number of parameters in the model).
© 2012 by Taylor & Francis Group, LLC
268
M e A suReM en t DAtA M o D eLIn G
4.5.2.1 AIC for AR Models
AR(k) model. Let
Suppose time series {yt} is described by an 2
AIC( k ) = log σ ε ( k ) +
2k , k = 0, 1, 2, . . ., P n
(4.27)
2 where σ ε ( k ) stands for an estimate of σ ε2, which is the variance of the white noise series {εt} in the AR(k) model of {yt}, and P is the upper limit of the order, which is usually set by practical experiences and the preliminary recognition of AR models in the previous section. The p satisfying AIC( p ) = min AIC( k ) can be determined as the 0≤k≤P estimate of the model’s order. A subtle observation of the right side of Equation 4.27 reveals the following fact. The first term shows the fitting degree or approximation accuracy of the model to the observational data, and it will decrease with the increase of k, while the second term reflects the number of parameters and it will increase when k becomes larger. In this sense, the AIC has adequately balanced the two aspects of fitting precision and parameter numbers.
Generalizing results in AIC for AR models provides a criterion for ARMA(p,q) models. Similarly, we have
4.5.2.2 AIC for MA and ARMA Models
2
AIC( k, j ) = log σ ε ( k, j ) +
2( k + j ) , k, j = 0, 1, 2, …, Q (4.28) n
2
where σ ε ( k, j ) represents an estimate of σ ε2 , which is the variance of the white noise series {εt} in the ARMA(p,q) model of {yt}, and Q is the common upper limit of p, q. Take AIC( p, q ) = min AIC( k, j ) 0 ≤ k , j ≤Q
The AIC for MA models is a special case of Equation 4.28 at k = 0 [15,16]. 4.5.3 Model Testing
The AR, MA, and ARMA models for fitting a stationary series {yt} have been established according to the model selection and order
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F tIM e seRIe s A n A LysIs
269
determination methods described above. The goodness of fit should be tested through practical applications and engineering background. However, model testing should be done using mathematical theory–statistical hypothesis testing before any engineering application. Huge saving in manpower, material and financial resources, as well as research time can be achieved from model testing. 4.5.3.1 AR Models Testing
that is,
If time series {yt} fit an AR(p) model,
yt = φ 1 yt − 1 + + φ p yt − p + ε t then ε t = yt − φ 1 yt − 1 − − φ p yt − p , t = p + 1, p + 2, … , n should be a sample of a white noise series {εt}. We can verify whether { ε t } is an independent sample according to the method of independence test in Section 4.4.2. If { ε t } is an independent sample, then {yt} is an AR(p) series, otherwise {yt} is not an AR(p) series. 4.5.3.2 MA Models Testing
that is,
If time series {yt} fits an MA(q) model,
yt = ε t − θ 1 ε t − 1 − − θ q ε t − q then ε t = yt + θ 1 ε t − 1 + + θ q ε t − q , t = q + 1, …, n should be a sample of a white noise series {εt}, where we can set the initial values ε 0 = ε 1 = = ε q = 0. Therefore, we can determine if {yt} is an MA(q) series based on the independent test of { ε t , q + 1 ≤ t ≤ n}. The testing method is the same as those for AR and MA models. Here we take 4.5.3.3 ARMA Models Testing
ε t = yt − φ 1 yt − 1 − − φ p yt − p + θ 1 ε t − 1 + + θ q ε t − q , t = q + 1, … , n
© 2012 by Taylor & Francis Group, LLC
270
M e A suReM en t DAtA M o D eLIn G
We verify whether {yt} is an ARMA(p, q) series based on the independent test of { ε t , q + 1 ≤ t ≤ n} . Remark If fitting models fail the corresponding tests, other methods of data analysis should be carried out. These include the piecewise fitting, nonstationary model fitting, etc. The modeling steps of a stationary time series is given in Figure 4.4. 4.6 Nonstationary Time Series 4.6.1 Nonstationarity of Time Series
The time series {yt} obtained from actual data of dynamic measurements is often nonstationary. Such a nonstationarity is mainly reflected in the nonstationarity of mean or variance. The variance nonstationarity are mainly due to status changing or precision scope exceeding of the measuring equipment and apparatus. In such a situation, the prime approaches are
4.6.1.1 Processing Variance Nonstationarity
1. Divide the observational data into a number of sections with stationary variances. Input stationary observational data y1, y2, …, yn
Stationarity test
^ ^ Model selection with {ρ k}, {ϕkk}
Model parameters estimation, AIC order determination
Model testing
Figure 4.4
The modeling steps of a stationary time series.
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F tIM e seRIe s A n A LysIs
2 71
2. Transform the series into a stationary one. The generally used method is power transformation, that is, zt = ytβ , β ≥ 0; when β = 0, define ytβ = log yt . 4.6.1.2 Processing Mean Nonstationarity
a mean nonstationary series:
The following model describes
yt = f (t ) + et
(4.29)
where et is a zero-mean stationary series that can be expressed by an ARMA model. Eyt = f(t) represents the mean value change with time and it can be approximated by polynomial, exponential, or trigonometric functions. In this section, we will focus on the discussion of dealing with time series with stationary variances and nonstationary mean values. Usually there are two ways. One is to remove the tendency term f(t) by using the difference method. This approach can generally be used based on the fact that f(t) can be expressed with lower-order polynomial. Thus, the overall characteristics of f(t) + et can be obtained; the other is to establish models for f(t) and et separately and estimate model parameters simultaneously. The statistical characteristics of et and the estimate of f(t) can be obtained, respectively. 4.6.2 ARIMA Model
4.6.2.1 Definition of ARIMA Model
Definition 4.7 Suppose yt is a nonstationary time series; if there exists a positive integer d, such that ∇ d y t = wt , t ≥ d + 1
(4.30)
∑
d
d d ( −1)k C dk B k where wt is an ARMA(p, q) series, ∇ = (1 − B ) = k=0 is a d-order difference operator, then yt is called a d-order AutoRegressiveIntegrated-Moving-Average model, denoted as ARIMA(p, d, q) [18]. Take ARIMA(p,2,q) as an example. Assume that yt = β0 + β1t + et, that is, with a linear trend term, where et is a stationary zero-mean
© 2012 by Taylor & Francis Group, LLC
272
M e A suReM en t DAtA M o D eLIn G
series; then ∇yt = yt − yt − 1 = β1t + et − β1 (t − 1) − et − 1 = β1 + et − et − 1 , ∇ 2 yt = ∇yt − ∇yt − 1 = (β1 + et − et − 1 ) − (β1 + et − 1 − et − 2 ) ∆
= e t − 2e t − 1 + e t − 2 = wt Therefore, y t = y1 + ( y 2 − y1 ) + + ( y t − y t − 1 ) = y1 +
t
∑ ∇y j =2
j
On the other hand, ∇y t = ∇y 2 + ( ∇y 3 − ∇y 2 ) + + ( ∇y t − ∇y t − 1 ) = ∇y 2 + = ∇y 2 +
t
∑w k=3
t
∑∇ y k=3
2
k
k
Thus, y t = y1 + ∇y 2 + ∇y 2 + j =3 t
∑
= y1 + (t − 2)∇y 2 +
= y1 + (t − 2)∇y 2 + = y1 + (t − 2)∇y 2 +
t
∇ 2 yk k=3 j
∑ j
∑∑w j =3 k=3 t
k
t
∑∑w k=3 j =k
k
t
∑ (t − k + 1)w k=3
k
The reason for the model to be called an integrated ARMA model is that yt can be represented by the summation operations of the initial values y1, y2 and the differenced data wt(t ≥ 3).
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F tIM e seRIe s A n A LysIs
2 73
By the definition and mathematical induction method, there is yt =
d −1
∑C i =0
i t − d +i −1
i
∇ yd +
t −d
∑C j =1
d −1 t − j −1
wd + j , t ≥ d + 1 (4.31)
Hence, ARIMA(p, d, q) model can be represented by the summation operations of the initial values y1, y2, . . ., yd of yt and the differenced data wt(t ≥ d + 1). It can be seen from the above discussion that ARIMA models generally fit with the time series data with polynomial trend term. In general, if the trend item is an m-order polynomial, then d = m + 1. In general, there are three ways to find terms of nonstationary trends in time series data: (1) by studying engineering or physical background of the data, (2) by plotting of data points, and (3) by using statistical tests. If data {yt} consist of trend terms, the ARIMA model can be used to fit the data. The concrete steps are:
4.6.2.2 ARIMA Model Fitting for Time Series Data
1. Compute ∇i yt, i = 1, 2, . . ., and test the stationarity of ∇i yt, usually by the statistical test, i = 1, 2, . . ..
2. If ∇ d yt is stationary when i = d , then use AR, MA, or ARMA to model ∇ d yt , and thus obtain ARIMA( p , d , q ). 3. Use Equation 4.31 to obtain the fitted value y t of yt. 4.6.3 RARMA Model
The difference method in ARIMA models is a popular one in modern data processing, but it has some disadvantages. 1. The difference method breaks the original noise characteristics, so that the statistical properties of the noise become complicated. For instance, if we assume {et} in an ARIMA model is an AR (1) series, that is, et = ϕ1et−1 + εt, then after the firstorder difference there exists ∆
zt = et − et − 1 = (φ1et − 1 + ε t ) − (φ1et − 2 + ε t − 1 ) = φ1zt − 1 + ε t − ε t − 1
© 2012 by Taylor & Francis Group, LLC
2 74
M e A suReM en t DAtA M o D eLIn G
which is an ARMA(1, 1) series; after the second-order difference, it follows that ∆
wt = zt − zt − 1 = (φ1zt − 1 + ε t − ε t − 1 ) − (φ1zt − 2 + ε t − 1 − ε t − 2 ) = φ1wt − 1 + ε t − 2ε t − 1 + ε t − 2 which is an ARMA(1, 2) series. Generally, after d-order difference, wt is an ARMA(1, d) series. 2. Since the statistical characteristics obtained after the processing of ARIMA models are not the same as the original observation noise statistics, we cannot obtain the noise features of the measurement system. For example, for an ARIMA (p, 2, q) model, wt = ∇ 2 y t = e t − 2e t − 1 + e t − 1 = ∇ 2 e t Similar to the expression of yt above, we have et = e1 + (t − 2)(e 2 − e1 ) +
t
∑ (t − k + 1)w k=3
k
If the initial values of e1, e 2 are known, then the fitted data e t of et can be obtained. However, et is usually unobservable and we cannot obtain the values of e1, e 2. In practice, it is usually assumed that e 1 = e 2 = 0 . Therefore, e t =
t
∑ (t − k + 1)w k=3
k
et − e t = e1 + (t − 2)(e 2 − e1 ) = (3 − t )e1 + (t − 2)e 2 and Var(et − e t ) = (3 − t )2 Var(e1 ) + (t − 2) 2 Var(e 2 ) + 2(3 − t )(t − 2)COV (e1 , e 2 )
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F tIM e seRIe s A n A LysIs
275
= ( 2r0 − 2r1 )t 2 − (10r0 − 2r1 )t + 13r0 − 12r1
(
→ + ∞(t → +∞) since r0 ≥ r1
)
Thus, the model of noise et in the original observed data cannot be obtained by the difference method. 3. Some functions can be accurately approximated by polynomials, but their derivatives cannot be well approximated by lower-order polynomials. Higher-order differences of such functions are very different from those of polynomials. When Eyt = μt is such a function, the residual data obtained by the difference method cannot be considered to be stationary. Therefore, difference method is mainly applied in fitting overall data yt. When it comes to studying features of μt, et, the difference method does not work well. A good way of dealing with μt, et is to establish a RARMA model. In practice, μt is usually a continuously differentiable function of t. Based on functional approximation theory, μt can represented by µt =
N
∑ β ψ (t ) j =1
j
j
where {ψ1(t), . . ., ψN(t)} is a set of linearly independent basis functions; then N = β j ψ j (t ) + et y t j =1 e isa zero-mean stationary ARMA model t
∑
This is a linear regression model with ARMA noise and is called a RARMA (regression ARMA) model. Specially, when et is an AR or MA series, such a model is called a RAR or RMA model.
© 2012 by Taylor & Francis Group, LLC
276
M e A suReM en t DAtA M o D eLIn G
4.6.4 PAR Model
Particularly, in a RARMA and et is a zero-mean AR(p) series, the
4.6.4.1 Model and Parameter Estimation
model, when ψj(t) = t RARMA model is
j−1
r βi t i + e t yt = i =0 Φ ( e ) = ε t p t
∑
where εt is a zero-mean Gaussian white noise series, Var( ε t ) = σ ε2 . The model is called a PAR (polynomial AR) model [19]. Let β = (β0, β1, . . ., βr)τ, ϕ0 = −1. Then r
∑ β Φ (B)t
Φ p ( B ) yt =
i =0
i
p
i
+ Φ p ( B )et =
p
r
∑ β ∑ (−φ )(t − j ) i =0
i
i
j
j =0
+ εt
βi t i + ( −φ j ) Cil t i − l ( − j )l + ε t l =0 i =0 j =1 r i p i l i − l βi t + Ci t ( −φ j )( − j )l + ε t = j = 1 i =0 l =0 r i Cil t i − l bl + ε t βi = l =0 i =0
=
p
r
i
∑
∑
∑
∑
∑
∑
∑ ∑
where bl =
p
∑ (−φ )(− j ) , j =1
j
l
1 ≤ l ≤ r , b0 = 1 +
p
∑ ( − φ ). j =1
j
Let C 00 b0 0 A =
© 2012 by Taylor & Francis Group, LLC
C11b1
C10 b0
C 22 b2 C 21b1
Crr−−11br − 1
Crr−−12br − 2 Cr0− 1b0
Crr br Crr − 1br − 1 =∆ A (φ) Cr1b1 Cr0 b0
M e t h o D s o F tIM e seRIe s A n A LysIs
277
and ∆
∆
Aβ = a = (a0 , a1 , …, ar )τ , T = (1, t , …, t r )τ
(4.32)
Then Φ p ( B ) yt = ( Aβ)τ T + ε t = a0 + a1t + + ar t r + ε t which is a polynomial linear regression model of a. Let y t = Φ p ( B ) yt , y = ( y p + 1 , …, y n )τ , 1 X = 1
p+1
n
( p + 1)r nr
and ε = (εp+1, . . ., εn)τ. Then y = Xa + ε which is a linear regression model of a. When n − p ≥ r + 1, X τ X is positive definite. The least squares estimate of a is a = ( X τ X )−1 X τ y and the residual sum of squares is RSS = y τ ( I − H ) y , H = X ( X τ X )−1 X τ Let y p M = y n −1
© 2012 by Taylor & Francis Group, LLC
y1 , yn − p
y φ p +1 1 y = , φ = y n φp
2 78
M e A suReM en t DAtA M o D eLIn G
Then y = y − M φ and RSS = RSS(φ) = ( y − M φ)τ ( I − H )( y − M φ)
(4.33)
In practice, X, M, y are known. By solving the extreme value problem minRSS(φ), we have φ
M τ ( I − H )M φ = M τ ( I − H ) y By the theory of linear equations, a sufficient condition for the equation to have a unique solution is that span(X) and span(M) are linearly independent. Since span(M) is a linear subspace spanned by random vectors, span(X) is linearly independent of span(M). The equation has a unique solution: φ = ( M τ ( I − H )M ) −1M τ ( I − H ) y Therefore, a = ( X τ X )−1 X τ ( y − M φ ) β = A −1 (φ ) a 2
σ ε = RSS(φ )/(n − p − r ) Note that the procedure of estimating φ given above actually involves getting minimum values by least squares residuals twice. Therefore, the estimate φ is called a two-step least squares estimate of ϕ. 4.6.4.2 PAR Model Fitting
reveals that
A detailed analysis of Equation 4.33
RSS = RSS(φ, p, r )
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F tIM e seRIe s A n A LysIs
2 79
Therefore, the criterion of AIC can be used to determine the order of r, p. Let 2
AIC(r , p ) = log σ ε +
2(r + p ) n
2
where σ ε = RSS(φ , r , p )/(n − r − p ) and AIC(r , p ) = min AIC(r , p ) 0 ≤ r ≤ r0 0 ≤ p ≤ p0
In fact, φ can also be obtained simultaneously in the procedure of determining orders, which provides a, β . Thus, the fitting model for {yt} is r β i t i + et y t = i =0 et = φ 1 et − 1 + + φ p et − p + ε t
∑
Let ε t = yt −
r
∑ β t i =0
i
i
− φ 1 et − 1 − − φ p et − p
Then { ε t } should be a sample from white noise series if the above PAR model is correct. Hence, according to the independence test method in Section 4.4.2, if the test is passed, then the fitting model is accurate. Note that in the model parameter estimation performed previously, there exists
4.6.4.3 Further Discussions
yt =
© 2012 by Taylor & Francis Group, LLC
r
∑β t j =0
j
j
+ et
280
M e A suReM en t DAtA M o D eLIn G
and Φ p ( B ) yt =
r
∑a t j =0
j
j
+ εt
The smooth conduct of the parameter estimation of the PAR model is mainly due to the totally same form of the two equations. Generally, if yt = x(t )β + et , Φ p ( B ) yt = z(t )α + ε t where x(t) = (x 1(t), . . ., xr(t))τ, z(t) = (z1(t), . . ., zs(t))τ. When (x(p + 1), . . ., x(n))r×(n−p) is a full column rank matrix, and (z(p + 1), . . ., z(n))s×(n−p) is also a full column rank matrix, then the signal x(t)β is called a form-keeping type signal under the effect of Φp(B), or autoregressive form-keeping type signal. Specifically, if z(t) = x(t), then it is called a completely form-keeping type signal [20]. A few examples of form-keeping signals are given as follows. Suppose x(t)β = β0 + β1t + ⋅ ⋅ ⋅ + βr t r; from the previous discussion we have 4.6.4.3.1 Polynomial Signals
Φ p ( B )x(t )β = a0 + a1t + + ar t r that is, Φ p ( B )x(t )β = x(t )a where a = A(ϕ)β. Therefore, x(t)β is a (completely) form-keeping signal. 4.6.4.3.2 Trigonometric Function Signals
1. Cosine signal: Suppose x(t )β = β1 cos ω1t + β 2 cos ω 2 t + + βr cos ω r t (ωi ≠ ω j , i ≠ j )
© 2012 by Taylor & Francis Group, LLC
2 81
M e t h o D s o F tIM e seRIe s A n A LysIs
Then β k Φ p ( B )cos ω k t = βk 1 − Φ p ( B )x(t )β = k =1 k =1 r
r
∑
∑
p
∑ i =1
φi B i cos ω k t
β k cos ω k t − φi cos ω k (t − i ) k =1 i =1 p r β k 1 − φi cos ω k i cos ω k t = i =1 k =1 p + βk − φi sin ω k i sin ω k t = z(t )α i = 1
=
p
r
∑
∑
∑
∑
∑
where z(t ) = (cos ω1t , sin ω1t , cos ω 2 t , sin ω 2 t , …, cos ω r t , sin ω r t ) p −β1 φi cos ω1i , −β1 i =0 α = α(φ, β) = p φi cos ω r i , −βr −βr i =0
φi sin ω1i ,…, i =1 p φi sin ω r i i =1 p
∑
∑
∑
∑
τ
Since the components of z(t) are trigonometric basis, then the design matrix composed by it is full column rank. Therefore, cosine signals are form-keeping signals. 2. Sinusoidal signal: Suppose x(t )β = β1 sin ω1t + β 2 sin ω 2 t + + βr sin ω r t (ωi ≠ ω j , i ≠ j ) Similarly to (1), there is Φ p ( B )x(t )β = z(t )α where z(t) is the same as in (1), and
© 2012 by Taylor & Francis Group, LLC
282
M e A suReM en t DAtA M o D eLIn G
β1 α = α(φ, β) = βr
p
∑ φ sin ω i, i
i =1
1
p
∑ φ sin ω i, i
i =1
r
− β1
p
∑ i =0
− βr
p
∑ i =0
φi cos ω1i ,…, φi cos ω r i
τ
Therefore, sinusoidal signals are also form-keeping signals. 3. Combinational signals: Suppose x(t )β =
∑
r k =1
(β 2 k − 1 cos ω k t + β 2 k sin ω k t ) (ωi ≠ ω j , i ≠ j )
By straightforward calculation, we have Φ p ( B )x(t )β = x(t )α where p p −β1 φi cos ω1i + β 2 φi sin ω1i , …, i =0 i =1 α = α(φ, β) = p p φi cos ω r i + β 2r φi sin ω r i −β 2r − 1 i =0 i =1
∑
∑
∑
τ
∑
Therefore, combinational signals of this kind are (completely) form-keeping signals. 4. The linear combination signals formed by the three types of signals listed above are also form-keeping signals. With similar method as the one used in PAR models, the estimates of ϕ, β in form-keeping signals can also be given. 4.6.5 Parameter Estimation of RAR Model
We discussed earlier about the parameter estimation problem of several special RAR models. Now we present the parameter estimation for general RAR models.
© 2012 by Taylor & Francis Group, LLC
283
M e t h o D s o F tIM e seRIe s A n A LysIs
Suppose r β j ψ j (t ) + et yt = , t = 1, 2, …, n j =1 Φ ( B )e = ε t t p
∑
and y = ( y 1 , … , yn ) τ , e = ( e1 , … , en ) τ , ε = ( ε 1 , … , ε n ) τ X (1) ψ (1) 1 X = = … X (n) ψ (n) 1
ψ r (1) ψ r (n)
Then the RAR model can be expressed as Y = X β + e 2 Φ p ( B )e = ε, Eε = 0, COV ( ε) = σ I
(4.34)
Before estimating the parameters of (β, ϕ), we need to prove the following lemma. lemma 4.1 Suppose ϕ is a set of real number parameters, and 1 −
p
∑φ j =1
j
≠ 0, f (t )
is a real value continuous function, which is not identically zero. Then, there is
Φ p ( B ) f (t ) = f (t ) − φ1 f (t − 1) − − φ p f (t − p ) ≠ 0
© 2012 by Taylor & Francis Group, LLC
28 4
M e A suReM en t DAtA M o D eLIn G
Proof (proof by contradiction)
p
If Φp(B)f(t) = 0, that is,
∑φ j =0
f (t − j ) = 0 , then because it is a
j
p-order difference equation, f(t) must be a nonzero algebraic polynomial whose order is not more than p − 1. Therefore, we can assume that f (t ) =
k
∑β t j
j =1
j −1
, k ≤ p, and β1 ≠ 0
It follows that Φ p ( B ) f (t ) = =
k
∑α t j =1
j
j −1
=
p
∑ β Φ (B)t j =1
j
p
j −1
βj − φi ( t − i ) j − 1 i = 0 j =1 p
∑
p
∑
Comparing the constant term of two polynomials above, we have p α 1 = β1 − φi = β 1 1 − i =0
∑
p
∑ i =1
φi ≠ 0
Thus, Φ p ( B ) f (t ) ≠ 0 which contradicts the assumption that Φp (B)f(t) = 0. Let Φ (B) y Φ ( B )e p p +1 p p +1 Yφ = , eφ = Φ (B) y Φ ( B )e n n p p
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F tIM e seRIe s A n A LysIs
285
Φ ( B ) X ( p + 1) p Xφ = Φ ( B ) X (n) p X ( p + 1) − φ X ( p ) − − φ X (1) 1 p = X (n) − φ X (n − 1) − − φ X (n − p ) 1 p (n − p ) × r where X(p) = (ψ1(p), . . ., ψr(p)). Then, from the original model Y = X β + e Φ p ( B )e = ε we have Yϕ = X ϕ β + eϕ 2 Eeϕ = 0, COV (eϕ ) = σ I n − p
(4.35)
where Xϕ is a matrix of full column rank. This is due to the fact that ∀β ≠ 0,
f (t ) =
r
∑ β ψ (t ) ≠ 0 j
j =1
j
and using Lemma 4.1 it can be seen that Φp(B)f(t) ≠ 0, that is, r
r
∑ β Φ ( B )ψ ( t ) = ∑ β X j =1
j
p
j
j =1
j
( j) φ
≠0
where X φ( j ) is the jth column of Xϕ. Theorem 4.10 Suppose (β, ϕ) is the true value of the parameters in model 4.35, (α, ρ) is a set of constant vector that has the same dimension as (β, ϕ), and
© 2012 by Taylor & Francis Group, LLC
286
M e A suReM en t DAtA M o D eLIn G
p
∑φ j =0
j
≠ 0,
p
∑ρ j =0
j
≠ 0(ρ0 = φ0 = −1), β − α
2
+ φ−ρ
2
>0
then there holds 2
E Yρ − X ρα
> E Yϕ − X ϕ β
2
Proof From Lemma 4.1 and the conditions, Xϕ , Xρ are matrices of full column rank and e ρ is a zero-mean random vector. E Yρ − X ρα
2
= E Yρ − X ρβ + X ρ (β − α )
2
= E eρ + X ρ (β − α )
2
= X ρ (β − α ) + E eϕ + (eρ − eϕ ) = X ρ (β − α ) + E εi + i = p +1 2
2
= X ρ (β − α ) +
n
∑ n
2
(ϕ j − ρ j )ei − j j =1 p
∑
p
n
∑ Eε + ∑ ∑ (ϕ 2 i
i = p +1
2
i = p +1 j , k =1
j
2
− ρ j )(ϕ j − ρ j )r k − j
2
= X ρ (β − α ) + (n − p )σ ε2 + (n − p )(φ − ρ) τ R(φ − ρ) As long as α = β and ρ = ϕ are not true at the same time, we have 2
X ρ (β − α ) + (n − p )(φ − ρ)τ R(φ − ρ) > 0 Because E Yϕ − X ϕ β
2
= E eϕ
E Yρ − X ρα
© 2012 by Taylor & Francis Group, LLC
2
2
= (n − p )σ ε2
> E Yϕ − X ϕ β
2
M e t h o D s o F tIM e seRIe s A n A LysIs
287
Based on this theorem, the problem of parameter estimation in model 4.35 becomes the extreme value problem of E Yϕ − X ϕ β
2
= min
To solve the above minimum value problem, the two-step least squares method can be applied as Yφ − X φ ( X φτ X φ )−1 X φτ Yφ
2
= min ⇒ φ *
β * = ( X φτ* X φ* )−1 X φτ* Y φ* 4.6.6 Parameter Estimation of RMA Model
Suppose Y = X β + e e = Θq ( B )ε Then Ee = 0, COV(e) = K = (σij)n×n, where 0, σ ij = r , i − j
i− j >q
i− j ≤q
If K is known, from the linear regression theory (see Section 3.7), βLS = (X τ K −1 X)−1 X τ K −1 Y is the uniformly minimum variance linear unbiased estimate of β. Therefore, if K as the estimate of K can be
−1
−1
obtained, then β = ( X τ K X )−1 X τ K Y can be used as the estimate of β. While estimating K, the only thing to do is to give the estimate of r = (r 0, r 1, …, rq)τ . Thus, we should first give the estimate of r. Let H = X ( X τ X )−1 X τ , I − H = A = (aij ) ξ = AY = ( I − H )( X β + e ) = ( I − X ( X τ X )−1 X τ ) X β + ( I − H )e = ( I − H )e
© 2012 by Taylor & Francis Group, LLC
288
M e A suReM en t DAtA M o D eLIn G
It can be seen that the value of ξ can be given from the observational data and X, and it is also irrelevant with ξ or β. The following will estimate r based on the data of ξ. Let s = (n + 1)( k − 1) −
k ( k − 1) + (l − k + 1), 1 ≤ k ≤ l ≤ n 2
g s = ξ k ξ l , g = g 1 , g 2 ,…, g n(n + 1)
τ
2
zsj =
n− j +1
∑ (a a i =1
ki l ,i + j − 1
s = 1, 2, …,
+ ak ,i + j − 1ali )
n(n + 1) , 2
j = 1, 2, …, q , Z = (zsj )
It can be verified that Eg s =
q +1
∑z r j =1
sj j − 1
Thus, we have g = Zr + η, Eη = 0 It can also be proved that, if rank(X) = N, n ≥ N + q + 1, r = (Z τZ )−1 Z τ g is a linear unbiased estimator of r. 4.6.7 Parameter Estimation of RARMA Model
Suppose [21] Y = X β + e , t = 1, 2, …, n, X = X n × N , rank( X ) = N Φ p ( B )et = Θq ( B )ε t COV(e ) = K = (r i − j )n × n , r = (r0 , r1 , …, rn − 1 )τ
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F tIM e seRIe s A n A LysIs
289
As in the derivation of the RMA model, we have g = Zr + η, Eη = 0 However, Z = Z(n(n+1)/2)×n and Z τZ is a singular matrix. We cannot use the above equations to estimate r directly. It can be proved that if m < n − N is large enough (e.g., m ≥ p + q) such that n
∑zr
j =m +1
j j −1
2
≈0
we have g = Z p r p + η, Eη = 0 where Zp = (z1, . . ., zm), rp = (r0, . . ., rm−1)τ, and Z τp Z p is a positive definite matrix. Therefore, r p = (Z τp Z p )−1 Z τp g (when m < n − N is large enough) Note that, for an ARMA(p, q) model, r j = φ1r j − 1 + + φ p r j − p
( j > q)
Let j = q + 1, . . ., q + p, then from r q + 1 , …, r q + p , we can obtain φ1 , … , φ p . Again, from rj = ϕ1 rj−1 + . . . + ϕp rj−p, j = m + 1, . . ., n − 1, we have
j = m + 1, …, n − 1
−1
r j = φ1 r j − 1 + + φ p r j − p ,
−1
Therefore, K = (r i − j ) , and β = ( X τ K X )−1 X τ K
© 2012 by Taylor & Francis Group, LLC
Y.
290
M e A suReM en t DAtA M o D eLIn G
4.7 Mathematical Modeling of CW Radar Measurement Noise
Continuous wave (CW) radar is the main equipment for spacecraft tracking and measurement. The measurement principle can be found in Chapter 6. From the analysis of Chapter 6, we know that the measurement data yt from the flight vehicle tracked by a CW radar satisfy the following model: y t = s( t ) + e t where et is the measurement noise, and the true signal s(t) can be approximated precisely by a quadratic polynomial; then there is 3
yt =
∑β t j =1
j
j −1
+ et , t1 ≤ t ≤ t 2
Our mission here is to analyze the statistical characteristics of the unobservable noise {et, t = 1, . . ., n} based on the observational data {yt, t = 1, . . ., n} [22–24]. According to the discussions of the previous sections, the flow of CW radar measurement noise mathematical modeling can be carried out as shown in Figure 4.5. Based on three sets of CW radar data (I, II, and III), Tables 4.2 through 4.4 illustrate the results of modeling and analysis according to the above flow. From Section 4.4.3, we know that if there is still tendency in data, then |u| ≥ 2 (significance level α = 0.05). The u values in Table 4.2 show that the residuals are stationary. In order to make the modeling flow more clear, we take the S direction of data set I as an example. Figure 4.6 illustrates the autocorrelation function (upper figure) and the partial correlation function The figure tells us that its autocorrelation func(lower figure) of ∆S. tion has drag property, while the partial correlation function is truncated after two steps. Therefore, the data satisfy an AR(2) model. Figures 4.7 and 4.8 show the random error of the measurement data in the S direction and the residual ε(t) after the data fitted by a PAR model. According to the independence test in Section 4.4.2, we can easily see this as an independent process; furthermore, Figure 4.9 presents the probability density histogram of the data, which forms a
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F tIM e seRIe s A n A LysIs
2 91
Measurement data after pre-processing and systematic error elimination
Quadratic polynomial fitting
Stationarity test of the fitting residuals
Preliminary recognition of random error models
AR model
MA model
ARMA model
PAR model of the measurement data and parameter estimation
RMA model of the measurement data and parameter estimation
RARMA model of the measurement data and parameter estimation
Figure 4.5
The mathematical modeling flow of the CW radar measurement noise.
Table 4.2 stationarity Test of the Fitting Residuals VALUE oF U
ΔS
ΔP
ΔQ
∆S
∆P
∆Q
I II III
−0.45 0.56 0.027
0.17 0.31 −0.21
0.34 0.088 −0.37
0.089 0.093 0.48
0.091 0.44 −0.38
0.40 0.09 0.26
Table 4.3 Error models
I II III
© 2012 by Taylor & Francis Group, LLC
ΔS
ΔP
ΔQ
∆S
∆P
∆Q
AR(1) AR(1) AR(1)
AR(1) AR(1) AR(1)
AR(2) AR(1) AR(1)
AR(2) AR(2) AR(2)
AR(2) AR(2) AR(2)
AR(2) AR(2) AR(2)
292
M e A suReM en t DAtA M o D eLIn G
Table 4.4 Parameter Estimation of the Error models ΔP
ΔQ
∆S
∆P
∆Q
I
φ1 = 0.96
φ1 = 0.94
II
φ1 = 0.94
φ1 = 0.65
φ1 = 1.14 φ2 = −0.36 φ1 = 0.67
III
φ1 = 0.75
φ1 = 0.57
φ1 = 0.54
φ1 = 1.46 φ2 = −0.67 φ1 = 1.17 φ2 = −0.48 φ1 = 1.41 φ2 = −0.60
φ1 = 1.30 φ2 = −0.56 φ1 = 1.34 φ2 = −0.42 φ1 = 1.51 φ2 = −0.64
φ1 = 1.23 φ2 = −0.61 φ1 = 1.47 φ2 = −0.85 φ1 = 1.39 φ2 = −0.68
Autocorrelation function
ΔS
1
Partial correlation function
–1
0
20
k
1
–1
0
20
k
Figure 4.6 The autocorrelation function (upper figure) and the partial correlation function (lower figure) of ∆S.
εt
0.04
–0.04
0
Figure 4.7
k
The random error in the S direction.
© 2012 by Taylor & Francis Group, LLC
200
293
M e t h o D s o F tIM e seRIe s A n A LysIs
εt
0.02
–0.02
0
Figure 4.8
198
k
The residual ε(t) in the AR(2) model in the S direction.
normal process, and this can be also seen by the test conclusion from Section 4.4.1. In summary, the measurement noise of CW radars can be well described by zero-mean stationary normal AR(1) or AR(2) models. EXERciSE 4
1. Suppose that et is an AR(2) process, and take φ1 and φ2 as a. φ1 = 0.1950, φ2 = −0.95 b. φ1 = 1.5955, φ2 = −0.95 c. φ1 = 1.9114, φ2 = −0.95
50
40
30
20
10
0
–0.0088 –0.0067 –0.0046 –0.0025 –0.0004
Figure 4.9
0.0017
0.0037
0.0058
The histogram of ε(t) in the AR(2) model in the S direction.
© 2012 by Taylor & Francis Group, LLC
0.0079
0.0100
0.0121
294
M e A suReM en t DAtA M o D eLIn G
Answer the following questions based on each set of the above parameters: a. Calculate the eigenvalues of the two-dimensional correlation matrix of et. b. What is the variance of εt in order to make the variance of et to be 1? 2. Depict the stationary condition for an AR(2) process with the ranges of ρ1, ρ2, and draw the stationary region figure with ρ1, ρ2 as the coordinates. 3. Calculate the corresponding ρr, r = 0,1, . . ., 10 when the φ1, φ2 in an AR(2) model take the following three sets of values: a. φ1 = 0.6, φ2 = −0.2 b. φ1 = −0.6, φ2 = −0.2 c. φ1 = −0.8, φ2 = 0.6 4. Suppose there is an AR(2) process et = et−1 − 0.5et−2 + εt, where εt is a zero-mean white noise with the variance 0.5. a. Write the corresponding Yule–Walker equation and solve ρ1 and ρ2. b. Calculate the variance of et. 5. Suppose there is an ARMA(2,1) process et − φ1et−1 − φ2et−2 = εt − θ1εt−1. Represent σ e2 and ρk, k ≥ 1 with the parameters. 6. Derive the stationary conditions of an ARMA(3, m) model expressed with φ1, φ2, φ3. 7. Suppose the time series yt is composed of an AR(p) process et and a white noise process ηt, that is, yt = et + ηt e t = ϕ 1e t − 1 + + ϕ p e t − p + ε t where the white noises εt and ηt are independent. Prove that yt can be expressed by the following ARMA(p, p) model yt −
© 2012 by Taylor & Francis Group, LLC
p
∑ϕ y i =1
i
t −i
=
p
∑ψ ζ i =1
i t −i
+ st
M e t h o D s o F tIM e seRIe s A n A LysIs
295
where st is a white noise; determine the MA parameter ψi and the variance of st. 8. Derive the partial correlation expression φkk of the MA(1) model et = εt + θεt−1 (−1 < θ < 1), and verify that it is attenuated. 9. For a stationary series {xt}, suppose that μt = Ext, and let 1 x = n
n
∑x t =1
t
Calculate Εx and Var(x ). 10. Prove the property (1) of rk and rk∗ in Section 4.3.1. 11. Derive the moment estimation expressions of the parameters and the variance in the MA(1) model. 12. Derive the moment estimation expressions of the parameters and the variance in the ARMA(1,1) model. 13. Prove that in the stationarity test in Section 4.4.3, the following equations hold: EA =
VarA =
1 k( k − 1) 4
1 k( 2k 2 + 3k − 5) 72
14. Suppose that the fractionation workshop in a chemical plant has the production quantity per day in a certain period of time as follows: 3.23 3.57 2.44 3.88 5.38 3.26 1.89 1.58 2.29 1.31 0.99 2.02 1.00 2.31 2.46 4.66 5.27 3.78 2.74 3.39 3.51 3.53 4.03 5.15 5.76 5.38 4.76 3.30 3.58 4.25 3.75 4.15 5.25 5.18 4.32 5.65 4.94 5.00 5.62 5.30 4.07 2.32 3.62 4.42 3.32 1.02 2.79 4.58 4.74 3.99 3.22 3.21 3.95 3.31 Establish a model and analyze with the Box–Jenkins method. 15. Suppose yt = a 0 + a1t + et, where et is a stationary series. Prove that the sample autocorrelation function of {yt, t = 1, 2, . . ., n} is a constant sequence close to 1.
© 2012 by Taylor & Francis Group, LLC
296
M e A suReM en t DAtA M o D eLIn G 4
16. Prove that the combined signal X (t )β = β 5e t − β6 t 3 is a form-keeping type signal.
References
∑ β cos ω t + k =1
k
k
1. Hongzhi An, Zhaoguo Chen, Jinguan Du, Yimin Pan. Time Series Analysis with Applications. Beijing: Science Press, 1983 (in Chinese). 2. Hongzhi An. Time Series Analysis. Shanghai: East China Normal University Press, 1992 (in Chinese). 3. Weiqin Yang and Lan Gu. Time Series Analysis and Dynamic Data Modeling. Beijing: Beijing Institute of Technology Press, 1988 (in Chinese). 4. Bendat Julius Samuel, Piersol Allan Gerald. Random Data: Analysis and Measurement Procedures. New York: Wiley-Interscience, 1971. 5. Sudhakar M. Pandit, Shien-Ming Wu. Time Series and System Analysis with Applications. New York: John Wiley and Sons, 1983. 6. George E. P. Box, Gwilym M. Jenkins. Time Series Analysis Forecasting and Control. Revised edition. San Francisco: Holden-Day, 1976. 7. Hong-Zhi An, Chen Zhao-Guo, E. J. Hannan. Autocorrelation, autoregression and autoregressive approximation. The Annals of Statistics, 1982, 10(3): 926–936. 8. Lan Gu, Hongzhi An. The fine structure and statistical analysis of autoregressive models. Acta Mathematicae Applicatae Sinica, 1985, 8(4): 433– 455 (in Chinese). 9. Hannan E. J., Rissanen J. The recursive estimation of mixed autoregressive-moving average order. Biometrika, 1982, 69(1): 81–94. 10. Xiaoyun Liang. Normality test. Mathematics in Practice and Theory, 1988, 18(1): 45–50 (in Chinese). 11. Ljung G. M., Box G. E. P. On a measure of lack of fit in time series models. Biometrika, 1978(65), 297–303. 12. Gang Zhou. Research on stationarity test methods. Master thesis of National University of Defense Technology, 1994 (in Chinese). 13. Yi Wu, Dongyun Yi. Study on data pre-processing of stationarity test for measurement errors. Journal of National University of Defense Technology, 1996, 18(2): 130–134 (in Chinese). 14. Walter Vandaele. Applied Time Series and Box-Jenkins Models. New York: Academic Press, 1993. 15. Akaike Hirotugu. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1974, 19(6): 716–723. 16. Rui Lv. Strategies for model order determination in time series analysis. Journal of National University of Defense Technology, 1988, 10(4): 97–106, 120 (in Chinese). 17. Wenquan Zhang. A model order determination method. System Engineering, 1989, 7(2): 6–11, 70 (in Chinese).
© 2012 by Taylor & Francis Group, LLC
M e t h o D s o F tIM e seRIe s A n A LysIs
297
18. Dongyun Yi. A new parameter’s estimation method and applications of extended stationary mixed regressive models. Mathematics in Practice and Theory, 1995, 25(3): 1–4 (in Chinese). 19. Dongyun Yi, Zhengming Wang. Parameter estimation of polynomial signal and AR noise model. Acta Electronica Sinica, 1995, 23(6): 84–87,90 (in Chinese). 20. Dongyun Yi, Zhengyu Jiang. Parameter estimation of form-keeping type signal combination models under the effect of auto-regressive operators. Acta Electronica Sinica, 1997, 25(7): 12–15 (in Chinese). 21. Zhengming Wang, Dongyun Yi. Parameter recognition of system models with ARMA noise, Control Theory & Applications, 1996, 13(4): 471–476 (in Chinese). 22. Peitai Pan, Jinyou Xiao, Fazhong Zhu. Statistical analysis of tracking equipment errors. Journal of Spacecraft TT&C Technology, 1987, 6(1): 59–66 (in Chinese). 23. Lisheng Liu, Chunhua Tian. Statistical characteristics of observational data using least squares fitting residuals. Journal of Spacecraft TT&C Technology, 1985, 4(2): 24–31 (in Chinese). 24. Zhiming Du, Chunpu Xing. Statistical analysis of variances using variable difference method. Journal of Spacecraft TT&C Technology, 1985, 4(2): 19–24 (in Chinese).
© 2012 by Taylor & Francis Group, LLC
5 d iscrE TE -TimE K alman filTEr
5.1 Introduction
The Kalman filter introduced in this chapter mainly serves in processing real-time data. In the fields of missile and space technology, we often need to deal with the issues of command, displays, and security control that are inseparable from the Kalman filter. Actually, the Kalman filtering method can be regarded as the application of the Gauss–Markov theorem to processing dynamic measurement and real-time data. The realization of the Kalman filtering method involves steps such as model establishment, formulas applying, and expected results analysis, among which the most crucial part is the establishment of appropriate equations of state and measurement models. For instance, let us consider a real-time flight-tracking scenario of a flying target. Suppose the target moves with a uniform velocity. The position parameters are Zk = (X(tk),(Y(tk),(Z(tk))τ , tk−tk−1 = h. Then, position parameters and velocity of the target should satisfy the following equations:
Z k = X k − 1 + hZ k − 1 Z k = Z k − 1
(5.1)
where Z k = ( X (t k ), Y (t k ), Z (t k ))τ is the velocity parameter. In practice, the model is as follows because objective environmental factors have to be considered. Z k = Z k − 1 + hZ k − 1 + Wk − 1,1 Z k = Z k − 1 + Wk − 1, 2
(5.2)
299
© 2012 by Taylor & Francis Group, LLC
300
M e A suReM en t DAtA M o D eLIn G
where Wk,1, Wk,2 are three-dimensional random vectors that reflect the impact of environmental factors. In order to determine the flight status of a target, a tracking system is used to obtain measurement data that have a relationship with the real-time location Zk of Yk : Yk = Z k + V k
(5.3)
where V k is the measurement noise. Let X k = (Z kτ , Z kτ )τ . Equations 5.2 and 5.3 can be rewritten as I3 X k = 0 Yk = ( I 3
hI 3 + Wk − 1 X I 3 k − 1 0) X k + V k
(5.4a) (5.4b)
where Wk − 1 = (Wkτ− 1,1 ,Wkτ− 1, 2 )τ . Equations 5.4a and 5.4b form a linear stochastic system. Equation 5.4a is the state equation of the system and it depicts the objective evolving rules that the system state follows. Wk is the state noise. Equation 5.4b is the measurement equation of the system. It describes the information provided by external observations and can also be understood as constraints applied to the state of the system. A filter eliminates as much the interference as possible from the measurement information received and separates the true signal needed. The question we considered is how to determine the state parameters of the system in real time using the measurement information Yk at the moment tk. That is, how to estimate the random vector Xk of the state from the random vector Yk of the measurement. The differences between methods of parameter estimation discussed here and those in Chapters 3 and 4 are as follows. 1. The parameters estimated here are random vectors rather than constant vectors. 2. The state equation can be employed. 3. The estimation has to be performed in real time. A general form of a linear stochastic system is as follows.
© 2012 by Taylor & Francis Group, LLC
D Is C Re t e-tIM e K A L M A n FILt eR
3 01
X k = Φ k ,k − 1 X k − 1 + Γ k − 1Wk − 1 Yk = H k X k + Vk
(5.5a) (5.5b)
where Xk is the state vector, Yk is the measurement vector, Φk,k−1 is the state transition matrix, Hk is the measurement matrix, Wk is the state noise, V k is the measurement noise, and Γk is the interference matrix. The aim of Kalman filter is to get real-time estimate of Xk from Yk by Equation 5.5. The following is a brief historical review of the development of Kalman filtering. Norbert Wiener, a famous mathematician and also known as the founder of Cybernetics, started the research on filtering theory for stationary processes in his 1949 [1] article. However, with the rapid development of space technology, Wiener’s filtering method could not meet the needs of real-time signal processing due to its computational complexity, high memory capacity, and limitation of handling stationary processes only [2,3]. In 1960, by introducing state variables, Kalman [4] successfully proposed a recursive method that calculates the estimated state vector at a new time based on the measurement data at the new time and the estimated state vector at its previous time. The method broke through drawbacks of the Wiener filter. Nowadays, contents of a Kalman filter are very rich after half a century’s development. It has been widely used in various fields such as aerospace, aviation, navigation, communication, control, signal processing, and national economy. Due to space limitations, this chapter introduces a Kalman filter briefly and focuses on Kalman filter problems with colored noise. For a more comprehensive understanding of the Kalman filter, see Refs. 5–14. 5.2 Random Vector and Estimation 5.2.1 Random Vector and Its Process
5.2.1.1 Mean Vector and Variance Matrix
Definition 5.1 Suppose X = (X1, X 2,. . ., Xn)τ is a random vector of n dimensions. Its mean vector is
© 2012 by Taylor & Francis Group, LLC
302
M e A suReM en t DAtA M o D eLIn G
EX = (EX 1 , EX 2 ,…, EX n )τ where EX i =
∫
+∞
−∞
∫
+∞
−∞
xi f (x1 ,…, xn )dx1 …dxn .
The variance matrix of X is Var( X ) = E[( X − EX )( X − EX )τ ] +∞
=
∫ (x − EX )(x − EX )
τ
f (x )dx
−∞
If X, Y are random vectors of n,m dimensions, respectively, and their joint probability density function is f(x, y), then their covariance matrix is COV ( X , Y ) = E[( X − EX )(Y − EY )τ ] +∞
=
∫ (x − EX )( y − EY )
τ
f (x , y )dx
−∞
The covariance matrix has the following properties: 1. COV(X, Y) = COV(Y, X)τ . 2. E(AX + BY) = A(EX) + B(EY), where A, B are r × n, r × m-dimensional nonrandom matrices, respectively. 3. E(AX)(BY)τ = A(E(XYτ))Bτ , where A, B are r × n, s × m-dimensional nonrandom matrices, respectively. 4. E(XYτ) = COV(X, Y) + (EX)(EY)τ . Definition 5.2 Two random vectors X, Y are independent if f(x,y) = f X (x)f Y (y). X, Y are uncorrelated if E(XYτ) = (EX)(EY)τ , that is, COV(X, Y) = 0.
© 2012 by Taylor & Francis Group, LLC
D Is C Re t e-tIM e K A L M A n FILt eR
303
5.2.1.2 Conditional Mean Vector and Conditional Variance Matrix
Definition 5.3 Suppose X, Y are random vectors, the conditional mean vector (also called conditional expectation) of X, given the condition of Y = y, is +∞
E( X | y ) = E( X | Y = y ) =
∫ xf (x | y)dx
−∞
and the conditional variance matrix is Var( X | y ) = E{[( X − E( X | y ))( X − E( X | y ))τ ] | y } +∞
=
∫ (x − E(x | y))(x − E(x | y))
τ
f (x | y )dx
−∞
where f(x | y) is the conditional probability density f(x | y) = f(x,y)/f Y (y). Depending on the value of random vector Y, the conditional expectation E(X | y) = E(X | Y = y) is a random vector and is usually denoted as E(X | Y). The conditional expectation possessed the following properties: 1. E(E(X | Y)) = EX. Proof xf (x | y )dx f Y ( y )dy −∞ −∞ +∞ +∞
E(E( X | Y )) =
∫ ∫
+∞ +∞
=
∫ ∫ xf (x | y) f
+∞ +∞ Y
( y )dxdy =
−∞ −∞
xf (x , y )dy dx = −∞ −∞ +∞ +∞
=
© 2012 by Taylor & Francis Group, LLC
∫ ∫
∫ ∫ xf (x, y)dxdy
−∞ −∞ +∞
∫ xf (x)dx = EX
−∞
304
M e A suReM en t DAtA M o D eLIn G
2. If X, Y are independent, then E(X | Y) = EX. 3. For any function of Y, f(Y), we have E(Xf(Y) | Y) = f(Y)E(X | Y). 4. E(X1 + X 2 | Y) = E(X1 | Y) + E(X 2 | Y). 5. E[E(X | Y) | Y] = E(X | Y). 5.2.1.3 Vector Random Process
Definition 5.4 A vector random process is a series of random vectors {Xt, t ∈ T} where T is a set of parameters. In this chapter, T = {0, ± 1, ± 2,. . .} and a vector random process is simply referred to as a random process. Definition 5.5 Suppose tk ∈ T, k = 1,2, . . ., N, then the joint probability distribution function of n-dimensional random vectors X1, X2,. . ., XN is F ( x1 , x 2 ,… , x N ) = P ( X 1 < x1 , X 2 < x 2 ,… , X N < x N ) where x1,x2,. . .,xN are n-dimensional vectors. Definition 5.6 1. The mean function of a random process {Xt,t ∈ T} is defined as µ t = EX t , t ∈ T 2. The autocovariance function is defined as COV ( X t1 , X t 2 ) = E[( X t1 − µ t1 )( X t 2 − µ t 2 )τ ], ∀t1 , t 2 ∈ T 3. The cross-covariance function of two random processes {Xt,t ∈ T} and {Yt,t ∈ T} is defined as COV ( X t1 , Yt 2 ) = E[( X t1 − EX t1 )(Yt 2 − EYt 2 )τ ], ∀t1 , t 2 ∈ T
© 2012 by Taylor & Francis Group, LLC
D Is C Re t e-tIM e K A L M A n FILt eR
305
Definition 5.7 For an n-dimensional vector random process {Xt, t ∈ T}, 1. If the mean function μt = 0, ∀t ∈ T, then the process is a zeromean-value process. 2. If the autocovariance function COV( X t1 , X t 2 ) = 0, ∀t1 ≠ t 2 ∈T , then the process is a white noise process. 3. If f(x) as the joint probability density of X t1 , X t 2 ,…, X tm is normal, ∀t1,t 2,. . .,tm ∈ T, then the process is a Gaussian process. 4. If conditions (1) and (2) are satisfied, then the process is called a zero-mean-value white noise process; If conditions (2) and (3) are satisfied, then the process is a Gaussian white noise; If conditions (1), (2), and (3) are all satisfied, then the process is called a zero-mean-value Gaussian white noise. 5.2.2 Estimate of the State Vector
Different methods of state vector estimation will be derived if different criteria are applied to determine the estimate of the state vector. There are many estimation methods such as minimum mean square error estimation (MMSEE), maximum posterior estimation, maximum likelihood estimation, least squares estimation, and so on. We introduce only the minimum mean square error estimation here, since it is very convenient in practical application. It does not involve the distribution of a random vector, and it requires only the existence of the first and second-order moments. Denote X (Y ) as the estimate of the state vector X; the error of the estimation is X = X − X (Y );then the mean square error matrix is 5.2.2.1 Minimum Mean Square Error Estimate
τ ) = E( X − X (Y ))( X − X (Y ))τ E( XX +∞ +∞
=
∫ ∫ (x − X (Y ))(x − X (Y ))
−∞ −∞
© 2012 by Taylor & Francis Group, LLC
τ
f (x , y )dxdy
306
M e A suReM en t DAtA M o D eLIn G
+∞ = (x − X (Y ))(x − X (Y ))τ f (x | y )dx f Y ( y )dy −∞ −∞ +∞
∫ ∫
To get the minimum of the mean square error matrix, that is, τ ) = min , X (Y ) has to satisfy E( XX +∞
∫ (x − X (Y ))(x − X (Y ))
τ
f (x | y )dx = min
(5.6)
−∞
Theorem 5.1 Let X MV (Y ) be a solution to the minimum problem 5.6. Then X MV (Y ) = E( X | Y ). Proof +∞
∫ (x − X (Y ))(x − X (Y ))
τ
f (x | y )dx
−∞
+∞
=
∫ (x − E( X | y) + E( X | y) − X (Y ))(x − E( X | y) + E( X | y)
−∞
− X (Y ))τ f (x | y )dx +∞
=
∫ (x − E( X | y))(x − E( X | y))
τ
f (x | y )dx
−∞
+∞
+
∫ (E( X | y) − X (Y ))(E( X | y) − X (Y ))
τ
f (x | y )dx
−∞
+∞
+
∫ (x − E( X | y))(E( X | y) − X (Y ))
τ
f (x | y )dx
τ
f (x | y )dx
−∞ +∞
+
∫ (E( X | y) − X (Y ))(x − E( X | y))
−∞
© 2012 by Taylor & Francis Group, LLC
307
D Is C Re t e-tIM e K A L M A n FILt eR
Note that the third term on the right-hand side of the above equation is +∞
∫ (x − E( X | y)(E( X | y − X (Y ))
τ
f (x | y )dx
−∞
+∞
∫ (x − E( X | y) f (x | y)dx(E( X | y − X (Y ))
=
τ
−∞
= (E( X | y ) − E( X | y ))(E( X | y − X (Y ))τ = 0 Similarly, the fourth term also equals to 0. Therefore, we have +∞
∫ (x − X (Y ))(x − X (Y ))
τ
f (x | y )dx
−∞
+∞
=
∫ (x − E( X | y)(x − E( X | y)
τ
f (x | y )dx
−∞
+∞
+
∫ (E( X | y − X (Y ))(E( X | y − X (Y ))
τ
f (x | y )dx
−∞
+∞
≥
∫ (x − E( X | y)(x − E( X | y)
τ
f (x | y )dx
−∞
The equality holds if and only if X (Y ) = E( X | Y ). Hence, X MV (Y ) = E( X | Y ).
Note that E( X MV (Y )) = E(E( X | Y )) = EX . X MV (Y ) is an unbiased estimate of X with the minimum mean square error matrix as E( X − X MV (Y ))( X − X MV (Y ))τ +∞
=
∫ (x − E( X | y)(x − E( X | y)
−∞
© 2012 by Taylor & Francis Group, LLC
τ
f (x | y )dx = Var( X | Y )
308
M e A suReM en t DAtA M o D eLIn G
5.2.2.2 Linear Minimum Mean Square Error Estimate (LMMSEE) It follows that the MMSEE of X based on Y is exactly the conditional expectation E(X | Y). However, the calculation of E(X | Y) often involves the conditional probability density f(x | y) and E(X | Y) is usually a nonlinear random function of Y. Consequently, it is not easy to carry out the calculation. We now discuss an estimate based on the linear function of Y, that is, to find
X L = α + BY
(5.7)
such that the mean square error matrix satisfies E( X − X L (Y ))( X − X L (Y ))τ = min
(5.8)
Theorem 5.2 The linear function of Y with a form of Equation 5.7 that satisfies condition 5.8 is X L = EX + COV ( X , Y )(Var(Y ))−1 (Y − EY ) Proof Suppose X L = α + BY . Then E( X − X L (Y ))( X − X L (Y ))τ = E( X − α + BY )( X − α + BY ) τ = E(( X − EX ) − (α − EX + BY ))(( X − EX ) − (α − EX + BY )) τ = E(( X − EX ) − (α − EX + BEY ) − B(Y − EY )) (( X − EX ) − (α − EX + BEY ) − B(Y − EY )) τ Let b = α − EX + BEY. Then
© 2012 by Taylor & Francis Group, LLC
D Is C Re t e-tIM e K A L M A n FILt eR
309
E( X − X L (Y ))( X − X L (Y ))τ = E( X − EX )( X − EX )τ − E( X − EX )b τ − E( X − EX )(Y − EY )τ B τ − bE( X − EX )τ + bb τ + bE(Y − EY )τ B τ − BE(Y − EY )( X − EX )τ + BE(Y − EY )b τ + BE(Y − EY )E(Y − EY )τ B τ = Var( X ) − COV ( X , Y )B τ + bb τ − BCOV (Y , X ) + BVar(Y )B τ = bb τ + B − COV ( X , Y )Var −1 (Y ) Var(Y ) B − COV ( X , Y )Var −1 (Y )
τ
+ Var( X ) − COV ( X , Y )Var −1 (Y )COV (Y , X ) Note that Var(X)−COV(X,Y)Var−1(Y)COV(Y,X) is independent of α and B. We have E( X − X L (Y ))( X − X L (Y ))τ = min ⇔ b = 0, B = COV ( X , Y )Var −1 (Y ), B = COV(X,Y)Var−1(Y), and α = EX−BEY = EX−COV(X,Y)Var−1(Y)EY. Therefore, we have X L = EX + COV( X , Y )
(Var(Y ))−1 (Y − EY ). Note:
1. Since Ε X L = ΕX + COV(X ,Y )(Var(Y ))−1Ε(Y − EY ) = EX , X L is an unbiased estimate of X and its mean square error matrix is E( X − X L (Y ))( X − X L (Y ))τ = Var( X ) − COV ( X , Y )Var −1 (Y )COV (Y , X ) 2. It is observed that X L is dependent on the first- and secondorder moments of X, Y rather than their distributions. Thus, X L can be easily calculated.
© 2012 by Taylor & Francis Group, LLC
310
M e A suReM en t DAtA M o D eLIn G
5.2.2.3 The Relation between MMSEE and LMMSEE
following inequality holds
Generally, the
E( X − X L (Y ))( X − X L (Y ))τ ≥ E( X − X MV (Y ))( X − X MV (Y ))τ The next theorem shows that equality holds when the joint distribution of X and Y is normal. Theorem 5.3 If the joint distribution of X, Y is normal distribution, then X MV = X L . The proof is omitted. Note that the two estimators are different in general. 5.3 Discrete-Time Kalman Filter
Assume that we have a measurement series Y1,. . .,Yk and Y ( k) = (Y1τ ,…, Ykτ ). Let X j |k be the estimate of the state vector Xj at time j given the information of Y(k). The estimation error is X j |k = X j − X j |k and the mean square error matrix is P j |k = E( X j |k X τj |k ). Conventionally, X j |k is called filtering when j = k, X j |k is called predicting when j > k, and X j |k is called smoothing when j < k. A linear filtering means that X k|k is a linear function of Y(k).
5.3.1 Orthogonal Projection
There are many ways to derive the formulas of a Kalman filter. We discuss only the method of orthogonal projection in this book. The definition of orthogonal projection is introduced next. Definition 5.8 Suppose X, Y are n, m-dimensional random vectors with the first and second moments, respectively. If there exists an n-dimensional vector Z such that 1. There are an n-dimensional nonrandom vector a and an n × mdimensional matrix B such that Z = a + BY (linearity)
© 2012 by Taylor & Francis Group, LLC
D Is C Re t e-tIM e K A L M A n FILt eR
311
2. EZ = EX (unbiasedness) 3. E(X−Z)Yτ = 0 (orthogonality) Vector Z is called the orthogonal projection of X on Y and is denoted by ( X | Y ). Z =E lemma 5.1 Suppose that X, Y are random vectors with the first and second ( X | Y ) is the unique linear minimum moments, respectively. Then E mean square error estimate (LMMSEE) of X based on Y, that is, ( X | Y ) = EX + COV ( X , Y )Var −1 (Y )(Y − EY ) E
(5.9)
Proof It is easy to verify that X L is the LMMSEE of X based on Y and has the three properties of an orthogonal projection. Therefore, the orthogonal projection exists. The following discussion proves that the ( X | Y ) is X L . orthogonal projection E By the definition of an orthogonal projection, we have Z = a + BY, and EZ = EX = a + BEY. Therefore, a = EX − BEY , Z = EX + B(Y − EY ) Furthermore, 0 = E( X − Z )Y τ = E{[( X − EX ) − B(Y − EY )]Y τ } = E{[( X − EX ) − B(Y − EY )](Y − EY )τ } = COV ( X , Y ) − BVar(Y ) Hence B = COV ( X , Y )Var −1 (Y ) and Z = EX + COV ( X , Y )Var −1 (Y )(Y − EY ) = X L
© 2012 by Taylor & Francis Group, LLC
312
M e A suReM en t DAtA M o D eLIn G
As a result of Lemma 5.1, we have the following lemma. lemma 5.2 Suppose that X, Y are random vectors with the first and second moments, respectively, and A is a nonrandom matrix. Then ( AX | Y ) = A E (X | Y ) E Theorem 5.4 Suppose that X, Y, y are three random vectors with the first and sec ( X | Y ), y = y − E ( y | Y ). ond moments. Let Y* = (Yτ,yτ)τ, X = X − E Then
(X | Y * ) = E (X | Y ) + E ( X | y ) E −1 τ ( X | Y ) + E( Xy τ ) E( yy =E ) y
Proof By the unbiasedness of an orthogonal projection, we have EX = Ey = 0. By Lemma 5.1, we have ( X | y ) = EX + COV ( X , y )Var −1 ( y )( y − Ey ) E −1
τ τ ) E( yy = E( Xy ) y
which is the second equality of the conclusion. Next, we show that −1 τ (X | Y * ) = E ( X | Y ) + E( Xy τ ) E( yy y E )
It is sufficient to show that the right-hand side of the equation above, (X | Y ) + E ( X | y ), has the three properties of an denoted by W = E
orthogonal projection of X on Y *.
© 2012 by Taylor & Francis Group, LLC
313
D Is C Re t e-tIM e K A L M A n FILt eR
1. Linearity: ( y | Y ) are both linear functions of Y, W can ( X | Y ) and E Since E ( X | Y ) = a + B Y. be expressed as a linear function of Y *. In fact, let E 1
Then
1
( X | y ) = a + B y = a + B ( y − E ( y | Y )) E 2 2 2 2 = a2 + B2 y − B2 (a3 + B3Y ) = a2 − B2 a3 + B2 y − B2 B3Y Hence, (X | Y ) + E ( X | y ) = a + a − B a + ( B − B B Y ) + B y W =E 1 2 2 3 1 2 3 2 ∆
= a + BY * where a = a1 + a2 − B2a3, 2. Unbiasedness:
B = (B1 − B2B3
B2).
( X | Y )) + E(E ( X | y )) = EX + EX = EX EW = E(E 3. Orthogonality: By definition, we have ( X | Y ))Y τ ] = 0, and τ = E[( X − E EXY ( y | Y ))Y τ ] = 0 τ = E[( y − E EyY ( y | Y ) is a linear function of Y. We have Note that E ( y | Y ))τ ] = E[ y (E ( y | Y ))τ ] = 0, E[ X (E ( y | Y ))τ ] = E( Xy τ ) = E( Xy τ ) + E[ X (E τ ) E( Xy ( y | Y ))τ ] = E( yy τ ) τ ) = E( yy τ ) + E[ y (E E( yy
© 2012 by Taylor & Francis Group, LLC
314
M e A suReM en t DAtA M o D eLIn G
Therefore, E[( X − W )Y * τ ] = (E[( X − W )Y τ ], E[( X − W ) y τ ]) ( X | Y ) − E( Xy τ )(E( yy τ )−1 y ]Y τ } E[( X − W )Y τ ] = E{[ X − E τ ) − E( Xy τ )(E( yy τ )−1 E( yY τ ) = 0 = E( XY ( X | Y ) − E( Xy τ )(E( yy τ )−1 y ]Y τ } E[( X − W )Y τ ] = E{[ X − E τ ) − E( Xy τ )(E( yy τ )−1 E( yy τ) = E( Xy τ ) − E( Xy τ )(E( yy τ )−1 E( yy τ ) = 0 = E( Xy Consequently, we have E[( X − W )Y * τ ] = 0 Figure 5.1 illustrates the geometric interpretation of Theorem 5.4. 5.3.2 The Formula of Kalman Filter
Consider a linear random system X k = Φ k ,k − 1 X k − 1 + Γ k − 1Wk − 1 Yk = H k X k + Vk
(5.10) (5.11)
where EWk = 0, COV(Wk,Wj) = Qkδkj,EVk =0, COV(Vk,Vj) = R kδkj, COV(Wk,Vj) = 0. Such assumptions of the system noise are usually X ~
X
Y
Eˆ (X |Y )
~
Eˆ (X |Y*)
(Vary~)–1 y~ y~
Figure 5.1
orthogonal projection geometry.
© 2012 by Taylor & Francis Group, LLC
~
~
E(Xyτ)(Vary )–1 y
y
D Is C Re t e-tIM e K A L M A n FILt eR
315
called assumptions of a standard system noise. Moreover, assume that the initial states have the following statistical properties: EX 0 = µ 0 , Var( X 0 ) = P0 , COV ( X 0 ,Wk ) = COV ( X 0 ,Vk ) = 0 Theorem 5.5 For an n-dimensional dynamic system 5.10 and an m-dimensional measurement system 5.11 with assumptions of a standard system noise, the optimal linear filter of Xk, denoted by X k, can be calculated through the following equations recursively: • Filter formula X k = X k|k − 1 + K k (Yk − Y k|k − 1 ) = Φ k ,k − 1 X k − 1 + K k (Yk − H k Φ k ,k − 1 X k − 1 ), X 0 = X 0
(5.12)
• Gain matrix K k = Pk|k − 1 H kτ ( H k Pk|k − 1 H kτ + Rk )−1 • Mean square error matrix of prediction error Pk|k − 1 = Φ k ,k − 1Pk − 1Φ kτ ,k − 1 + Γ k − 1Qk − 1Γ τk − 1 • Mean square error matrix of filter error Pk = ( I − K k H k )Pk|k − 1 , P0 = Var( X 0 ) The physical meaning of filter formula 5.12 is that the filtered value is equal to the predicted value X k|k −1 plus a term of correction which is the product of the prediction error and the gain matrix. The flow of a Kalman filter is as follows (see Figure 5.2). Note: 1. In the process of deriving the recursive formulas of a Kalman filter, the filtered value X k is essentially the linear minimum mean square error estimation of the state vector Xk based on
© 2012 by Taylor & Francis Group, LLC
316
M e A suReM en t DAtA M o D eLIn G
Initial values X0, P0, k = 0
k=k+1
^
^
One step prediction Xk|k – 1 = Φk.k – 1Xk – 1
Mean square error matrix of prediction error Pk|k – 1 = Φk.k – 1Pk – 1 Φτk.k – 1 + Γk – 1Qk – 1Γτk – 1
Filter gain Kk = Pk|k – 1 Hτk (Hk Pk|k – 1 Hτk + Rk)–1 ^
^
^
Optimal filter value Xk = Φk.k – 1Xk – 1 + Kk (Yk – HkXk|k – 1)
Mean square error matrix of filter error Pk = (I – KkHk)Pk|k – 1
Figure 5.2
The flow of a Kalman filter.
measurements Y1,. . .,Yk. The mean square error matrix of the filter error Pk is the minimum mean square error matrix among all linear estimators. 2. From the recursive formulas of a Kalman filter, it is observed that the gain matrix Kk, the prediction error matrix Pk|k −1, and the mean square error matrix of filter error Pk can be computed offline. This improves the online speed of computing. 3. By introducing the state equation, the recursive formulas of a Kalman filter calculate the estimate of the state vector at a new time point according to new measurement data and an estimate of the state vector at a previous time point without restoring all measurement data in the past. Not only can we achieve a real-time state estimation by Kalman filtering but we can also predict the dynamic motion at the next time point to achieve real-time control. It is these advantages that make the method of Kalman filter be widely applied in various fields.
© 2012 by Taylor & Francis Group, LLC
D Is C Re t e-tIM e K A L M A n FILt eR
5.3.3 Examples Example 5.1 Suppose there is a linear random system X k = X k −1 + Wk −1 Yk = X k + Vk where Xk, Yk, W k, V k are random variables, {W k}, {V k} are noise, and X0 is the initial state satisfying the standard system assumptions. Derive the recursive formulas of a Kalman filter for estimating the system state. Solution
Assume that the initial value of state is X 0 = EX 0 = µ0 and the initial value of the mean square error matrix is P 0 = Var(X0). Then, by noticing that Φk,k−1 = 1,Γk−1 = 1,Hk = 1, the recursive formulas of a Kalman filter are X k|k −1 = Φ k , k −1 X k −1 = X k −1
Pk|k −1 = Φ k , k −1P k −1 Φ kτ , k −1 + Γ k −1Qk −1Γ kτ −1 = P k −1 +Q where COV(W k, Wj) = Qδkj. K k = Pk|k −1H kτ ( H k Pk|k −1H kτ + Rk )−1 = ( Pk −1 + Q )( Pk −1 + Q + R )−1 where COV(V k, Vj) = Rδkj. X k = Φ k , k −1 X k −1 + K k (Yk − H k Φ k , k −1 X k −1 ) = X k −1 + K k (Yk − X k −1 ) = (1 − K k ) X k −1 + K kYk Pk = ( I − K k H k )Pk|k −1 Pk −1 + Q R( Pk −1 + Q ) = 1 − ( Pk −1 + Q ) = Pk −1 + Q + R Pk−1 + Q + R Tables 5.1 through 5.3 list calculation results of Kalman filters based on various initial values.
© 2012 by Taylor & Francis Group, LLC
317
318
M e A suReM en t DAtA M o D eLIn G
Table 5.1 Calculation of Kalman Filtering When X0 = 0,P0 = 100,Q = 12 ,R = 12 yK
KK
PK
Xk
9.95 6.98 12.52 7.54 9.36
.99 .67 .62 .62 .62
.99 .67 .62 .62 .62
9.86 7.94 10.80 8.78 9.14
Table 5.2
xK(REAL VALUE) 11.60 8.79 11.15 9.05 9.82
2 2 Calculation of Kalman Filtering When X0 = 0,P0 = 100,Q = 5 ,R = 1
yK
KK
PK
Xk
xK (REAL VALUE)
16.35 2.13 17.10 3.76
.99 .96 .96 .96
.99 .96 .96 .96
16.22 2.65 16.56 4.23
17.99 3.94 15.73 5.27
Table 5.3
Calculation of Kalman Filtering When X0 = 0, P0 = 100,Q = 12 , R = 52
yK
KK
PK
3.37 −.27 18.00 1.50 7.51 11.21 1.00 2.85 5.44 11.26 10.24 10.02 7.90 9.06 10.09 17.91 13.28
.80 .46 .33 .27 .24 .22 .20 .20 .19 .19 .19 .18 .18 .18 .18 .18 .18
20.04 11.42 8.30 6.78 5.93 5.43 5.11 4.91 4.78 4.70 4.64 4.60 4.58 4.56 4.55 4.54 4.54
© 2012 by Taylor & Francis Group, LLC
Xk 2.70 1.34 6.87 5.42 5.91 7.06 5.82 5.24 5.28 6.40 7.11 7.65 7.69 7.94 8.33 10.07 10.66
xK (REAL VALUE) 11.60 8.79 11.15 9.05 9.82 9.81 9.50 10.44 8.82 10.32 9.03 11.55 9.23 7.23 9.17 13.29 11.02
D Is C Re t e-tIM e K A L M A n FILt eR
Remarks:
319
1. It is observed that filtering gain Kk decreases as the variance of measurement error R increases. Intuitively, when measurement noise increases, the gain should be decreased to reduce the influence of measurement noise. On the contrary, the fact that Kk increases as the variance of state noise Q increases indicates that the gain matrix should be large in order to strengthen the role of measurement information and to weaken the impact of the state estimate when the system state noise increases. 2. It is worth to point out that calculation results based on different initial values of X 0 , P0 do not change much. This is the evidence that a Kalman filter relies little on initial values. This feature is called the stability of a Kalman filter. Example 5.2 (The Kalman filter of radar tracking data) Suppose a set of radar equipments is tracking a flying target in real time. At a time tk, the range R(tk) between the radar and the tracked target is observed as the measurement data Y(tk), that is, Y ( t k ) = R( t k ) + V ( t k ) where V(tk) is the measurement noise. Assume that EV(tk) = 0, Var(V(tk)) = R. Derive a real-time filtered value of R(tk) from Y(tk). Solution
In a short period, R(t) can be expressed by a third-order polynomial 3
R( t ) =
∑b t j =0
j
j
Let R k = R(tk). Since R(t) is a polynomial of third order, the fourth difference of {R k} is zero, that is, ∇4 R k = 0. Expanding this equation yields Rk = 4 Rk −1 − 6Rk − 2 + 4 Rk − 3 − Rk − 4
© 2012 by Taylor & Francis Group, LLC
320
M e A suReM en t DAtA M o D eLIn G
Let 4 Rk 1 R k −1 ,Φ = Xk = 0 Rk − 2 R k − 3 0
−6
4
0
0
1
0
0
1
−1 0 , H = (1, 0, 0, 0) 0 0
Yk = Y (t k ),Vk = V (t k ) Then X k = ΦX k − 1 Yk = HX k + Vk
( with Γ k −1 = 0,Wk −1 = 0,Qk = 0).
Given initial conditions X 0 = ( R3 , R2 , R1 , R0 )τ = (Y3 , Y2 , Y1 , Y0 )τ , Var( X 0 ) = P0 = R the recursive formulas of a Kalman filter are as follows: X k|k −1 = Φ X k −1 Pk|k −1 = Φ k , k −1P k −1 Φ kτ , k −1 + Γ k −1Qk −1Γ τk −1 = ΦP k −1 Φ τ K k = Pk|k −1H kτ ( H k Pk|k −1H kτ + Rk )−1 = Pk|k −1H τ ( HPk|k −1H τ + R )−1 X k = Φ X k −1 + K k (Yk − H Φ X k −1 ) Pk = ( I − K k H k )Pk|k −1 = ( I − K k H )ΦPk −1Φ τ The method in this example has been successfully employed in the field of economy analysis in recent years [15]. Example 5.3 (Applications of a Kalman filter in AR modeling) Suppose the observation data Yk come from a stationary AR(p) model. Let Yl = 0, l < 0. We have Yk = φ1Yk −1 + + φ pYk − p + ε k = Y ( k )τ φ( k ) + ε k
© 2012 by Taylor & Francis Group, LLC
D Is C Re t e-tIM e K A L M A n FILt eR
3 21
where Y(k) = (yk−1, . . ., yk−p)τ, ϕ(k) = (ϕ1, . . ., ϕp)τ, εk is a zeromean-value white noise series and Var(ε k ) = σε2 . Obviously, ϕ(k + 1) = ϕ(k). If we treat ϕ(k) as a state vector in Rp, then φ( k + 1) = φ( k ) τ Yk = Y ( k ) φ( k ) + ε k which corresponds to Φk,k−1 = I,Γk−1 = W k−1 = 0,Hk = Y(k)τ , V k = εk. By the recursive formulas of a Kalman filter, we have φ k|k −1 = Φ k , k −1 φ ( k − 1) = φ ( k − 1) Pk|k −1 = Φ k , k −1P k −1 Φ kτ , k −1 + Γ k −1Qk −1Γ τk −1 = P k −1 K k = Pk −1Y ( k )(Y ( k )τ Pk −1Y ( k ) + σ ε2 )−1 = φ ( k ) = φ ( k − 1) + K k (Yk − Y ( k )τ φ ( k − 1))
Pk −1Y ( k ) Y (kk )τ Pk −1Y ( k ) + σ ε2
Pk = ( I − K kY ( k )τ )Pk −1 The method above can also be generalized to the nonstationary AR(p) modeling case where the state equation should be modified.
5.4 Kalman Filter with Colored Noise
In the previous section we have obtained formulas of a Kalman filter under the condition that state and measurement noises are both zeromean white noises. However, in many practical applications, the state noise or measurement noise is not a white noise sequence, but a zeromean stationary colored noise sequence (or correlated sequence). In such a case, the recursive formulas of the Kalman filter derived in Section 5.3 cannot be applied. Otherwise, any application of those formulas usually results in larger errors. 5.4.1 Kalman Filter with Colored State Noise
Suppose the state equation is X k = Φ k ,k − 1 X k − 1 + Γ k − 1Wk − 1
© 2012 by Taylor & Francis Group, LLC
322
M e A suReM en t DAtA M o D eLIn G
where the state noise {Wk} is a p-dimensional AR(1) series with the form as W k = ϕW k − 1 + ε k The measurement equation is Yk = H k X k + V k where φ is a known p × p-dimensional autoregression coefficient matrix, {Vk}, {εk} are independent m-dimensional and p-dimensional zero-mean white noise series, respectively. Let Xk Φ k ,k − 1 Zk = , ψ k ,k − 1 = Wk 0
0 Γk −1 ,B = ,H * = Hk ϕ k I k
(
0
)
then Z k = ψ k ,k − 1Z k − 1 + Bk ε k * Yk = H k Z k + Vk This is the standard form of a white noise. Therefore, the Kalman formulas can be applied directly. The technique used here of converting a colored-noise model into a white-noise model is called the expanding-dimension method of state vectors. 5.4.2 Kalman Filtering with Colored Measurement Noise
Suppose the state equation is X k = Φ k ,k − 1 X k − 1 + Γ k − 1Wk − 1
(5.13)
and the measurement equation is Yk = H k X k + V k
(5.14)
V k = ϕV k − 1 + ε k
(5.15)
where V k satisfies
© 2012 by Taylor & Francis Group, LLC
323
D Is C Re t e-tIM e K A L M A n FILt eR
and {Wk}, {εk} are independent zero-mean white noise series. Let Xk Φ k ,k − 1 X = , Φ *k ,k − 1 = Vk 0 * k
(
0 ,H * = Hk ϕ k
I
)
Wk − 1 0 * W , = I k − 1 ε k
Γk −1 Γ *k − 1 = 0
Then, Equations 5.13 and 5.15 can be combined as follows. X k Φ k ,k − 1 = Vk 0
0 X k −1 Γk −1 + ϕ Vk − 1 0
0 Wk − 1 I ε k
that is, X k* = Φ k* ,k − 1 X k* − 1 + Γ *k − 1Wk*− 1
(5.16)
Equation 5.14 can be rewritten as Xk I = H k* X k* Vk
(
)
Yk = H k
(5.17)
and Wk W jτ EW W = E ε k + 1 * k
(
* j
Qk δ k , j = 0
ε
τ j +1
)
EWkW jτ = 0
0 Eε
τ k +1 j +1
ε
σ I δ k , j 0
2 ε
Therefore, {Wk* } is a white noise series. It follows that the system composed of Equations 5.16 and 5.17 is a standard system that can be estimated by the recursive formulas of a Kalman filter in Section 5.3. Again, the trick here is the expanding-dimension method of state vectors. The method can be further generalized to the situation when measurement noises follow an AR(p) model.
© 2012 by Taylor & Francis Group, LLC
324
M e A suReM en t DAtA M o D eLIn G
Suppose X = Φ k , k − 1 X k − 1 + Γ k − 1W k − 1 k Yk = H k X k + Vk Vk = ϕ 1Vk − 1 + + ϕ pVk − p + ε k When p = 2, the above system can be rewritten as X k Φ k ,k − 1 V = 0 k V 0 k −1
(
Yk = H k
0 X k −1 Γk −1 ϕ 2 Vk − 1 + 0 0 V 0 k−2
0 ϕ1 I
I
0 Wk − 1 I ε k 0
X k 0 Vk V k −1
)
The original system has been transformed to a standard one to which the normal recursive formulas of a Kalman filter can be applied. In general, we can similarly utilize the expanding-dimension method of state vectors to turn a nonstandard system to a standard one and apply the normal Kalman filter formulas when p > 2. 5.4.3 Kalman Filtering with Both Colored State Noise and Measurement Noise
Suppose
where
X k = Φ k ,k − 1 X k − 1 + Γ k − 1Wk − 1 Yk = H k X k + Vk Wk = ϕ wWk − 1 + ε k Vk = ϕ vVk − 1 + ηk
and {εk}, {ηk} are independent zero-mean white noise series.
© 2012 by Taylor & Francis Group, LLC
325
D Is C Re t e-tIM e K A L M A n FILt eR
Let X Φ k k ,k − 1 * * X k = Wk , Φ k ,k − 1 = 0 0 V k 0 B = I 0
Γk −1 ϕw 0
0 0 ϕv
0 εk 0 ,ζ k = ηk I
Then, the state equation is X k Φ k ,k − 1 W = 0 k V 0 k
Γk −1 ϕw 0
0 X k −1 0 0 Wk − 1 + I ϕv V 0 k −1
0 ε k 0 ηk I
that is, X k* = Φ k* ,k − 1 X k* − 1 + Bζ k The measurement equation is Yk = H k* X k* where H k* = ( H k 0 I ),{ζ k } is a zero-mean white noise series. Thus the original system is converted to the standard one and the normal Kalman filter works. Similarly, we can derive formulas of a Kalman filter using the expanding-dimension method of state vectors when both state noise and measurement noise follow higher-order AR(p)(p > 1) models. 5.5 Divergence of Kalman Filter
Theoretically, state estimation from a Kalman filter becomes more accurate as the number of observational data increases. Sometimes the computed variance may become stable, and differences between
© 2012 by Taylor & Francis Group, LLC
326
M e A suReM en t DAtA M o D eLIn G
the state estimates obtained from the filter and the observed states are far beyond the scope defined by the computed variance. Moreover, the variance of estimate errors may tend to infinity. This phenomenon is called filter divergence and is also known as data saturation. The main reasons of filter divergence can be attributed to the following three aspects. 1. The mathematical model is not consistent with its corresponding physical system due to the lack of understanding to the physical system. Mathematical models based on any physical system are usually complex. Improper handling of model simplification (e.g., linearization of nonlinear systems) always results in explicit errors. Such errors contribute to the discrepancy between the assumed statistical properties of dynamic noises and the true ones. 2. The lack of understanding to measurement noises leads to improper selection of measurement noise models. 3. Due to limitations of computer word length, the accumulation of calculation errors contributes to the differences between computed values and actual values. An example is given to illustrate that the improper selection of the dynamic noise model or the measurement noise model leads to filter divergence. Example 5.4 Suppose that when a balloon is still, its state and measurement equations are X k + 1 = X k Y = X + V k +1 k +1 k +1
( 5.18)
( 5.19)
When it rises at a constant velocity α, the equations are X k + 1 = X k + α Y = X + α +V k +1 k k +1
( 5.20)
( 5.21)
where the measurement noise {V k} is a zero-mean white noise series and it is independent of X0. Var(V k) = σ2, Var(X0) = P 0. If
© 2012 by Taylor & Francis Group, LLC
327
D Is C Re t e-tIM e K A L M A n FILt eR
state equation 5.18 that describes the static state of the balloon is used to estimate its rising state, what is the result? AnAlYSiS
When the balloon is still, we have Φ k + 1, k = 1, H k + 1 = 1,
Qk = 0, Tk = 0.
Rk = σ 2 ,
The one-step optimal predicted estimate of the state is X k + 1 k = X k and the variance of the one-step predicted estimate error is Pk + 1 k = Pk The Kalman gain factor is K k + 1 = Pk ( Pk + σ 2 )−1 By Kalman filter formulas Pk+−11 = P0−1 + ( k + 1)σ −2 the variance of the optimal filter error is Pk+ 1 = σ 2 P0 [σ 2 + ( k + 1)P0 ]−1 and the gain factor is K k+ 1 = P0 [σ 2 + ( k + 1)P0 ]−1 From problem 8 of Exercise 5, X K + 1 = σ 2 [σ 2 + ( k + 1)P0 ]−1 X 0 + P0 [σ 2 + ( k + 1)P0 ]−1
© 2012 by Taylor & Francis Group, LLC
k +1
∑Y i =1
i
(5.22)
328
M e A suReM en t DAtA M o D eLIn G
If the balloon rises at a constant velocity, then the observation will be Yk∗+ 1 = X 0 + ( k + 1)α + Vk + 1 and k +1
∑ i =1
Yi∗ =
k +1
∑(X i =1
0
+ i α + Vi )
Substituting Yi * in Equation 5.22 with Yi * , we have X k + 1 = σ 2 [σ 2 + ( k + 1)P0 ]−1 X 0 + P0 [σ 2 + ( k + 1)P0 ]−1 ( k + 1)( k + 2) ( k + 1) X 0 + α+ 2
Vi i =0 k +1
∑
(5.23)
Equation 5.23 shows the estimated balloon height obtained from a wrong state equation but a correct measurement equation. If any of the statistical properties of the initial state is unknown, we can only take P 0 = ∞. Thus, from Equation 5.23, k+2 α+ X k + 1 = X 0 + 2
1 k+1
k +1
∑V i =0
i
Therefore, the true Xk+1 and the filtered value X k +1 have the difference of 1 k X k + 1 = X k + 1 − X k + 1 = α − 2 k+1
k +1
∑V i =0
i
Note that k EX k + 1 = α → ∞ (k → ∞) 2 and Pk+ 1 =
k2 2 σ2 α + → ∞ (k → ∞) 4 k+1
The two equations above show that a filter divergence occurs. The computational results are shown in Table 5.4.
© 2012 by Taylor & Francis Group, LLC
329
D Is C Re t e-tIM e K A L M A n FILt eR
Table 5.4
Calculation Results of Kalman Filtering Based on X0 = 0, P0 = 100,Q = 0, R = 1
yK
KK
PK
Xk
8.35 8.19 11.37 8.49 9.54 10.28 8.3 9.33 10.19 10.24 9.69 9.73 10.37 10.18 10.92 10.45 11.6
0.99 0.5 0.33 0.25 0.2 0.17 0.14 0.11 0.1 0.09 0.08 0.08 0.07 0.07 0.06 0.06 0.06
0.99 0.5 0.33 0.25 0.2 0.17 0.14 0.11 0.1 0.09 0.08 0.08 0.07 0.07 0.06 0.06 0.06
8.27 8.23 9.27 9.08 9.17 9.35 9.2 9.14 9.24 9.33 9.36 9.39 9.46 9.51 9.6 9.65 9.76
Xk − X k
xK (REAL VALUE)
(REAL VALUE)
15 20 25 30 35 45 50 55 60 65 70 75 80 85 90 95 100
− 6.73 − 11.77 − 15.73 − 20.92 − 25.83 − 35.8 − 40.89 − 45.86 − 50.76 − 55.67 − 60.64 − 65.61 − 70.54 − 75.49 − 80.4 − 85.35 − 90.24
From the theoretical derivation and the computation of this example, we see that K k+1 decreases rapidly with the increase of k. This means that the gain factors that are used to calibrate the next step filtering by multiplying new observational data are decreasing quickly, the roles of new observations in the filtering calibration are weakened rapidly, and data saturation forms. On the other hand, model inaccuracy plays an increasingly important role in filtering and eventually causes the divergence. The next example shows that filter divergence also occurs when the dynamic equation model is accurate, but features of measurement noises are inappropriate. Example 5.5 Consider a linear random system X k = X k −1 + Wk Y = X + V k k k
© 2012 by Taylor & Francis Group, LLC
( 5.24) ( 5.25)
330
M e A suReM en t DAtA M o D eLIn G
Suppose that the measurement noise is a dependent series (5.26)
V k = ϕV k − 1 + ε k
where {W k} is a zero-mean white noise series, VarW k = Q = 1; {V k} is a zero-mean stationary AR(1) series that is independent of {W k}, and φ = 0.98, Varεk = R = 22. 1. If we treat {V k} as a zero-mean white noise and perform Kalman filter (VarV k = 1), then Figure 5.3 is the plot of recursive estimates of the filtering error variance Pk(k = 1, 2, . . ., 20) and Figure 5.4 shows differences between filter values of the system state and their true values, ΔXk(k = 1, 2, . . ., 150). It is observed that, although variances of filtering errors Pk are around 0.62, actual filtering errors are far beyond its range and filter divergence occurs.
Pk
1
0.5
0
0
4
6
12
16
20
k
Figure 5.3
Variances of Kalman filter errors (I).
ΔXk
40
0
–40 0
Figure 5.4
k
Kalman filter errors (I).
© 2012 by Taylor & Francis Group, LLC
150
3 31
D Is C Re t e-tIM e K A L M A n FILt eR
2. Next we use the expanding-dimension method in Section 5.4.2 to carry out Kalman filter based on Equations 5.24 through 5.26. Figure 5.5 is the plot of Pk(k = 1, 2, . . ., 40) and Figure 5.6 is the plot of ΔXk(k = 1, 2, . . ., 400). Figures 5.5 and 5.6 show that Pk → 4.457, which is consistent with the trend of filter errors ΔXk. These are correct filtering results. Typical approaches to correct filter divergence are as follows. 1. Making gain coefficients not to decrease after some steps. For instance, taking 1/k, k ≤ M Kk = 1/M , k > M in Example 5.4.
Pk
14
7
0
0
Figure 5.5
5
10
15
20 k
25
30
35
40
Variances of Kalman filter errors (II).
ΔXk
13
0
–13
0
Figure 5.6
k
Kalman filter errors (II).
© 2012 by Taylor & Francis Group, LLC
400
332
M e A suReM en t DAtA M o D eLIn G
2. Increasing the weight of new observations and reducing the impact of old observational data. This method is called fading-memory filter. 3. Intuitively, when the mathematical model is a simplification of a complex one, the approximation is only valid in a relatively short period of time. Therefore, obsolete observations should not be used as the basis of estimating the current state. For example, if X k +1 is the one to be estimated, it is reasonable to use only N observed points close to Y k+1 and discard all observations before Y k−N+2 . This method is called limited memory filter. 4. Getting the model as accurately as possible. For example, use the following model X k + 1 1 1 x k = α k + 1 0 1 α k Y = (1 0) x k + 1 + V ( k ) α k +1 k +1 in Example 5.4 instead and take the dependence of noises into account in Example 5.5 when the expandingdimension method is applied. 5. Reducing filter errors by improving filter design through adopting the adaptive filtering method or constantly estimating and revising statistical features of the unknown or uncertain parameters and noise in the recursive filter process based on the observed data. 6. If the system itself is nonlinear, then it is better to use methods of nonlinear filter.
5.6 Kalman Filter with Noises of Unknown Statistical Characteristics
The recursive formulas of the Kalman filter derived above are based on the assumption that statistical characteristics of the dynamic and measurement noises are known. However, the statistical characteristics of noise are often unknown in practice. It is inconvenient to apply the Kalman filter without knowing the statistical characteristics of
© 2012 by Taylor & Francis Group, LLC
D Is C Re t e-tIM e K A L M A n FILt eR
333
noises and it often causes the divergence of the Kalman filter. Detailed elaboration is given below. 5.6.1 Selection of Correlation Matrix Q k of the Dynamic Noise
For an ordinary dynamic system such as tracking a nonmaneuvering target, it is desirable to take Qk as a constant matrix Q where Q can be calculated and determined by simulations in advance. In the process of determining Q using simulated data, the main consideration is the filter error series. At the beginning, if Q is too small, the filter error series will diverge; As Q increases up to a certain value of Q *, the trend of the filter error will be around a steady state that gives the smallest steady-state error; if Q keeps increasing over Q *, the trend of the filter error will fluctuate around a steady state, but the error increases with Q. Therefore, it is better to choose Q slightly larger than Q *. Adaptive filtering techniques should be employed for maneuvering targets. Currently, there is not a very effective way in such applications. Finding an effective way of filtering maneuvering targets is an active research area of the Kalman filter. Those interested in this area should read Refs. 7 and 8. 5.6.2 Extracting Statistical Features of Measurement Noises
Unlike dynamic noises, statistical characteristics of measurement noises cannot be obtained by simulations. In general, there are two resources for statistical information of measurement noises. One is the accuracy of measurement instruments (or accuracy appraisal); the other is the prior knowledge of processing measurement data such as results from previous data processing. These two cannot accurately reflect statistical characteristics of measurement noises due to changes in measurement environments. Therefore, statistical characteristics measurement noises are often unknown or only partially known. In Section 5.4.3, we have discussed the most widely used Kalman filter with AR(p) measurement noises. In order to obtain optimal recursive formulas of the Kalman filter, parameters in an AR(p) model and the variance of white noises of its driving sequence must be known. Unfortunately, none of them are known in practice. To address this issue, we present an adaptive algorithm to recursively estimate parameters in an AR(p) model and the variance of white noises based on
© 2012 by Taylor & Francis Group, LLC
334
M e A suReM en t DAtA M o D eLIn G
measurement data. Such an algorithm provides statistical characteristics of a real-time series of measurement noises for the Kalman filter. Here we use a radar tracking system with continuous waves as a background to illustrate the procedure. Suppose a radar tracking system with continuous waves, say, MISTRAM (MISsile TRAjectory Measurement) is used for tracking and measuring the trace of a ballistic missile. At each time point there are six measurements, namely, distance sum S(t), distance differences P(t),Q(t), and corresponding change rates S(t ), P (t ),Q (t ). Assume that the six measurements are uncorrelated with each other but there is a correlation in each time series and correlations within these time series can be modeled by AR(1) and AR(2) (see Section 4.7 for details). Based on the characteristic analysis of six measurements, distance measurements can be approximated accurately by cubic polynomials and velocity elements can be expressed precisely by corresponding quadratic polynomials in a limited time period. Consider distance sum S(t) as an example. The statistical features of noises in other measurements can be derived similarly. Let Yk be the measurement data of S(tk) at time tk. Then, measurement equations are Yk − N + j = S ( t k − N + j ) + e k − N + j
(5.27)
4
S (t k − N + j ) =
∑ α (k)t i =1
i
i −1 k−N + j
(5.28)
k ≥ N, j = 1, 2, . . ., N, where N, a fixed positive integer, is the number of data points the model 5.28 processes. The value of N needs to be determined in advance. N is selected such that the truncation error of approximating a true signal S(t) with a cubic polynomial is negligible (usually an order of magnitude smaller than a random error). Simulations and practical applications show that N is generally between 100 and 200 or the sampling rate as 20 points/second for ballistic missiles. In order to maintain the accuracy of polynomial approximation to a true signal, our approach is to move forward with a fixed fitting length of N data points. This technique is usually known as the limited memory method. The idea of moving approximation can be realized by letting
© 2012 by Taylor & Francis Group, LLC
335
D Is C Re t e-tIM e K A L M A n FILt eR
t k − N + j = t j = j ∆t , j = 1, 2,…, N where Δt is the sampling interval. Now models 5.27 and 5.28 can be combined into 4
Yk − N + j =
∑ α (k)t i
i =1
i −1 j
+ e k − N + j , j = 1, 2,…, N
(5.29)
where measurement noise ek−N+i is a zero-mean AR(p) series such that e k − N + j = ϕ1 ( k)e k − N + j − 1 + + ϕ p ( k )e k − N + j − p + ε k − N + j (5.30) where {εk−N+j} is a zero-mean series of white noise, Var ( ε k − N + j ) = σ ε2 ( k). Although models 5.29 and 5.30 are similar to the PAR models mentioned in Section 4.6, the difference is that polynomial coefficients {ai (k),i = 1,2,3,4} and autoregressive parameters {ϕ 1 ( k),…, ϕ p ( k), σ ε2 ( k)} are all time-varying here. Thus, models 5.29 and 5.30 are called timevarying polynomial AR (PAR) models [16–18]. For fixed intervals {tk−N+j, …, tk}, similar to the derivation in Section 4.6.3, executing Φ(B) = 1 − φ1B − ⋅ ⋅ ⋅ − φpBp simultaneously on both sides of model 5.29, we have 4
Φ(B)Yk − N + j =
∑ a (k)(∆t ) i =1
i −1
i
Φ(B) j i − 1 + ε k − N + j
4
=
∑ a ( k) j i =1
i
i −1
+ εk − N + j
where a(k) = (a1 (k), . . ., a4 (k))τ satisfies a( k) = A (ϕ( k)a ∗ ( k)) and a *(k) = (a1(k), a2(k)Δt, a3(k)(Δt)2, a4(k)(Δt)3, A(φ(k)) is similar to Equation 4.32. Note that the difference is b0 = 1 +
p
∑ (1 − ϕ (k)), j =1
© 2012 by Taylor & Francis Group, LLC
j
bl =
p
∑ (−ϕ (k)) (− j ) j =1
j
l
for 1 ≤ l ≤ 4
336
M e A suReM en t DAtA M o D eLIn G
Let Y Y k − N +1 k−N YN ( k) = , M ( k − N , k − p) = Y Y k k −1 1 1 U = 1
1
1
2
2
2
N
N2
Yk − N + 1− p Yk − p
1 ϕ 1 ( k) 3 2 ( k) = , ϕ N 3 ϕ p ( k)
and H = U(U τ U)−1 U τ . Then, the normal equations of two-step least squares estimation can be derived as follows: M ( k − N , k − p )τ ( I − H )M ( k − N , k − p ) ϕ ( k) = M ( k − N , k − p )τ ( I − H )Y N ( k)
(5.31)
When the (k + 1)th data point Yk+1 is obtained, the measurement equation is 4
Yk + 1 − N + j =
∑ α (k + 1)t i =1
i
i −1 j
+ ek + 1− N + j
e k + 1 − N + j = ϕ 1 ( k + 1)e k + 1 − N + j + + ϕ p ( k + 1)e k + 1 − N + j − p + ε k + 1 − N + j , j = 1, 2,…, N Let Y k + 1− N + 1 YN ( k + 1) = , Y k +1
© 2012 by Taylor & Francis Group, LLC
ϕ 1 (k + 1) ϕ (k + 1) = ϕ p (k + 1)
(5.32)
(5.33)
D Is C Re t e-tIM e K A L M A n FILt eR
Y k + 1− N M (k + 1 − N , k + 1 − p) = Y k
337
Yk + 1− N + 1− p Yk + 1− p
Then, normal equations of the two-step least squares estimation for models 5.32 and 5.33 can be similarly derived as follows. M ( k + 1 − N , k + 1 − p )τ ( I − H ) M ( k + 1 − N , k + 1 − p ) ϕ ( k + 1) = M ( k + 1 − N , k + 1 − p )τ ( I − H )YN ( k + 1)
(5.34)
To get adaptive estimates of AR(p) parameters, it is required to derive recursive relations between φ(k + 1) and φ(k) from Equations 5.31 and 5.34 [18,19]. Let Y k + 1− j − N + 1 YN ( k + 1 − j ) = j = 0, 1,…, N Y k + 1− j and Z k + 1− j − N + 1 ZN (k + 1 − j ) = = ( I − H )YN ( k + 1 − j ) Z k + 1− j A N ( k + 1) = (Z N ( k ), Z N ( k − 1),…, Z N (k − p )) = (( I − H )Y N ( k ),( I − H )Y N ( k + 1),…, ( I − H )YN ( k − p )) = ( I − H )M ( k + 1 − N , k + 1 − p ) It follows that Equation 5.34 can be rewritten as A Nτ ( k + 1) A N ( k + 1) ϕ ( k + 1) = A Nτ ( k + 1)Z N ( k + 1)
© 2012 by Taylor & Francis Group, LLC
(5.35)
338
M e A suReM en t DAtA M o D eLIn G
In order to get the recursive expression from Equation 5.35, we need to recursively solve the AR(p) parameters. Note that Zk − N + 1 Z k−N +2 A N ( k + 1) = Zk
Zk − N
Zk − N + 1
Zk − 1
Zk − N − p + 2 Zk − N − p + 1 Z k + 1 − p
is an N × p-dimensional matrix, Zk − N Z k − N +1 A N ( k) = Z k − 1
Zk − N − 1
…
Zk − N
…
Zk − 2
…
Zk − N − p + 1 Zk − N − p + 2 Z k − p
and let b( k + 2 − i ) = (Z k + 1 − i , Z k + 1 − N − j + 1 ,…, Z k + 1 − i − p + 1 )τ , i = 0, 1,…, N + 1 Z N + 1 ( k − j + 1) = (Z k + 1 − N − j , Z k + 1 − N − j + 1 ,…, Z k + 1 − j )τ , j = 0, 1, 2,…, p Zk − N Z k − N +1 A N ( k + 1) = Z k
Zk − N − 1
…
Zk − N
…
Zk − 1
…
Zk − N − p + 1 Zk − N − p + 2 Z k − p + 1
= (Z N + 1 ( k), Z N + 1 ( k − 1),…, Z N + 1 ( k − p + 1)) b τ ( k − N + 1) τ b ( k − N + 2) = τ b ( k + 1)
© 2012 by Taylor & Francis Group, LLC
D Is C Re t e-tIM e K A L M A n FILt eR
339
Consider the relationship between solutions to the following three normal equations. A Nτ ( k) A N ( k) ϕ ( k) = A Nτ ( k)Z N ( k)
(5.36)
A Nτ ( k + 1) A N ( k + 1) ϕ ( k + 1) = A Nτ ( k + 1)Z N ( k + 1) (5.37) A Nτ + 1 ( k + 1) A N + 1 ( k + 1) ϕ N + 1 ( k + 1) = A Nτ + 1 ( k + 1)Z N + 1 ( k + 1)
(5.38)
From problem 9 in Exercise 5, recursive relations between parameters in Equations 5.36 and 5.38 are τ ϕ N + 1 ( k + 1) = ϕ( k) + K k + 1 [Z k + 1 − b ( k + 1) ϕ( k)] τ −1 K k + 1 = Pk b( k + 1)[1 + b ( k + 1)Pk b( k + 1)] Pk + 1 = [ I − K k + 1b τ ( k + 1)]Pk
(5.39)
where Pk = [ A Nτ ( k) A N ( k)]−1 . The idea of obtaining the recursive relation between ϕ ( k + 1) and ϕ ( k ) is to find the recursive relation between ϕ ( k + 1) and ϕ N + 1 ( k + 1) in Equations 5. 5.38 and 5.37 and the recursive relation between ϕ N + 1 ( k + 1) and ϕ ( k) in Equation 5.39. From Equation 5.37 ϕ ( k + 1) = [ A Nτ ( k + 1) A N ( k + 1)]−1 A Nτ ( k + 1)Z N ( k + 1) = PN ( k + 1) A Nτ ( k + 1)Z N ( k + 1)
(5.40)
where PN ( k + 1) = [ A Nτ ( k + 1) A N ( k + 1)]−1 . Note that, if we let A Nτ ( k + 1) be 0 b τ ( k − N + 2) τ b ( k + 1)
© 2012 by Taylor & Francis Group, LLC
340
M e A suReM en t DAtA M o D eLIn G
and Z N (k + 1) be 0 Z k − N − p + 2 Z k + 1 the value of ϕ ( k + 1) in Equation 5.40 will not be affected. First, calculate PN ( k + 1) b τ ( k − N + 1) 0 τ = [ A N + 1 ( k + 1) − (b( k − N + 1), 0,…, 0)] Ak + 1 − 0
−1
= {[ A Nτ + 1 ( k + 1) A N + 1 ( k + 1) − b( k − N + 1)b τ ( k − N + 1) −b( k − N + 1)b τ ( k − N + 1) + b( k − N + 1)b τ ( k − N + 1)]}−1 = [ A Nτ + 1 ( k + 1) A N + 1 ( k + 1) − b( k − N + 1)b τ ( k − N + 1)]−1 = [Pk−+11 − b( k − N + 1)b τ ( k − N + 1)]−1 and utilize the matrix inversion formula ( A + BC τ )−1 = A −1 − A −1 B( I + C τ A −1 B )−1C τ A −1 we have PN ( k + 1) = Pk + 1 + Pk + 1b( k − N + 1)[ I − b τ ( k − N + 1) Pk + 1b( k − N + 1)]−1 b τ ( k − N + 1)τ Pk + 1
© 2012 by Taylor & Francis Group, LLC
(5.41)
D Is C Re t e-tIM e K A L M A n FILt eR
3 41
= Pk + 1 + K k∗+ 1 ( N + 1)b τ ( k − N + 1)Pk + 1 = [ I + K k∗+ 1 ( N + 1)b τ ( k − N + 1)]Pk + 1 where K K∗ + 1 = Pk + 1b( k − N + 1) [ I − b τ ( k − N + 1)Pk + 1b( k − N + 1)]−1
(5.42)
Second, calculate A Nτ ( k + 1)Z N ( k + 1) Zk − N + 1 0 τ = ( A N + 1 ( k + 1) − (b( k − N + 1), 0,…, 0)) Z N + 1 ( k + 1) − 0 τ = A N + 1 ( k + 1)Z N + 1 ( k + 1) − b(kk − N + 1)Z k − N + 1 and we have ϕ ( k + 1) = PN ( k + 1) A Nτ ( k + 1)Z N ( k + 1) = [ I + K k∗+ 1b τ ( k − N + 1)]Pk + 1 [ A Nτ + 1 ( k + 1)Z N + 1 ( k + 1) − b( k − N + 1)Z k − N + 1 ] = [ I + K k∗+ 1b τ ( k − N + 1)][ϕ N + 1 ( k + 1) − Pk + 1b( k − N + 1)Z k − N + 1 ] = [ϕ N + 1 ( k + 1) + K k∗+ 1b τ ( k − N + 1) ϕ N + 1 ( k + 1) − [ I + K k∗+ 1b τ ( k − N + 1)]Pk + 1b( k − N + 1)Zk − N + 1
© 2012 by Taylor & Francis Group, LLC
3 42
M e A suReM en t DAtA M o D eLIn G
Note that [ I + K k∗+ 1b τ ( k − N + 1)]Pk + 1b( k − N + 1) = Pk + 1b( k − N + 1) + K k∗+ 1b τ ( k − N + 1)Pk + 1b( k − N + 1) = Pk + 1b( k − N + 1) + K k∗+ 1 − K k∗+ 1 + K k∗+ 1b τ ( k − N + 1)Pk + 1b( k − N + 1) = Pk + 1b( k − N + 1) + K k∗+ 1 − K k∗+ 1 [ I − b τ ( k − N + 1)Pk + 1b( k − N + 1)] = Pk + 1b( k − N + 1) + K k∗+ 1 − Pk + 1b( k − N + 1) = K k∗+ 1 Therefore, ϕ ( k + 1) = ϕ N + 1 ( k + 1) + K k∗+ 1b τ ( k − N + 1) ϕ N + 1 ( k + 1)K k∗+ 1Z k − N + 1 (5.43) = ϕ N + 1 ( k + 1) − K k∗+ 1 [Z k − N + 1 − b τ ( k − N + 1) ϕ N + 1 ( k + 1)]
By synthesizing Equations 5.39, 5.42, and 5.43, we have the following recursive equations: 1. Kk+1 = Pkb(k + 1)[1 + bτ (k + 1)Pkb(k + 1)]−1. τ 2. ϕ N + 1 ( k + 1) = ϕ ( k) + K k + 1 [Z k + 1 − b ( k + 1) ϕ ( k)]. 3. Pk+1 = [I − Kk+1bτ (k + 1)]Pk. 4. K k∗+ 1 = Pk + 1b( k − N + 1) [ I − b τ ( k − N + 1)Pk + 1b( k − N + 1)]−1 . ∗ 5. ϕ ( k + 1) = ϕ N + 1 ( k + 1) − K k + 1 [Z k − N + 1
− b τ ( k − N + 1) ϕ N + 1 ( k + 1)]
© 2012 by Taylor & Francis Group, LLC
343
D Is C Re t e-tIM e K A L M A n FILt eR
From these recursive formulas we know that, once the initial estimates ϕ ( k0 ), Pk0 are given, where k 0 ≥ N, we can recursively solve the estimates of autoregressive parameters. In general, we can take ϕ ( k0 ) = 0, Pk0 = µI , where μ is a large positive number. It is easy to verify that σ ε2 ( k + 1) satisfies the following recursive relation. σ ε2 ( k + 1) = σ ε2 ( k) + Yk + 1 − − Yk − N + 1 −
p
∑
p
i =1
∑ i =1
Yk − i + 1ϕ i ( k + 1)
Yk − N + 1 − i ϕ i ( k )
2
2
(5.44)
We have obtained all recursive formulas for estimating unknown parameters in an AR(p) model. Therefore, we can provide real-time AR(p) statistical characteristics of measurement noises required in the Kalman filter recursion. EXERciSE 5 1. Prove that a linear minimum variance estimation has the following properties: a. The estimation error X = X − X L is independent of L
the measurement Y. b. The estimation error X L is independent of the estimated parameter X L . 2. Suppose that an n-dimensional random vector X and an m-dimensional measurement vector Y satisfy Y = HX + V
(5.45)
where H is an m × n-dimensional constant value matrix, V is an m-dimensional measurement noise, and EV = 0, VarV = R, EXVτ = 0, EX = μ, Var(X) = P. Equation 5.45 can be regarded as a measurement system at stationary status. Find the linear minimum variance estimation X L for X based on Y and its corresponding mean square error matrix E(X L X Lτ ).
© 2012 by Taylor & Francis Group, LLC
344
M e A suReM en t DAtA M o D eLIn G
3. Suppose that {Wk, k ≥ 0} is a Gaussian white noise series and independent of X0. Var(WK) = g, and EX0 = 0, Var(X0) = P 0. The random variable sequence {Xk,k ≥ 0} satisfies X k + 1 = aX k + Wk Find the recursive formula of Pk = E(Xk − EXk)2 and calculate lim Pk . k →∞
4. Suppose the state equation and the measurement equation of a system are X k + 1 = aX k + Wk
Yk + 1 = X k + 1 + V k + 1 where X, Y, W, V are random variables and we have the standard assumption VarWK = Qk = β VarVK = Rk = γ EX 0 = 0, P0 = VarX 0 = C Given observations Y1 = 4, Y2 = 2, calculate X 2 and P 2. 5. What will the Kalman filter algorithm be if there are several observational vectors from the same system? 6. Consider a one-dimensional linear system X k +1 = X k Yk = X k + V k EVk = 0, VarVK = σ 2 , EV0 = 0, VarX 0 = µ 2 Prove that X k = X k − 1 +
µ2 (Yk − X k − 1 ) 2 2 σ + kµ
and X k → C ( k → ∞) , where C is a certain constant. 7. Consider the motion equation of a free falling object Z(t ) = − g , ∀t ≥ 0
© 2012 by Taylor & Francis Group, LLC
D Is C Re t e-tIM e K A L M A n FILt eR
345
where g is the acceleration of gravity. Suppose X (t ) = (Z (t ), Z (t ))τ represents the position Z(t) and the velocity Z (t ) of the object at time t. Then the equation above can be expressed as a state equation 0 X (t ) = 0
1 0 X ( t ) − , ∀t ≥ 0 0 g
For the state equation above, take τ = tk+1 − tk = 1s as the sampling interval, then the corresponding discrete-state equation should be 0 X k +1 = 0
1 1 g 2 , ∀k ≥ 0 X − k 0 g
(5.46)
Suppose we are observing the position of the free falling object and the measurement equation is
( 0) X
Yk + 1 = 1
k +1
+ V k + 1 , ∀k ≥ 0
(5.47)
where Vk is a white noise series, EVk = 0, VarVK = σ 2 Try to use the orthogonal projection method to derive the recursive formulas of the Kalman filter for linear systems 5.46 and 5.47. 8. Verify Equation 5.23 in Example 5.4. 9. Verify the recursive relation in Equation 5.39.
References
1. Norbert Wiener. Extrapolation, Interpolation and Smoothing of Stationary Time Series with Engineering Application. New York, NY: Tech. Press of M.I.T. and John Wiley & Sons, 1949. 2. Youwei Zhang. Theoretical Derivation of Wiener and Kalman Filters. Beijing: People’s Education Press, 1980 (in Chinese).
© 2012 by Taylor & Francis Group, LLC
346
M e A suReM en t DAtA M o D eLIn G
3. Peter S. Maybeck. Stochastic Models, Estimation, and Control. New York, NY: Academic Press, 1979, 1982, Vols. 1–3. 4. Kalman Rudolf Emil. A new approach to linear filtering and prediction theory. Transactions of the ASME. Series D, Journal of Basic Engineering, 1960, 82: 35–45. 5. Wenyao Song, Ya Zhang. Kalman Filter. Beijing: Science Press, 1991 (in Chinese). 6. Peizhang Gu, Zhengtao Zhu. Optimal Estimation with Applications. Beijing: Science Press, 1984 (in Chinese). 7. Charles K. Chui, Guanrong Chen. Kalman Filtering with Real-Time Application. 2nd ed. Berlin Heidelberg: Springer-Verlag, 1991. 8. Guanrong Chen. Approximate Kalman Filtering. Singapore: World Scientific Publishing Co. Inc., 1993. 9. Keigo Watanabe. Adaptive Estimation and Control. Englewood Cliffs, NJ: Prentice-Hall, 1993. 10. Simon Haykin. Adaptive Filter Theory. 2nd ed. Upper Saddle River, NJ: Prentice-Hall Inc, 1991. 11. Youmin Zhang, Guanzhong Dai, Hongcai Zhang. Development on Kalman filter algorithms. Control Theory and Applications, 1995, 12(5): 529–538 (in Chinese). 12. Kalman R. E., Bucy R. S. New results in linear filtering and prediction theory. Transactions of the ASME. Series D, Journal of Basic Engineering, 1961, 83: 95–108. 13. Ignani M. B. Separated bias Kalman estimator with bias state noise. IEEE Transactions on Automatic Control, 1990, AC-35(3): 338–341. 14. Kai Gu, Weifeng Tian. Optimal Estimation Theory and Its Applications in Navigation. Shanghai: Shanghai Jiaotong University Press, 1990 (in Chinese). 15. A. C. Harvey, S. Peters. Estimating procedures for structural time series models. Journal of Forecasting, 1990, 9(2): 89–108. 16. Dongyun Yi, Zhengming Wang. Parameter estimation of polynomial signal and AR noise model. Acta Electronica Sinica, 1995, 23(6): 84–44 (in Chinese). 17. Zhengming Wang, Dongyun Yi. Parameter recognition of system models with ARMA noise. Control Theory and Applications, 1996, 13(4): 471–475 (in Chinese). 18. Dongyun Yi, Zhengming Wang. Parameter recognition of time-varying AR models. Control Theory and Applications, 1999, 16(5): 733–735, 738 (in Chinese). 19. Wei Shi, Wenwu Chen, Qinfen He. Introduction to Self-Adaptive Control. Nanjing: Southeastern University Press, 1990 (in Chinese). 20. Shixian Pan. Spectrum Estimation and Self-adaptive Filtering. Beijing: Beihang University Press, 1991 (in Chinese).
© 2012 by Taylor & Francis Group, LLC
6 P ro cEs sin g d aTa from r adar m E asurEmEnTs
6.1 Introduction
The effectiveness of a spacecraft is closely related to its precision. A little improvement in precision will greatly enhance the overall performance of the spacecraft. High-precision data from radar measurements are extremely important for the appraisal and analysis of the guidance system, power system, reentry system and other subsystems of a spacecraft. This chapter focuses on the mathematical processing of data from space radar measurements. Processing methods are closely related to measurement equipments and trajectory characteristics of a spacecraft tracked. Since radars of continuous waves are commonly used equipments and data processing methods of other equipments have many similarities to those of radars of continuous waves, this chapter discusses processing methods based on radars of continuous waves. Measurement data from a continuous-wave radar system tracking a spacecraft include three kinds of errors: random errors, systematic errors, and gross errors. In general, gross errors can be detected based on multistation, multitime tracking data and their engineering background by using routine methods of data processing (see Sections 1.5.2 and 3.6 for details). Random errors have been discussed in Chapter 4. In this chapter, we will mainly discuss estimation methods of systematic errors. 6.1.1 Space Measurements
Space measurements are divided into tracking and telemetry measurements [1,2]. Tracking measurements are carried out by optical or radio equipments. Optical measurements are measured by cinetheodolites 3 47
© 2012 by Taylor & Francis Group, LLC
348
M e A suReM en t DAtA M o D eLIn G
(or theodolites) that make images of targets on films through filming cameras using visible light, laser or infrared, and so on. Theodolites take pictures of spacecrafts with necessary cooperative targets installed. The elevation and azimuth angles of a theodolite reflect the position of a spacecraft. Ballistic parameters can be obtained through interactions of multiple theodolites. Radio measurements are acquired by radars on the ground. Various ballistic parameters of states are obtained by processing data from terrestrial radio waves that are received and returned by a responder on the missile. Telemetry measurements can get the inner parameters of a spacecraft. They are measured and encoded by sensors on the spacecraft and signals are sent to the ground by transmitters. Remote sensing stations are set up on the ground and various parameters are obtained through demodulating signals received by ground receivers.
6.1.2 Tracking Measurements and Trajectory Determination Principle
The optical equipment for tracking spacecrafts, called a cinetheodolite, is a combination system of “filming camera” and “theodolite.” It can take pictures for a flying spacecraft and record the elevation (E) and azimuth (A) angles of the theodolite at the moment of imaging to determine the position of the spacecraft and track its trajectory. Since a spacecraft’s position is three-dimensional (3-D), one cinetheodolite cannot determine the position parameters of the spacecraft. Two or more cinetheodolites are needed to determine the 3-D spatial position of the spacecraft. Suppose two cinetheodolites are positioned at (x 1, y 1, z1) and (x 2 , y 2 , z 2), respectively. They take pictures of the missile at the same time and get the elevations of E1, E2 and azimuths of A1, A 2 respectively. Let (x, y, z) be the position of a spacecraft at time t, A 0 be the launching azimuth, that is, the orientation angle of the axis ox of the launching reference frame oxyz. From Equation 6.1 we can see that an error on any of the 11 measurements on A 0, A1, A2, E1, E2, (x1, y1, z1), and (x2, y2, z2) will lead to an error in the solution of the trajectory parameters (x, y, z). 6.1.2.1 Optical Measurements
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
tg E = ( y − y )/ (x − x )2 + (z − z )2 1 1 1 1 tg( A1 − A0 ) = (z − z1 )/(x − x1 ) 2 2 tg E 2 = ( y − y 2 )/ (x − x 2 ) + (z − z2 ) tg( A 2 − A0 ) = (z − z2 )/(x − x 2 )
349
(6.1)
Equation 6.1 has four equations. If three sets are used, there will be six equations. Only three parameters are to be estimated, so there is redundant information. Nonlinear least-squares method can be used to estimate the three trajectory parameters. Optical measurement equipments have good reliability and high precision. They are widely used since they are easy to use and can measure at close distances. Although radio measurements are quite developed nowadays, optical measurements are not discarded but rather become indispensable in shooting range measurements and their equipment diagnoses, especially in the initial phase of trajectory tests. However, optical measurement equipments also have fatal weakness. First, they require relatively strict weather conditions. They can capture high-quality photos only in clear sky and cloudless conditions. Second, they cannot directly measure the velocity of a target. Velocity data can only be obtained by means of the differential operations of position data. Therefore, the precision of velocity measurements is not high. In many cases, equipments of optical and radar measurements are combined. In order to construct an independent measurement system using optical devices, laser light sources are installed on cinetheodolites. Distances are calculated by using the laser light waves returned from the cooperative targets on the spacecraft. This is the laser ranging. Therefore, a laser cinetheodolite can measure angles (A, E) as well as ranges (R). Just a set of laser cinetheodolite is sufficient for determining totally the spacecraft spatial position (x, y, z). That is, R = ( x − x ) 2 + ( y − y ) 2 + (z − z ) 2 0 0 0 2 2 tg E = ( y − y0 )/ (x − x0 ) + (z − z0 ) tg( A − A0 ) = (z − z0 )/(x − x0 )
© 2012 by Taylor & Francis Group, LLC
(6.2)
350
M e A suReM en t DAtA M o D eLIn G
where (x 0, y 0, z 0) are the coordinates of a cinetheodolite in the launching reference frame, A 0 is the launching azimuth. From Equation 6.2 we can see that an error in any of the seven measurement elements R, A, E, A 0, (x 0, y 0, z 0) will lead to an error in the trajectory parameters solution. A device called ballistic camera is developed to achieve more precise trajectory measurements and to identify or calibrate optical and radar measurement systems. The ballistic camera uses stars as the background reference of calibration. It captures the stars and the observed moving object in a photographic image board and obtains high-precision measurements through image interpretation and processing. These data can be used as calibration standards of precision identification in tracking measurement systems [3]. Similar to the fact that wave refraction errors arise when radio waves travel in the air, the refraction produced by the propagation of light in the air affects the precision of angular measurements. A continuous wave radar (CW radar) transmits carrier frequency signals with a fixed frequency. It applies the Doppler effect to measure velocity and uses phase change to get position. This type of radar can measure the velocity directly with a high precision and works in all weather conditions. It is one of main high-precision measurement equipments for tracking. We introduce CW radar measurements in the following discussion. Figure 6.1 displays a set of CW radar measurement systems. Carrier frequency signals with a fixed frequency are transmitted from station T and received by stations R, P, and Q. By Doppler principle, we can get tracking data of distance sum, distance difference, and their change rates. Based on these data we can determine the position and velocity of a spacecraft. Suppose the site position of the transmitting station is (xT , yT , zT), and the receiving stations are at (xi, yi, zi)(i = R, P, Q), the trajectory parameters at time t are 6.1.2.2 Radar Measurements
X (t ) = (x(t ), y(t ), z(t ), x(t ), y (t ), z(t ))τ
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
Receiving station (R)
3 51
Transmitting station (T)
Master station Receiving station (P) Slave station 1 Receiving station (Q) Slave station 2
Figure 6.1
sketch of the mIsTRAm CW radar measurement system.
Let R j (t ) =
2
2
x(t ) − x j + y(t ) − y j + z(t ) − z j
2
(6.3)
x(t ) − x j x(t ) + y(t ) − y j y (t ) + z(t ) − z j z(t ) (6.4) R j = R j (t ) The MISTRAM system mainly measures the following six quantities: S (t ) = RT (t ) + RR (t ) P (t ) = RR (t ) − RP (t ) Q(t ) = RR (t ) − RQ (t ) S (t ) = RT (t ) + RR (t ) P (t ) = RR (t ) − RP (t ) Q (t ) = R R (t ) − RQ (t )
© 2012 by Taylor & Francis Group, LLC
(6.5)
352
M e A suReM en t DAtA M o D eLIn G
By the measurement elements (S(t ), P (t ),Q (t ), S(t ), P (t ),Q (t ))τ obtained at time t, we can solve for the indirect measurement data X (t ) = (x(t ), y (t ), z(t ), x(t ), y (t ), z(t ))τ of the trajectory parameters X(t) at time t. Apparently, errors of S(t ), P (t ),Q (t ) and measurement errors of the station positions (xj, yj, zj) ( j = T, R, P, Q) will introduce errors into the measurement data (x(t ), y (t ), z(t )) for the trajectory position parameters. And the errors of S(t ), P (t ),Q (t ), S(t ), P (t ),Q (t ) and the measurement errors of the station positions (xj, yj, zj) ( j = T, R, P, Q) will also lead to errors in the trajectory velocity parameters. Table 6.1 shows magnitude ranges of the MISTRAM system measurement elements [2,3]. By using Equation 6.5 to solve the trajectory parameters, the error would be magnified several times to 10 times. The concrete magnification relation is dependent on the geometry between the site positions and the spacecraft position. As for this point, we can see from Table 6.2. Table 6.2 shows how the error on each measurement element influences the trajectory parameters. From this table we can see that the measurement errors of S, P ,Q do not affect x, y, z, while the measurement errors of S, P, Q can have impact on every component of the six trajectory parameters. Generally, if the measurement elements ∆S , ∆P ,…, ∆Q become several times larger or smaller, then the induced errors of the trajectory parameters will also increase or decrease several times. (Strictly speaking, the error propagation is a
Table 6.1 The measurement Errors of the mIsTRAm system mEAsUREmENT ELEmENTs oF mIsTRAm S
0.010(m)
P Q
0.002(m)
0.13(m)
0.005(m)
0.25(m)
S P Q
© 2012 by Taylor & Francis Group, LLC
RANDom ERRoRs
sYsTEmATIC ERRoRs UNCALIBRATED 2.12(m)
0.0013(m/s)
–
0.0002(m/s)
–
0.0002(m/s)
–
353
DAtA F R o M R A DA R M e A suReM en t s
Table 6.2 The Trajectory Error Caused by the mIsTRAm system measurement Errors
∆x
∆y
∆z
Δx
ΔY
Δz
ΔS = 0.01(m) ΔP = 0.01(m) ΔQ = 0.01(m) ∆S = 0.006(m /s)
0.12 0.008 0.006
0.13 0.01 0.009
0.09 0.003 0.002
0.005 0.004 0.003
0.008 0.007 0.006
0.001 0.001 0.002
–
–
–
0.005
0.009
0.003
∆P = 0.0006(m /s) ∆Q = 0.0006(m /s)
–
–
–
0.008
0.009
0.002
–
–
–
0.008
0.008
0.002
nonlinear relationship; however, due to the errors being usually small in quantity, we can regard it as a linear relationship.) 6.1.3 Precision Appraisal and Calibration of Measurement Equipments
Appraisal of measurement equipments is mainly to determine the accuracy of the measurement system when a variety of similar measurements need to be done. Due to the demand of space measurements, the central task of appraisal is to judge whether the measurement equipment has achieved the specifications and requirements. In addition, the classification and analysis of various errors are also needed, and ultimately, appraisal needs to separate the random errors and systematic errors. We should point out that the most effective method of precision appraisal is to measure the same object simultaneously with the evaluated equipment and an equipment of higher precision and then compare the results. The higher-precision equipment nowadays is mainly the star-oriented trajectory camera. Because the precision appraisal is carried out for dynamic measurements, we should also consider the characteristics of the measured object [4]. We also need a moving target, which simulates the motion of a spacecraft. This simulation can be realized by aircrafts designed particularly for the tests, or be treated as by-product of the spacecrafts test flight. Aircraft tests can provide a lot of valuable information. But they cannot properly simulate the distance and the velocity, which is the main characteristic of missiles and satellites [5]. Another limitation of 6.1.3.1 Precision Appraisal
© 2012 by Taylor & Francis Group, LLC
354
M e A suReM en t DAtA M o D eLIn G
aircraft tests is the application regions. For example, aircraft testing is not suitable for the measurement precision appraisal of tracking ships. Moreover, the cost of the aircraft tests is rather high. Another way for precision appraisal is to take the appraisal of measurement equipment as a by-product of flight tests. The main difficulty of this scheme is that due to various reasons, only a very small number of spacecrafts install flashing beacons for tracking by trajectory cameras. Consequently, it is difficult to obtain large amount of spacecraft flight test data. Another difficulty with this approach is that the flash can be seen only after the termination of the combustion in the engine; while the tracking and measurement results after the combustion termination cannot be used to reconstruct the flight status during dynamic phase. It is crucial for the precision appraisal of spacecraft guidance system to acquire high-precision data of dynamic phase. Based on the analysis and engineering practices above, it is not enough to rely only on high-precision equipment and aircraft or missile flight tests. For the appraisal calibration and of space measuring equipment, we also need effective mathematical methods [6–12]. Calibration is carried out immediately after the completion of appraisal. Its central task is to find the causes of systematic errors identified in the appraisal process and to eliminate these systematic errors. The purpose of calibration is to use appropriate means to correct errors on each observation channel in order to improve the measurement accuracy. The appraisal work provides raw data available for the calibration. In general, a tracking system achieves its technical specifications only after calibration. After thorough calibration, its precision will be greatly improved. The key to precision appraisal and calibration is error modeling. The simplest error model is the constant systematic error model [13], while practical error models are much more complex. The appraisal and calibration of tracking systems is usually fulfilled by several (or one of the) methods as follows.
6.1.3.2 Precision Calibration
1. Theoretical error analysis: This method is mainly used for engineering design and is essential for the design of ideal tracking systems. It is not applied in testing and approval of tracking systems, which are usually tested with real measurement data.
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
355
2. Special laboratory tests for individual components and overall tests: For some tracking systems, it is feasible to achieve satisfactory calibration results by testing the individual components of the system through special designed laboratory tests. The calibration of film theodolites and radar base center eccentricity are good examples. By experience, we know that in a rather long period, the eccentricity error is stable, which means that the laboratory calibration is credible. The value of laboratory calibration for individual components is that it can reduce the total number of parameters to be considered during the overall system tests. We should point out that even in the best case, laboratory tests can only provide part of the tracking system appraisal results, and is generally not a substitute for a comprehensive ground test. 3. Multiple observations: From Section 6.1.2, we know that using a laser theodolite or a MISTRAM CW system is enough to determine the trajectory parameters of a spacecraft [14]. This is true for the case of measurement error-free. In the case of measurement error existence, we can use multiple observations, such as two sets or more of MISTRAM systems to track a spacecraft at the same time. Such observation with “redundant” information is called multiple observations. Here, “redundant” refers to the excess with the assumption that there is no measurement error; while not redundant with the existence of measurement errors. By using multiple observations, we can perform the mutual appraisal and mutual calibration between measurement equipments. 4. Comparison with higher-precision measurement systems: This approach has been discussed in the previous introduction to the significance of appraisal. An example for this method is to use high-precision trajectory cameras to calibrate relatively lower-precision CW radar systems. 5. Measurement data modeling and method of parameter estimation: This is to establish the mathematical model for measurement data, to describe a period of trajectory parameters and systematic errors with a few parameters, and then use parameter estimation method to estimate the systematic errors, which are then removed to achieve the calibration.
© 2012 by Taylor & Francis Group, LLC
356
M e A suReM en t DAtA M o D eLIn G
This idea is the focus of this book and will be illustrated in detail in the following sections. 6.1.4 Systematic Error Model of CW Radar
In order to estimate the systematic error accurately, we must first represent it as parameter model. Here, we introduce the commonly used matched systematic error model. In practical problems, we also need to consider unmatched systematic error, which will be described in Section 6.4.2. The following is an error model for MISTRAM systems. Position Systematic Error Model for MISTRAM System ∆S = a1 + a2 t + a3tS + a4S + a5S + a6 (csc ET + csc ER ) + a7 (csc 3 ET + csc 3 ER )
where a1: phase shift a2t: first-order phase drift a3tS: first-order frequency drift a4S: frequency shift a5S: time shift a6(csc ET + csc ER): first-order refraction a7 (csc3 ET + csc3 ER): second-order refraction ∆P = a8 + a9t + a10 tP + a11P + a12 P + a13 (csc ER − csc EP ) + a14 (csc 3 ER − csc 3 EP ) + a15S ∆Q = a16 + a17 t + a18tQ + a19Q + a20Q + a21 (csc ER − csc EQ ) + a22 (csc 3 ER − cssc 3 EQ ) + a23S where a 8, a16: phase shift a9t, a17t: first-order phase drift a10tP, a18tQ: first-order frequency drift a11P, a19Q: frequency shift a12 P , a20Q : time shift a13 (csc ER – csc EP), a21 (csc ER – csc EQ): first-order refraction
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
357
a14 (csc3 ER – csc3 EP), a22 (csc3 ER – csc3 EQ): second-order refraction a15S , a23S : time delay Possible constraints: a5 = a12 = a20, a15 = a23 = 0 where E j = E j (t ), j = T , P ,Q , R
tg E j =
y(t ) − y j x(t ) − x j
2
+ z( t ) − z j
2
Velocity Systematic Error Model for MISTRAM System d ∆S (t ) dt d ∆P = ∆P (t ) dt d ∆Q = ∆Q(t ) dt ∆S =
In the error models above, errors of station sites are neglected. If we are not sure enough of the accuracy of station sites, then site errors should also be considered [2]. In addition, for tracking where the responder takes initiative cooperation, the frequency shift and the firstorder frequency drift in ΔS and ∆S can both be ignored, and the time shift is also avoidable. It should be noted that, in process of selecting error models, we should not only include all necessary errors, but also exclude the ones with small influences. This is because the more parameters to be estimated a model has, the more ill-conditioned the model is [7,8,12,15,16]. Methods in Section 3.4 can be used to select models of systematic errors. 6.1.5 Mathematical Processing for Radar Measurement Data
The main work of space radar data processing is to estimate accurately and correct measurement systematic errors, and to reveal the statistical characteristics of random errors, thus finally achieve high-precision measurement data consistent with the engineering background. For measurement random errors, the main task is to analyze their statistical properties. For CW radar measurement data, to deal with
© 2012 by Taylor & Francis Group, LLC
358
M e A suReM en t DAtA M o D eLIn G
random errors is to reveal the statistical properties of random errors on distance sum, distance difference, and their change rates. This work has been carried out in detailed discussion in Chapter 4. The most important thing for processing radar measurement data is the estimation and correction of systematic errors. The rule is rather simple, systematic errors are much larger than random errors by experiences. Thus, estimating and eliminating systematic errors is the core mission of space tracking data processing. The traditional method of processing systematic errors is the EMBET method [18]. The main idea of the EMBET method is to estimate systematic errors and trajectory parameters by using various measurement elements at the same time. Now, we briefly introduce the EMBET method. A detailed discussion can be found in Section 6.6. Suppose at time t we have p measurement elements: y1(t), y2(t), …, yp(t). Let X = X (t ) = (x , y , z, x , y , z )τ = (x(t ), y(t ), z(t ), x(t ), y (t ), z(t ))τ be the trajectory parameters at time t and X ∗ (t ) = (x ∗ , y ∗ , z ∗ ,
x ∗ , y ∗ , z ∗ )τ = (x ∗ (t ), y ∗ (t ), z ∗ (t ), x ∗ (t ), y ∗ (t ), z ∗ (t ))τ be the nominal trajectory. Assume the systematic errors to be estimated as α = (a1, a2, …, aq)τ, while the number of measurement elements p > q + 6. The model of the systematic errors is linear, so y1 (t ) = f 1 (x , y , z, x , y , z ) + u1α + ε1 (t ) y 2 (t ) = f 2 (x , y , z, x , y , z ) + u2 α + ε 2 (t ) y p (t ) = f p (x , y , z, x , y , z ) + u p α + ε p (t ) Let y1 ( t ) y 2 (t ) , Y (t ) = y p (t )
© 2012 by Taylor & Francis Group, LLC
f 1( X ) f 2(X ) , f (X ) = f p ( X )
(6.6)
DAtA F R o M R A DA R M e A suReM en t s
359
ε1 (t ) u1 ε 2 (t ) u 2 U = , ε = ε p (t ) u p By Equation 6.5 we have Y (t ) = f ( X ) + U α + ε
(6.7)
where Y is the measurement data vector, ε is the measurement random error vector, f (X) is the true signal, and Uα is the measurement systematic error. From the nominal trajectory X* at time t, model (6.7) can be rewritten as Y = f ( X )* + ∇f ( X * )( X − X * ) + U α + ε or X − X * Y − f ( X ) = (∇f ( X ),U ) +ε α *
*
(6.8)
By the linear regression model (6.8) we can get the estimation of X – X * and α, thus get the estimation of trajectory parameters X and the systematic error parameters α. This is single-point EMBET method. If we combine data at multiple time points to process, then it will form the multipoint EMBET method. From the theoretical analysis and simulation results in Section 6.6, we can see that the precision of the EMBET methods is comparatively low. Owing to the drawbacks of traditional methods, it is an urgent task to find a new way of estimating systematic errors and trajectory parameters. This chapter focuses on methods of estimating systematic errors and trajectory parameters. The core idea is to convert the data processing problem into a parameter estimation issue in a nonlinear model with fewer estimated parameters and smaller modeling errors. To reduce the parameters to be estimated and to eliminate the ill-conditioned
© 2012 by Taylor & Francis Group, LLC
360
M e A suReM en t DAtA M o D eLIn G
situations of the model, we do not consider the issue by modeling at each time point; we consider data for a period of time. The use of polynomials or polynomial splines to represent trajectory parameters will reduce the number of estimated parameters dramatically since the amount of data is huge. The reasons of doing this are on one hand, the number of the parameters to be estimated is decreased; on the other, it can take full advantage of matching relationships between trajectory parameters and thus greatly alleviate the ill-conditioned degree of the model. The systematic errors we consider fall into three categories: constant systematic errors, time delay errors, and slow drift errors of distance and velocity measurements. The application of matching relationships between the trajectory parameters makes measurement data processed by the method proposed in this chapter more consistent with their engineering rationalities. The traditional EMBET methods do not employ the useful information provided by matching principle (Theoretical analysis and simulations show that the information can effectively improve the estimation precision of trajectory parameters and systematic errors.), so the data processed cannot meet the matching relations well. To translate the data processing problem into a parameter estimation issue, we need some theoretical bases such as the parametric representation method for estimated functions, time-series analysis method, and parameter estimation theories of modern linear or nonlinear regression analysis. See Chapters 2 through 4 and references for more details. There is another prominent feature of the method proposed in this chapter. Its realization does not require nominal trajectory. The traditional EMBET method requires nominal trajectory and the precision of the estimation depends heavily on the precision of the nominal trajectory. The feature of the method described in this chapter not only brings convenience to the estimation, but also improves the precision significantly. The description approach of the trajectory parameters is to keep the parameters to be estimated as few as possible in the premise of ensuring the approximation precision of trajectory parameters. According to this principle, we usually use polynomials and
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
3 61
polynomial splines to express the trajectory parameters during the boost phase, while describe the parameters by equations during the free flight phase. As for the classification of systematic errors, we propose the concepts of matched and unmatched systematic errors, and also give the corresponding estimation methods. Since there are measurement data for both the distance sum (or difference) and its change rate, and the true signals are matched, it is beneficial to the accurate estimation of systematic errors by distinguishing systematic errors by the matching principle. 6.2 Parametric Representation of the Trajectory
It is a distinguishing feature of this book to convert the data processing problem into a parameter estimation issue. To improve the efficiency of the estimation, we need to build a matching, accurate nonlinear model with fewer parameters and also make it easy to separate the systematic errors and the trajectory parameters. To use relatively fewer parameters to represent the trajectory in a period of time has at least two advantages. 1. To alleviate the ill-conditiond of the model, and improve the estimation precision of the trajectory parameters (we will prove it in Section 6.6). 2. To combine the data of a certain time interval and do concentrated process, which makes the results satisfy the matching principle and match the engineering background better. The method below can represent the numerous trajectory parameters of a certain period of time with very few parameters to be estimated. 6.2.1 Equation Representation of Trajectory
Let (x(t), y(t), z(t)) be the position of a spacecraft at time t in the launching reference frame of oxyz. Recall that x = x(t), y = y(t), z = z(t) describes a spatial curve. Take the x direction as an example, x(t ), x(t ), x(t ) represents the position, velocity and acceleration in the x direction, respectively. The motion equation for the spacecraft is
© 2012 by Taylor & Francis Group, LLC
362
M e A suReM en t DAtA M o D eLIn G
x x d dt y = y z z x d y = P + R + FC + mg + Fk − mat − mak dt z
(6.9)
(6.10)
where P is the thrust force, R is the aerodynamic force, Fc is the control force, mg is the gravitational force, Fk is the additional Coriolis force, −mat is the centrifugal force, and −mak is the Coriolis inertial force. Generally, the right-hand side of Equation 6.10 is a known function of X (t ) = (x , y , z, x , y , z )τ , and the above equations can be simplified as dx(t ) = F (t , X (t )) dt
(6.11)
As long as the expression of F is accurate, Equation 6.11 is accurate [5]. If the equations are accurate and the initial values at time t 1 are known, then by dX (t ) dt = F (t , X (t )) X (t1 ) = η
(6.12)
we can solve X(t 2), X(t 3), …, X(tm). If η is unknown, then we can use the data at t 1, t 2 , …, tm to estimate η. Obviously, there are only six parameters to be estimated in η, while there are 6m parameters at all m time points. By using the trajectory equations we can reduce the estimation of 6m parameters into six, and the estimation efficiency can be improved significantly [16]. The real situation is that during the boost phase, the trajectory Equations 6.11 cannot be very accurate due to the inaccurate
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
363
knowledge of the thrust and the control forces. However, we also need to mention that the data during the free flight phase can be well described by the equations, since the free flying spacecraft is only affected by the three forces mg , −mat , −mak , which are all clearly known as functions of X(t) with known expressions. We will discuss this later in detail in Section 6.7. 6.2.2 Polynomial Representation of Trajectory
Expressing trajectory parameters by polynomials is a traditional method [1]. The theoretical basis for this is that according to the trajectory equations and the analysis of the various forces in the equations, the fourth-order derivatives x(4)(t), y(4)(t), z(4)(t) all have small absolute values. And based on the discussions in Chapter 2, in a period of not so long time, the trajectory parameters can be expressed in three-order polynomials, that is, x(t ) = a1 + a2 t + a3 t 2 + a4 t 3 y(t ) = a5 + a6 t + a7 t 2 + a8 t 3 z(t ) = a + a t + a t 2 + a t 3 12 9 10 11 2 x(t ) = a2 + 2a3t + 3a4 t y (t ) = a6 + 2a7 t + 3a8 t 2 z(t ) = a10 + 2a11t + 3a12 t 2
(6.13)
which shows that in a not-so-long interval, only 12 parameters are needed to represent the trajectory parameters. The truncation error of approximating the trajectory parameters with three-order polynomials can be given by the following theorem. Theorem 6.1 If f (t) ∈ C4 [T1, T2] and |f (4) (t)| ≤ δ, then there exists a three-order polynomial P (t) such that
© 2012 by Taylor & Francis Group, LLC
364
M e A suReM en t DAtA M o D eLIn G
π 4 (T2 − T1 )4 δ 30720
(6.14)
π 3 (T2 − T1 )3 f (t ) − P (t ) ≤ δ 1536
(6.15)
f (t ) − P (t ) ≤
Proof By the application of Theorem 2.11, and taking n = k = 4 we can get the above conclusions. In practice, the interval here generally refers to 5–10 s. If the time interval is too long, then the trajectory may be not well expressed by three-order polynomials; while for a short interval, the number of parameters cannot be reduced to a good extent. The simulation results below show the approximation precision of the trajectory by polynomials. Example 6.1 We take the theoretical trajectory parameters at the 290-th second as the initial values, and use trajectory equations to generate trajectory data of 10s. Use polynomials to fit into the data, and denote Px (t), Py (t), Pz (t) as the fitted polynomials. The simulation shows that max Px (t ) − x(t ) = 0.154 D − 3
290 ≤ t ≤ 300
max Py (t ) − y(t ) = 0.818D − 3
290 ≤ t ≤ 300
max Pz (t ) − z(t ) = 0.366D − 5
290 ≤ t ≤ 300
max Px (t ) − x(t ) = 0.209D − 4
290 ≤ t ≤ 300
max Py (t ) − y (t ) = 0.177 D − 3
290 ≤ t ≤ 300
max Pz (t ) − z(t ) = 0.266D − 3
290 ≤ t ≤ 300
The simulation above demonstrates that using polynomials to approximate trajectory parameters can give very satisfactory effects. The application of polynomials to represent trajectory has very important significances in data processing problems. It can be applied not only in the estimation of trajectory parameters and
© 2012 by Taylor & Francis Group, LLC
365
DAtA F R o M R A DA R M e A suReM en t s
the systematic errors, but also in the moving average method of polynomials and the detection of abnormal data.
6.2.3 Matching Principle
Let (x(t ), y(t ), z(t ), x(t ), y (t ), z(t ))τ be trajectory parameters, then dy(t ) dx(t ) dz(t ) = x(t ), = y (t ), = z(t ) dt dt dt
(6.16)
Equation 6.16 is actually the matching principle for trajectory parameters. However, this simple but practical principle is often ignored in some conventional data processing methods. Example 6.2 Generate the true trajectory parameters by simulation, and generate the true values of (S , P ,Q , S, P ,Q ) . Then add in the random errors and the systematic errors. Use the point-by-point method to solve and get the measurement data for the trajectory parameters with errors. Use six polynomials (U, V, W, u, v, w) to fit the data of (x (ti ), y(ti ), z (ti ), x (ti ), y (ti ), z(ti )) (i = 1, 2, …, m) respectively, where U, V, W are three-order polynomials and u, v, w are two-order polynomials. Obviously, there should be dU (t ) dt − u(t ) = 0 dV (t ) − v(t ) = 0 dt dW (t ) − w(t ) = 0 dt However, the fact is not like this. We take ti = 290 + 0.05i (i = 1, 2, …, 200), and use 10 s real data to test the equations above, and the results turn out to be that dU (t ) dt − u(t ) ≈ 0.0661 dV (t ) − v(t ) ≈ 0.0207 dt dW (t ) − w(t ) ≈ 0.0153 dt
© 2012 by Taylor & Francis Group, LLC
366
M e A suReM en t DAtA M o D eLIn G
Obviously, this group of data does not meet the matching principle. It is because the point-by-point approach does not apply the fact that the trajectory parameters are continuously differentiable functions of time t and the matching principle in Equation 6.16 is not used. The above analysis shows that trajectory parameters of different instants were treated as independent, and this has introduced too many parameters to be estimated, and cannot guarantee that the matching of the trajectory parameters estimation results. We need to pay attention that due to the existence of measurement errors, point-by-point (one-by-one sampling time) data processing method cannot make the processed data to meet the matching principle. And this is a very negative factor for improving the precision of the data processing. We can process a relatively long time of observational data at one time. This is convenient for using the matching principle of Equation 6.16 [13,15,19–22]. If the polynomials P (t) and u (t) are the fitting functions for x (t) and x(t ) respectively, then we have u(t ) = P (t ) . Equation 6.13 is actually the expression of the three-order polynomials used to fit the trajectory parameters. For spline fitting, there is similar conclusion, that is to say, if S(t) is the spline fitting for x(t), then S(t ) is the spline fitting for x(t ). This is also the case for the y and z directions. The two main advantages of applying the matching principles are the following: 1. There is no unmatched phenomenon for processed data. 2. The number of estimated parameters is reduced and the precision of data processing can be improved. We need to mention that the EMBET methods estimate the systematic error and the trajectory parameters point-by-point. Both single-point and multipoint method EMBET methods treat trajectory parameters at different time as independent parameters to be estimated, thus the processed data usually do not meet the matching relationship. 6.2.4 Spline Representation of Trajectory
Functions expressed by polynomials only have favorable features locally. We can only process 5–10 s of data while using polynomials to
© 2012 by Taylor & Francis Group, LLC
367
DAtA F R o M R A DA R M e A suReM en t s
approximate the trajectory. When we estimate the constant systematic error, 5–10 s of data are not enough to completely solve the illconditioned problem. Another drawback of the polynomial representation is that if the data are separately processed in various intervals by polynomials, then there is a connection problem for the data processing results at the junctions of different time intervals. Discontinuous and unsmooth phenomena would occur. During the boost phase and the reentry phase, we cannot apply the approach of describing the trajectory parameters with equations and initial values to reduce the estimated parameters due to the unclear knowledge of the forces on the spacecraft. The reasons mentioned above lead to the introduction of the polynomial spline to approximate the trajectory. Because of the characteristics of trajectories, we use three-order polynomial splines. In Section 6.2.2, we have mentioned that x(t), y(t), z(t) are all fourorder continuous differentiable functions, and x(4) (t), y(4) (t), z(4) (t) have small absolute values. Such kind of functions can be well depicted by spline functions. Theorem 6.2 Suppose f (t) ∈ C4 [T2, T N −1], S(t) is the interpolation spline function which satisfies S(T2 ) = f (T2 ), S(TN − 1 ) = f (TN − 1 ) S (T j ) = f (T j ), j = 2, 3,…, N − 1
(6.17)
5 4 4 f (t ) − S (t ) ≤ 384 f (t ) ∞ ⋅h f (t ) − S(t ) ≤ 1 f 4 (t ) ∞ ⋅h 3 24
(6.18)
Then
where h = (TN − 1 − T2 )/( N − 3), f ( 4 ) (t ) = ∞
© 2012 by Taylor & Francis Group, LLC
max
T2 ≤ t ≤ TN − 1
f ( 4 ) (t )
368
M e A suReM en t DAtA M o D eLIn G
Proof See Ref. 9. Theorem 6.2 tells us that as long as f 4 (t ) ∞ is not too large, then we can choose an adequate h such that S(t) and S(t ) can simultaneously approximate f (t) and f (t ), respectively. The measurement data processing is realized by parameter estimation. Therefore, we would like to write S(t) in a parametric form. Denote j = 1, 2,…, N
T j = T2 + ( j − 2)h ,
0, τ ≥ 2 3 (6.19) B( τ) = τ 2 − τ 2 + 2 3 , τ ≤ 1 3 2 − τ 6 + τ − 2 τ + 4 3 , 1 < τ < 2
( ) ( )
Theorem 6.3 There exists a unique set of coefficients (b1, b2, …, bN) such that S (t ) =
N
t − Tj h
∑ b B j =1
j
(6.20)
which satisfies the condition in Equation 6.17. Proof See Section 2.3. By the application of Theorems 6.2 and 6.3, we can convert the estimation problem of trajectory parameters into the spline coefficients estimation. The trajectory parameters can be written in the following form:
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
x ( t ) = y(t ) = z(t ) = x(t ) = y (t ) = z(t ) =
N
369
t − Tj h
∑ a B j =1
j
N
∑a j =1
j +N
t − Tj B h
N
∑a j =1
j + 2N
t − Tj B h
N
a j t − Tj h j =1
∑ h B
(6.21)
N
a j+N t − Tj B h h j =1
∑ N
a j +2N t − T j B h h j =1
∑
The spline function representation of trajectories can overcome the deficiency of the polynomial representation. The use of cubic spline representation for trajectory parameters has the characteristics below. 1. Between the adjacent two spline nodes, the cubic spline is a three-order polynomial. 2. At the spline nodes, the cubic spline is continuous and has continuous first and second-order derivatives. This is consistent with the engineering background. 3. By the representation, we can process together the measurement data of a long period of time interval, and establish the model for estimating the trajectory parameters and the systematic errors. Because of the long time interval and the existence of large amount of data, it is very helpful to alleviate the ill-conditioned problem of regression models. The theoretical analysis and simulation calculations in Section 6.6 will fully illustrate this point. 4. As long as reasonable spline nodes are selected, the precision of the trajectory approximation can be fully guaranteed. The denser the nodes are (with a smaller h), the
© 2012 by Taylor & Francis Group, LLC
370
M e A suReM en t DAtA M o D eLIn G
better the approximation effect is [23]. Of course, too close nodes will increase substantially the number of parameters to be estimated, so that the effect of parameter estimation will be inf luenced largely by the random errors. Simulation results show that the taking h = 5 (seconds) is fairly appropriate. The usage of spline representation of trajectories is mainly used in the boost phase data processing. Example 6.3 Use the trajectory equations in the free flight phase to simulate 30 s trajectory data, and then use cubic spline to fit the trajectory parameters. The fitting precision is shown in Table 6.3. From Table 6.3 we can see that it has very high precision by using spline to describe the trajectory in free flight phase. It also has relatively high precision to depict the boost phase. We also need to point out that besides the common spline representation, the application of spline wavelet [24], which combines the time domain and frequency domain, can also give very satisfactory results.
6.3 Trajectory Calculation
The MISTRAM system is the most basic one among the trajectory tracking systems. We categorize those measurement systems, which have the same positioning mathematical principles, also as MISTRAM systems. And the mathematical processing methods for Table 6.3 Error of spline Fitting the Trajectory (h = 5 s) TImE DIRECTIoN x(t) – Sx(t) y(t) – Sy(t) z(t) – Sz(t) x(t ) − S (t )
290.00
305.00
320.00
0.164D-5 0.870D-5 –0.689D-8
0.525D-4 0.234D-3 –0.579D-5
0.455D-4 0.194D-3 –0.585D-5
–0.975D-4
–0.121D-4
–0.202D-4
y(t ) − Sy (t )
0.484D-5
–0.572D-4
–0.168D-3
z(t ) − Sz (t )
0.932D-6
0.985D-7
–0.423D-6
x
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
3 71
measurement data from such systems are also the same. In this section, we first discuss the approach of calculating the trajectory parameters of a single moment by using the single-epoch MISTRAM system measurement element data. On this basis, the polynomial characteristics and the matching principle of trajectory parameters are applied to give the nonlinear method of calculating the trajectory. These methods have the characteristics of high precision and easy calculation, while no standard trajectory is needed, thus make them very practical. 6.3.1 Mathematical Method for MISTRAM System Trajectory Determination [14]
6.3.1.1 Problem Introduction MISTRAM systems are commonly used radar tracking measurement systems. The principle is to measure three different distances and their change rates data, and then to use mathematical methods to calculate the solution of trajectory parameters. The following two mathematical problems need to be addressed in the applications.
1. How can we solve the trajectory parameters from the observational data? 2. How does the observational error propagate? The function relation between the measurement data and the trajectory parameters is nonlinear. The traditional method is to find a nominal trajectory first, and do the first-order Taylor expansion near the nominal trajectory to get the approximate linear relationship between them. After that, by solving linear equations the trajectory parameters are determined. This method has two significant disadvantages: One is that a fairly high-precision nominal trajectory is required; the other is that truncation error is generated during the linearization process. Therefore, more effective ways are demanded to solve trajectory parameters determination problem. The following presents a novel trajectory determination method, which does not require nominal trajectory. The approach just needs simple computation, and the precision is high with no truncation error. The error propagation relationship is also clear.
© 2012 by Taylor & Francis Group, LLC
372
M e A suReM en t DAtA M o D eLIn G
6.3.1.2 Mathematical Model for the MISTRAM System Measurement Data Denote the measurement elements of a MISTRAM system as
S (t ), P (t ),Q(t ), S(t ), P (t ),Q (t ). Take
b1 (t ) = S (t ) = RT (t ) + RR (t ) b2 (t ) = S (t ) − P (t ) = RT (t ) + RP (t ) b3 (t ) = S (t ) − Q(t ) = RT (t ) + RQ (t ) b1 (t ) = S(t ) = RT (t ) + R R (t ) b2 (t ) = S(t ) − P (t ) = RT (t ) + R P (t ) b3 (t ) = S(t ) − Q (t ) = RT (t ) + RQ (t ) With the notations above, the mathematical model of the measurement data b1 (t ), b2 (t ), b3 (t ), b1 (t ), b2 (t ), b3 (t ) obtained by the MISTRAM system at time t is bi (t ) = bi (t ) + εi (t ), i = 1, 2, 3 bi (t ) = bi (t ) + δ i (t ), where ε1, ε2, ε3, δ1, δ2, δ3 is the measurement error. 6.3.1.3 Mathematical Method for Trajectory Determination
After getting
the measurement data b1 (t ), b2 (t ),…, b3 (t ) at time t, we need to use these data to solve the trajectory parameters of the spacecraft. For convenience, the following notations are introduced: x − x T R A = xT − x P x − x Q T
yT − y R yT − y P
yT − yQ
zT − zR zT − zP zT − zQ
x 2 + y 2 + z2 − x 2 − y 2 − z2 T T T R R R 2 2 2 2 2 2 C = xT + yT + zT − x P − y P − zP 2 2 2 2 2 2 xT + yT + zT − xQ − yQ − zQ
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
x − xT x − xR + R RR T x − x x − xP T B = + RP RT x − xT x − xQ + RQ RT
y − yT y − yR + RT RR y − yP y − yT + RT RP y − yQ y − yT + RT RQ
3 73
z − zT z − zR + RT RR z − zT z − zP + RT RP z − zT z − zQ + RT RQ
Owing to the stations distribution features, both A and B are nonsingular matrices. Define the vector value functions α = (α1, α2, α3)τ , β = (β1, β2, β3)τ of the vector b = (b1, b2, b3)τ as follows: α = A −1b β=
1 −1 1 A C + A −1 (b12 , b22 , b32 )τ − (xT , yT , zT )τ 2 2
(6.22) (6.23)
With these notations, we have Theorem 6.4 If the following equations b = R + R T R 1 b2 = RT + R p b3 = RT + RQ
(6.24)
have solutions (x, y, z)τ , then the solution must satisfy the following equation: 2 A (x , y , z )τ = C + (b12 , b22 , b32 )τ − 2RT ⋅ (b1 , b2 , b3 )τ Proof We leave the proof as an exercise.
© 2012 by Taylor & Francis Group, LLC
(6.25)
3 74
M e A suReM en t DAtA M o D eLIn G
Note that RT2 = (x , y , z )τ − (xT , yT , zT )τ
2
1 −1 1 A C + A −1 (b12 , b22 , b32 )τ − (xT , yT , zT )τ − RT A −1 (b1 , b2 , b3 )τ 2 2 2 = β − RT α
2
=
= β
2
2
− 2RT β τ α + α RT2 ,
that is, (α
2
− 1)RT2 − 2β τ αRT + β 2
RT =
2
2
βτ α ± (βτ α)2 + β − α ⋅ β α
2
(6.26)
=0 2
−1
(6.27)
Theorem 6.5 The trajectory parameters of the MISTRAM tracked spacecraft can be given by the following two equations: b 2 x b 1 1 y = 1 A −1C + 1 A −1 b 2 − R A −1 b T 2 2 2 2 2 b3 z b3
(6.28)
b x 1 y = B −1 b 2 b3 z
(6.29)
where RT is given by Equation 6.27. Proof By Equation 6.25 and the nonsingularity of the matrix A we can get Equation 6.28. Since (b1 , b2 , b3 )τ = B ⋅ (x , y , z )τ , and by the nonsingularity of the matrix B, we can get Equation 6.29.
© 2012 by Taylor & Francis Group, LLC
375
DAtA F R o M R A DA R M e A suReM en t s
Remark 6.1 By the definition of b and the engineering background, we can know that the solution to Equation 6.24 uniquely exists, thus there should be no less than 1 of the positive number RT given by Equation 6.27. If there is just one RT given by Equation 6.27, then substituting RT into Equation 6.28 will yield the solution needed; or if there are two RT given by Equation 6.27, then substitute them into Equation 6.28 will yield two sets of solutions (x, y, z)τ . Substitute them into Equation 6.24 and by the uniqueness of the solution and the engineering background, we can determine the solution we need. Thus, we solve the τ B−1 and furthermore (x , y , z ) . 6.3.1.4 Error Propagation Relationship In practical applications, true
values of (b1 , b2 , b3 , b1 , b2 , b3 )τ are unknown. Instead we know the observed values of (b1 , b2 ,…, b3 )τ with errors. Note that the observational errors are always small in quantity, so when we substitute (b1 , b2 ,…, b3 )τ with (b1 , b2 ,…, b3 )τ , the existence of the solution to Equations 6.24 and 6.26 will not be affected. That ensures that we can use the observational data with errors to solve the “errorcontaminated” trajectory parameters point by point. In the following part, we probe into the propagation discipline of the observational error. Suppose the measurement error (ε1, ε2, ε3)τ of (b1, b2, b3)τ induces the error (Δx, Δy,Δz)τ on (x, y, z)τ, then we have the following conclusion. Theorem 6.6 With the notations above, the error propagation from (b1, b2, …, b3)τ to (x, y, z)τ obeys the following discipline: ( ∆x , ∆y , ∆z )τ = B −1 ⋅ (ε1 , ε 2 , ε3 )τ
(6.30)
Proof Suppose R j =
(x + ∆x − x j )2 + ( y + ∆y − y j )2 + (z + ∆z − z j )2
from Equation 6.24, we have
© 2012 by Taylor & Francis Group, LLC
376
M e A suReM en t DAtA M o D eLIn G
b + ε RT + 1 1 b + ε = R + 2 T 2 b3 + ε 3 RT +
R P RQ R R
(6.31)
Denote H(Rj) as the Hesse matrix of Rj. For spacecrafts, we note that Rj = ( j = P, Q, R, T) is relatively large, then we get H ( R j ) ≈ 0, j = P ,Q , R,T Thus by Equations 6.24, 6.22 and Taylor expansion formulas we get (ε1 , ε 2 , ε3 )τ ≈ B ⋅ ( ∆x , ∆y , ∆z )τ from which the nonsingularity of the matrix B, we can get Equation 6.30. Meanwhile, we can also get the relationship between the observational errors (ε1, ε2, ε3, δ1, δ2, δ3)τ of (b1 , b2 , b3 , b1 , b2 , b3 )τ and the error τ ( ∆x , ∆y , ∆z )τ of (x , y , z ) . We also leave it as an exercise. 6.3.2 Nonlinear Regression Analysis Method for Trajectory Determination
A set of MISTRAM radar measurement system tracks a spacecraft. At each instant t, six observational data are obtained as follows: 6.3.2.1 Introduction
u1 (t ) = S (t ) + ε1 (t ),
u4 (t ) = S(t ) + ε 4 (t )
u1 (t ) = P (t ) + ε 2 (t ),
u5 (t ) = P (t ) + ε 5 (t )
u3 (t ) = Q(t ) + ε 3 (t ),
u6 (t ) = Q (t ) + ε 6 (t )
In Section 6.3.1, we discussed the point-by-point method for solving the trajectory parameters. The method has some drawbacks. First, since B −1 is relatively big, the pointwise method cannot avoid the enlargement effect of the observational noise, thus the solution has large errors; second, the parameters to be estimated are numerous. Take 10 s as an example, if there are 20 data in every second,
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
377
then we need to solve 6 × 200 trajectory parameters, and the processing has to be performed in two steps, which need a rather long time; third, the precision of this method relies on the precision of the provided nominal trajectory; last but not the least, the data processed by this method cannot ensure to satisfy the matching principle. In the following, we fuse the measurement data of a period of time to process together. We employ the polynomial characteristics of trajectory parameters and start directly from the measurement data to build a nonlinear regression analysis model for the trajectory parameter estimation. Then we use the nonlinear regression analysis method to estimate the trajectory parameters. We combine the two steps in the traditional approach into one, reduce the number of parameters to be estimated, eliminate the amplification characteristics of measurement noise propagation, and get rid of the dependence on the nominal trajectory. The simulation results show that this method provides apparent precision improvement than the point-by-point solution method.
Suppose F ( X (t )) = (S (t ), P (t ),Q(t ), S(t ) X (t )) = (S (t ), P (t ),Q(t ), S(t ), P (t ),Q (t )) is the measurement elements vector at time t, U (t) = (u1 (t), u 2 (t), …, u 6 (t)) τ is the observational data of F (X(t)). Note that F (X(t)) is the nonlinear vector function of X (t), consequently, U(t) and X (t) satisfy the following nonlinear model: 6.3.2.2 Mathematical Model Establishment τ
U (t ) = F ( X (t )) + ε(t )
(6.32)
where ε(t) = (ε1(t), ε2(t), …, ε6(t))τ is the observational error which obeys normal distribution. Suppose there are observational data {U(t), 1 ≤ t ≤ T} of totally T epochs, then expand Equation 6.32 with time will get the nonlinear regression model as follows: U = F ( X ) + ε, ε ∼ N (0, K ) where U = (U (1)τ ,U ( 2)τ ,…,U (T )τ )τ
© 2012 by Taylor & Francis Group, LLC
(6.33)
3 78
M e A suReM en t DAtA M o D eLIn G
F ( X ) = ( F ( X (1))τ , F ( X ( 2))τ ,…, F ( X (T ))τ )τ ε = ( ε(1)τ , ε( 2)τ ,…, ε(T )τ )τ are all 6T-dimensional vectors. Assume that the measurement elements here are all independent from each other, and K only involves the temporal correlation of the measurement data. Refer to Chapter 4 for the temporal correlation of K. After obtaining K, do the normalization processing to Equation 6.33 as −1
−1
−1
−1
K 2U = K 2 F ( X ) + K 2 ε, K 2 ε ∼ N (0, I )
(6.34)
There are 6T parameters to be estimated in model 6.34 as X = (X (1) , X (2)τ, …, X (T)τ)τ , which are very hard to solve directly. With regard to this, we will apply the polynomial characteristics of the trajectory parameters. Except for the inter-stage separation phase, the trajectory parameters for a ballistic spacecraft have very small fourth-order derivative. From Theorem 6.4, the truncation error can be neglected in a certain period of time. The length of the time interval is related to the magnitude of the above fourth-order derivative. And based on the engineering background and the data processing experiences, it is usually taken as 5–10 s. Substituting Equations 6.22 and 6.13 into Equation 6.32 yields τ
U (t ) = F (t , β) + ε(t )
(6.35)
where β = (α 1 , α 2 ,…, α 12 )τ . −
1
K 2U = K
−
1 2
F (β) + K
−
1 2
ε, K
−
1 2
ε ∼ N (0, I )
(6.36)
where F (β) = ( F (1, β)τ , F ( 2, β)τ ,…, F (T , β)τ )τ Compare models 6.36 and 6.34, we can see that the number of the estimated parameters dramatically reduces from 6T to 12.
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
3 79
To summarize the discussions above, we have established the nonlinear regression model equation 6.36 to describe the relationship between the measurement data U and the trajectory parameters polynomial coefficients β. For convenience, we assume that K = I. Now the parameter estimation of the nonlinear regression model (6.36) can be converted into the optimization problem as 6.3.2.3 Algorithm and Error Analysis
U − F (β)
2
= min
(6.37)
Q(β) =∆ (U – F(β))τ (U – F(β)) is called the cost function. The solution β* to Equation 6.37 is called optimal solution. To identify particularly, in the following part, we take β(k) as the kth iterative estimation of β instead of the component of β. Note that at a certain initial point β(0), Q(β) has the gradient ∇Q(β(0)) = − 2V (β(0))τ (U – F (β(0))), where V (β(0)) = ∇F(β(0)) is the gradient matrix of F (β) at β(0). From the theory of nonlinear programming [5], the optimal solution to Equation 6.37 can be calculated by the following iteration: β( k + 1) = β( k ) + α kV (β( k ) )τ (U − F (β( k ) ))
(6.38)
Different choices of αk will lead to different algorithms in nonlinear programming theory. Now we pick the Gauss–Newton method, that is to take αk = (V(β(k))τ V(β(k)))−1, hence (V(β(k))τ V(β(k)))−1 V(β(k))τ (Y – F(β(k))) is the Gauss–Newton direction of Q(β) at β(k). Now we have the iterative formula of β(k) as β( k + 1) = β( k ) + (V (β( k ) )τV (β( k ) )−1V (β( k ) )τ (U − F (β( k ) )) (6.39)
In Equation 6.37, if F (β) is a linear function of β, then the optimal solution to Equation 6.37 is the least-squares estimation of the usual linear regression model. For Equation 6.38, starting from any initial value, it will get the uniformly minimum variance unbiased estimate of β by just one iteration. After obtaining the estimate β* of the parameter β, substitute it into Equation 6.13 we will get the solution X * (t ) = (x(t )* , y(t )* , z( X * (t ) = (x(t )* , y(t )* , z(t )* , x(t )* , y (t )* , z(t )* )τ of the trajectory parameters.
© 2012 by Taylor & Francis Group, LLC
380
M e A suReM en t DAtA M o D eLIn G
Based on the discussions above, we now acquire the nonlinear iteration algorithm for solving the trajectory parameters, abbreviated as Algorithm 1. Algorithm 1 The improved Gauss–Newton method [8] Step 1: Normalize model 6.34 according to Equation 6.36; Step 2: Give the initial value β(0), which is obtained by fitting the “error-contaminated” trajectory parameters solved by the measurement elements with errors; give the constants λ > 0, δ > 0; Step 3: Calculate β(1) = β(0) + λ (V(β(0)) τ V(β(0))) −1 V(β(0)) τ (U – F(β (0))); Step 4: Calculate Q(β(0)), Q(β(1)). If Q(β(1)) ≤ Q(β(0)) ≤ δ, then turn to Step 5; if Q(β(1)) > Q(β(0)), then let λ = λ / 2 and turn to Step 3; Step 5: For a given δ, if Q(β(0)) – Q(β(1)) ≤ δ, then β* = β(1); otherwise, let β(0) = β(1) and reset λ into the value given in Step 2, return to Step 3; Step 6: Substitute β* into Equation 6.13 as polynomial coefficients to calculate X *(t). In Algorithm 1, δ is called tolerance error, Step 5 is the stop criterion and λ is called convergence factor. Step 4 ensures that each iteration makes the cost function value to decrease. For the proof of the convergence of the algorithm see [11]. From the nonlinear regression theory (see Chapter 3 or [8]), we know that the nonlinear model (6.36) has the following approximate error formula: ∆β ≈ (V (β)τ (β))−1V (β)τ ε
(6.40)
Hence from Equation 6.13, we have the approximate deviation and variance formulas for X *(t)
© 2012 by Taylor & Francis Group, LLC
E( X * (t ) − X (t )) ≈ 0
(6.41)
VarX * (t ) ≈ W (t )τ (V (β)τV (β))−1W (t )
(6.42)
3 81
DAtA F R o M R A DA R M e A suReM en t s
where 1 0 0 W (t )= 1 0 0
t
t2
t3
0
0
0
0
0
0
2t 0
0
0
0
0
0
0
0
1
t
t
2
3
0
0
0
0
0
0
0
0
1
3t 2
3t 2
t
t2
0
0
0
0
0
0
0
0
0
0
1
2t
0
0
0
0
0
0
0
0
0
0
1
2t
0
t
0 0 t3 0 0 3t 2
We use the third-order polynomials to approximate the trajectory parameters in all three directions. In practical problems, sometimes it will give better approximation effect while using the second-order polynomials in some direction. To this end, the AIC criterion is given below to further adjust the polynomial order (see Chapter 4 or [7]), as shown in Table 6.4. AIC( px* , p *y , pz* ) = min AIC( px , p y , pz ) where px, py, pz represent the polynomial order in the x, y, z directions, respectively. AIC( px , p y , pz ) = 1og Q( px , p y , pz ; β * ) −
2( p1 + p2 + p3 ) T
Q( px , p y , pz ; β * ) = U − F (β * px , p y , pz )
2
β*px, py, pz is the p1 + p2 + p3-dimensional polynomial coefficients vector approximated by px, py, pz-order polynomials in the x, y, z directions and estimated by Algorithm 1. The AIC criterion takes a comprehensive consideration of both the magnitude of the cost function value and the number of the polynomial coefficients. Fill the Table 6.4 Polynomial order Determined by AIC Criterion px py pz
3 3 3
© 2012 by Taylor & Francis Group, LLC
3 3 2
3 2 3
3 2 2
2 2 2
2 2 3
2 3 2
2 3 3
382
M e A suReM en t DAtA M o D eLIn G
values of AIC(px, py, pz) into Table 6.4 and pick the px* , p *y , pz* making AIC(px, py, pz) minimal and take them as the finally determined polynomial orders. Assume that the MISTRAM system is tracking a spacecraft. The observational noises of the six measurement elements obtained are white noises, with the standard root variance [2,3] are 0.12 m, 0.009 m, 0.009 m, 0.009 m/s, 0.0006 m/s, and 0.0006 m/s, respectively. Suppose that the MISTRAM system has the sampling rate of 20 points/s. Take a period of 10 s with 200 points of observational data. In Algorithm 1, δ = 10− 2, λ = 1. The nominal trajectory is provided by adding 100 m to the simulated true trajectory. To specify, the Error I in Table 6.6 is the error in the nominal trajectory used in Table 6.5. The initial value β(0) is obtained by fitting the nominal trajectory with three-order polynomial. The calculation results are listed in Tables 6.5 and 6.6, where 6.3.3.4 Simulation Calculation Results
1 σ = 188 2 x
200
∑ (x (i ) − x(i ))
1 E ∆x = 200
*
2
i =1
200
∑ (x (i ) − x(i )) *
i =1
and the others are similar. Table 6.5
Comparison between the method in This section and the Traditional method σy
σz
σ x
σ y
0.15D-2
0.10D-2
0.14D-2
0.25D-3
0.18D-3
0.28D-1
0.31D-1
0.14D-1
0.21D-2
0.17D-2
0.20D-1
0.23D-1
0.99D-2
0.10D-2
0.94D-3
EΔx
EΔy
EΔz
E ∆x
E ∆y
E ∆z
0.31D-1
0.10D-2
0.84D-3
0.98D-4
0.75D-5
0.43D-5
0.37D-1
0.14D-2
0.86D-3
0.98D-4
0.73D-5
0.46D-5
0.32D-1
0.11D-2
0.83D-3
0.96D-4
0.73D-5
0.44D-5
σx Algorithm I 0.35D-1 Point-wise method 0.66D + 0 (before smoothing) Point-wise method 0.48D + 0 (after smoothing) Arithmetic I Point-wise method (before smoothing) Point-wise method (after smoothing)
© 2012 by Taylor & Francis Group, LLC
σ z
383
DAtA F R o M R A DA R M e A suReM en t s
Table 6.6 The Influence of Nominal Trajectory Precision on the Algorithm Results
Value of cost function Iteration times
ERRoR I
ERRoR II (10 TImEs oF ERRoR I)
ERRoR III (50 TImEs oF ERRoR I)
41.37422
41.3954
41.48346
3
5
8
The results in Table 6.5 show that compared to the traditional point-by-point method, the method above has almost the same mean values of the trajectory parameters calculation error. The mean values are close to zero, which is consistent with Equation 6.41. However, the variances of the error are obviously smaller. Therefore, this method has significantly improved the precision of trajectory parameters determination. It can also be seen from Table 6.6 that this method has greatly reduced the dependence on the nominal trajectory precision. Actually the determination precision is not really affected by the nominal trajectory; here we refer the dependence just in terms of the initial values. And the number of iterations also decreases. The above analysis and calculations show that this is a practical and effective method for solving high-precision trajectory parameters. It should be noted that the method is also applicable for other measurement systems and multistation data processing. 6.4 Composite Model of Systematic Error and Trajectory Parameters 6.4.1 Measurement Data Models
The space radar measurement data are mainly composed of three parts: true signal, systematic error, and random error. In Chapter 4, we discussed the statistical features of random errors, while in this chapter we will focus on the estimation methods of systematic errors and true signals. The mathematical models of the measurement data are very essential. For convenience, we take the MISTRAM system as an example. At time t, the MISTRAM system has six measurement data, which can be expressed as
© 2012 by Taylor & Francis Group, LLC
384
M e A suReM en t DAtA M o D eLIn G
S(t ) = S (t ) + CS (t ) + εS (t ) P (t ) = P (t ) + C P (t ) + ε P (t ) Q (t ) = Q(t ) + C (t ) + ε (t ) Q Q S (t ) = S (t ) + DS (t ) + δS (t ) P (t ) = P (t ) + DP (t ) + δ P (t ) Q (t ) = Q (t ) + D (t ) + δ (t ) Q Q
(6.43)
where εS, εP, εQ, δS, δP, δQ are the measurement random errors, whose statistical features can be studied according to Chapter 4; CS, CP, CQ, DS, DP, DQ are systematic errors; S , P ,Q , S, P ,Q are the true signals. If the measured values on the left-hand side of Equation 6.43 do not have random errors or systematic errors, then we can accurately solve for the true values of the trajectory parameters inversely by the six measurement elements. Unfortunately, the fact is these two measurement errors do exist, and the systematic error is fairly large. Thus, by the way in Equation 6.43 we can only get the estimated values of the trajectory parameters. And the estimated values calculated by this method have some differences from the true values of the trajectory. The systematic error is much greater than the random error, and it cannot be compensated, which makes it a main error source affecting the data processing precision. To estimate and correct the system error is a very important work. What needs special attention is that the MISTRAM system can get distance sum (or difference) and its change rate data. From the six formulas in Equation 6.43 we can see that there are matching relations between the true signals in the first and the fourth, the second and the fifth, the third and the sixth expressions. Therefore, the matching principle can be applied. This is very useful for data processing. 6.4.2 Matched Systematic Error and Unmatched Systematic Error
In the previous part, we talked about the matching relationship between the true signals in the measurement data of distance sum and its change rate. What is the case for systematic errors?
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
385
We consider the data processing problem at times t 1, t 2, …, tm in the following part. Decompose the systematic errors of S, P, Q in the three directions as follows, respectively: CS (t ) = CSD (t ) + CSC (t ) CSD (t ) = DS (t ),CSC (t1 ) = 0
(6.44)
C P (t ) = C PD (t ) + C PC (t ) C PD (t ) = DP (t ),C PC (t1 ) = 0
(6.45)
CQ (t ) = CQD (t ) + CQC (t ) CQD (t ) = DQ (t ),CQC (t1 ) = 0
(6.46)
S (t ) = S (t ) + C (t ) SD * P* (t ) = P (t ) + C PD (t ) Q* (t ) = Q(t ) + CQD (t )
(6.47)
Denote
Then model 6.43 can be rewritten as S(t ) = S* (t ) + CSC (t ) + εS (t ) P (t ) = P* (t ) + C PC (t ) + ε P (t ) Q (t ) = Q (t ) + C (t ) + ε (t ) QC Q * S (t ) = S* (t ) + δ s (t ) P (t ) = P* (t ) + δ P (t ) Q (t ) = Q (t ) + δ (t ) * Q
© 2012 by Taylor & Francis Group, LLC
(6.48)
386
M e A suReM en t DAtA M o D eLIn G
where CSC (t1 ) = C PC (t1 ) = CQC (t1 ) = 0
(6.49)
Definition 6.1 We call CSD (t), CPD (t), CQC (t) as matched systematic errors, and CSC (t), CPC (t), CQC (t) as unmatched systematic errors. The causes for unmatched systematic errors are complex [13,16,20– 22,25]. It is mainly because of the different mechanics and principles for measuring the distance and its change rate. Real data analysis and calculations show that the unmatched systematic errors do exist. The unmatched systematic errors are relatively easier to be identified and estimated than the matching systematic errors. The matched systematic errors we need to worry about are two types: one is the constant systematic error; the other is the systematic error, which is proportional to the time t. That is to say, C (t ) = a + b t , D (t ) = C (t ) = b S S S SD S SD C PD (t ) = aP + bP t , DP (t ) = C PD (t ) = bP CQD (t ) = aQ + bQ t , DQ (t ) = CQD (t ) = bQ
(6.50)
The reason for considering these two types matched systematic errors mainly lies in: 1. These two types of systematic errors make the main part of the matching systematic errors. 2. The estimation method for other matching systematic errors is the same as the method for these two types. For a MISTRAM system, there are only six parameters to be estimated in the matching systematic error model in Equation 6.50. For multistation measurement data, which have more measurement elements such as the distance sum and its change rate, the systematic error parameters to be estimated will generally increase in the corresponding number.
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
387
6.4.3 Summary
Section 6.2 discusses the use of a small number of parameters to be estimated (the spline coefficients, polynomial coefficients, the initial values of the trajectory equations, etc.) to represent the trajectory precisely. After representing the trajectory X(t) of a period of time with a few parameters, the CW radar measurement elements such as the distance sum, distance difference, and their change rates can also be expressed by the corresponding parameters. Similarly, the systematic errors to be estimated can also be represented by a small number of parameters according to the model in Section 6.1.4. In short, through the parametric representation of the trajectory parameters and the systematic errors, the true signals and the systematic errors to be estimated in the measurement data model can be converted into a parameter estimation problem. There are several things we should keep in mind when we express the systematic errors and the trajectory parameters with a few parameters. 1. We should ensure that the representation accuracy, that is, to ensure that the modeling error caused by the representation can be neglected. 2. We should ensure that after the parameterization, the composite model which includes both the parameters of systematic errors and the estimated coefficients of the trajectory parameters, can separate the true signals and the systematic errors easily. 3. We should ensure that after the parameterization, the modeling residual is small, the fitting effect is good and the estimation precision is high. 6.5 Time Alignment of CW Radar Multistation Tracking Data 6.5.1 Introduction
The basic working principle of continuous wave radar (CW radar) is that carrier frequency signals with a fixed frequency are transmitted from the transmitter, and the velocity is measured by the Doppler
© 2012 by Taylor & Francis Group, LLC
388
M e A suReM en t DAtA M o D eLIn G
effect while the position is determined using time delay. This type of radars can measure velocity directly with high precision, while does not require many environmental conditions. MISTRAM system is a typical CW radar system. It consists of one transmitter and three receivers. In a practical CW radar measurement system, multiple receivers are often introduced in order to improve the precision of the measurement data. Such a system is called a multistation system. The benefits of a multistation system are reflected in improving the trajectory determination precision and detecting the abnormal data. There is a common time alignment problem for both MISTRAM systems and other multistation systems [19,20]. To estimate and revise the time misalignment quantity error is a crucial issue for improving the precision of trajectory parameters estimation. The following part of this section is organized as follows. First, the measurement principle of CW radars is introduced and analyzed. It will come to the conclusion that only the time misalignment of the receiver stations can have an impact on the observational data. On this basis, a nonlinear regression analysis model is established to estimate trajectory parameters and the time misalignment quantity between each receiver station. To form this model, the features are applied that the trajectory parameters can be expressed by a polynomial. And the matching relation between trajectory parameters is also used. The model is erected for the measurement data of a period of time such as 10 s or so. After that, modern nonlinear regression analysis is used to give the calculation method of parameter estimation and the error with it. Theoretical analysis and simulations show that the trajectory parameters and the time misalignment quantity between stations have high estimation precisions by this method. 6.5.2 Velocity Measurement Mechanism of CW Radars
In the following, we will consider the issue in the launching reference frame. The observation point is the responder onboard the missile. Denote the spacecraft’s trajectory parameters are (x(t ), y(t ), z(t ), x(t ), y (t ), (x(t ), y(t ), z(t ), x(t ), y (t ), z(t )) at the time epoch t, and the site position for the
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
389
transmitting station and receiving stations are (xT , yT , zT) and (xR , yR , zR), respectively. Doppler effect shows that the measured oscillation frequency at the observation point will change when the radio wave propagation distance between the observation point and the oscillation source changes, and the change rate of the frequency is proportional to that of the distance. In this situation, the oscillator is the ground transmitter, which sends radio waves with a fixed frequency; and the observation point is the responder onboard the missile. When the responder receives the radio waves emitted from the ground transmitter, its frequency will change. The change in frequency is called Doppler frequency shift. Denote s(t ) = RT (t ) + RR (t ) and f 1 is the fixed frequency of the transmitting station emission, f 2 (t) is the frequency received by the on-board responder, then f 2 (t ) = f 1 1 − RT (t )c −1 where c is the speed of light. The responder converts the received frequency f 2(t) into f 3(t). f 3(t) = nf 2(t), where n is a known positive constant called the conversion ration. The frequency f 3(t) is transmitted back to the ground by the responder, and the frequency received by the ground station R is f 4 (t ) = f 3 (t ) ⋅ 1 − R R (t )c −1
(6.51)
Taking into account that there is some distance between the spacecraft and the measurement station, time delay is produced from the T station to the responder with frequency f 1 and from the responder to the R station with frequency f 3. Therefore, f 3 at the time t will reach the R station at time t + (R R(t)) / c. Thus Equation 6.51 should be modified into f 4 (t +
© 2012 by Taylor & Francis Group, LLC
RR (t ) ) = f 3 (t ) ⋅ 1 − R R (t )c −1 c
(6.52)
390
M e A suReM en t DAtA M o D eLIn G
Based on the discussion above, we have 2 1 R (t ) nf 1 − f 4 t + R = nf 1 c −1 s(t ) − nf 1c −2 s(t ) 4 c
2 1 + nf 1c −2 R R (t ) − RT (t ) 4
(6.53)
Usually the T station is not far away from the R station. When the target is far from the measurement station, R R (t ) and RT (t ) will be very close. Since c 2 is a rather large number, thus 2 1 nf 1c −2 R R (t ) − RT (t ) ≈ 0 4
nf 1 − f 4 (t + RR (t )c −1 ) = nf 1c −1 s(t ) −
2 1 nf 1c −2 s(t ) 4
(6.54)
Because ( s(t )c −1 )2 and its derivatives are small, there is f4 (t ) ≈ −nf 1c −1s (t )
(6.55)
Hence, from the four expressions above, we get nf 1 − f 4 (t ) = nf 1c −1 s(t ) −
2 1 1 nf 1c −2 s(t ) + nf 1c −2 s(t )s (t ) 4 2
(6.56)
Among the three terms on the right-hand side of Equation 6.56, the latter two terms are relatively smaller than the first term. Consequently, nf 1 − f 4 (t ) ≈ nf 1c −1 s(t ) . Thus, we can obtain the rough estimation for s(t ) as s* (t ) = (nf 1 )−1 c nf 1 − f 4 (t )
(6.57)
Furthermore, from Equation 6.56 we can have the precise estimation of s(t ) is s(t ) = s* (t ) +
2 1 1 s* (t ) − s* (t )s* (t ) 4c 2c
(6.58)
Actually, real calculation result shows that the values of the latter two terms on the right-hand side of Equation 6.58 are both less than
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
3 91
0.05, which makes the most functional part as the first term. Here, s* (t ), s* (t ) are the numerical integration and numerical differentiation of s* (t ), respectively. Similarly, s(t ), s (t ) are also the numerical integration and numerical differentiation of s(t ). Now we come to the discussion of the time alignment issue. If the station R takes time t + τ as time t by mistake, then it will deal with f4 (t + τ) as f4 (t), and the amount derived from Equation 6.58 is actually s(t + τ) . Since τ is small, then s(t + τ) ≈ s(t ) + τs (t ) s(t + τ) ≈ s(t ) + τs(t ) Based on the discussion above, since the frequency of the transmitting station is constant, the observational data precision will only be affected by the time misalignment between receiving stations, while has nothing to do with the misalignment between the time of transmitting station and the standard time. 6.5.3 Mathematical Model of the Multistation Measurement Data
Now we are looking at how to establish a mathematical model for multistation measurement data, which are measured by the CW radar as the distance sums and their change rates. Only receiving stations are considered. Denote Ri is the cause for the inter-station time misalignment, Tij ( j = 1, 2, …, li) are the transmitting stations corresponding to the Ri station. Then the CW radar measures sij (t) ( j = 1, 2, …, li ; i = 1, 2, …, r), and sij (t )( j = 1, 2,…, l i ; i = 1, 2,… n). Here we have n ≥ r, sij (t) stands for the sum of the distance from Tij station to the target and the one from Ri station to the target at time t. Based on the discussion in Section 6.4.3, we can assume that a polynomial can be used to precisely approximate the trajectory parameters in a certain period of time like less than 10 s (see Equation 6.13). Thus both the distance sum S(t) and its change rate S(t ) are functions of polynomial coefficients (a 1, a 2 , …, a 12) and time t.
© 2012 by Taylor & Francis Group, LLC
392
M e A suReM en t DAtA M o D eLIn G
Denote S(t) and S(t ) are both mli -dimensional vector, si 1 (t1 ) si 1 (t1 ) si 1 (tm ) si 1 (tm ) Si (t ) = , Si (t ) = sili (t1 ) sili (t1 ) sil (tm ) i sili (tm ) and take Yi (t) and Zi (t) as the observational data S(t) and S(t ) of the station Ri, respectively. Yi (t ) = Si (t + τi ) + ei , Z (t ) = S (t + τ ) + g , j j j j
i = 1, 2,…, r
j = 1, 2,…, n
(6.59)
where ei and g j are observational errors (vectors). Note that τi is the time misalignment quantity, and τi is small, Si (t + τi ) = Si (t ) + τi Si (t ) Si (t + τi ) = Si (t ) + τi Si (t )
(6.60)
For convenience, we might assume that ei ∼ N (0, σ i2 I mli ), g j ∼ N (0, θ 2j I ml ), j COV (ei , g j ) = 0
COV (e k , el ) = 0,( k ≠ l ),
COV ( g k , g l ) = 0,( k ≠ l ),
(6.61)
Since the site positions of the Ri station and the Tij stations ( j = 1, 2, …, li, i = 1, 2, …,n) are known, and the missile trajectory can be expressed in polynomial, then the only things we need to estimate is the polynomial coefficients of the trajectory parameters and the time misalignment quantity τi (i = 1, 2, …,n) of the Ri station. Denote
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
393
σ 1−1Y1 (t ) σ 1−1e1 σ −1Y (t ) σ −1e r r r r Y = −1 , e = −1 , θ1 Z1 ( t ) θ1 g 1 −1 −1 θn Zn (t ) θn g n a1 σ1-1 [S1 (t ) + τ1S1 (t )] a12 σ -1 [S (t ) + τ S (t )] r r 1 r β = , F (β) = -1 τ1 θ1 [S1 (t ) + τ1S1 (t )] -1 θn [Sn (t ) + τnSn (t )] τn then by Equations 6.59 through Equation 6.61 and the notations above, we can get the nonlinear regression analysis model of the estimated parameters as Y = F (β) + e , e ∼ N (0, I )
(6.62)
Model 6.62 is a typical nonlinear regression analysis model. As a consequence, we have converted the problem of estimating the time misalignment quantity and the trajectory parameters into an issue of parameter estimation of a nonlinear regression model. 6.5.4 Solving Method and Error Analysis 2
Denote U (β) ≡ Y − F (β) , for model 6.62, the least-squares esti∧ mator β is usually taken as the estimation of β, that is, 2 2 Y − F (β ) = min Y − F (β) β
© 2012 by Taylor & Francis Group, LLC
(6.63)
39 4
M e A suReM en t DAtA M o D eLIn G
To solve the extreme value problem in Equation 6.63, we can use the Gauss–Newton method [8], which is expressed in the following iterative formulas to get the least-squares estimation of β. Given the initial value β( 0 ) in advance (k) Vk = ∇F (β ) β( k + 1) = β( k ) + (VkτVk )−1Vkτ Y − F (β( k ) )
(6.64)
Whether the iterative formulas in (6.64) are convergent and how is the convergence rate mainly depend on the structure of F(β) and the selection of the initial value. As for the question we are concerned, the structure of F(β) is pretty good as its nonlinearity is not too strong. And the initial value can be determined by the approach below: Since τi = (i = 1, 2, …,n) is small, then for this part of the β components, we can take 0 as the initial value. The initial values for the polynomial coefficients (a1, a2, …,a12) can be determined as following. First pick up the measurement elements of the MISTRAM system from the multistation measurements, and solve the rough estimation (x(ti ), y (ti ), z(ti ), x(ti ), y (ti ), z(ti )) (i = 1, 2,…, m) of the trajectory parameters (see Section 2.2.1), then use linear regression model x(t1 ) 1 x(tm ) 1 = x(t1 ) 0 x(tm ) 0
t1
t12
tm
tm2
1 1
2t 1
2t m
t13 a 1 3 a 2 tm +η 3t12 a3 a4 3tm2
to solve the estimation of (a1, a2, a3, a4); similarly, determine the other coefficients (a5, a6, …, a12) of the trajectory parameter polynomial. Take the polynomial coefficient estimations obtained with the above method as the initial values of the polynomial coefficient components corresponding to β.
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
395
Now we discuss the convergence of the iterative formula (6.64). The following analysis shows that for the β estimation problem based on the nonlinear regression model (6.62), as long as the initial value β(0) is given appropriately, the iterative formula (6.64) can converge to the ∧ unique minimum point β . Denote β = (β1 ,…, β12 , β12 + 1 ,…, β12 + n )τ = (a1 ,…, a12 , τ1 ,…, τn )τ Y = ( y1 , y 2 ,…, y M )τ F (β) = ( F1 (β), F2 (β),…, FM (β))τ From the knowledge of convex analysis [11], if we can verify that there exists a convex region D which consists the initial value β (0) and the true values of the estimated parameters, and U(β) is the strictly convex function in D, then U(β) must have a unique minimum value ∧ ∧ β in D, and the iterative formula (6.64) will definitely converge to β . Now we will illustrate that in the neighborhood of the true values of the estimated parameters, the second-order derivative matrix of U(β) is positive definite, thus U(β) is a strictly convex function. Apparently, M
∂F (β) ∂U (β) = 2 Fi (β) − yi i ∂β j ∂β j i =1
∑ M
∂ 2U =2 ∂β j ∂β k i =1
∑
∂Fi ∂Fi ∂ 2 Fi + ( Fi − yi ) ∂β j ∂β k ∂β j ∂β k
Note that comparing to
(6.65)
2 ∂ 2U , [Fi (β) − yi ] ∂ Fi is a small ∂β j ∂β k ∂β j ∂β k
quantity which can nearly to be neglected. In fact, ∂ 2 Fi = 0, 12 < j , k ≤ 12 + n ∂β j ∂β k
(6.66)
On the other hand, when j ≤ 12 or k ≤ 12, ∂Fi ∂Fi ∂ 2 Fi >> . ∂β j ∂β k ∂β j ∂β k
© 2012 by Taylor & Francis Group, LLC
(6.67)
396
M e A suReM en t DAtA M o D eLIn G
For simplicity, we only discuss the case of i = 1 and 1 ≤ j, k ≤ 4. F1 (β) = σ 1−1 S11 (t1 ) + τ1S11 (t1 ) Denote (x1, y1, z1) and (x2, y2, z2) as the site positions of the station R1 and T11 respectively, and R1, R 2 as RR1 (t ) and RT11 (t ). Take (x , y ,…, z ) as the trajectory parameter at time t, S11 (t ) = R1 + R2 , S 11 = R 1 + R 2 ; Rl =
( x − x l ) 2 + ( y − y l ) 2 + ( z − zl ) 2
ul = (x − xl )x + ( y − yl ) y + (z − zl )z υ l = ( y − y l ) 2 + ( z − zl ) 2 u R l = l , l = 1, 2 Rl R x + R1x 2 x3 = 2 1 R1 + R2 Suppose σ1 = 1; then ∂F1(β) = ∂β j
2
∑ l =1
∂ 2 F1 (β) = ∂β j ∂β k
1 τu τ x τ (x − xl )( j − 1)t1j − 2 − 1 31 (x − x1 )t1j −1 + 1 t1j −1 + 1 Rl Rl Rl Rl
2
∑ l =1
υ t1 + τ1 ( j + k − 2) t1j + k − 3 l3 + t1j − 1 Rl
2
∑ l =1
∂ ∂β k
1 ∂F1 (β) j + k − 2 1 + ≈ t1 + τ1 ( j − 1)t1j − 2 (x − x3 ) R1 R2 ∂β j υ2 υ2 ∂ 2 F1 ≈ t1j + k − 2 + τ1 ( j + k − 2)t1j + k − 3 13 + 23 ∂β j ∂β k R2 R1 As 0≤
© 2012 by Taylor & Francis Group, LLC
υ1 1 υ 1 ≤ , 0 ≤ 23 ≤ 3 R1 R2 R1 R2
x − xl R 3 l
397
DAtA F R o M R A DA R M e A suReM en t s
therefore, ∂F1 (β) ∂F1 (β) ∂ 2 F1 ∂β j ∂β k ∂β j ∂β k
−1
≥
R1R2 ( R1 + R2 )2 (x − x3 )2 (6.68) V1R23 + V2 R32
When R1 and R 2 are relatively large, the magnitude levels of R1, R 2 and (R1 + R 2)/2 are equivalent, and those of υ1 and υ2 are also equivalent to those of R12 , R22 , (x − x3 )2 and ((R1 + R 2)/2)2. On the righthand side of Equation 6.68, the numerator is at the order of R6 and denominator is at the order of R 5. Since R is usually over several 10,000 m, we can get Equation 6.67 by Equation 6.68. The correctness of Equation 6.67 was validated by the real measurement data as well as numerous simulation calculations. Furthermore, by model 6.62 we can know that when the value of β is near the true value of the estimated parameters, the value of |Fi (β) – yi| is not large, thus M
∂ 2U ∂Fi ∂Fi ≈ 2 ∂β j ∂β k ∂β j ∂β k i =1
∑
(6.69)
To sum up the discussions above, the entries of the second-order derivative matrix of U(β) and those of the positive-definite matrix ∇F(β)τ ∇F(β) have very close values, therefore, the second-order derivate matrix of U(β) is positive definite, and U(β) is a strictly convex function in D. The improved Gauss–Newton method will converge. Based on the real calculation experience, generally the value of β(k) gets stable after about 10 times of iterations, thus β(10) can be taken as ∧ β . Based on the corresponding conclusion of Section 3.8, we have ∧
∧
E (β− β)(β− β)τ ≈ (V τV )−1 Example 6.4 We use the trajectory equation to generate trajectory data of 10 s [5], then approximate the data by a polynomial. The approximation effect is good, as the RMS of the three directions x, y, z is less than 10− 5, and the RMS of the x , y , z directions is less than 10 − 7. Take the equation-produced trajectory data as true values to generate the data of Si(t) and Si (t ). In total, 12 receiving
© 2012 by Taylor & Francis Group, LLC
398
M e A suReM en t DAtA M o D eLIn G
stations and 4 transmitting stations are used. Random errors are added (σi = 0.02, θj = 0.002, i = 1, 2,…, 5; j = 1, 2,…, 12), and then simulate the time misalignment quantities τi (i = 1, 2,…, 12). Take the polynomial coefficients fitted to the trajectory data before adding the random errors as the true values for the trajectory coefficients. And take the simulated time misalignment τi as the true value for the misalignment. Use the iterative solution to the nonlinear regression model as the least-squares estima∧ tion β for β. It takes only 8 times of iteration before it stabilizes, ∧ V = V8 , β = β(8) . Table 6.7 presents the estimation error of β. We can see from Table 6.7 that the method in this section can actually acquire relatively high precision. We also apply the method to the calculation of real measurement data, and get the results consistent with actual situation. To be particularly noted is that, whether there exists the time misalignment or not, the application of the method above shows higher precision to estimate the trajectory parameters than both the point-wise solving method and the EMBET method.
6.5.5 Time Alignment between the Distance Sum and Its Change Rate
The distance from a CW radar to a target is achieved by measuring the time delay Δ(t) between transmitting and receiving signals, that is, S(t) = RT (t) + R R(t) = cΔ(t), where c is the speed of light. On the other hand, the CW radar measures the speed of the target using the Doppler effect produced by the moving object, see Section 6.5.1 for the principle.
Table 6.7 Estimation Error of β i
βi − βˆ i
i
βi − βˆ i
i
βi − βˆ i
1 2 3 4 5 6 7 8
0.179D-2 –0.468D-3 0.521D-4 –0.221D-5 0.130D-2 0.847D-3 –0.222D-3 0.626D-5
9 10 11 12 13 14 15 16
0.218D-2 0.312D-3 –0.206D-4 0.196D-5 0.608D-5 0.110D-4 0.789D-5 –0.730D-5
17 18 19 20 21 22 23 24
0.646D-5 –0.101D-4 0.817D-5 –0.219D-5 0.677D-5 0.338D-5 –0.351D-5 –0.608D-5
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
399
Because the measurement data of the distance and its change rate come from different measurement channels [19], they have factors of time misalignment, which cannot be ignored by the users who demand high-precision measurement data. Denote S(t ) = S (t + τ) + e(t ) S(t ) = S(t ) + g (t )
(6.70)
For the sake of time alignment, we firstly express the trajectory with equal interval spline. Take the measurement data at times t 1, t 2, …,tm into account, generally ti + 1 – ti = 0.05 second. Denote tm − t1 ,T2 = t1 , TM − 1 = tm M −3 T j = T2 + ( j − 2)h , j = 1, 2,…, M h =
Select an appropriate h, then both S(t) and S(t ) can be expressed by spline functions as S (t ) = S (t ) =
M
t − Tj h
∑ b B j =1
j
M
t − T j −1 b j B h h j =1
∑
(6.71)
Assume e = (e(t ), e(t ),…, e(t ))τ ∼ N (0, σ 2 I ) 1 2 m m τ 2 g = ( g (t1 ), g (t 2 ),…, g (tm )) ∼ N (0, θ I m ) Eeg τ = 0 and denote Y = (σ −1S(t1 ),…, σ −1S(tm ), θ −1S(t1 ),…, θ −1S(tm ))τ ξ = (σ −1e τ , θ −1 g τ )τ
© 2012 by Taylor & Francis Group, LLC
(6.72)
400
M e A suReM en t DAtA M o D eLIn G
X (t ) = (xij )2m × M , β = (b1 , b2 ,…, bM )τ −1 t i + τ − T j xij = σ B , h i = 1,…, m; j = 1,…, M t T − i j x −1 −1 , = θ ⋅ h ⋅ B i + m, j h Summarizing Equations 6.70 through 6.72, we get Y = X ( τ)β + ξ
(6.73)
Formerly we have proved that X(0) is a column full-rank matrix. When |τ| < h, it can be easily verified that X(τ) is column full rank; Suppose that τ is known, then immediately we can get the LS-estimation of β ∧
(6.74)
β = ( X ( τ)τ X ( τ))−1 X ( τ)τ Y The RMS is F ( τ) = Y − X ( τ)( X ( τ)τ X ( τ))−1 X ( τ)τ Y
2
(6.75) ∧
Obviously F(τ) is a one-variable function of τ. We can get τ by onedimensional searching method, which is to solve ∧
F ( τ) = min F ( τ)
(6.76)
τ
∧
After obtaining τ∧ , substitute it into Equation 6.74 to get β . In order to estimate the time misalignment τ precisely, generally we use the measurement data of a relatively longer period such as 60 s and take h = 5–10 s. Remark 6.2 Two kinds of time delay error estimation methods are described above. The method in Section 6.5.5 is suitable for time delay error estimation between the distance sum (or distance difference) and
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
4 01
its corresponding change rate; while the method in Sections 6.5.2 through 6.5.4 is applicable to the delay error estimation between the data from different measurement stations. 6.6 Estimation for Constant Systematic Error of CW Radars
This section considers the case that three sets of MISTRAM systems track a spacecraft simultaneously, assuming that there are constant systematic errors on distance sum and distance difference while their change rates measurements are error free. The traditional method for constant systematic error estimation is the EMBET method [2]. The following discussion starts firstly from the EMBET method, and then studies nonlinear regression analysis model application with its parameter estimation theory to estimate the trajectory parameters and the constant systematic error. We represent the trajectory of a long time with very few parameters to be estimated and establish a nonlinear regression model. By using nonlinear parameter estimation method we solve this constant systematic error estimation problem. Theoretical proof is also presented to show that the method here [13] is superior to the traditional EMBET method. 6.6.1 Mathematical Model of Measurement Data
A CW radar performs trajectory tracking by measuring the distance sum, distance difference, and their change rates. Denote oxyz as the launching reference frame, X (t ) = (x(t ), y(t ), z(t ), x(t ), y (t ), z(t ))τ , as the trajectory parameters, (xij, yij, zij)(i = 1, 2, 3; j = 1, 2, 3, 4) as the site position of each station. If not specified in the following part, X = (x , y , z, x , y , z )τ represents the trajectory parameters at time t. Take R ( t ) = ( x − x ) 2 + ( y − y ) 2 + (z − z ) 2 ij ij ij ij (x − xij )x + ( y − yij ) y + (z − zij )z Rij (t ) = Rij (t ) Suppose that three sets of MISTRAM systems are used to track the same target. Denote
© 2012 by Taylor & Francis Group, LLC
402
M e A suReM en t DAtA M o D eLIn G
S (t ) = R (t ) + R (t ), i1 i2 i Pi (t ) = Ri 2 (t ) − Ri 3 (t ), i = 1, 2, 3 Qi (t ) = Ri 2 (t ) − Ri 4 (t ),
(6.77)
Assume that the measurements of distance sum and distance difference only include constant systematic errors, while the change rates of them do not consist of any systematic errors. Then the measurement data acquired by the ith (i = 1, 2, 3) set of MISTRAM system at time t can be expressed as Si (t ) = Si (t ) + ai 1 + εi 1 (t ) Pi (t ) = Pi (t ) + ai 2 + εi 2 (t ) Q (t ) = Q (t ) + a + ε (t ) i3 i3 i i Si (t ) = Si (t ) + δ i 1 (t ) Pi (t ) = Pi (t ) + δ i 2 (t ) Q (t ) = Q (t ) + δ (t ) i i3 i
(6.78)
where aij (i, j = 1, 2, 3) are the measurement constant systematic errors. Based on the features of random errors, we can assume that {εij(t)} and {δij(t)} are white-noise time series with the mean as 0 and variance as σ ij2 and θij2 , respectively. From model 6.78 we can see that for any fixed sampling instant t, every set of the MISTRAM systems has six measurement elements, and the estimated parameters at time t are the trajectory parameters X and the constant systematic errors a = (a11, a12, a13, a21, a22, a23, a31, a32, a33)τ . There are 15 parameters totally. Denote −1 −1 −1 −1 Yt = (σ 11 S1 ,…, σ 33 Q3 , θ11 S1 ,…, θ33 Q 3 )τ , −1 −1 −1 −1 ξ t = (σ11 ε11 ,…, σ 33 ε 33 , θ11 δ 11 ,…, θ33 δ 33 )τ ,
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
403
B −1 −1 B = diag(σ11 ,…, σ 33 ), U = 0 −1 −1 −1 −1 τ F ( X (t )) = (σ 11 S1 ,…, σ 33 S3 , θ11 S1 ,…, θ33 Q3 )
With the notations above, we can obtain the mathematical model of the measurement data at time t as Yt = F ( X (t )) + Ua + ξ t , ξ t ∼ (0, I )
(6.79)
6.6.2 EMBET Method Analysis
Suppose that we have gained the nominal trajectory of the tracked spacecraft as X* (t), which is sufficiently close to the true trajectory X(t). Denote ∆X (t ) = X (t ) − X ∗ (t ) then ∆X (t ) Yt − F ( X ∗ (t )) = (∇F ( X ∗ (t ),U ) + ξt a
(6.80)
The single-point EMBET method, which uses the data of only one instant, is to estimate ΔX(t) and a using model 6.80. After getting the estimate of ΔX(t), the estimate of X(t) can be obtained by applying the formula X(t) = X*(t) + Δ X(t). If the matrix Vt = (∇F(X*(t)), U) is column full rank, then from model 6.80 we can get the least-squares estimation of ΔX(t) and a. ∆X (t ) τ −1 τ a = (Vt Vt ) Vt (Yt − F ( X ∗ (t ))) However, VtτVt is generally an ill-conditioned matrix with some of the eigenvalues very close to 0.
© 2012 by Taylor & Francis Group, LLC
404
M e A suReM en t DAtA M o D eLIn G
If the systematic error is constant during a period of time, then Y − F ( X (1)) ∇F ( X (1)) ∗ ∗ 1 = Ym − F ( X ∗ (m ))
∇F ( X ∗ (m))
∆X (1) U ξ 1 + ∆X (m) ξm U a
(6.81) By using model 6.81 the estimate of X(1), …, X(m) and a can be given, which forms the multipoint EMBET method. This method involves a large-scale linear regression model, which has 6m + q estimated parameters, where q is the number of the systematic errors parameters to be estimated, and m is the number of the sampling instant points. For convenience, we simplify model 6.81 as ∆X Y = V∗ + ξ, ξ ∼ (0, I ) a
(6.82)
The drawbacks of the EMBET method mainly lie in the several aspects below. 1. The model in Equation 6.82 is just an approximate model, from which the estimates of ΔX and a are biased; and when X *(t) deviates from X(t), the modeling error is rather large, and the estimation deviation will be pretty big. 2. The design matrix V* is seriously ill conditioned. This is primarily due to the fact the multicollinearity between ∇F(X(t)) and U. And with the increase of the number m of the sampling instant points, this multicollinearity cannot be effectively alleviated. It is easy to prove that ∆X COV = (V∗τV∗ )−1 a
(6.83)
Since the serious multicollinearity between the columns of V*, (V∗τV∗ ) has several eigenvalues very close to 0. In this case the diagonal entries of (V∗τV∗ )−1 are large, and the variances for
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
405
the trajectory parameters and systematic error estimates are very large. 3. Model (6.82) is a huge linear model with 6m + q estimated parameters. When m > 50, the number of the estimated parameters is over 300 and the computation burden is rather heavy. 6.6.3 Nonlinear Modeling Method
Based on the discussions from the previous sections, we can conclude that the trajectory parameters can be treated as the cubic continuously differentiable function of the time t. It can be expressed by a polynomial or a polynomial spline. Consider the data processing problem at times t 1, t 2, …, tm: Yi = F ( X (ti )) + Ua + ξ, i = 1, 2, ..., m
(6.84)
Usually m is big, and there are 20 sampling points in 1 s. In a case that data of 30 s are processed continuously, there are totally m = 601 sampling points. Then we need to estimate 6m + 9 = 3615 parameters from Equation 6.84, with six trajectory parameters for each instant and nine systematic error parameters. Therefore, the primary problem is to reduce the number of the estimated parameters. Based on the discussions above, there exists a set of basis functions (ψ1(t), ψ2(t), …, ψN (t)) (e.g., the polynomial basis functions or polynomial spline basis functions), such that x ( t ) = y(t ) = z(t ) =
N
∑ b ψ (t ), j
j =1
j
N
∑b j =1 N
∑ j =1
j+N
x(t ) =
N
∑ b ψ (t ) j
j =1
ψ j (t ), y (t ) =
b j + 2 N ψ j (t ), z(t ) =
j
N
∑b j =1
j+N
ψ j (t )
N
∑b j =1
j +2N
(6.85)
ψ j (t )
Thus, X (ti ) = (x(ti ) y(ti ), z(ti ), x(ti ) y (ti ), z(ti ))τ = Ai b
© 2012 by Taylor & Francis Group, LLC
(6.86)
406
M e A suReM en t DAtA M o D eLIn G
where b = (b1 , b2 ,…, b3 N )τ ψ 1 (ti ),…, ψ N (ti ) Ai = ψ 1 (ti ),…, ψ N (ti )
ψ 1 (ti ),…, ψ N (ti )
ψ 1 (ti ),…, ψ N (ti )
ψ 1 (ti ),…, ψ N (ti ) ψ 1 (ti ),…, ψ N (ti )
Obviously, Ai is a 6 × (3N) matrix. From Equations 6.84 and 6.85 there is Y F ( A b) 1 1 = + Ym F ( Am b )
U ξ 1 a + U ξm
(6.87)
Model 6.87 can be rewritten in a simpler way as (6.88)
Y = f (b ) + Wa + ξ, ξ ∼ (0, I ) where Y F ( A b) 1 1 Y = , f (b ) = Ym F ( Am b ) U ξ 1 W = ,ξ = U ξm
From model 6.88 we can estimate the parameters b and a. The commonly used method is LS estimation [8], which is to determine them by using 2 Y − f (b ) − W a = min Y − f (b ) − Wa b ,a
© 2012 by Taylor & Francis Group, LLC
2
DAtA F R o M R A DA R M e A suReM en t s
407
Denote ∧
Z∗ = (∇f (b ),W ) then by Ref. 8 and Equation 6.22, we can get ∧ ∧ b b b E ≈ , COV = (Z∗τZ∗ )−1 a∧ a a∧
(6.89)
Based on Equation 6.86, we can obtain the trajectory parameters, the systematic errors and their estimates are respectively as X (t 1 ) A1 = X (tm ) Am 0 a ∧ X (t 1 ) A1 = A m ∧ X (tm ) 0 a
0 b 0 a I
0 ∧ b 0 ∧ a I
(6.90)
In the following part, we will denote the parameters estimates with ∧ and ~ on the top to represent those obtained from the nonlinear method in this book and the EMBET method, respectively. To compare the two methods above, we assume that the nominal trajectory of EMBET is the one given in the second expression in Equation 6.90. A 1 A A = , K = 0 Am
© 2012 by Taylor & Francis Group, LLC
0 I
408
M e A suReM en t DAtA M o D eLIn G
∧ ∇F ( X (t1 )) V∗ =
∧
∇F ( X (tm ))
U U
According to the derivation rule of composite functions, we have Z∗ = V∗ K
(6.91)
From Equations 6.89 and 6.90, there is X (t 1 ) X (t 1 ) E ≈ , X (tm ) X (tm ) a a X (t 1 ) = K ( K τV∗τV∗ K )−1 K τ . COV X (tm ) a
(6.92)
In the case that the nominal trajectory is fairly close to the true one, the mean value and the variance of the systematic error and the trajectory parameters given by the EMBET method are respectively X (t 1 ) X (t 1 ) X (t 1 ) τ −1 E ∧ ≈ ≈ (V∗ V∗ ) . , COV ∧ X (tm ) X (tm ) X (tm ) ∧ a∧ ∧ a a
© 2012 by Taylor & Francis Group, LLC
(6.93)
409
DAtA F R o M R A DA R M e A suReM en t s
To compare Equations 6.92 and 6.93, some theoretical preparations are needed. Denote P = ( P1 , P2 ) where P is an orthogonal matrix, B1 B =PV VP = B21 τ
τ ∗ ∗
B12 P1τV∗τV∗ P1 = B2 P2τV∗τV∗ P1
P1τV∗τV∗ P2 P2τV∗τV∗ P2
B3 = B2 − B21 B1−1 B12 lemma 6.1 If B is a symmetric positive definite matrix, then B1, B2, B3 are all symmetric positive definite matrices, and B
−1
B1−1 + B1−1 B12 B3−1 B21 B1−1 = − B3−1 B21 B1−1
− B1−1 B12 B3−1 B3−1
Proof See Appendix 1. Theorem 6.7 Suppose that V* and K are both column full-rank matrices, and the number of the rows of K is bigger than the one of the columns, then X (t 1 ) X (t 1 ) ≤ COV COV X (tm ) X (tm ) a a
© 2012 by Taylor & Francis Group, LLC
(6.94)
410
M e A suReM en t DAtA M o D eLIn G
E a − a
2
+
m
∑ E X(t ) − X (t ) i
i =1
≤ E a − a
2
+
2
i
m
∑ E X (t ) − X (t ) i =1
i
i
2
(6.95)
Proof From Equations 6.92 and 6.93, we only need to prove that K ( K τV∗τV∗ K )−1 K τ ≤ (V∗τV∗ ) −1
(6.96)
tr K ( K τV∗τV∗ K )−1 K τ < tr(V∗τV∗ )−1
(6.97)
Since the matrix K is a column full-rank one with more rows than columns, there should exist orthogonal matrices P and Q which have the same number of rows and columns as K respectively, and a nonsingular diagonal square matrix D with the same number of columns as K, such that D K = P Q = P1 DQ 0
(6.98)
where P = (P 1, P 2), P 1 has the same number of columns as K. By Equation 6.98, we have K ( K τV∗τV∗ K )−1 K τ = P1 ( P1τV∗τV∗ P1 )−1 P1τ Hence based on Lemma 6.1, there is tr K ( K τV∗τV∗ K )−1 K τ = tr P1 ( P1τV∗τV∗ P1 )−1 P1τ = tr( P1τV∗τV∗ P1 )−1 P1τ P1 = tr( P1τV∗τV∗ P1 )−1
© 2012 by Taylor & Francis Group, LLC
(6.99)
411
DAtA F R o M R A DA R M e A suReM en t s
= tr( B1−1 ) < tr( B −1 ) = tr( P τV∗τV∗ P )−1 = tr P τ (V∗τV∗ )−1 P = tr(V∗τV∗ )−1 which is Equation 6.97. The following is to prove Equation 6.96. Note that (V∗τV∗ )−1 = P ( P τV∗V∗ P )−1 P τ therefore we only need to prove that for any nonzero real value vector y, there holds y τ P1 ( P1τV∗τV∗ P1 )−1 P1τ y ≤ y τ P ( P τV∗τV∗ P )−1 P τ y
(6.100)
Denote P1τ y z1 z = = Pτ y = τ P y z2 2 By following the notations of Lemma 6.1, to prove Equation 6.100 is reduced to proving that (6.101)
z1τ B1−1z1 ≤ z τ B −1z z is a nonzero real value vector. In fact, from Lemma 6.1 we know that τ
−1
τ 1
−1 1 1
−
1 2
−1 1 1
−
1 2
z B z = z B z + B3 B21 B z − B3 z2
2
≥ z1τ B1−1z1
from which we get Equation 6.101. Theorem 6.7 shows that as long as in matrix A, the number of columns is fewer than that of rows, that is, 3N < 6m, then the method described above is certain to lead to better results than the traditional EMBET method.
© 2012 by Taylor & Francis Group, LLC
412
M e A suReM en t DAtA M o D eLIn G
It is necessary to point out that in practical applications V∗τV∗ may have some eigenvalues very close to 0. The minimum eigenvalue could be smaller than 10− 18. Consequently, E a − a
2
+
m
∑ E X (t ) − X (t ) i
i =1
2
i
= tr(V∗τV∗ )−1
is a large quantity. However, as long as the matrix A is chosen adequately, namely, the basis function ψj is well chosen, then we can make E a − a
2
+
m
∑ E X (t ) − X (t ) i =1
i
2
i
= tr ( K V∗τV∗ K )−1 K τ K τ
fairly small. 6.6.4 Algorithm and Numerical Examples
Model 6.88 is a nonlinear regression model about the estimated parameters β = (bτ , aτ)τ . The LS estimation to get the parameter β can be Gauss–Newton method. Given the initial values b ( 0 ) and a ( 0 ) Z k = (∇f (b ( k ) ),W ) ( k + 1) b = (Z kτZ k )−1 Z kτ Y − f (b ( k ) ) − Wa ( k ) ( k + 1) a
(6.102)
The initial value b(0) can be obtained based on Equation 6.85 by fitting into the trajectory solved by the method in Section 6.6, assuming that there is no systematic error; the initial value a(0) can be taken as zero vector, and (k) ∇ F ( X ( t 1 ) ) A1 (6.103) ∇f ( b ( k ) ) = (k) ∇F ( X (tm ) ) Am Since ∇2 f is small, the convergence can be assured, and the convergence of the iteration formulas can be proved by the similar way as in Section 6.6.
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
413
In practice, the selection of the basis function ψj (t) ( j = 1, 2, …, N ) is essential. Because x(t), y(t), z(t) is a quadratic continuous differentiable function of the time t, we can use spline function to approximate the trajectory parameters very well during the stable flight phase of the spacecraft. As a result, we can choose cubic standard B-spline as the basis functions. The concrete steps are as follows: T2 = t1 , TN − 1 = tm , h = (TN − 1 − T2 )/( N − 3) T j = T2 + ( j − 2)h ,
j = 1, 2,…, N
t − Tj 1 t − Tj ψ j (t ) = B , ψ j (t ) = B h h h The experiences from real calculations show that when taking h = 5 s, the results are pretty good in the sense of both keeping the approximation precision and assuring small number of estimated parameters. Example 6.5 We use the trajectory equation to generate 601 measurement data points lasting 30 s (from the 51-st second to the 81-st second),
(x(ti ), y (ti ), z(ti ).x(ti ), y (ti ), z(ti ))τ . Take h = 5 s, N = 9. Use the measurement data above to fit into the spline coefficients (b1, b2, …, b27)τ , which are taken as the true values of the trajectory parameters spline coefficients. The trajectory parameters are x ( t ) = y(t ) = z(t ) =
9
t − Tj 1 b j B3 , x(t ) = 5 5 j =1
∑ 9
t − Tj b j + 9 B3 , 5 j =1
∑ 9
y (t ) =
9
j
j =1
1 5
t − Tj 1 b j + 18 B3 , z(t ) = 5 5 j =1
∑
t − Tj 5
∑ b B 3
9
∑b j =1
j +9
t − Tj B 3 5
9
∑b j =1
j + 18
t − Tj B 3 5
51 ≤ t ≤ 81, T2 = 51, TN −1 = 81, h = 5, N = 9 T j = T2 + ( j − 2)h, j = 1, 2,… , 9
© 2012 by Taylor & Francis Group, LLC
414
M e A suReM en t DAtA M o D eLIn G
After the true trajectory parameters are generated, we then produce the measurement elements data of three MISTRAM systems based on the site positions and Equation 6.78; random errors and systematic errors are added into the distance sum and distance difference data; and only random errors are added into the change rates of them. The standard deviations of the random errors are [2] Distance sum 0.12, change rate of distance sum 0.006, Distance difference 0.009, change rate of distance difference 0.0006.
At each moment, there are nine measurement data of the distance sum or difference, and nine data of their change rates, thus 18 measurement elements for every sampling instant. In real measurement, some of the data of the distance sum or difference cannot be used. Therefore, we also abandon some of them in simulated calculations. We use the measurement elements of only two distance sums, two differences and all the nine change rates. As a consequence, there are four estimated constant systematic errors. In real calculation, it takes only three times of iteration to converge to the estimated values. For the case in this example, either the single-point or the multipoint EMBET method cannot lead to satisfying results. From Tables 6.8 and 6.9 we can see that good results are obtained by using the method proposed in this chapter. The estimation reaches satisfying precision for both the systematic errors and the trajectory parameters. Since the method above can give the estimate of the spline coefficients b, it can present the trajectory parameters precisely. Take the x-direction as an example. 9 t − Tj ∧ x∧(t ) − x(t ) = (b j − b j )B3 h j =1 9 ∧ t − Tj ∧ 1 x ( t ) − x ( t ) = (b j − b j )B 3 h h j =1
∑
(6.104)
∑
From the property of the standard B-spline, we know that ∧ ∧ when b j − b j ( j = 1, 2,…, 9) is small, x∧(t ) − x(t ), x − x(t ) are also small. This is due to the fact that 1. t − Tj t − Tj 2 5 ≤ , h = 5; ≤ , B 3 B3 h 3 h 8
© 2012 by Taylor & Francis Group, LLC
415
DAtA F R o M R A DA R M e A suReM en t s
Table 6.8 The Trajectory Parameters spline Coefficients and Their Estimation Errors i
bi
bˆi − bi
bi +9
bˆi + 9 − bi + 9
1 2 3 4 5 6 7 8 9
–10 –21 –15 –13 –9 73 200 400 660
–0.83D-03 –0.844D-05 0.276D-03 0.104D-03 0.321D-04 0.224D-04 0.151D-03 –0.156D-04 0.902D-03
8780 10580 12690 15710 17980 21040 25610 29650 35170
–0.821D-02 0.147D-02 0.215D-03 –0.102D-02 –0.702D-03 0.489D-03 0.354D-03 0.326D-04 –0.103D-02
bˆi +18 − bi +18
bi +18 30 45 62 80 100 120 145 170 200
0.237D-02 0.102D-02 0.265D-03 0.299D-03 –0.371D-03 0.401D-03 0.819D-04 –0.109D-03 0.908D-04
Table 6.9 The Constant systematic Errors aS1
2.1
aˆS 1 − aS 1
0.430D-2
aP1
4.1
aˆP 1 − aP 1
0.333D-3
aQ1
3.1
aˆQ 1 − aQ 1
0.533D-3
aS2
2.1
aˆS 2 − aS 2
0.429D-3
2. When t is fixed, the two summation expressions in Equation 6.104 actually involve the summation of only three terms.
6.6.5 Conclusions
To estimate constant systematic errors is a very challenging task. Most of the previous work was focusing on the nominal trajectory to make it more precise, or emphasizing on the use of prior information of systematic errors. The effects of these practices are not ideal. In the aspect of reducing the multicollinearity of the matrix V*, the EMBET method was often applied. The way to avoid the computational difficulties was to extract small amount of sampling points out of a long time period. But this has little effect on solving the ill-conditioned problem. In Section 6.6.2, we have pointed out that for the EMBET method there are three main disadvantages, which can all be overcome by the method in this section. 1. This method uses the nonlinear model with fewer parameters instead of the nominal trajectory. It does not require the
© 2012 by Taylor & Francis Group, LLC
416
M e A suReM en t DAtA M o D eLIn G
iteration initial values to be very close to the true values. And the convergence speed is very fast, ranging from three to four times of iteration. This technique can avoid the modeling error introduced by the nominal trajectory inaccuracy. 2. The number of the estimated parameters is small. In Example 6.5, there are 3610 parameters to be estimated using the EMBET method, while the method in this section only needs to estimate 31 parameters. The calculation burden is greatly reduced. 3. The method has fundamentally solved the ill-conditioned problem. Due to fewer parameters, the matrix Z∗τZ∗ (not like V∗τV∗) is a low-order matrix and not an ill-conditioned one. As for the case in Example 6.5, Z∗τZ∗ is a 31-order matrix, while τ τ V∗τV∗ is a 3010-order matrix. Note that V∗τV∗ and B = P V∗ V∗ P ττ ττ ττ Z∗∗Z Z∗∗ == BB11 == PP11VV∗∗VV∗∗PP11 . have the same eigenvalues, while Z This shows that Z∗τZ∗ is only a principal minor of P τV∗τV∗ P . The ill-conditioned degree of a certain lower-order principal minor of a higher-order matrix is definitely less than that of the whole matrix. Generally speaking, the lower the order of the principal minor is, the less the ill-conditioned degree is. In the processing of Example 6.5, Z∗τZ∗ is just a 31-order principal minor of the 3010-order matrix P τV∗τV∗ P . To be emphasized specifically, the selection of basis functions is very important. We must ensure that the modeling error is small, and at the same time the ill-conditioned degree of the matrix Z∗τZ∗ is reduced. 6.7 Systematic Error Estimation for the Free Flight Phase
The previous sections introduced the method of using polynomial or polynomial spline to describe the trajectory parameters during a period of time. The applications of this method in the data processing for the boost phase have shown great success in the aspects of trajectory solution, the time misalignment quantity estimation and the constant systematic errors estimation. In essence, the approach to express a trajectory of a period with polynomial or polynomial spline is actually to establish a nonlinear parameter estimation model with fewer estimated parameters, less illconditioned degree and smaller modeling errors. This method can also be applied to the measurement data processing of the free flight
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
417
phase in principle. Computing experience has shown that due to the relatively small acceleration x, y , z in the free flight phase, the change in x , y , z is not so big; and in particular, because of the increase of the distance from the aircraft to the station, the change of R, R is not so apparent as that of the boost phase. Consequently, the ill-conditioned problem cannot be resolved as in the boost phase. Further work is needed for solving the ill-conditioned problem of the parameter estimation model in the free flight phase data processing. The following part of resolving the ill-conditioned issue is still based on the idea of reducing the number of estimated parameters. Fortunately, for the data processing in the free flight phase, we can establish a better parameter estimation model. This is because we can take advantage of the trajectory equations of the free flight phase, since the free-flying spacecraft is only affected several accurately expressed forces such as gravity, Coriolis force, centrifugal force, and so on. 6.7.1 Trajectory Equations in the Free Flight Phase
We still consider the issue in the launching reference frame. We have the equations in this frame. (For the derivation of the equations, see [5].) Denote R0 = 6371110 m (the radius of the Earth); f M = 3.98620 × 1014 m 3 /s 2 ; ω = 7.29211 × 10− 5 rad/s; μ = Jf M a2 = 0.262977 × 1026 m5/s2 (oblateness constant); a, b are the long and short semi-axes of the Earth ellipsoid, respectively; a = 6378140 m; e is the first eccentricity, H0 is the geodetic height of the launching point; A 0 is the launching azimuth, B 0 is the latitude of the launching point a−b 1 a2 − b2 = , e2 = a 298.257 a2 ϕ 0 = tg −1 (1 − e 2 )tgB0 , µ 0 = B0 − ϕ 0
© 2012 by Taylor & Francis Group, LLC
418
M e A suReM en t DAtA M o D eLIn G
R − R0 sin µ 0 cos A0 υx R0 cos µ 0 Rυy = , R sin µ sin A 0 0 Rυz 0 ω cos B cos A x 0 0 ω = ω B sin 0 y − cos B0 sin A0 ω z r =
r x + R x υx r = y + R ; υy y r z + R υz z
(x + Rυx ) 2 + ( y + Rυy )2 + (z + Rυz ) 2
(x + Rυx )ω x + ( y + Rυy )ω y + (z + Rυz )ω z rω f Ja 2 g r = − M2 1 + 2 (1 − 5 sin 2 ϕ ) r r
sin ϕ =
gω = −
2 fM 2 2µ Ja sin ϕ = − sin ϕ r4 r4
The components form of the implicated acceleration in the launching reference frame is a 2 2 ex ωx − ω a = ω ω ey x y a ω x ωz ez
ωx ω y
ω 2y − ω 2 ω y ωz
ω x ω z x + Rυx ω y ω z y + Rυy ω z2 − ω 2 z + R υz
The components form of the Coriolis acceleration in the launching reference frame is a kz 0 a = 2ω z ky − 2 ω a y kx
© 2012 by Taylor & Francis Group, LLC
−2ω z 0
2ω x
2ω y x −2ω x y 0 z
DAtA F R o M R A DA R M e A suReM en t s
419
Thus we get the trajectory equations in the free flight phase as [16] x x d dt y = y z z x + R ω x υx x g g d y = r y + Rυy + ω ω y − ω dt r z z + Rυz ω z
a ex a − ey aez
a kz a ky akz
(6.105)
Take X = (x , y , z, x , y , z,)τ , then Equation 6.105 can be rewritten in the following concise form: dX (t ) = F (t , X (t )) dt
(6.106)
If the initial value of the trajectory is known as X(t 1) = η, then the trajectory parameters at time ti (i = 2, 3, …, m) can be completely determined by the trajectory equation. The accuracy of the trajectory equation is the most concerned issue when applying the equation to the measurement data processing. The error sources of the trajectory equation mainly come from: 1. Ignoring the impact of the atmosphere, which may bring some error. 2. Physical constant error. The previously mentioned trajectory equation requires many constants, which may have errors. 3. Gravity anomaly error. The error of gravity acceleration is usually in the range of ±0.02 m/s2. It is because of these errors that we should make efforts to get the equation as accurate as possible when we use it to process the measurement data. A lot of simulation calculations show that it is very convenient to solve it with numerical method by taking the step interval h = 0.005 ~ 0.05 with high precision.
© 2012 by Taylor & Francis Group, LLC
420
M e A suReM en t DAtA M o D eLIn G
The following is to solve the initial value problem: dX (t ) dt = F (t , X ) X (t1 ) = a
(6.107)
Fix h, and use the Runge–Kutta formula 1 X i + 1 = X i + ( K 1 + 2K 2 + 2K 3 + K 4 ) 6 K 1 = hF (ti , X i ) 1 h K 2 = hF ti + , X i + K 1 2 2 h 1 K 3 = hF ti + , X i + K 2 2 2 K 4 = hF (ti + h , X i + K 3 ) We have performed this solution with computer programs. Comparing the results of fixing h = 0.0001 and h = 0.05 will get very close values, which shows that this method has high precision. This can actually be proved in theory too. We only need to use the expression of F(t, X(t)) in Equation 6.106 and some of the known theoretical conclusions [12]. For any given initial value a, the values of X(t2), …, X(tm) can be uniquely determined from the initial value problem Equation 6.107. In other words, after the trajectory equation is known, the trajectory parameters at each moment are the known implicit functions of a. X(ti) = Xi(a), and Xi(a) is a function of a with known expression, i = 1, 2, …, m. 6.7.2 Nonlinear Model of the Measurement Data
Suppose that at time ti we have measured M distance sums (or differences), N change rates of the distance sums (or differences); assume that there are constant systematic errors on the measured distance sums (or differences), and no systematic errors on their change rates. Since the distance sums, differences and their corresponding change rates at time ti are all functions of X (ti), therefore they can all be expressed as the functions of the initial value a via the trajectory equation. Based on the discussion above, the measurement data at time ti can be represented as
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
Y1 (ti ) = f 1 (ti , a ) + b1 + ε1 (ti ) YM (ti ) = f M (ti , a ) + bM + ε1 (ti ) P1 (ti ) = f 1 (ti , a ) + δ 1 (ti ) PN (ti ) = fN (ti , a ) + δ N (ti )
4 21
(6.108)
where b1, b2, …, bM are constant systematic errors; f i (ti, a) is the known function of a and the expression can be completely determined after ti is given. ∞ ∞ are zero-mean normal Assume that {εi (t j )} j =1 and {δi (t j )} j =1 2 white noise series with the variance as σ i and θi2 respectively. σ 1−1Y1 (t1 ) σ −1Y (t ) 1 1 m −1 σ M YM (t1 ) σ −M1 YM (tm ) Y = −1 θ1 P1 (t1 ) −1 θ1 P1 (tm ) −1 θ N PN (t1 ) θ −N1PN (tm )
© 2012 by Taylor & Francis Group, LLC
,
σ1−1 f 1 (t1 , a ) σ −1 f (t , a ) 1 1 m −1 σ M f M (t1 , a ) σ −M1 f M (tm , a ) f (a ) = −1 θ1 f1 (t1 , a ) −1 θ1 f 1 ( t m , a ) −1 θ N fN (t1 , a ) θ −N1 fN (tm , a )
422
M e A suReM en t DAtA M o D eLIn G
σ 1−1 σ −1 1 Z = 0 0 0 0
σ1−1ε1 (t1 ) −1 σ ε (t ) 1 1 m −1 σ −M1 σ M ε M (t1 ) σ −M1 ε M (tm ) σ −M1 , ξ = −1 0 θ1 δ 1 ( t 1 ) −1 0 θ1 δ 1 ( t m ) θ −N1δ N (t1 ) 0 1 − 0 θ N δ N (tm )
With the notations above, we can obtain the following model with the measurement data of m moments: Y = f (a ) + Zb + ξ
(6.109)
where Y is a (M + N)m-dimensional vector, f (a) is a (M + N)mdimensional vector function of a, Z is a known (M + N)m × Mdimensional matrix, ξ ~ (0, I). Now we discuss the estimation of parameters a, b from the model (6.109). The estimate of b is the estimated value of the constant systematic error. And after the estimate of a as a∧ is given, from the initial value problem ∧ d X (t ) = F (t , X∧ (t )) dt ∧ ∧ X (t1 ) = a
(6.110)
we can obtain the estimated trajectory parameters values ∧ X (ti )(i = 1, 2,…, m) at time ti (i = 1, 2,…, m).
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
423
6.7.3 Parameter Estimation Method
The usual approach to estimate the parameters a and b from the model ∧ (6.109) is least-squares method, that is, to solve a∧, b such that 2 Y − f (a) − Zb = min Y − f (a ) − Zb
(6.111)
2
a ,b
To solve the extreme value problem (6.111), we can use directly the Gauss–Newton method just like in the last sections. We now introduce a more efficient algorithm. Denote H = Z (Z τZ )−1 Z τ , Y = ( I − H )Y G (a ) = ( I − H ) f (a ), η = ( I − H )ξ Note that ( I − H )Zb = Z − Z (Z τZ )−1 Z τZ b = 0 from Equation 6.109 we can get Y = G (a ) + η
(6.112)
The theorem below indicates a simple approach to solve the extreme value problem (6.111). Theorem 6.8 ∧
If a∧, b are the solutions to the extreme value problem (6.111), then Y − G (a)
2
= min Y − G (a )
2
(6.113)
a
∧
(6.114)
b = (Z τZ )−1 Z τ (Y − f (a∧)) Proof First, we prove that ( I − H )(Y − f (a))
© 2012 by Taylor & Francis Group, LLC
2
= Y − f (a) − Zb
2
(6.115)
424
M e A suReM en t DAtA M o D eLIn G
Define b ∗ = (Z τZ )−1 Z τ (Y − f (a∧)) ∧
then based on the definition of a∧, b in Equation 6.111 we know that Y − f (a) − Zb ∗
2
≥ min Y − f (a ) − Zb
2
(6.116)
2
(6.117)
a ,b
2 = Y − f (a) − Zb
While by the definition of b * we get 2 Y − f (a) − Zb
≥ min Y − f (a ) − Zb b
= Y − f (a) − Zb *
2
Synthesize Equations 6.116 and 6.117 we get to Equation 6.115. Now let us prove Equation 6.113. Denote that a is the solution to the following extreme value problem: 2 2 Y − G (a ) = min Y − G (a ) a
(6.118)
then from Equation 6.114 we can get 2 2 Y − G (a ) ≤ Y − G (a )
Define b = (Z τZ )−1 Z τ (Y − f (a )) , then
© 2012 by Taylor & Francis Group, LLC
(6.119)
DAtA F R o M R A DA R M e A suReM en t s
425
2 Y − G (a )
= ( I − H )Y − ( I − H ) f (a ) = Y − f (a ) − Zb
2
2
≥ min Y − f (a ) − Zb
2
(6.120)
a ,b
= Y − f (a ) − Zb
2
2 = Y − G (a ) .
By the two expressions above we come to 2 2 Y − G (a ) = Y − G (a)
which is Equation 6.113. By the definition of b ∗ , b , Equation 6.114 holds. Theorem 6.8 demonstrates that instead of solving the extreme value problem (6.111), a∧ can be acquired by solving the extreme value prob∧ lem Equation 6.113, and b can be obtained by Equation 6.114. And to solve the problem of Equation 6.113 is easier than to solve (6.111). In order to solve the extreme value problem (6.113), we can also use Gauss–Newton method. Numerical derivatives can be used to get ∇G(a), ∇G ( a ) =
1 (G (a + he1 ) − G (a ),…,G (a + he6 ) − G (a )) h
where G(a) is a vector value function, and ei is the unit coordinate vector. The iteration formulas are Given the initial value a ( 0 ) , ( k + 1) −1 = a ( k ) + ∇G (a ( k ) )τ ∇G (a ( k ) ) ∇G (a ( k ) )τ (Y − G (a ( k ) )) a
© 2012 by Taylor & Francis Group, LLC
426
M e A suReM en t DAtA M o D eLIn G
6.7.4 Numerical Example and Analysis
Run simulation to generate the trajectory parameters and the measurement elements of a set of MISTRAM system for 50 s. In the three directions of S, P, Q add the constant systematic errors of 4 (m), −5 (m) and −3 (m). The velocity measurements have no systematic error. By employing the trajectory equation the estimations are 3.738 (m), –4.997 (m), –3.006 (m). The result is satisfactory. For the data processing problem described in model 6.108, we need not only do theoretical analysis but also simulations to know the specific variety of mathematical precision of the results. We use the processed real data as the true values for the simulation. Experience has shown that by using the measurement data from a single station with 50 s or more, we can get good results for estimating constant systematic errors by utilizing the trajectory equations. For the case of two MISTRAM systems, 30 s of data are enough to achieve good results. 6.8 Estimation of Slow Drift Error in Range Rate Measurement
In the CW radar tracking there exists in the range rate measurement a kind of slow drift error, which seriously affects the precision of observational data. In this section, we build the nonlinear model by use of spline function, which is used to estimate the drift error and trajectory parameters [22]. Furthermore, we give a selection criterion of spline nodes, which is applied in the spline representations of both the trajectory and the drift error. Finally, we present the estimate of the drift error and its estimation precision. We apply the method to the data processing of practical launch vehicle tracking and obtain satisfactory results, which are in good agreement with the engineering analysis. 6.8.1 Mathematical Model of Measurement Data
We consider the trajectory tracking problem in the launching reference frame. Denote X (t ) = (x(t ), y(t ), z(t ), x(t ), y (t ), z(t ))τ as the trajectory parameters of the spacecraft in this frame at time t. Use a
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
427
MISTRAM system to perform the tracking, and at time ti (i = 1, 2, …, m), we acquire the measurement data as follows: S(ti ) = S (ti ) + ε1 (ti ) P (ti ) = P (ti ) + ε 2 (ti ) Q (t ) = Q(t ) + ε (t ) i 3 i i S (ti ) = S (ti ) + ∆ S (ti ) + ε 4 (ti ) P (ti ) = P (ti ) + ∆ P (ti ) + ε 5 (ti ) Q (t ) = Q (t ) + ∆ (t ) + ε (t ) i Q i 6 i i
(6.121)
where ∆S (t), ∆P (t), ∆Q (t) are the slow drift errors of the three range rates tracking data respectively. ε is the random error of the measurement. Assume that ε j (ti ) ∼ N (0, σ 2j ), i = 1, 2…, m, j = 1, 2,…, 6 (6.122) 2 2 E ( t ) ( t ) = 0 , when ( i − l ) ( j q ) + − ≠ 0 ε ε j i q l Usually ti+1 – ti = 0.05, we can assume that t 1 = 0, tm = k. Take N = k/h + 3. Supposing 0 = t 1 ≤ t ≤ tm = k, the trajectory parameters in the smooth flight phase can be expressed by spline functions as [22] u(t ) = au 1 + au 2 t + au 3t 2 + au 4 t 3 N + auj (t − T j )3+ , k=5 u = x, y, z 2 u(t ) = au 2 + 2au 3t + 3au 4 t N + 3auj (t − T j )+2 , k=5
∑
(6.123)
∑
Similarly, the drift error can also be represented by spline functions as
© 2012 by Taylor & Francis Group, LLC
428
M e A suReM en t DAtA M o D eLIn G
∆V (t ) = βV 1 + βV 2 t + βV 3t + βV 4 t + 2
3
N
∑β j =5
Vj
(t − TVj )3+ ,V = S , P ,Q (6.124)
Given the spline nodes, the trajectory parameters and the drift errors are determined once the coefficients of the spline in Equations 6.123 and 6.124 are known. Once ti is fixed, σ1−1S (ti ), σ −21P (ti ),…, σ 6−1Q (ti ), are all nonlinear functions with known expressions of the spline coefficients (ax 1 ,…axN , a y 1 ,…, a yN , az 1 ,…, azN ). Consider m moments jointly, denoted as vector values functions F 1(a),…, F6(a), and take Y1 = σ −1 (S(t1 ),…, S(tm ))τ …, Y6 = σ −6 1 (Q (t1 ),…,Q (tm ))τ e j = σ −j 1 ( ε j (t1 ),…, ε j (tm ))τ ,
j = 1, 2,…, 6
Y1 F1 (a ) Y2 F2 (a ) Y3 F3 (a ) = + Y4 F4 (a ) Y F ( a ) 5 5 Y F ( a ) 6 6
e1 e 2 e3 e4 e 5 e 6
0 0 0 + WS βS W β P P W β Q Q
(6.125)
where 1 1 WV = 1
© 2012 by Taylor & Francis Group, LLC
t1
t12
t13
(t1 − TV 5 )3+
t2
t 22
t 23
(t 2 − TV 5 )3+
tm
tm2
tm3
(tm − TV 5 )3+
(t1 − TVNV )3+ (t 2 − TVNV )3+ (tm − TVNV )3+
DAtA F R o M R A DA R M e A suReM en t s
429
βV = (βV 1 , βV 2 ,..., βVNV )τ , V = S , P ,Q The model in Equation 6.125 can also be rewritten as Y = F (a ) + W β + e , e ∼ N (0, I )
(6.126)
By model 6.126 we can get the estimations for the trajectory spline coefficients a and the slow drift spline coefficients β. The key is how to reduce the spline nodes to achieve the highest possible estimation precision. 6.8.2 Selection of the Spline Nodes
We classify the selection of spline nodes as a variable selection issue in linear regression models. And then by probing into the variable selection criterion, we give the criterion of spline nodes selection. Suppose that t 1 = 0, tm = k, where k/24 is an integer. Based on the discussions in [25], during the interval of [t 1, tm], S(t), P(t), Q(t), x(t), y(t), z(t) can be expressed with isometric-node spline functions of the same partition. When the truncation error of x(t ), y(t ),…, z(t ) expressed by spline is small, the truncation error of the expressed S (t ), P (t ),…,Q (t ) is also small. Based on the characteristics of the trajectory and the drift error, in the interval of [t 1, tm], S (t ), S(t ) and ΔS (t) can all be represented by isometric-node splines as S (t ) = aS 1 + aS 2 t + aS 3t + aS 4 t + 2
S(t ) = aS 2 + 2aS 3t + 3aS 4 t 2 +
3
∑a j =5
k+3
∑ 3a j =5
∆ S ( t ) = βS 1 + βS 2 t + βS 3 t 2 + βS 4 t 3 +
© 2012 by Taylor & Francis Group, LLC
k+3
Sj
Sj
(t − j + 4)3+
(t − j + 4)+2
k+3
∑β j =5
Sj
(t − j + 4)3+
(6.127)
(6.128)
(6.129)
430
M e A suReM en t DAtA M o D eLIn G
According to the features of the trajectory and the drift error, the truncation error can be neglected when the node interval is taken as 1 s or less. Denote a = (aS 1 , aS 2 ,…, aS ,k + 3 )τ , β = (βS 1 , βS 2 ,…, βS ,k + 3 )τ 1 X1 = 1 0 X2 = 0
t1
t12
t13
(t1 − 1)3+
tm
t
2 m
3 m
(tm − 1)
1
2t 1
3t12
2t m
2 m
1
t
3t
(t1 − k + 1)3+ 3 (tm − k + 1)+
3 +
3(t1 − 1)+2
3(tm − 1)
2 +
3(t1 − k + 1)+2 3(tm − k + 1)+2
We can establish the linear regression model based on the measurement data. Y1 σ 1−1 X 1 = −1 Y4 σ 4 X 2
a e1 + σ 4−1 X 1 β e 2 0
The model can be further simplified as Y = Xa + Zβ + e , e ∼ N (0, I )
(6.130)
Note that ti+1 – t 1 = 0.05, m = 20k + 1, then it is easy to verify that (X, Z) is a column full-rank matrix. Thus, we get the LS estimation of the spline coefficients by a X τ X = τ β Z X
X τZ Z τZ
−1
X τY Z τY
(6.131)
Actually, the estimation of the trajectory parameters and the drift errors can be reduced to the estimation of Xa and Zβ. By Equation 6.131 and the method in Chapter 3, we can prove that ∧
a∧ = a + ( X Tτ X T )−1 X Tτ e , β = β + (ZTτ ZT )−1 ZTτ e
© 2012 by Taylor & Francis Group, LLC
(6.132)
4 31
DAtA F R o M R A DA R M e A suReM en t s
where X T = X − Z (Z τZ )−1 Z τ X , ZT = Z − X ( X τ X )−1 X τZ Hence, we have the following conclusion. lemma 6.2 Suppose e ~ N(0, I), then E ( Xa − Xa
2
− Zβ ) + Zβ 2
= tr X τ X ( X Tτ X T )−1 + Z τZ (ZTτ ZT )−1
(6.133)
If there are indeed some unnecessary spline nodes, then we consider the subsets N and L of the set {1, 2, …, k - 1}. Take the basis numbers of the sets N and L are n and l respectively, then Y = X P aP + Z P β P + e , e ∼ N (0, I )
(6.134)
where XP, ZP are matrices composed by part of the columns of matrices X, Z. The form of XP is similar to X, except for the number of the corresponding spline inner nodes is N; and it is the same with the relationship between ZP and Z. In the same way, aP, βP are the vectors forming by the corresponding components in α, β. If the spline function with fewer inner nodes can represent the true signal and the drift error, then X P aP = Xa, Z P β P = Zβ
(6.135)
Denote X PT = X P − Z P (Z Pτ Z P )−1 Z Pτ X P , Z PT = Z P − X P ( X P X P )−1 X Pτ Z P , aP X Pτ X P = τ β P Z P X P
© 2012 by Taylor & Francis Group, LLC
X ZP Z Pτ Z P τ P
−1
Z Pτ Y Z Pτ Y
(6.136)
432
M e A suReM en t DAtA M o D eLIn G
From Equation 6.134 and Lemma 6.2, we have E( X P aP − Xa
2
2
+ Z P β P − Zβ )
τ τ X PT )−1 + Z Pτ Z P (Z PT Z PT )−1 = tr X Pτ X P ( X PT
(6.137)
The question is: which expression is better? Theorem 6.9 Suppose that Equation 6.134 holds, then the following facts are true simultaneously: E X P aP − Xa 2 < E Xa − Xa 2 2 − Zβ 2 E Z P β P − Zβ < E Zβ
(6.138) (6.139)
Proof The two expressions above are similar. We will only verify Equation 6.138. As a matter of fact, from Lemma 6.2, we only need to prove that τ tr X Pτ X P ( X PT X PT )−1 < tr X τ X ( X Tτ X T )−1
Denote X = ( X P , X R ), H P = Z P (Z Pτ Z P )−1 Z Pτ X PT = ( I − H P ) X P , X RT = ( I − H P ) X R τ τ X RR = X RT − X PT ( X PT X PT )−1 X PT X PT τ D = X RR X RR
© 2012 by Taylor & Francis Group, LLC
(6.140)
DAtA F R o M R A DA R M e A suReM en t s
A11 A = A 21
A12 ∆ X Pτ X P = A 22 X Rτ X P
A11 B = A 21
τ X PT A12 ∆ X PT = τ A 22 X RT X PT
433
X Pτ X R X Rτ X R τ X PT X RT τ X RT X RT
−1 Proving Equation 6.138 is equivalent to proving that tr( A11 B11 ) < tr( AB −1 ) −1 tr( A11 B ) < tr( AB ) . From Appendix 1 we have −1 11
B
−1
−1 − B11 B12 D −1 D −1
−1 −1 −1 B11 + B11 B12 D −1 B21 B11 = −1 − D −1 B21 B11
−1 −1 τ −1 tr( AB −1 ) = tr( A11 B11 ) + tr( A11 B11 B12 D −1 B12 B11 ) −1 + tr( A 22 D −1 ) − 2tr( A 21 B11 B12 D −1 ) −
1
−
1
Denote U 1 = X P B −1B12 D 2 ,U 2 = X R D 2 , then −1 tr( AB −1 ) = tr( A11 B11 ) + tr (U 1 − U 2 )(U 1 − U 2 )τ
(6.141)
From Equation 6.141 we can instantly get Equation 6.140. From the two lemmas above we can see that as long as the truncation error of the spline expressed S (t ), S(t ), ∆ S (t ) can be neglected, then the fewer the spline nodes are, the better the result is. Now we come to the question how to ensure that the truncation error is small enough. Theorem 6.10 Suppose that the complete model (6.130) and the selected model (6.134) are both correct, the 2(k + 3)-dimensional vector (aτ , β τ )τ and the n + l -dimensional vector (aPτ , β τP )τ are the LS estimates of the parameters 2 in the two models above. Take G = ( Y − Xa − Zβ )/(m − 2k − 6), then
© 2012 by Taylor & Francis Group, LLC
434
M e A suReM en t DAtA M o D eLIn G
Xa + Zβ − X P aP − Z P β P
2
( 2k + 6 − n − l )G
∼ F ( 2k + 6 − n − l , m − 2k − 6) (6.142)
Remark 6.3 From Theorem 6.10, for a given level of significance α, when Xa +Z β − X P aP − Z P β P
2
< FαG ( 2k + 6 − n − l )
(6.143)
holds, then model 6.134 is correct. Since m – 2k – 6 is large, usually we take α = 0.05, Fα = 1.6. The function of Equation 6.142 is that it forms a judgment criterion for the correctness of Equation 6.134 under the premise that the model in Equation 6.130 holds. criterion 6.1 In the premise that model 6.130 is correct, the optimal selected model (6.134) can be determined by the following extreme value problem τ τ tr X Pτ X P ( X PT X PT )−1 + Z Pτ Z P (Z PT Z PT )−1 = min (6.144) Xa +Zβ − X P aP − Z P β P ≤ FαG ( 2k + 6 − n − l )
To solve the extreme value problem (6.144) directly is rather difficult. In the following part, we will simplify the problem into several steps based on the engineering background.
Step 1: Solve a , β, Χ a , Z β,G . Step 2: Fix ZP = Z, and give a positive integer h, T j = ( j − 4 )h ,
j = 5, 6,…,
k −1 h
Then XP should be composed from part of the columns of X. According to the characteristics of the trajectory, we only need to compare the three cases that h = 4, 6, 8 seconds under Criterion 6.1.
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
435
Step 3: Fix XP with the number of columns as n = k + 3. Select ZP h from the following complete model: Y = X P aP + Zβ + e , e ∼ N (0, I )
(6.145)
Denote aP X Pτ X P = τ β Z X P
X Pτ Z Z τZ
−1
X Pτ Y Z τY
Y = X P aP + Z β ZP can be selected from Equation 6.145 based on the criterion below τ τ tr X Pτ X P ( X PT X PT )−1 + Z Pτ Z P (Z PT Z PT )−1 = min (6.146) 2 Y − X P aP − Z P β P ≤ FαG ( 2k + 6 − n − l )
where l is the columns number of ZP. The selection of ZP can be based on stepwise regression method. Define a n + (k + 3) + 1-order square matrix X τ X P P E = ZP X P τ Y XP
X Pτ Z Z τZ Y τZ
X Pτ Y Z τY Y τY
Step 4: Determine ZP by using the method in [14]. Define Si as the scanning operation which takes the ith diagonal entry of the operated matrix as the pivot. Follow the steps to extract several columns from the matrix Z to form ZP. Z has k + 3 columns and ZP has l columns. 1. 2.
© 2012 by Taylor & Francis Group, LLC
{1, 2,…, k + 3} → L, C = (Cij ) =
∏S i ∈L
i +n
S1 …Sn E
436
3.
M e A suReM en t DAtA M o D eLIn G
Tj =
Cn + j ,n + k + 4
GCn + j ,n + j
2
,
j ∈ L;Ti = min T j j ∈L
4. If Ti > Fα, then stop; otherwise, L \ {i} → L, Si+n C → C, return to (3). From the steps above the index set L of ZP is determined. Once XP and ZP are determined, the spline nodes of S(t) and ΔS (t) are determined; by the same approach, we can determine the spline nodes of P(t) and ΔP (t), Q(t) and ΔQ (t). Take the set of nodes with the shortest intervals as the common spline nodes for S, P, Q and also the common nodes for x(t), y(t), z(t). The spline nodes for ΔS (t), ΔP (t), ΔQ (t), are different from each other. 6.8.3 Estimation of the Slow Drift Errors
For simplicity, we still denote the aP and βP in Section 6.8.2 as a, β. After we derive the models similar to the model in Equation 6.134 in all the three directions of S, P, Q, we actually obtain the optimal models for estimating the drift errors and the trajectory parameters. The procedures to estimate the drift errors are as follows. 1. Based on the features of the trajectory, the spline inner nodes of the true signal are taken as equal intervals h = 4, 6, or 8 s T j = ( j − 4 )h ,
j = 5, 6,…, kh −1 − 1
2. The nodes of the spline for expressing the drift errors ΔS , ΔP , ΔQ are determined by the method in Section 6.8.2 as (TS 5 ,TS 6 ,…,TSN S ) (TP 5 ,TP 6 ,…,TPN P ) (TQ 5 ,TQ 6 ,…,TQNQ ) 3. The initial values of β are taken as 0;
© 2012 by Taylor & Francis Group, LLC
437
DAtA F R o M R A DA R M e A suReM en t s
4. We can get the estimate of S , P ,Q , S, P ,Q , and by the method in [21], we can obtain the estimate of X(t), from which we can represent X(t) by a spline using the inner nodes given in (1). The initial estimate of a can be acquired. To summarize, we get model 6.127 from (1) and (2), and get the initial values of the estimated coefficients in model 6.130 from (3) and (4). Thus we can obtain the estimate of a, β from the iterative formulas below (for theoretical derivation, see Ref. 8): a ( 0 ) , β( 0 ) given by ( 3) and ( 4); V j = (∇F (a ( j ) ),W ), a ( j + 1) a ( j ) ( j) ( j) τ τ β( j + 1) = β( j ) + (V j V j )V j (Y − F (a ) − W β ). Since F(a) is not highly nonlinear, the iteration converges quickly with generally 6–8 times. Suppose the limit point of the above iterative formulas is from [8] we have τ
a∗ − a a∗ − a ( ∇F ( a ∗ ) τ ∇F ( a ∗ ) = E ∗ τ ∗ W ∇F (a ) β − β β ∗ − β
∇F ( a ∗ ) τ W W τW
a* β* ,
then
−1
(6.147)
Denote HF = ∇F (a *)(∇F (a *)τ ∇F (a *))−1 ∇F (a *)τ , by Equation 6.147 there is E(β∗ − β)(β∗ − β)τ = D W τ ( I − H F )W where D is a NS + N P + NQ -order square matrix, and D has NS , N P, NQ -order principal minor DS , DP, DQ. The following uses V to denote S, P, Q, then ∗
∗
E(βV − βV )(βV − βV )τ = DV Denote the drift error as ∆V = ( ∆V (t1 ), ∆V (t 2 ),…, ∆V (tm ))τ
© 2012 by Taylor & Francis Group, LLC
438
M e A suReM en t DAtA M o D eLIn G
then the estimate of the drift error and the estimation errors are respectively ∆V∗ = WV βV∗ E( ∆V∗ − ∆V )( ∆V∗ − ∆V )τ = WV DV WVτ E( ∆V∗ − ∆V )τ ( ∆V∗ − ∆V ) = dV tr( DV WVτWV ) Simulation shows that when the offsets σ j ( j = S , P ,Q , S, P ,Q ) are 0.12, 0.009, 0.009, 0.006, 0.0006, 0.0006 (m/s) [2], the RMS of the estimated error of the drifted S, P ,Q in three directions are 0.000924, 0.000924, 0.000923 (m/s). The precision is satisfactory. We also use the method in this section to estimate the slow drift errors in the range rate tracking data of a CW radar system. The time period is 25 s. The drift error in the S direction is illustrated in Figure 6.2. The directions of P,Q are similar with much smaller amplitudes. The results above are consistent with engineering background. The precision of the measurement data is remarkably improved after removing the slow drift error. 6.9 Summary of Radar Measurement Data Processing 6.9.1 Data Processing Procedures
Space radar measurement data processing firstly goes through the procedures called front-end processing (e.g., interpretation, translation, removing abnormal data, time correction, atmospheric refraction
ΔV
0.04
0
–0.04
0
Figure 6.2
100
200 j
300
Error of the slow drift error in the range rate measurement.
© 2012 by Taylor & Francis Group, LLC
400
DAtA F R o M R A DA R M e A suReM en t s
439
correction, etc.). In the following part of this section, we mainly focus on the data processing work after the front-end data processing. Many factors, such as the measurement device itself, weather conditions, phase separation and the surveyors, are likely to produce several abnormal data or even segments of them. Sometimes observations in some period of time may be lost. Therefore, the first thing to do is to analyze abnormal data. The identification and reconstruction of individual abnormal data can follow the method below. The observational data of the distance sum, difference and their change rates at time ti can be expressed as
6.9.1.1 Analysis of Abnormal Data
yi = f (ti ) + ei , i = 1, 2,…, m
(6.148)
where yi is the observational data, f (ti) is the sum of the true signal and the systematic error, and ei is the measurement random error or its sum with the gross error. We express f (t) in the form of a polynomial or polynomial spline, that is, f (t ) =
n
∑ β ψ (t ) i =1
j
j
(6.149)
where (ψ1(t),…, ψn(t)) is a set of known linearly independent basis functions. Denote Y = ( y1 , y 2 ,…, y m )τ X = (xij )m × n , xij = ψ j (t ) β = (β1 , β 2 ,…, βn )τ e = (e1 , e 2 ,…, em )τ Then from Equations 6.148 and 6.149, we can get the linear regression model Y = Xβ + e
© 2012 by Taylor & Francis Group, LLC
(6.150)
440
M e A suReM en t DAtA M o D eLIn G
For model 6.150 and by the method in Section 3.6, we can find the abnormal data Yj( j ∈ N). Remove these abnormal data and the corresponding regression equations, and take Y(N) as the vector after excluding the abnormal data components from Y; X(N), e(N) represent the matrix and the vector after excluding the rows in X and e corresponding to the abnormal data. By Equation 6.150 there is Y ( N ) = X ( N )β + e( N )
(6.151)
From Equation 6.151, we can get the estimate of β as ∧
−1
β( N ) = X ( N )τ X ( N ) X ( N )τ Y ( N )
(6.152)
Based on the estimate of β, w can use ∧
Y j = Xj β,
j ∈N
(6.153)
as the observational data of f(tj), that is, use Yj instead of Yj as the observational data at time tj. 6.9.1.2 Analysis of the Measurement Principle and the Measurement Data The mathematical processing results of measurement data are
closely related to the mathematical model applied. Different mathematical models or different mathematical processing methods will lead to different data processing results. To establish an accurate model with few parameters for the measurement data and make it benefit to estimating the trajectory parameters and the systematic errors, it is very necessary to perform a comprehensive analysis to the measurement data, equipment, measured target, and environmental conditions. The analysis mainly includes the following aspects.
1. Analyze the trajectory parameters (x(t ), y(t ), z(t ), x(t ), y (t ), z(t )) , y(t ), z(t ), x(t ), y (t ), z(t )) . We can see from the engineering background that except for the period of inter-phase separation, x(t), y(t), z(t) should be a two-order (or more) continuously differentiable function. What’s more, the position parameters and velocity parameters are matched by the derivative relationship.
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
4 41
2. Due to the matching relationship between the trajectory parameters, there are strict derivative matching relations between the distance sums and the corresponding change rates, as well as the distance differences and their change rates. And the distance sums and differences are also twoorder (or more) continuously differentiable functions of time t except for the period of inter-phase separation. 3. Analyze the measurement principles. The transmitting station sets out a fixed frequency and measures the range rates using Doppler effect. The measurement data are only affected by the time misalignment from the receiving station. At the same time, we should also take the time misalignment problem into account between the measured distances and their change rates. Analysis of the measurement principle should also consider the constant systematic error of the distance measurements, the frequency drift errors of the distance change rate measurements, and so on. 4. Analyze the environmental conditions, for instance, the error of the trajectory parameters introduced by the site position errors of stations, the error of the refraction correction caused by unclear atmospheric features, the error by the inter-phase separation, and so on. 5. The errors of computation formulas, including the errors induced by the speed of light, approximation of π using limited decimals, and other relevant physical constants. To approximate a nonlinear formula with a linear one also brings errors. 6. The personnel errors due to operation tiredness, physical habits, and so forth. Measurement data modeling is an important prerequisite for successful data processing. The purpose of measurement data modeling is to convert the measurement data mathematical processing problem into the parameter estimation issue. By effectively applying the parameter estimation theory and methods, we can expect the precision improvement of the data processing results.
6.9.1.3 Measurement Data Modeling
© 2012 by Taylor & Francis Group, LLC
4 42
M e A suReM en t DAtA M o D eLIn G
From the point view of modern parameter estimation theories and methods, the measurement data modeling should pay attention to the following aspects. 1. The model should be as accurate (i.e., complete and correct) as possible. Try to avoid the modeling error and make it small enough to be neglected. 2. The model establishment should make (independent) estimated parameters as few as possible, including the parameters to describe the trajectory, systematic errors and those to embody the statistical features of random errors. Models with fewer parameters can help to improve the estimation precision and to identify abnormal data. 3. The established model should make it easy to separate the systematic errors and the trajectory parameters. Only the models, which can ensure the precision of both the system errors and the trajectory parameters, are good models. 4. Use linear or nonlinear models for the situation it needs. The avoidance of using nonlinear models is not generally wise. 6.9.1.4 Estimation of Statistical Features of Random Errors In a smooth flight phase of a not long period of time, the random errors of the spacecraft tracked by a radar can well be described by AR(2) series, and it works for both the distance and velocity measurements. Based on these characteristics, we can express the measurement element of the distance or velocity as the following form:
yi = f (ti ) + ei , {ei } is a stationary AR ( 2) series
(6.154)
Here, f (t) is the sum of the true signal and the systematic error. If we approximate f (t) with a polynomial with a period of about 5 s, we will get yi = a1 + a2 t + + ar t r − 1 + ei
(6.155)
which is a PAR model. Based on the method given in Section 4.6.4 we can give the parameter estimation of the time series {ei } and further
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
443
get the statistical features of random errors. There are also other types of models and methods for analyzing the statistical features of random errors, see Chapter 4. 6.9.1.5 Estimation of True Signal and Systematic Error Since the true signals are matched, it is beneficial for estimating precisely both the true signal and the systematic error to divide the systematic error into matched systematic error and unmatched one. The unmatched systematic error should be estimates and removed in advance. It is required to give the estimation error at the same time of giving the estimation result. For example, it is necessary to present estimation errors of all direction trajectory parameters, those of the total systematic error and the systematic error parameters. To give the estimation error includes the calculation formulas of the estimation error, the simulation results and so on.
Different measurement data models and different data processing methods lead to different data processing results. In addition to the strict and scientific selection of mathematical method, it is very necessary to perform engineering analysis to the results of the measurement data processing. We need to answer some questions through engineering analysis such as
6.9.1.6 Engineering Analysis for Data Processing Results
1. Whether the mathematical model for the measurement data is reasonable? 2. Whether the error sources identified are consistent with the actual situation? 3. Whether the trajectory parameters are consistent with the actual flight situation? 4. Whether the found abnormal data are accurate? 5. Whether the data precision and the actual device accuracy are the same? 6. What kind of models and methods has the data processing results that match the reality best?
© 2012 by Taylor & Francis Group, LLC
444
M e A suReM en t DAtA M o D eLIn G
6.9.2 Basic Conclusions
The CW radar data processing practice shows that: 1. For each measurement element of the CW radar, that is, distance sum and its change rate, distance difference and its change rate, their measurement random error can be described by stationary normal zero-mean AR (2) series. 2. From Theorem 6.6 we know that the more faraway the spacecraft is from the observing station, the greater the error magnification ratio for the measurement element error to the launching reference system conversion will be. 3. The traditional EMBET methods have very prominent problems of ill-conditioned model and modeling error, and the effect of the systematic estimation is not ideal. 4. The applications of matching principle, reduced parameters, nonlinear model, and composite time series models are effective to estimate trajectory parameters, the systematic error and the statistical properties of random errors. 5. Since the trajectory parameters have strong polynomial characteristics, the distance sum and distance difference with the corresponding change rates also have strong polynomial characteristics. It is for this reason that the estimation for some matching systematic error, which can be expressed in lower order polynomial, is rather difficult. Those systematic errors, which can be resolved by engineering methods, should be solved with those methods. At the same time, we should try to use multistation information and optical measurement equipment to reduce the ill-conditioned problem. 6. It is appropriate to represent the trajectory parameters of the boost phase with spline functions; while for the free flight phase, it is recommended to express them with precise trajectory equations. This will not only ensure the precision, but also reduce number of the parameters to be estimated. 7. For the free flight phase, only the measurement elements of one set of MISTRAM system are required to estimate the trajectory parameters and the constant systematic error, provided the exactly precise trajectory equation could be given.
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
445
8. It is possible to apply mathematical methods to estimate and correct systematic error and to reveal the statistical characteristics of random error. But we should pay attention to make the model more accurate and compact by utilizing the engineering background of the trajectory parameters, measurement equipment and environmental conditions. Only in this way the precision could be effectively improved and the data processing results could be more consistent with engineering practice by the mathematical treatment of measurement data. EXERciSE 6 1. Explain the differences and the relations between precision appraisal and precision calibration of the radar measurement system. 2. In the mathematical method for the MISTRAM trajectory determination, it is required to solve a quadratic equation. What is the approach to remove the unneeded one between the two solutions? 3. Try to use the regression analysis theory to illustrate that multimeasured information is beneficial to the systematic error estimation. 4. Make an example to show that reducing the number of estimated parameters is beneficial to estimate precisely the systematic errors and the trajectory parameters provided that the modeling error is small enough to be neglected. 5. Consider tracking a certain satellite with both a laser cinetheodolite and a MISTRAM continuous wave system. Suppose that the laser cinetheodolite has a constant systematic error a1 on the range measurement, and the MISTRAM system has constant systematic errors a2 , a 3 and a 4 on S, P, Q respectively. Suppose that both measurements have random errors. Try to use the measurement data of a period of time to establish the mathematical model and the estimation method of estimating the satellite orbit parameters and the constant systematic errors a1, a2 , a 3, a 4. Give the expressions for the estimation methods, and do simulation calculations.
© 2012 by Taylor & Francis Group, LLC
446
M e A suReM en t DAtA M o D eLIn G
6. Suppose tracking the same spacecraft with three sets of MISTRAM system at different site positions. Each measurement element of each MISTRAM system is considered to have phase shift and the first-order phase drift. No other system errors are taken into account. Try to use the measurement data of a period of time to establish the mathematical model and the estimation method of estimating the systematic errors and the trajectory parameters. Give the expressions for the estimation methods, and do simulation calculations. 7. Through the establishment and analysis of the mathematical model of measurement data and the method of parameter estimation, prove that, for a continuous wave radar system, the more systematic error sources are considered, the more difficult the systematic error estimation will be. 8. Suppose that random errors of the six measurements of the MISTRAM system at the same time obey normal distributions with zero-mean and the mean square error σ S2 , σ P2 , σQ2 , σ S2 , σ P2 , σQ2 respectively. The six random errors are independent. Consider the circumstances that these measurement errors are white noise or stationary AR(1) series. Analyze the efficiency loss problem when we deal with the zero-mean stationary AR(1) series as a zero-mean white-noise series while using polynomial approximation for the trajectory parameters. 9. Prove Theorem 6.4. 10. Using the proof method in Theorem 6.6 to prove that the velocity measurement errors ( ∆x , ∆y , ∆z ) introduced by (ε1, ε2, ε3, δ1, δ2, δ3)τ satisfies the following relation: ∆x δ b 1 1 −1 −1 −1 ∆ y ≈ B − B ⋅ Z ⋅ B b δ 2 2 ∆z δ 3 b3 where Z = (Zij ) , Zij = grad( Bij )B −1 ( ε1 , ε 2 , ε 3 )τ
© 2012 by Taylor & Francis Group, LLC
DAtA F R o M R A DA R M e A suReM en t s
References
4 47
1. Cheneynyck B Φ. Statistical Processing Fundamentals of Radio Tracking Trajectory Measurement. Beijing: Astronautic Publishing House, 1987. 2. Charles L, Carroll J. High accuracy trajectory determination, future space program and impact on range and network development, AAS Science and Technology Series, 15, 1967. 3. AD602799, Investigation of Feasibility of Self-Calibration of Tracking System, 29 May 1964. 4. Jingtian Xiang et al. Dynamic and Static Data Processing: Time Series and Statistics Analysis. Beijing: China Meteorological Press, 1991 (in Chinese). 5. Peiran Gu, Kejun Chen, Li He. Long-range Rocket Ballistics. Changsha: National University of Defense Technology Press, 1993 (in Chinese). 6. Songgui Wang. Linear Models: Theories and Applications. Hefei: Anhui Education Press, 1987 (in Chinese). 7. Xiru Chen, Songgui Wang. Contemporary Regression Analysis: Principles, Methods and Applications. Hefei: Anhui Education Press, 1987 (in Chinese). 8. Bocheng Wei. Contemporary Nonlinear Regression Analysis. Nanjing: Southeastern University Press, 1989 (in Chinese). 9. Shengfu Wang. Spline Functions with Applications. Xi’an: Northwestern Polytechnical University Press, 1989 (in Chinese). 10. Cheney E W. Introduction to Approximation Theory. Shanghai: Shanghai Science and Technology Press, 1979. 11. Rubiao Xie, Peiqing Jiang. Nonlinear Numerical Analysis. Shanghai: Shanghai Jiaotong University Press, 1984 (in Chinese). 12. Mathematics Department, Nanjing University. Numerical Methods for Ordinary Partial Equations. Beijing: Science Press, 1979 (in Chinese). 13. Zhengming Wang. Constant systematic error estimation of continuous wave radar systems. Chinese Space Science and Technology, 1996(1) (in Chinese). 14. Zhengming Wang, Haiyin Zhou. Mathematical method for trajectory determination of the MISTRAM systems. Chinese Space Science and Technology, 1994(4) (in Chinese). 15. Zhengming Wang. A Fast algorithm of selecting optimal regression models. Journal of Astronautics, 1992(2) (in Chinese). 16. Zhengming Wang, Dongyun Yi. Systematic error estimation for the free flight phase. Chinese Space Science and Technology, 1996(6) (in Chinese). 17. Zhengming Wang. On the biased estimation of linear regression models. Journal of Systems Science and Mathematical Sciences, 1995(4) (in Chinese). 18. Brown D C, The Error Model Best Estimation of Trajectory, AD602799, 1964. 19. Zhengming Wang, Dongyun Yi. Time alignment of distance sum and its change rate. Missiles and Space Vehicles, 1996(2) (in Chinese).
© 2012 by Taylor & Francis Group, LLC
448
M e A suReM en t DAtA M o D eLIn G
20. Zhengming Wang. Time unifying and orbit estimation of multi-station tracking data of continuous wave radar system. Chinese Space Science and Technology, 1995(3) (in Chinese). 21. Zhengming Wang, Dongyun Yi. A New method and algorithm of solving trajectory parameters. Journal of Astronautics, 1996(2) (in Chinese). 22. Zhengming Wang, Dongyun Yi. Estimation of slow drift error on the distance change rate. Journal of Astronautics, 1997(4) (in Chinese). 23. Zhengming Wang, Dongyun Yi. On the spline fitting of measurement data. Annuals of Hunan Mathematics, 1995(2) (in Chinese). 24. Jintai Cui. Introduction to Wavelet Analysis. Xi’an: Xi’an Jiaotong University Press, 1995 (in Chinese). 25. Zhengming Wang, Haiyin Zhou. Mathematical processing of distances and change rates tracking data. Chinese Space Science and Technology, 1994(3) (in Chinese).
© 2012 by Taylor & Francis Group, LLC
7 P rEcisE o rBiT d E TErminaTi on of lEo s aTElliTEs B asEd on d ual-frEquEn cy gPs
7.1 Introduction
With the development of space technology, the precision of orbit determination for low earth orbit (LEO) satellites has been continuously improved. The orbit determination at hundred-meter level of precision in the 1960s, which was enhanced to meter level in the 1970s and decimeter level in the 1980s, has currently reached the precision of a centimeter level. The evolution of tracking and measurement techniques provides a solid foundation for the improvement of precision in orbit determination of satellites. Nowadays, tools commonly used for tracking orbits of satellites in the world include global positioning system (GPS), satellite laser ranging system (SLR), Doppler orbitography and radiopositioning integrated by satellite (DORIS), etc. SLR measures the distance from a laser tracking station to a satellite. It can achieve high accuracy of measurement because of the relatively small atmospheric effect on the laser pulse transmission. With all advances in modern tracking technology, the current measurement accuracy of SLR has reached 5–10 mm. SLR can provide precise, unambiguous distance observations from tracking stations to a satellite. By the fusion of these observations and the satellite orbit dynamic model, we can determine the satellite orbit. The main limitations of SLR systems are the sparse geographical distribution of tracking stations and the demanding weather conditions. Presently, there are more than 40 laser ranging stations located in the northern hemisphere, while there are fewer stations in the southern hemisphere [1]. DORIS is a one-way Doppler measurement system from the ground to a satellite. The system was established by CNES (Centre 449
© 2012 by Taylor & Francis Group, LLC
450
M e A suReM en t DAtA M o D eLIn G
National d’Etudes Spariales), IGN (Institut Géographique National), and GRGS (Le Groupe de Recherche de Géodésie Spatiale). It is among the major LEO satellite tracking systems in the world [2]. This system uses a series of ground beacon station to broadcast continuous, full-direction dual-frequency signals at 2036.24 and 401.25 MHz. Through the Doppler frequency shift measurements by the onboard DORIS receiver, the distance change rate of the satellite to the beacons can be calculated with the average accuracy of 0.3 mm/s. The DORIS tracking system has more than 50 stations, which form a more even global distribution. GPS is a space-based satellite navigation system approved by the U.S. Department of Defense. It is developed jointly by the navy, air force, and land forces [3]. The space segment of the GPS system was designed to have 24 satellites that are operating in circular orbits at the height of 20,200 km. The 24 satellites are evenly distributed in six orbital planes with four satellites each, and the orbital planes are separated by 60° (Figure 7.1). Spaceborne GPS applies the SST (satellite-to-satellite tracking) approach to determine the precise LEO orbit by utilizing the uniformly distributed GPS constellation. Compared to the measurement systems using ground satellite tracking stations, the navigation satellite system possesses advantages such as all-weather conditions,
Figure 7.1
The spatial distribution of GPs constellation.
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
4 51
good continuity, high precision, low cost, and broad time–space coverage. With the development of the spaceborne dual-frequency GPS observation technology and the enhancement of data processing level, more and more satellites are equipped with dual-frequency GPS receivers. It has gradually become a research focus and development trend in the international field of aerospace to determine the LEO satellite orbit by means of spaceborne dual-frequency GPS. It dates back to the early 1980s when the research first started on using spaceborne dual-frequency GPS to determine the precise LEO satellite orbit. The precision of orbit determination also rises with the improvement of observation accuracy and the refinement of the satellite orbit dynamic model. In particular, through the TOPEX satellite precise orbit determination experiment, JPL (Jet Propulsion Laboratory) and CSR (The University of Texas at Austin, Center for Space Research) determined TOPEX satellite orbit by using dualfrequency GPS observations, and the radial precision has reached 3 cm. It adequately shows the feasibility and precision of the use of dual-frequency GPS satellite for LEO satellite orbit determination, as well as its potential for future development [4]. Currently, spaceborne dual-frequency GPS has made gratifying achievements on the applications of TOPEX, CHAMP, GRACE, and other satellites. However, with the development of future hardware, the rising demands of scientific research, and engineering applications to the satellite orbit precision, there are still some issues worthy of discussion and research in the spaceborne GPS precise orbit determination techniques, such as small cycle slip detection in phase data, the calculation of the nonconservative forces in the dynamic orbit model, and the compensation of unknown perturbations. This chapter focuses on the mathematical processing methods related to the dual-frequency GPS-based LEO satellite precise orbit determination technology, including mainly the spaceborne dualfrequency GPS observational data preprocessing method and the zero-difference reduced dynamic orbit determination method. 7.2 Spaceborne Dual-Frequency GPS Data Preprocessing
In the spaceborne dual-frequency GPS dynamic measurement process, the observational data often contain outliers due to accidental
© 2012 by Taylor & Francis Group, LLC
452
M e A suReM en t DAtA M o D eLIn G
influences from a variety of tracking channels. The presence of outliers can seriously affect the precision and reliability of results in orbit determination. The data preprocessing thus becomes a premise for the spaceborne dual-frequency GPS precise orbit determination. The objects of GPS data preprocessing are pseudocode and carrier phase, including outlier removal and cycle slip detection. The focus is usually on the latter, since numerical experiments have shown that artificial increase or decrease of a cycle slip will influence the positioning at decimeter level [5]. 7.2.1 Basic Observation Equations
Dual-frequency GPS observations include carrier phase data L1, L2; pseudocode data P 1, P 2; and CA. The basic observation equations can be expressed as C Aj = ρ j + c (δt r − δt j ) + I j + εCj A P1 j = ρ j + c (δt r − δt j ) + I j + ε Pj1
P2j = ρ j + c (δt r − δt j ) + α ⋅ I j + ε Pj2 j 1
j
j
j 1
j
L = ρ + c ( δt r − δt ) + λ 1 N − I + ε
j L1
(7.1)
L2j = ρ j + c (δt r − δt j ) + λ 2 N 2j − α ⋅ I j + ε Lj2 where ρ j is the true geometric distance from the receiver antenna phase center to the transmitting antenna phase center of GPS satellite j, δtr is the receiver clock error, δt j is the GPS satellite clock error, α is the factor describing the relationship between the ionospheric delay and the two frequencies, and α = f 12 / f 22 . I j is the ionospheric delay at frequency f 1, λ1 and λ2 are carrier wavelengths, and N 1j and N 2j are zero-difference phase integer ambiguity. ε j is the sum of the observation errors, including multipath errors, thermal measurement noise, channel delay, electronic signal interference, etc., and ∙ represents any type of observation. 7.2.2 Pseudocode Outliers Removal
Several methods are combined to improve the reliability of outlier removal. These are the threshold method of signal-to-noise ratio, the
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
453
threshold method of ionospheric delay, the fitting residual method of ionospheric delay, and the method of monitoring receiver autonomous integrity [6,7]. The observational data often contain the signal-to-noise ratio data S1 and S2 of different frequency carriers. In GPS receivers, the signal-to-noise ratio is generally defined as [8] 7.2.2.1 Threshold Method of Signal-to-Noise Ratio
SNR 1/ 2 S1/ 2 = 20 log 10 2
(7.2)
By setting the signal-to-noise ratio threshold, we can directly remove a part of the observational data with rather low signal-to-noise ratio. 7.2.2.2 Threshold Method of Ionospheric Delay Dual-frequency P codes can be used to construct the ionospheric residual combination PI j to eliminate the impact of the geometric distance and the clock offset and extract the ionospheric propagation delay between the GPS satellites and the user satellite
PI j =
f 22 ⋅ ( P1 j − P2j ) 2 2 f 2 − f1
(7.3)
The magnitude of the ionospheric delay depends on the number of electrons in the propagation path of GPS signals, while the electron density distribution is related to the altitude of the user satellite, which is also used as the basis to set the threshold Imax of the ionospheric delay in the ionospheric delay threshold method. By judging whether PI j is beyond tolerance, the outliers are removed. As the ionospheric delay varies in a large range, this method can only be used to eliminate some of the relatively large data outliers in the observed pseudocode. The ionospheric delay has the characteristics of changing slowly with time. In a continuous tracking process, we can further remove the pseudocode outliers according to the magnitude of the polynomial fitting residuals to the ionospheric delay.
7.2.2.3 Fitting Residual Method of Ionospheric Delay
© 2012 by Taylor & Francis Group, LLC
454
M e A suReM en t DAtA M o D eLIn G
The Vondrak filter employs the sum of the square of the third-order differences to reflect the degree of smoothing and the smoothing factor to control the approximation and the smoothness of the fitting curve. It is able to smoothen the observational data reasonably without the knowledge of the form of the fitting function. It has impressive filtering performance and has been applied extensively and effectively in astronomy, surveying, mapping, and other fields. For the measurement series (xi , yi′), the idea of smoothing in the Vondrak filter is to minimize the following formula (7.4)
Q = F + λ 2S → min
where λ2 is a constant to control the impact of the goodness of fit F and smoothness S on the filter. F and S are generally defined as 1 F = N −3
N
∑ i =1
pi ( yi − yi′)2 , S =
1 s−r
s
∫ | ϕ(x) | dx 2
r
(7.5)
where N is the number of observation data, yi = φ(xi), φ; is the smoothing curve, yi is the Vondrak filtering value to be estimated, and pi is the weight of the observational data. The Vondrak filter uses a cubic Lagrange polynomial Li(x) to represent the smoothing curve between the two internal points (xi+1, yi+1) and (xi+2, yi+2): 3 ( x − xi + j ) ⋅ yi + k j = 0, j ≠ k (xi + k − xi + j ) k=0 3
Li (x ) =
∑ ∏
i (x ) = L
1 6 ⋅ yi + k j = 0, j ≠ k (xi + k − xi + j ) k=0 3
3
∑ ∏
(7.6)
where L(x) is only defined within [x2, xN−1], so s and r are defined as s = xN−1 , r = x2. S =
© 2012 by Taylor & Francis Group, LLC
N −3
∑ (a y + b y i =1
i i
i i +1
+ c i yi + 2 + d i yi + 3 )
2
(7.7)
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
455
where ai = bi =
6 xi + 2 − xi + 1 (xi − xi + 1 )(xi − xi + 2 )(xi − xi + 3 )
( xi + 1
6 xi + 2 − xi + 1 − xi )(xi + 1 − xi + 2 )(xi + 1 − xi + 3 )
( xi + 2
6 xi + 2 − xi + 1 − xi )(xi + 2 − xi + 1 )(xi + 2 − xi + 3 )
( xi + 3
6 xi + 2 − xi + 1 − xi )(xi + 3 − xi + 1 )(xi + 3 − xi + 2 )
ci = di =
According to Equation 7.4, the necessary condition for yi to be the solution for the Vondrak filter is ∂Q ∂F ∂S = + λ2 = 0, i = 1, 2,…, N ∂yi ∂yi ∂ yi
(7.8)
Suppose λ ≠ 0. Let ε = 1/λ2. Then, from Equation 7.8 we can get an N-order linear equations of yi: 3
∑A k = −3
ki
y k + i = Bi yi′, i = 1, 2,…, N
(7.9)
where A−3i = ai−3di−3, A−2i = ai−2 ci−2 + bi−3 di−3, A−1i = ai−1 bi−1 + bi−2 ci−2 + ci−3 di−3, A0i = εpi /( N − 3) + ai2 + bi2− 1 + ci2− 2 + d i2− 3 , A1i = aibi + bi−1 ci−1 + ci−2 di−2 , A 2i = aici + bi−1 di−1, A 3i = aidi , Bi = εpi /(N − 3). When m ≤ 0 or m ≥ N − 2, we have am = bm = cm = dm = 0. Therefore, when k + i ≤ 0 or k + i ≥ N + 1, Aki = 0, and A is a 7*7 matrix of diagonal band-type. ε is the smoothing factor whose selection is related to the signal-varying frequency and the error magnitude. The smaller the ε is, the smoother the curve will be. When ε → 0, the smoothing curve will be a very smooth parabola and when ε → ∞, we get an interpolation curve of the raw observational data.
© 2012 by Taylor & Francis Group, LLC
456
M e A suReM en t DAtA M o D eLIn G
By expressing the observation model in vectors, we obtain (7.10)
Y′ = Y + e
where Y ′ is the observational data vector, Y is the corresponding real data vector, and e is the observation error vector. Suppose ei ∼N (0, σ i2 ). Let the observation weight be pi = 1 /σ i2 . Denote Y as the Vondrak filter estimation of Y, then 1 Y = a r g mi n Y N −3
N
∑ i =1
2 yi − yi′ 2 + λ S σ i
(7.11)
We define the estimated residual function r as r (Y ) = Y − Y ′
(7.12)
We can detect the gross error based on the magnitude of the estimated residuals r (Y ), but only if the estimated Y is reliable. If the observational data contain gross error, then the estimation results near the gross error will all be affected; thus, the estimated Y is not reliable and the location of the gross error cannot be accurately identified by the magnitude of the estimated residuals. We can solve this problem by introducing a penalty function to the original Vondrak filter estimation model 7.11 and getting a robust Vondrak filter model. The new model can give the robust estimate Y M of the original signal Y when it is contaminated by the gross error. Furthermore, the gross error detection can be carried out based on the magnitude of the estimated residuals r (Y M ) [6,9]. The penalty function ρ can be defined as 1, ρ(r ) = (ρ1 ,. . ., ρN )T , ρi = 0,
ri ≤ c 0 ⋅ σi ri > c 0 ⋅ σi
, ri = yi − yi′ (7.13)
where c 0 is a constant and is usually taken as 2 ~ 4. We define the index set as ν(Y ) = {i : ρ( yi − yi′ ) = 0}, ν(Y ) = {i : ρ( yi − yi′ ) = 1} (7.14)
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
4 57
thus, we can obtain the robust Vondrak filter estimation of the original signal Y: 1 Y M = arg min Y N −3
N
∑ i =1
2 yi − yi′ 2 + S ρi ⋅ λ σi
(7.15)
In the solution process of model 7.15, an iterative algorithm is required. The specific steps are as follows: Step 1. Let k = 0, v(0) = ϕ (empty set). By substituting ρ(0) ≡ 1 into Equation 7.15, we can calculate the initial point Y (M0 ) . Step 2. Calculate the estimated residual r ( k ) = Y (k-1) − Y ′, the M
penalty factor ρ(k), and the index set v(k) of the k th iteration. Compare the index sets v(k) and v(k−1). Step 3. If v(k) = v(k−1), terminate the iteration, and v(k) is the corresponding gross errors index set. Step 4. If v(k) ≠ v(k−1), substitute ρ(k) into Equation 7.15, and solve Y (Mk ) . Let k = k + 1 and return to Step 2.
In practice, the parameter σ can be determined in the following two ways. One way is the rule of thumb, and σ does not change during the iteration; the other way is to calculate it self-adaptively based on the estimated residual r(Y ) of the robust Vondrak filter, and that will continuously update σ in the iteration process. The estimate of σ(k) by the MAD (mean absolute deviation) method in the kth iteration is σ
(k)
=
med ni = 1 (| ri( k )|) 0.6745
(7.16)
where the med function takes the middle of the sorted elements of the series. The MAD method performs better when the observation noise is close to normal distribution. In addition, σ(k) can also be estimated using the following formula:
σ( k ) =
∑ (r ( k − 1)
i ∉ν
n
(k) 2 i
)
(7.17)
We now take the gross error detection test by examining a set of data for the CHAMP satellite on February 3, 2006. First, we extract
© 2012 by Taylor & Francis Group, LLC
458
M e A suReM en t DAtA M o D eLIn G
the receiver continuous tracking data to a certain GPS satellite and construct the dual-frequency P codes ionospheric combination. Second, by using the robust Vondrak filter, we give the robust estimation of the ionospheric delay and then eliminate the tendency term of the ionospheric delay and extract the fitting residuals. Finally, we detect the gross errors by judging whether the magnitude of the residual exceeds the threshold. Figures 7.2 and 7.3 show the gross error detection results for a particular continuous tracking arc from GPS satellite PRN10 and PRN18, respectively. It can be seen that there is no gross errors in the data from Satellite PRN10, and the algorithm can converge after the first iteration; the data from PRN18 contain two gross errors and the algorithm converges after three iterations. The above results can validate the effectiveness of the robust Vondrak filter. The three methods discussed above employ the observation information from only one single channel while they do not make use of information from multiple channels. Thus, we can further use the receiver autonomous integrity monitoring (RAIM) method to test the consistency of measurements from multiple channels of one satellite [10,11]. Assume
7.2.2.4 Method of Monitoring Receiver Autonomous Integrity
Dual-frequency ionosphere P code observations/m
12
Observations Robust Vondrak filter fit curve
11 10
Figure 7.2
9
When iteration step is 1, computation converges
8 7 6 5 4
0
500
1000 Time/s
1500
Gross error detection of dual-frequency P codes from the GPs satellite PRN10.
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
Dual-frequency ionosphere P code observations/m
25
Observations Robust Vondrak filter fit curve Gross errors detection result
20
15
When iteration step is 3, computation converges
10
5
0
Figure 7.3
459
0
500
1000 Time/s
1500
Gross error detection of dual-frequency P codes from the GPs satellite PRN18.
that at the observation time t, the linearized GPS observation equations can be expressed as Z (t ) = G(t ) ⋅ X (t ) + ε(t )
(7.18)
where Z is the difference between the pseudocode observed and estimated values; G is the observation geometry matrix; X is the user state vector to be estimated, including the three position components and the receiver clock bias; and ε is the observation noise vector, and also assume that COV( ε) = σ ⋅ I n . The least squares estimation for X is X = (GT G )−1GT Z
(7.19)
The estimated residual for the pseudocode can be expressed as ω = [ I n − G(GT G )−1GT ] ⋅ Z
(7.20)
Define the statistical testing indicator as T =
© 2012 by Taylor & Francis Group, LLC
ωT ω n−4
(7.21)
460
M e A suReM en t DAtA M o D eLIn G
where T is the estimation of the pseudocode observation noise standard deviation σ. The value of T will be significantly larger when there is failure in one of the receiver observation channel. The RAIM algorithm completes the self-test of the channel failure based on this feature. This algorithm includes two functions: fault detection and troubleshooting. The fault detection function detects whether there is a fault channel by judging if statistics T is beyond the tolerance. At least five satellites should be seen to perform the fault detection. We apply the RAIM algorithm to process the ionosphere-free pseudocode combination P IF of the CHAMP satellite. Figure 7.4 shows the changes of the statistic variable T, whose RMS is this time about 1 m, and it is desirable to pick the fault detection threshold as 3 m (3 σ). The troubleshooting function further finds out which channel has failed. If the statistics T is over the predefined threshold, then the channels need to be investigated one by one to identify the main fault channel. If necessary, the treatment can be repeated to remove multiple fault channels at one instant. Troubleshooting requires at least six visible satellites. Figure 7.5 shows the result of fault channel detection. It shows that after troubleshooting, the T value reduces significantly. Among the channels, PRN11, PRN20, PRN21, and PRN24 are faultintensive. This is because some channels of the CHAMP satellite receiver occasionally has about 15 m code deviation, which will lead Statistic variable T of RAM algorithm/m
10
Figure 7.4 2006.
9 8 7 6 5 4 3 2 1 0
0
1
2
3
4
5
Time (104 s)
6
7
8
The statistic variable T of RAIm algorithm for the CHAmP satellite on February 2,
© 2012 by Taylor & Francis Group, LLC
4 61
Statistic variable T/m
PRN number of fault satellite
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
30 PRN24
PRN21
20
PRN24 PRN20 PRN11
10 0
0
12
1
2
3
4
5
6
7
8
On the epoch when fault channels happen On the epoch when fault channels have been detected and deleted
9 6 3 0
0
Figure 7.5
1
2
3
4
Time/104 s
5
6
7
8
The result of fault channel detection for the CHAmP satellite on February 2, 2006.
to the same magnitude of deviation to the all three types of code observations simultaneously. The deviation cannot be recognized only by a single channel data preprocessing [12]. This kind of deviation is mainly caused by the CHAMP satellite receiver software, and it happens twice on an average in the daily 400 continuous tracking arcs [6,12]. 7.2.3 Carrier Phase Outliers Removal and Cycle Slip Detection
High-precision GPS applications need to use the phase measurement information, but there exists the impact of the cycle slips and ambiguities. In the high dynamic environment of the spaceborne GPS processing, the probability of the occurrence of cycle slips of the receiver is larger than on the ground-based ones, and the speedy movement of the small satellite makes it more difficult to detect cycle slips. During static measurements on the ground, the receiver can lock the tracking to the same GPS satellite with longer continuous time period, and the signal changes slowly, which leads to easier cycle slip detection. However, in the high dynamic spaceborne GPS
© 2012 by Taylor & Francis Group, LLC
462
M e A suReM en t DAtA M o D eLIn G
environment, the flight of the small satellite leads to a shorter lock time. For example, the ground-based measurements can last multiple hours continuously, while for the CHAMP satellite, the average tracking time is only 30 min. The signal changes rapidly and cycle slip detection becomes challenging. Especially, the detection of small cycle slips is a hot research issue in the application of spaceborne GPS. At present, for small cycle slips lower than five cycles, the higherdegree difference method, the polynomial fitting method described in Ref. 13, and other methods are not as applicable to the spaceborne environment as to the ground static measurement [14]. To improve the reliability of cycle slip detection, we can combine the application of several methods, such as the M–W (Melbourne– Wuebbena) combination epoch difference method, the ionospherefree ambiguity epoch difference method, the cumulative sum method, etc. [6,7]. The M–W combination is acquired by creating a difference between the wide-lane phase and the narrow-lane pseudocode:
7.2.3.1 M–W Combination Epoch Difference Method
N wj = N 1j − N 2j +
j j f 1ε Pj1 + f 2 ε Pj2 1 f 1ε L1 − f 2 ε L2 − λw f1 − f 2 f1 + f 2
(7.22)
where λw is the wide-lane wavelength, λw = c/( f 1 − f 2) ≈ 86.2 cm. The M–W combination eliminates the impact of the geometric distance, clock bias, and ionospheric delay, and it also increases the wavelength and becomes insensitive to the pseudocode noise. Consequently, it is commonly used in cycle slip detection. During one continuous tracking of the receiver to a certain GPS satellite j, if the carrier cycle slip does not occur, then N wj should remain unchanged. The discontinuity of N wj can be exploited to identify outliers and cycle slips in the carrier phases [13]. Set the threshold Tslip and calculate the epoch difference of N wj between adjacent moments. Then, at instant ti , the cycle slip detection conditions are N wj (ti ) − N wj (ti − 1 ) ≥ Tslip
© 2012 by Taylor & Francis Group, LLC
and
N wj (ti + 1 ) − N wj (ti ) < Tslip
(7.23)
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
463
And the outlier detection conditions are N wj (ti ) − N wj (ti − 1 ) ≥ Tslip and N wj (ti + 1 ) − N wj (ti ) ≥ Tslip
(7.24)
Suppose the magnitude of the observation noise of the dualfrequency P codes is σ(P). By ignoring the impact of the phase observation error, the following equation gives the transfer relationship from the pseudocode observation error to the magnitude of the N wj error:
( )
( )
1
j j 2 2 2 2 2 1 f 1 ⋅ σ P1 + f 2 ⋅ σ P2 j σ( N w ) = ≈ 0.83 × σ( P ) (7.25) 2 λ w ( f1 + f 2 )
It shows that to some extent the M–W combination has a suppressing effect on the pseudocode observation noise, but it is still influenced by the pseudocode observation accuracy. The capability of this method for small cycle slip detection is still limited. Assuming that the combination precision σ( N wj ) is one cycle, the corresponding detection level of 4σ is four cycles. If the observational data on L1 and L2 contain cycle slips of the same size at the same time, then the M–W combination method cannot work. The ionosphere-free combination is another combination often used in satellite orbit determination. Due to the greater impact of this type of cycle slips on data processing, we need to add a step of cycle slip detection based on the ionosphere-free combination other than the M–W combination method. Initially, we calculate the ionosphere-free ambiguity λ IF AIFj by j of the pseudocode using the ionosphere-free combination PIFj and LIF and the phase: 7.2.3.2 Ionosphere-Free Ambiguity Epoch Difference Method
j λ IF AIFj = LIF − PIFj − ε LjIF + ε PjIF
(7.26)
Then the discontinuity of λ IF AIFj can be used to identify outliers and cycle slips of the carrier phase. As the measurement noise of λ IF AIFj is about four times the magnitude of the M–W combination
© 2012 by Taylor & Francis Group, LLC
464
M e A suReM en t DAtA M o D eLIn G
value, this method can only be used to detect larger systematic deviation. 1
f 2 ⋅ σ 2 ( P1 j ) + f 22 ⋅ σ 2 ( P2j ) 2 σ(λ IF AIFj ) = 1 ≈ 3 × σ( P ) ( f 1 − f 2 )2
(7.27)
Owing to the slight impact of small cycle slips on the M–W combination offset, it is difficult to be detected only by the epoch difference between adjacent moments. Therefore, the idea of CUSUM (cumulative sum) algorithm is introduced [7,15,16]. In addition to the M–W combination, the “Q statistic” is constructed to eliminate the effect of unknown M–W combination mean value. And through the accumulation of Q statistic, the small offset quantities of a number of epochs after the cycle slip occurrence are added up in order to amplify its effect and to improve the sensitivity of small cycle slip detection. The construction of Q statistic is [17]: 7.2.3.3 Cumulative Sum Method
(i)
Qi = i − 1 N wj (ti ) =
1 i
i
∑ k =1
1/ 2
j
[ N wj (ti ) − N w (ti − 1 )], i = 2, 3,…, n
N wj (t k ), i = 1, 2,…, n
(7.28)
When a cycle slip occurs, the process mean values of the M–W combination will shift. Assume that the mean of the original process begins to have a shift δσ = μ1 – μ0 from the m + 1-th observations, then the
new statistics Qi will have a shift of mm+1 δσ at m + 1. We can use Qi as the cumulative sum to detect the process mean value continuously. Ci+ = max{0,Qi + Ci+− 1 − K } Ci− = max{0, −Qi + Ci−− 1 − K }, i = 2, 3, . . ., n
(7.29)
The initial values are C1+ = 0 and C1− = 0. Since we do not know whether the cycle slip is positive or negative, the bilateral CUSUM test should be performed in the detection. The upper unilateral
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
465
cumulative sum is used to inspect the increase of the series mean, while the lower cumulative sum can detect the decrease of the mean. As long as Ci+ > H (C k+ ≤ H , k = 1, 2, . . ., i − 1) or Ci− > H (C k− ≤ H , k = 1, 2, ..., i − 1) , the alarm is triggered and a cycle slip is supposed to happen. Denote the instant with the last zero cumulative sum as τ; then τ + 1 is considered to be the possible starting point of the shift in the process. We can say that cycle slip has happened at an epoch within the interval of [τ + 1, i], and τ + 1 is the most likely instant. The steps to detect small cycle slips with the CUSUM method are as follows: Step 1. Calculate iteratively the standard deviation σ of the M–W combination series by using the epoch difference of the series. Step 2. Determine the parameters for the CUSUM algorithm: K = δσ/2, H = 5σ. Step 3. Construct the Q statistic based on the M–W combination series. Step 4. Detect the small cycle slips by using the CUSUM test statistics constructed by the Q statistic. We use the CHAMP satellite data to verify the effectiveness of the CUSUM method to detect small cycle slips. Figure 7.6 is the M–W combination series N w2 (1),…, N w2 (173) of GPS satellite PRN02 for the continuous tracking arc from 9:27:00 to 9:55:40. We add an artificial cycle slip with the magnitude of one cycle at the 100th epoch, and then a new M–W combination series can be obtained as N w2 (1),…, N w2 (173) . After calculation of the standard deviation of the M–W combination series, we get σ ≈ 0.348. Calculate the parameters K = δσ/2 = 1/2, H = 5σ ≈ 1.740. Then calculate the Q statistic Qi and the upper unilateral cumulative sum Ci+. Figure 7.7 shows the results of cycle slip detection. We can see that the upper unilateral cumulative sum at the + 103rd epoch C103 = 1.899 > H and it sends a warning signal (shown in the figure as “Δ,” and 2—normal, −2—alarm). The last cumulative sum Ci+ = 0 before the alarm is at the epoch of 99, which means a cycle slip happens somewhere between 100 and 103, and the most likely situation is that it occurs at the 100th epoch.
© 2012 by Taylor & Francis Group, LLC
466
M e A suReM en t DAtA M o D eLIn G
–28709979 –28709979.5 –28709980 –28709980.5 –28709981 –28709981.5 –28709982
Figure 7.6
22
0
20
40
60
80
100
120
140
160
180
The m–W combination series of the GPs satellite PRN02 for the CHAmP satellite.
Q statistic variable The upper side cumulative-sum Alarm signal
20 18 16 14 12 10 8 6 4 2 0 –2
0
Figure 7.7
20
40
60
80
100
120
140
160
180
The cycle slip detection result for the CHAmP satellite using the CUsUm method.
7.2.4 Data Preprocessing Flow
Step 1. Apply the signal-to-noise ratio threshold method. Set the SNR threshold to exclude the low SNR observational data. Step 2. Apply the ionospheric delay threshold method. Set the threshold of ionospheric delay Imax, by judging if PI j is beyond
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
467
the tolerance to remove some relatively large outliers in the pseudocode data. Step 3. Apply the ionospheric delay fitting residual method. Fit the ionospheric delay by using the robust Vondrak filter, and remove some relatively small outliers in the pseudocode data according to the magnitude of the fitting residuals. Step 4. Apply the RAIM algorithm. Examine the consistency among the multichannel pseudocode data at the same moments, identify the fault channel, and output the results of the single point geometric positioning. Step 5. Perform the Lagrange interpolation to the positioning results of adjacent moments and calculate the matching velocity. Calculate the observational elevation angles and Azimuth angles of visible GPS satellites. Step 6. Apply the M–W combination epoch difference method and the ionosphere-free ambiguity epoch difference method to detect the relatively large cycle slips in the phase data. Step 7. Apply the CUSUM method to detect the relatively small cycle slips in the phase data. Step 8. Output the observational data files after preprocessing, including the raw data, the preprocessing flags, the user satellite approximate orbital positions and velocities, the single point positioning geometric precision factors, and the lines of sight information. 7.3 Orbit Determination by Zero-Difference Reduced Dynamics
Compared with the double-difference method, the zero-difference method does not need observations from ground tracking stations to construct double-difference observational equations, but needs the support of accurate positions of GPS satellites and clock correction data [18]. The method has been successfully verified in CHAMP, GRACE, and other LEO satellite applications. In recent years, with the improvement of the precision of GPS ephemeris, the zero-difference method has been able to achieve precision comparable with the double-difference method, and has been widely studied and applied. Depending on whether the information of orbit dynamic force model is used, methods of orbit determination can be divided into
© 2012 by Taylor & Francis Group, LLC
468
M e A suReM en t DAtA M o D eLIn G
kinematic methods and dynamic methods. The spaceborne dualfrequency GPS has the advantages of full coverage, continuity, etc. It attains not only kinematic orbits [5,19] but also dynamic orbits [20]. In contrast, SLR and DORIS can only be used for dynamic orbit determination in general. Kinematic methods determine orbit positions of satellites through the geometric intersection principle using pseudorange and phase observations. Compared with dynamic methods of orbit determination, kinematic methods of orbit determination do not have constraints from models of orbit mechanics and cannot predict orbits. However, it does not have many parameters in orbit dynamic models to solve and does not have many numerical integrals to compute either. Therefore, the computational workload of kinematic methods is small and their speed is relatively fast. Kinematic methods of orbit determination are particularly sensitive to erroneous measurements, unfavorable viewing geometry, and data outages, which sometimes restrict their value in practice. Dynamic methods of orbit determination, in contrast, constrain the position estimates of satellites using models of orbit mechanics. This smoothens measurement data at various time points and improves the precision of orbit determination. Moreover, orbit integrals can cross accidental gaps in tracking data and predict orbits through extrapolation. Dynamic methods of orbit determination take full consideration of various perturbation models of forces during the flying process of satellites. However, it is currently difficult to model all perturbations that LEO satellites have. Satellites at different altitudes or in different shapes bear different perturbations. The computation of perturbations like solar radiation pressure and atmospheric resistance is complicated. In order to overcome such disadvantages, Thomas et al. [21] integrated merits of both kinematic and dynamic methods and proposed the method of reduced dynamics. The method of reduced dynamics solves many parameters in empirical acceleration using globally continuous coverage and precise GPS observations. It reflects subtle fluctuations of errors in dynamic models in terms of empirical acceleration by reducing the dependence of the method of orbit determination on the precision of dynamic models. Currently, the method of reduced dynamics is adopted in most orbit determination of LEO using spaceborne dual-frequency GPS [22−24].
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
469
7.3.1 Observational Equations and Error Correction
Observational equations use pseudocodes, and dual-frequency ionosphere-free (subscript “IF”) combination observations are always adopted. PIFj =
f 12 f 22 j P − P j = ρ j + c (δt r − δt j ) + ε PjIF 2 2 1 2 2 2 f1 − f 2 f1 − f 2
f 12 f 22 j L = 2 L − 2 L2j = ρ j + c (δt r − δt j ) + λ IF AIFj + ε Lj IF 2 1 2 f1 − f 2 f1 − f 2 j IF
(7.30) where P IF is an observation of ionosphere-free combinational pseudocodes, LIF is an observation of ionosphere-free combinational phase, f i is the ith carrier frequency, ρ j is the true geometric distance from an LEO satellite to the jth GPS satellite, δtr is receiver clock correction, δt j is GPS satellite clock correction, λIF AIF is the fuzziness of an ionosphere-free combinational phase, and εPIF and εLIF are sums of observational errors of two combinations and modeling errors. Signals received by a receiver at time t are actually sent by the jth GPS satellite at time t – τ j, where τ j is the propagation delay of signals. The geometric range is ρ j (ti , τ j ) = r (ti ) − rGj (ti − τ j )
(7.31)
j j where r(ti) is the position of an LEO satellite and rG (ti − τ ) is the position of a GPS satellite, and τ j can be obtained by iterative calculations. The kth iterative equation is τ kj = ρ j t , τ kj − 1 /c, where c is the speed of light. When the initial value τ 0j = 0 , the iteration converges under the condition of τkj − τkj −1 ≤ 1e − 7. The iteration process usually converges after two steps. The International GNSS Service (IGS) currently provides customers GPS data of precise ephemeris and clock corrections. The precision is better than 5 cm where the interval of clock error products is 30 s and the ephemeris interval is 15 min. Linear interpolations can be used for clock error products and 8-order moving Lagrange interpolations can be used for precise ephemeris
© 2012 by Taylor & Francis Group, LLC
47 0
M e A suReM en t DAtA M o D eLIn G
products. IGS precise ephemerides and clock errors can be downloaded from http://igscb.jpl.nasa.gov/components/ prods_cb.html. 7.3.1.1 Relativity Adjustments
∆ρrel = −
The relativity adjustment formula is
2 GM ⋅ a 2 ⋅ e ⋅ sin E = − (x ⋅ x + y ⋅ y + z ⋅ z ) c c
(7.32)
where GM is the constant of Earth’s gravity, c is the speed of light, a is the semimajor axis of a satellite’s orbit, e is the orbit eccentricity, and E is the orbit partial anomaly. For GPS satellites, the relativity influence can sometimes be around 15 m and cannot be ignored (see Figure 7.8). Although the relativity influence is around 0.5 m for CHAMP satellites (see Figure 7.9), it is ignored in orbit determination. The relativity adjustment of an LEO satellite is absorbed by parameter estimation in clock corrections of a receiver. The reference point of ephemeris of a GPS satellite is the mass center of the satellite. The reference point of the observational data is the phase center of transmitting antennas. Therefore, it is necessary to correct offset for the phase center of transmitting antennas (Table 7.1). The antenna center offsets, x, y, z, are defined in the GPS satellite body coordinate system (see Figure 7.10) [25], where the z-axis of the
7.3.1.2 Antenna Offset Corrections for GPS Satellites
Relativity correction/m
20
GPS satellite PRN02
10
0
–10
–20
0
1
2
3
4 Time/104
Figure 7.8
Relativity adjustments for GPs satellites.
© 2012 by Taylor & Francis Group, LLC
5 s
6
7
8
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
CHAMP satellite
0.6 Relativity correction/m
471
0.4 0.2 0 –0.2 –0.4 –0.6 –0.8
Figure 7.9
0
1
2
3
4
5
Time/104 s
6
7
8
Relativity adjustments for CHAmP satellites.
Table 7.1 Antenna Phase Center offset for GPs satellites in Their Body Coordinate system Block II/IIA Block IIR
x/m
y/m
z/m
0.279 0.000
0.000 0.000
1.023 0.000
satellite always points toward the center of the Earth, the y-axis is perpendicular to the vector between the Sun and the GPS satellite, and the x-axis completes the right-handed coordinate system. During the motion of the satellites, the z-axis is always pointing to the Earth, and the y-axis shall be kept perpendicular to the solar vector. The solar panel can be rotated around its axis to keep it perpendicular to the ray of the Sun for optimal collection of solar energy. x
y
z
Figure 7.10 The definition of body coordinate system for GPs satellite.
© 2012 by Taylor & Francis Group, LLC
47 2
M e A suReM en t DAtA M o D eLIn G
The axis vector of the GPS satellite body coordinate system, ex, ey, ez, can be calculated in the J2000 Earth inertial coordination from the following formula ez = −
rG e × rs , ey = z , ex = e y × ez rG ez × rs
(7.33)
where rG is the position of the GPS satellite and rs is the position of the Sun provided by JPL solar system ephemeris. Then, the vector of antenna offset correction for a GPS satellite is ∆rG = x ⋅ ex + y ⋅ e y + z ⋅ ez 7.3.1.3 Antenna Offsets for LEO Satellites
by antenna offset of an LEO satellite is ∆ρant =
(7.34)
The range correction caused
rG − r ⋅ M ⋅ ∆rbody rG − r
(7.35)
where r and rG respectively denote positions of an LEO satellite and a GPS satellite in the J2000 Earth inertial coordinate, and Δr body is the vector of antenna offset of an LEO satellite in body coordinate system (Table 7.2). MJ2000 is the rotation matrix from a body coordinate to a J2000 Earth inertial coordinate and can be calculated by attitude quaternion q q = = q1 q 4
q2
q3
T
q4
Table 7.2 Antenna Phase Center offset for CHAmP and GRACE satellites in the Body Coordinate system CHAmP GRACE-A GRACE-B
© 2012 by Taylor & Francis Group, LLC
x/m
y/m
z/m
−1.4880 0.0004 0.0006
0.0000 −0.0004 −0.0008
−0.3928 −0.4140 −0.4143
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
473
where q1, q2, q3 are vector components, and q4 is the scalar quantity component for the quaternion. Matrix MJ2000 can be expressed as M J 2000 = q 2 − q 2 − q 2 + q 2 2 3 4 1 2 (q1q 2 − q3q 4 ) 2 (q1q3 + q 2 q 4 )
2 (q1q 2 + q3q 4 )
−q12 + q 22 − q32 + q 42 2 (q 2 q3 − q1q 4 )
2 (q1q3 − q 2 q 4 )
T
2 (q 2 q3 + q1q 4 ) −q12 − q 22 + q32 + q 42
(7.36) 7.3.2 Parameter Estimation of Orbit Models
At the same time of introducing constraints of high-precision dynamic models, the reduced dynamic method of orbit determination utilizes three empirical acceleration components to compensate perturbations that cannot be modeled by deterministic models in orbit dynamics and estimates these components as well as other state variables as unknown parameters. Since the atmospheric density and solar activity are difficult to be modeled accurately, coefficients of atmospheric drag and solar radiation pressure are estimated in the solution process. The method of least squares estimation is adopted in the solution process. Now, parameters to be estimated include cδti, the receiver clock correction at time point ti; bj, the fuzzy parameter of ionosphere-free phase of a GPS satellite at each continuous tracking arc; and a satellite’s six-dimensional initial position and velocity vectors, r (t 0 ) y0 = y ( t 0 ) = ν(t 0 )
(7.37)
a solar pressure coefficient, CR; an atmospheric drag coefficient, CD; and na + 1 piecewise linear empirical accelerations aj . When t is in the jth interval [t 0 + j ⋅ τ, t 0 + ( j + 1) ⋅ τ] for j = 0, 1, 2, . . ., n, empirical acceleration vector a(ti) can be written as [26] a(ti ) =
© 2012 by Taylor & Francis Group, LLC
t 0 + ( j + 1) ⋅ τ − t t − t0 − j ⋅ τ aj + a j +1 τ τ
(7.38)
474
M e A suReM en t DAtA M o D eLIn G
Consider classifying parameters to be estimated into three groups: nT parameters of receiver clock correction, T = (c δt 0 ;…; c δtn T − 1 )T
(7.39)
nY parameters of orbit models, Y = ( y0T ;CR ;C D ; a0 ;…; ana )T
(7.40)
and nB fuzzy parameters of ionosphere-free phase, B = (b0 ;…; bnB − 1 )T
(7.41)
If parameters of orbit models Y can be calculated, the orbit state of a satellite at time ti, say y(ti), can be obtained by integrating orbital motion equations. Let hi (T, Y, B) be observational equations at time t and the linear expansion of estimation parameters at initial values (T *, Y *, B *) be T = T ∗ + ∆T , Y = Y ∗ + ∆Y , B = B ∗ + ∆B
(7.42)
then the LS estimates of ΔY, ΔT, and Δ B are ∂h ∗ ∂ T , Y ∗ , B ∗
(
T
)
∂h W ∗ ∂ T , Y ∗ , B ∗
∂h = ∗ ∂ T , Y ∗ , B ∗
(
(
)
∆T ∆Y ∆B
T
)
W ( z − h(T ∗ , Y ∗ , B ∗ ))
(7.43)
where z is a vector of observational data and W = Qz−1 is the inverse of observed weight matrix. If both phase and pseudocode observations are equally weighed, the role of phase observations is reduced in the process of solving parameters and high precise information on phases cannot be used. However, since phase observations are fuzzy, the structure of parameter estimation is unstable if only phase observations are used. Here is a weighing strategy. Phase and pseudocode
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
47 5
observations with elevation angles less than 5° weigh 0 and do not participate in orbit determination; phase observations with elevation angles larger than 5° weigh 40 while pseudocode observations with elevation angles larger than 5° weigh 1. Let hi j be the observational equation of the jth GPS at time ti. Then, the partial derivative of hi j with respect to the vector of clock offsets T is ∂hi j = (00 ;…; 0i − 1 ; 1i ; 0i + 1 ;…; 0n T − 1 )T ∂T
(7.44)
the partial derivatives of hi j with respect to the vector of fuzzy parameters B are ∂hi j = (00 ;…; 0 k − 1 ; 1k ; 0 k + 1 ;…; 0nB − 1 )T ∂B
(7.45)
where k corresponds the position of fuzzy parameter; the partial derivatives of hi j with respect to the vector of orbit model parameters are ∂h j T ∂h j ∂h j ∂h j ∂hi j ∂hi j i i i = i ; ; ; ; ; ; 0m + 1 ;; 0na − 1 ∂Y ∂C R ∂C D ∂a 0 ∂ am ∂y0
T
(7.46) If the vector of orbit model parameters to be estimated is divided into two parts, one part is the vector of initial state parameters, say y 0, and the other is the vector of remaining orbit model parameters, say P, then ∂hi j / ∂y0 ∂hi j = j ∂Y ∂hi / ∂P
(7.47)
The partial derivatives of hi j with respect to the vector of current position and velocity of satellite orbit are ∂hi j = (e j (ti )1 × 3 , 01 × 3 )T ∂y( t i )
© 2012 by Taylor & Francis Group, LLC
(7.48)
47 6
M e A suReM en t DAtA M o D eLIn G
where e j is the vector of line sight from the LEO satellite to the jth GPS satellite. By the chain rule of derivatives for composed functions, we have ∂hi j ∂y(ti ) ∂y ( t ) ⋅ ∂y ∂hi j 0 i = j ∂ y t ( ∂ h ∂Y i ) i ∂y(t ) ⋅ ∂P i
(7.49)
where partial derivatives ∂y(ti)/∂y 0 and ∂y(ti)/∂P can be obtained by integrating variation equations of orbits. If the design matrix is (HT, HY, HB), the least squares estimate can be written as H T WH T T T H Y WH T T H B WH T
H TT WH Y
H YT WH Y H BT WH Y
H TT WH B H YT WH B H BT WH B
∆T ∆Y ∆B
H TT W (z − h(T ∗ , Y ∗ , B ∗ )) T ∗ ∗ ∗ = H Y W (z − h(T , Y , B )) T ∗ ∗ ∗ H B W (z − h(T , Y , B ))
(7.50)
which can be further reduced to N TT N YT N BT
N TY N YY N BY
N TB ∆T nT N YB ∆Y = nY N BB ∆B nB
(7.51)
Although the dimension of NTT is relatively large, NTT is a diagonal matrix [8] and the inverse of NTT can be easily obtained (see Figure 7.11). If the parameters of the orbit model, Y, and those of fuzziness, B, are combined into X, Equation 7.51 becomes N TT N XT
© 2012 by Taylor & Francis Group, LLC
N TX ∆T nT = N XX ∆ X nX
(7.52)
47 7
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
NTT
NTY
NYY
NTB
NYB
NYT NBT
Figure 7.11
NBY
NBB
The structure of coefficient matrix for reduced dynamic orbit determination.
Therefore, the least squares estimates of ΔX and ΔT are −1 −1 N TX )−1 (nX − N XT N TT nT ) ∆ X = ( N XX − N XT N TT
∆ T = N
−1 TT
(nT − N TX ∆ X )
(7.53)
After having obtained the updates for the initial estimates, the newly obtained values are now used as initial values for a second run. Multiple Gauss–Newton iterations of this kind are required to cope with the nonlinearity of the reduced dynamic estimation problem, and convergence is typically achieved within three to four iterations. See Figure 7.12 for the flowchart of iteration estimation for dynamic orbit model parameters. 7.3.3 Dynamic Orbit Models and Parameter Selections
In the geocentric inertial coordinate system, the dynamic orbit equation of a satellite can be written as follows: r = −
© 2012 by Taylor & Francis Group, LLC
GM r + a(t , r , r, P ) r3
(7.54)
478
M e A suReM en t DAtA M o D eLIn G Initial dynamic orbit parameters of LEO Y* = (y0T; CR; CD; a0; …; an –1) = (y0T; 0; 0; 0; …; 0) a
Tide perturbations Third body gravity Atmospheric drag Solar radiation Relativity Empirical acceleration
Update orbit parameters Y* = Y * + ΔY
Dynamic force model
Two body Non-spheric perturbations
No
Observations ~ z
Integration of orbit motion equation
y(ti)
Observations minus computation z~– h(Y*)
Integration of orbit variational equation
∂y(ti) ∂Y
Normal equation coefficient ∂hi/∂Y
Observation equation
∂hi ∂y(ti)
The updates for orbit parameters ΔY
Converge? Yes End
Figure 7.12
The flowchart of iteration estimation for dynamic orbit model parameters.
r are where GM is the gravitational constant of the Earth; r, r , and position, velocity, and acceleration of a satellite, respectively; P is the vector of dynamic orbit parameters; and a is the sum of all perturbation accelerations other than the two-body center acceleration of gravity. a = aNS + aNB + a TD + aD + aSR + aRL + aRTN
(7.55)
where aNS is the acceleration of the Earth nonspherical perturbation; aNB is the third body gravitational perturbation acceleration, including the Sun and the Moon; a TD is the acceleration of tidal perturbation, mainly consisting of solid tide and the ocean tide; aD is the atmospheric drag perturbation acceleration; aSR is the acceleration of solar pressure perturbation; aRL is the acceleration of relativity perturbation; and aRTN is the empirical acceleration. In the geocentric inertial coordinate system, the Earth nonspherical gravitational potential V can be represented in the form of spherical harmonics [27,28]
7.3.3.1 Earth Nonspherical Perturbation
V (r , ϕ, λ) = GM r l = 2 ∞
l RE r Plm ( sin ϕ ) × C lm cos mλ + Slm sin mλ m =0 l
∑∑
(7.56)
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
479
where Plm is the fully normalized associated Legendre functions; R E is the mean radius of the Earth; r is the radius distance from the satellite to the center of the Earth; and λ and φ are the geocentric longitude and latitude of the satellite, respectively. C lm and Slm are the spherical harmonic coefficients corresponding to Plm. GM RE C l ,0 Pl ,0 (sin ϕ) r l = 2 r l
∞
V =
∑
∞
+
l RE ) × [ C cos m λ + S sin m λ ] P (sin ϕ) lm lm lm r m =1 l
∑∑ l =2
(7.57)
Then, the acceleration of the Earth nonspherical perturbation aNSB can be expressed as
aNSB
∂ V / ∂ x ∂V / ∂r ∂V = = ∂V / ∂y = M ⋅ ∂V / ∂ϕ ∂r ∂V / ∂z ∂V / ∂λ
(7.58)
where x r y M = r z r
− −
xz
r 2 x2 + y2 yz
r 2 x2 + y2 x2 + y2 r2
∞ ∂V GM =− 2 (n + 1) ⋅ Cn,0 r r n= 2
∑
∞
+
n
∑ ∑ (n + 1) ⋅ P n = 2 m =1
© 2012 by Taylor & Francis Group, LLC
nm
y x + y2 x 2 2 x + y 0
−
2
(7.59)
n
R ⋅ E ⋅ Pn,0 (sin ϕ ) r
(sin ϕ ) ⋅ Tnm
(7.60)
480
M e A suReM en t DAtA M o D eLIn G
∂V = ϕ GM C n ,0 r n= 2 ∞
∑
R ∂Pn,0 (sin ϕ ) + ⋅ E ⋅ r ∂ϕ n
∞ ∂V GM = r n= 2 λ
n
∑∑ n = 2 m =1
∂Pnm (sin ϕ ) ⋅ Tnm ∂ϕ (7.61)
∂Tnm ∂λ
(7.62)
R = E Cnm cos mλ + Snm sin mλ r
(7.63)
n
∑∑
Tnm
∞
m =1
Pnm (sin ϕ ) ⋅
n
n
∂Tnm R = m E Snm cos mλ − Cnm sin mλ r ∂λ
(7.64)
The partial derivative of aNSB with respect to position vector r is given by
∂aNSB ∂r
∂ 2V ∂x ∂x 2 ∂V = ∂y ∂x ∂ 2V ∂z ∂x
∂ 2V ∂x ∂y ∂ 2V ∂y ∂y ∂ 2V ∂z ∂y
∂ 2V ∂x ∂z ∂ 2V ∂y ∂z ∂ 2V ∂z ∂z
(7.65)
The gravitational effect of the Sun, the Moon, and other planets on the satellite motion is called the third body gravity. The third body gravitational perturbation acceleration aNB can be expressed as
7.3.3.2 Third Body Gravitational Perturbations
r′ ∆ aNB = −GM ′ 3 + 3 ∆ r′
(7.66)
where GM′ is the gravitational constant, r′ represents the position vector of the third body, such as the Sun and the Moon, Δ = r – r′,
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
4 81
and r represents the position vector of the satellite. The partial derivatives of acceleration aNB with respect to the satellite position vector r are represented by
∂aNB ∂r
3∆ x2 ∆2 − 1 GM ′ 3∆ x ∆ y = ∆3 ∆ 2 3∆ x ∆ z ∆2
3∆ x ∆ y ∆ 3∆
2 2 y 2
−1 ∆ 3∆ y ∆ z ∆2
3∆ x ∆ z ∆2 3∆ y ∆ z ∆22 3∆ z − 1 2 ∆
(7.67)
Tide perturbations mainly include solid tides and ocean tides. More details are given in IERS Conventions 1996 [29].
7.3.3.3 Tide Perturbations
Since the atmospheric effect is significant at the LEO altitude, the atmospheric drag force is the main perturbation. The atmospheric drag perturbation acceleration aD can be expressed as 7.3.3.4 Atmospheric Drag Forces
1 A aD = − C D D ρDvr νr m 2
(7.68)
where CD is the atmospheric drag coefficient, which is an unknown parameter to be estimated in the orbit determination procedure; AD is the area of the surface where the satellite flying against the atmosphere moves; m is the mass of the satellite; ρD is the atmospheric density at satellite position; vr represents the relative velocity of the satellite with respect to the atmosphere and based on the assumption that the atmosphere is rotating with the Earth, and vr can be expressed in the J2000 inertial reference frame as 0 νr = ν − ω ⊗ r = ν − ω 0
© 2012 by Taylor & Francis Group, LLC
−ω 0 0
0 0 ⋅ r = ν − Ω ⋅ r 0
(7.69)
482
M e A suReM en t DAtA M o D eLIn G
where v is the satellite velocity vector, ω = [0, 0, ωE], and ωE is the Earth’s angular velocity. The partial derivatives of acceleration aD with respect to vector r, v and the drag coefficient C D are represented by ∂a D ∂a ∂(vr νr ) 1 A = D = − C D D ρD ⋅ ∂ν ∂ νr ∂ νr 2 m ∂a D 1 A ∂ρ = − C D D v r νr ⋅ D ∂r ∂r 2 m
(7.70)
T
∂ (v r ν r ) ∂ ν r a 1 A ⋅ = D − C D D ρD ⋅ m ∂ νr ∂r ρD 2
T
∂a ∂ρ ⋅ D − D ⋅Ω ∂r ∂ν (7.71)
∂ (v r ν r ) ν ∂a D 1 AD = − ρDvr νr ; = r νr′ + vr ⋅ I ∂C D 2 m ∂ νr vr
(7.72)
where I is a 3 × 3 identity matrix. 7.3.3.5 Solar Radiation Pressures The simple “cannonball” model is generally used, and then the solar radiation pressure perturbation acceleration generated by the direct light from the Sun is expressed as
aSR = − F ⋅ ρSR ⋅
AS ⋅ CR m
2
A ∆ ⋅ u ⋅ S ∆S ∆S
(7.73)
where F is shadow factor, 0 ≤ F ≤ 1; ρSR = 4.5604 × 10− 6 N/m; AS is the area of the surface where the satellite faces the Sun; m is the mass of the satellite; CR is a scaling factor for the solar radiation pressure that is relevant to the material of exposed satellite surface, and is an unknown parameter to be estimated in the orbit determination procedure; Au = 1.49597870 × 1011 m is an astronomical unit; ΔS = r S – r represents the vector from the satellite to the Sun; and r S is the Sun position vector. The partial derivatives of acceleration aSR with respect
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
483
to the Sun position vector r and the solar radiation scaling factor CD are given by ∂aSR = ∂r
− F ⋅ CR ⋅ ρSR
2 3∆Sx 2 −1 ∆S 2 3∆Sx ∆Sy A A ⋅ S ⋅ u3 ⋅ m ∆S ∆2 3∆ S∆ Sx Sz ∆S2
∂aSR A = − F ⋅ ρSR ⋅ S m ∂C R
3∆Sx ∆Sy
∆ 2 3∆Sy
2 S
−1 ∆S2 3∆Sy ∆Sz ∆S2
2 ∆S 2 3∆Sz − 1 ∆S2 3∆Sx ∆Sz ∆S2 3∆Sy ∆Sz
2
A ∆ ⋅ u ⋅ S ∆S ∆S
(7.74) (7.75)
Only the Schwarzschild item is considered, and the acceleration of relativity perturbation aRL can be expressed as 7.3.3.6 Relativity Perturbations
aRL =
GM c 2r 3
( )
GM ⋅ 4 − v 2 ⋅ r + 4ν ⋅ r ⋅ ν r
(7.76)
where r, v are satellite orbit position vector and velocity vector, and the partial derivatives of acceleration aRL with respect to vector r and v are represented by 2
∂aRL 4 GM 4 GM i , j ) = − 2 3 ri r j + 2 3 vi v j ( ∂r c r c r
(7.77)
∂aRL 2 GM 4 GM i , j ) = − 2 3 ri v j + 2 3 vi r j ( ∂ν c r c r
(7.78)
Empirical force is used for compensating deficiencies in the applied dynamical models, and the empirical acceleration vector aRTN can be expressed as
7.3.3.7 Empirical Forces
aRTN = aR eR + aT e T + aNeN
© 2012 by Taylor & Francis Group, LLC
(7.79)
484
M e A suReM en t DAtA M o D eLIn G
where eR, e T, and eN are unit vector and denote the satellite radial, along-track, and cross-track flying direction, respectively. eR =
r ×ν r , e T = N × R , eN = |r| | r × ν|
(7.80)
The empirical accelerations are considered to be piecewise linear in predefined subintervals, and the entire data arc is divided into na intervals of equal duration τ and an independent set of empirical acceleration parameters (aR, aT, aN)j is estimated for the entire data arc, j = 0,1,. . .,na. Intervals of 1800s duration have been selected in this chapter. Since the radial direction is dynamically coupled with the alongtrack direction, it is not modeled in the orbit determination process to avoid the ill-conditioning of the system [30]. Therefore, only the along-track and cross-track components of empirical accelerations are estimated together with other parameters. The dynamic orbit models and parameters selection is listed in Table 7.3, where the Earth rotation parameter file can be downloaded from http://hpiers. obspm.fr//iers//eop//eopc04_05, the NOAA solar flux data can be downloaded from http://sgd.ngdc.noaa.gov/sgd/jsp/solarindex.jsp, and the geomagnetic activity index file can be downloaded from http://spidr.ngdc.noaa.gov/spidr/dataset.do. 7.3.3.8 Dynamic Orbit Models and Parameter Selections
Table 7.3 Dynamic orbit models and Parameters ITEm static gravity field solid earth tide Polar tide ocean tide Third body gravity solar radiation pressure Atmospheric drag Relativity Precession Nutation Earth orientation solar ephemerides
© 2012 by Taylor & Francis Group, LLC
DEsCRIPTIoN GGm02C 150 × 150 IERs96, 4 × 4 IERs96 CsR4.0 sun and moon Cannonball model, conical earth shadow, CR is estimated Jacchia 71 density model [NoAA solar flux (daily) and geomagnetic activity (3 hourly)], CD is estimated schwarzschild IAU1976 IAU1980 + EoPC correction EoPC04 JPL DE405
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
485
The coordinate systems are selected J2000 inertial reference frame, and the time systems are selected geodynamics time TDT. The orbit integration algorithm uses the Adams–Cowell multistep integration method. The orbit determination process is typically conducted in single-day (24 h) data batches [27,31]. 7.3.4 Re-Editing Observational Data
For spaceborne dual-frequency GPS observation preprocessing methods introduced in Section 7.2, only observation information was used, and restricted to the precision of pseudocode observations, the smaller gross errors and cycle slip detection became difficult. In order to improve the precision and reliability of orbit determination results, this section describes the GPS observation editing method, which is based on a prior reference orbit, IGS GPS ephemeris, and clock corrections data, reediting the raw observation data to further detect the remaining smaller gross errors and cycle slips [6,8]. At each epoch, the receiver position vector can be obtained precisely from the prior dynamic orbit. With n ≥ 2 observed ionosphere-free pseudoranges, the GPS receiver clock offset value is determined by
7.3.4.1 Re-Editing Pseudocode Data
1 c δt r ( t j ) = n
n
∑ (P i =1
i IF ,r
(t j ) − (ρir (t j ) − c δt i (t j )))
(7.81)
Then the associated residuals can be calculated by resri (t j ) = PIFi ,r (t j ) − (ρir (t j ) + c δt r (t j ) − c δt i (t j ))
(7.82)
Whenever the standard deviation of these residuals exceeds a predefined threshold (here taken as 2.5 m), the code observation that contributes the dominating error is identified and removed from the set of observations. If necessary, the process is repeated to reject multiple outliers at the same epoch. The method of reediting phase data is used for further detection of cycle slips. Since the carrier phase biases
7.3.4.2 Re-Editing Phase Data
© 2012 by Taylor & Francis Group, LLC
486
M e A suReM en t DAtA M o D eLIn G
are constant over time, a sudden jump can be detected by examining time-differenced carrier phase measurements between two consecutive measurement epochs, tj and tj−1. Instead of the receiver clock offset, the time difference of two consecutive clock offsets is determined. From the set of n ≥ 2 observations, an estimation of the time-differenced receiver clock offset is given 1 c δt r ( t j − 1 , t j ) = n
n
∑ (L i =1
i IF ,r
(t j − 1 , t j ) − (ρir (t j − 1 , t j ) − cδt i (t j − 1 , t j ))) (7.83)
and the associated receiver clock offset residuals of two consecutive measurement epochs tj , tj−1 can be calculated by resri (t j − 1 , t j ) = LiIF ,r (t j − 1 , t j ) − (ρir (t j − 1 , t j ) + c δt r (t j − 1 , t j ) − c δt i (t j − 1 , t j ))
(7.84)
whenever the standard deviation of these residuals exceeds a predefined threshold (here taken as 0.1 m), the carrier phase observation that contributes the dominating error is assumed to have experienced a cycle slip and is removed from the set of observations. Phase observation editing provides a robust cycle slip detection method. Although errors exist in the prior dynamic orbit position, most errors can be absorbed by receiver clock estimation, and the remaining small errors that are not absorbed can be compensated for by the standard deviation threshold. 7.3.5 The Flow of Zero-Difference Reduced Dynamic Orbit Determination
1. Preprocessing observations, giving the preprocessed observations and the geometric single point positioning results with pseudocode observations. 2. Fitting discrete positions in short arc with dynamic orbit model, giving estimations of initial dynamic orbit parameters. 3. Giving the coarse precise orbit product with zero-difference reduced dynamic method of orbit determination.
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S GPS ephemeris and clock data GPS antenna phase center offset
487
Earth orientation parameters TAI-UTC leap seconds
Observations preprocessing (include single point positioning) GPS observations LEO attitude data LEO antenna phase center offset
Preprocessed observations Coarse orbit position
Discrete positions fiting in short arc with dynamic orbit model
Earth gravity field Solar ephemerides Solar flux Geomagnetic index
Initial dynamic orbit parameters
Zero-difference reduced dynamic orbit determination
Medium precise orbit
Data editing and zero-difference reduced dynamic orbit determination
High precise orbit
Figure 7.13
The flowchart of zero-difference reduced dynamic orbit determination.
4. Reediting the raw observation data to further detect the remaining smaller gross errors and cycle slips. After that, reimproving estimations of orbit parameters with zerodifference reduced dynamic method of orbit determination, and giving the highly precise orbit product (Figure 7.13). 7.3.6 Analysis of Results from Orbit Determination
We develop the zero-difference reduced dynamic orbit determination program. Over the period from January 29 to February 4, 2006 (7 days) dual-frequency GPS observations from CHAMP, GRACE-A, and GRACE-B satellites were processed, and the RMS value of the O–C phase residual obtained from reduced dynamic orbit determination was only 1 cm or so (see Figures 7.14 through 7.16), which reflects the consistency of the applied models with the GPS observation data. The dual-frequency GPS observation data, attitude data, and science orbit of CHAMP, GRACE-A, and GRACE-B satellites are given by the GFZ center.
© 2012 by Taylor & Francis Group, LLC
488
M e A suReM en t DAtA M o D eLIn G
0.1 0.08
RMS = 0.0096 m
Phase O-C residual/m
0.06 0.04 0.02 0
–0.02 –0.04 –0.06 –0.08 –0.1 0
1
2
3
4 5 Time/104 s
6
7
8
Figure 7.14 The phase o–C residual of reduced dynamic orbit determination for CHAmP on February 2, 2006.
On comparison of GFZ science orbit and reduced dynamic orbit determination result for CHAMP, the RMS in R, T, and N position component are 3.23, 5.39, and 4.87 cm, and the RMS in three dimension is 7.99 cm (see Figures 7.17 and 7.18). On comparison of GFZ science orbit and reduced dynamic orbit determination results for GRACE-A, the RMS in R, T, and N 0.1 RMS = 0.0133 m
0.08 Phase O-C residual/m
0.06 0.04 0.02 0
–0.02 –0.04 –0.06 –0.08 –0.1 0
1
2
3
4 5 Time/104 s
6
7
8
Figure 7.15 The phase o–C residual of reduced dynamic orbit determination for GRACE-A on February 2, 2006.
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
489
0.1 RMS = 0.0123 m
0.08
Phase O-C residual/m
0.06 0.04 0.02 0
–0.02
–0.04 –0.06 –0.08 –0.1
0
1
2
3
4 5 Time/104 s
6
7
8
Figure 7.16 The phase o–C residual of reduced dynamic orbit determination for GRACE-B on February 2, 2006.
0.2 RMS = 0.0316 m
R (m)
0.1 0 –0.1 –0.2
0
1
2
3
0.2
5
6
7
8
6
7
8
6
7
8
RMS = 0.0489 m
0.1 T (m)
4
0 –0.1 –0.2
0
1
2
3
N (m)
0.2
4
5
RMS = 0.0461 m
0.1 0 –0.1 –0.2
0
1
2
3
4 5 T/104 s
Figure 7.17 Comparison between GFz science orbit and reduced dynamic orbit determination result for CHAmP on February 2, 2006.
© 2012 by Taylor & Francis Group, LLC
490
M e A suReM en t DAtA M o D eLIn G
10 8.69 cm 8
8.19 cm
Average RMS = 7.99 cm 8.65 cm 8.05 cm 7.43 cm
R T N RMS
7.79 cm
7.11 cm
6
4
2
0
2006-1-29
Figure 7.18 science orbit.
2006-1-30
2006-1-31
2006-2-1
2006-2-2
2006-2-3
2006-2-4
Daily mean square root of the CHAmP orbit position error when compared to GFz
position component are 2.14, 3.53, and 4.27 cm, and the RMS in three dimension is 5.95 cm (see Figures 7.19 and 7.20). On comparison of GFZ science orbit and reduced dynamic orbit determination results for GRACE-B, the RMS in R, T, and N position components are 2.02, 3.12, and 4.03 cm, and the RMS in three dimension is 5.49 cm (see Figures 7.21 and 7.22). The direction of T and N in dynamic orbit determination corresponds to the direction of atmospheric drag force, while it is difficult to model the atmosphere density accurately, so orbit determination error in the direction of T and N is larger than in the R direction. As the process of orbit determination using dynamic orbit model smoothing, the precision of orbit speed estimation is generally higher. On comparison of GFZ science orbit velocity and reduced dynamic orbit determination result for GRACE-A, the RMS in R, T, and N position component are 0.072, 0.074, and 0.045 mm/s, and the RMS in three dimension is 0.11 mm/s (see Figure 7.23). EXERciSE 7
1. In order to improve dual-frequency GPS observation preprocessing, many methods are often combined. Explain the distinction and relationship among different outliers and cycle slip detection methods.
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
4 91
R (m)
0.2 RMS = 0.0222 m
0.1 0
–0.1 –0.2
0
1
2
3
4
5
6
7
8
6
7
8
6
7
8
0.2 RMS = 0.0341 m
T (m)
0.1 0
–0.1 –0.2
0
1
2
3
4
0.2
RMS = 0.0358 m
0.1 N (m)
5
0
–0.1 –0.2
0
1
2
3
4
Time/104 s
5
Figure 7.19 Comparison between GFz science orbit and reduced dynamic orbit determination result for GRACE-A on February 2, 2006.
10
R T N RMS
Average RMS = 5.95 cm 8 6.95 cm 6.20 cm
6
6.16 cm
5.71 cm
5.41 cm
5.56 cm
5.69 cm
4
2
0
2006-1-29
Figure 7.20 science orbit.
2006-1-30
2006-1-31
2006-2-1
2006-2-2
2006-2-3
2006-2-4
Daily mean square root of the GRACE-A orbit position error when compared to GFz
© 2012 by Taylor & Francis Group, LLC
492
M e A suReM en t DAtA M o D eLIn G
R (m)
0.2 RMS = 0.0224 m
0.1 0
–0.1 –0.2
0
1
2
3
4
T (m)
0.2
5
6
7
8
6
7
8
6
7
8
RMS = 0.0267 m
0.1 0
–0.1
–0.2
0
1
2
3
4
N (m)
0.2
5
RMS = 0.0335 m
0.1 0
–0.1 –0.2
0
1
2
3
4
T/104 s
5
Figure 7.21 Comparison between GFz science orbit and reduced dynamic orbit determination result for GRACE-B on February 2, 2006.
8 7
6.19 cm
6
6.67 cm
R T N RMS
Average RMS = 5.49 cm
5.38 cm
5
5.65 cm 4.81 cm
4.83 cm
4.88 cm
4 3 2 1 0
2006-1-29
Figure 7.22 science orbit.
2006-1-30
2006-1-31
2006-2-1
2006-2-2
2006-2-3
2006-2-4
Daily mean square root of the GRACE-B orbit position error when compared to GFz
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
493
× 10–4
R (m/s)
5
RMS = 0.000072 m/s
0 –5
T (m/s)
0 × 10–4 5
1
2
3
4
5
6
7
8
6
7
8
6
7
8
RMS = 0.000074 m/s
0
–5 0 × 10–4
1
2
3
T (m/s)
5
4
5
RMS = 0.000045 m/s
0
–5 0
1
2
3
4 5 Time/104 s
Figure 7.23 Comparison between GFz science orbit velocity and reduced dynamic orbit determination result for GRACE-B on February 1, 2006.
2. Implement the Vondrak filtering algorithm, fitting the following measurement signals x − 0.35 x − 0.8 yi = 1.5 ⋅ φ i − φ i + ei 0.15 0.04 where φ (u ) =
−u 2 1 exp , xi ∈0, 1 , ei ∼ N (0, 0.052 ), 2 2π
i = 1,…, 100. 3. Explain why at least five visible satellites are needed in the RAIM fault detection algorithm, and why at least six visible satellites are needed in the RAIM fault exclusion algorithm? Implement the RAIM algorithm and perform a consistency
© 2012 by Taylor & Francis Group, LLC
494
M e A suReM en t DAtA M o D eLIn G
test with multiple-channel pseudocode observations from the CHAMP satellite. 4. Analyze and state the relationship between cycle slip detection capability of epoch difference M–W combination method and the size of the observation noise. Implement the epoch difference M–W algorithm and detect phase cycle slips for the CHAMP satellite. 5. Download the post science orbit of GRACE satellites from the GFZ center, implement the observation reediting algorithm, and detect outliers and cycle slips for GRACE satellites.
References 1. 2. 3. 4.
5.
6. 7. 8. 9. 10. 11. 12. 13.
The International Laser Ranging Service, http://ilrs.gsfc.nasa.gov. International DORIS Service, http://ids-doris.org. International GNSS Service, http://igscb.jpl.nasa.gov. Bertiger W. I., Bar-Sever Y. E., Christensen E. J., Davis E. S., Guinn J. R., Haines B. J., Ibanez-Meier R. W. et al. GPS precise tracking of Topex/Poseidon: Results and implications. Journal of Geophysical Research, 1994, 99(12): 24449–24464. Zuoya Zheng. Study and software implementation of GPS data pre-processing and onboard GPS kinematic Orbit determination. PhD dissertation, Shanghai: Shanghai Astronomical Observatory, Chinese Academy of Sciences, 2004 (in Chinese). Defeng Gu. The spatial states measurement and estimation of distributed InSAR satellite system. PhD dissertation, Changsha: National University of Defense Technology, 2009 (in Chinese). Yuwang Lai. Study and software implementation of satellite-borne dual frequency GPS data pre-processing. Master thesis, Changsha: National University of Defense Technology, 2009 (in Chinese). Remco Kroes. Precise relative positioning of formation flying spacecraft using GPS. PhD dissertation, Delft, the Netherlands, 2006. Defeng Gu, Lei Sun, Dongyun Yi. Robust Vondrak filter design and its application in gross error detection. Acta Armamentar, 2008, 29(6): 696– 700 (in Chinese). Jinzhong Mi, Yulin Li. Research on RAIM algorithm. Bulletin of Surveying and Mapping, 2001, 3: 7–9 (in Chinese). Yongchao Wang, Zhigang Huang. Research on receiver autonomous integrity monitoring augmented with improved clock bias model. Acta Electronica Sinica, 2007, 35(6): 1084–1088 (in Chinese). Oliver Montenbruck, Remco Kroes. In-flight performance analysis of the CHAMP blackJack GPS receiver. GPS Solutions, 2003, 7: 74–86. Jiyu Liu. GPS Satellite Navigation/Positioning Theories and Methods. Beijing: Science Press, 2007; pp. 319–329 (in Chinese).
© 2012 by Taylor & Francis Group, LLC
P R E C i S E O R B i T D E T E R M i N AT i O N O F S AT E L L i T E S
495
14. Tae-Suk Bae, Jay H. Kwon, Grejner-Brzezinska Dorota A. Data Screening and Quality Analysis for Kinermatic Orbit Determination of CHAMP Satellite. ION Technical Meeting, San Diego, 2002. 15. Quesenberry C. P. SPC Q charts for start-up processes and short or long runs. Journal of Quality Technology, 1991, 23(3): 213–224. 16. Quesenberry C. P. On properties of Q charts for variables. Journal of Quality Technology, 1995, 27(3): 184–203. 17. Xiaolong Pu. The SPC chart with parameters unknown. Chinese Journal of Applied Probability and Statistics, 2001, 17(4): 437–447 (in Chinese). 18. A. Jäggi, U. Hugentobler, H. Bock, G. Beutler. Precise orbit determination for GRACE using undifferenced or doubly differenced GPS data. Advances in Space Research, 2007, 39: 1612–1619. 19. Jiancheng Li, Shoujian Zhang, Xiancai Zou, Weiping Jiang. Precise orbit determination for GRACE with zero-difference kinematic method. Chinese Science Bulletin, 2009, 54(6): 2355–2362 (in Chinese). 20. Dongju Peng, Bin Wu. Zero-differenced and single-differenced precise orbit determination for LEO using GPS. Chinese Science Bulletin, 2007, 52(6): 715–719 (in Chinese). 21. Thomas P. Yunck, Sien-Chong Wu, Jiun-Tsong Wu, Catherine L. Thornton. Precise tracking of remote sensing satellites with the Global Positioning System. IEEE Transactions on Geoscience and Remote Sensing, 1999, 28(1): 108–116. 22. Švehla D., Rothacher M. Kinematic and reduced-dynamic precise orbit determination of low earth orbiters. Advances in Geosciences, 2003, 1: 47–56. 23. Oliver Montenbruck, Tom van Helleputte, Remco Kroes, Eberhard Gill. Reduced dynamic orbit determination using GPS code and carrier measurements. Aerospace Science and Technology, 2005, 9: 261–271. 24. Zhigui Kang, Byron Tapley, Srinivas Bettadpur, John Ries, Peter Nagel, Rick Pastor. Precise orbit determination for the GRACE mission using only GPS data. Journal of Geodesy, 2006, 80: 322–331. 25. Guochang Xu. GPS—Theory, Algorithms and Applications. Heidelberg: Springer Verlag, 2003; p. 97. 26. A. Jäggi, U. Hugentobler, G. Beutler. Pseudo-stochastic orbit modeling technique for low-Earth orbiters. Journal of Geodesy, 2006, 80: 47–60. 27. Jishen Li. Satellite Precise Orbit Determination. Beijing: Press of PLA, 1995; p. 40 (in Chinese). 28. Lin Liu. Orbit Theory of Spacecraft. Beijing: National Defense Industry Press, 2000 (in Chinese). 29. Dennis D. McCarthy. IERS conventions (1996). IERS Technical Note 21, Paris: Observatoire de Paris, 1996, 20–39. 30. Tianyi Huang. Adams-Cowel integrator with a first sum. Acta Astronomica Sinica, 1992, 33(4): 413–419 (in Chinese). 31. Tae-Suk Bae. Near real-time precise orbit determination pf low earth orbit satellites using an optimal GPS triple-differencing technique. PhD dissertation, America: Ohio State University, 2006.
© 2012 by Taylor & Francis Group, LLC
Appendix 1: Matrix Formulas in Common Use Here, we will briefly introduce the matrix formulas commonly used in this book. A1.1 Trace of a Matrix
Definition A1.1 Suppose A is an n × n matrix. The trace of A is defined as
tr( A ) =
n
∑a i =1
ii
Theorem A1.1 Suppose A and B are both n × n matrices and k is a constant. Then we have
tr( A τ ) = tr( A )
(A1.1) 497
© 2012 by Taylor & Francis Group, LLC
498
A P P en D I X 1: M At RI X F o RMuL A s
tr( A + B ) = tr( A ) + tr( B )
(A1.2)
tr( kA ) = ktr( A )
(A1.3)
tr( AB ) = tr( BA )
(A1.4)
tr( A τ B ) = tr( AB τ )
(A1.5)
A1.2 Inverse of a Block Matrix
Theorem A1.2 Suppose A and B are two block triangle block matrices of the order (n + m) × (n + m) A11 A = 0
A12 A11 B = A 22 A 21
0 A 22
where A11 is an invertible matrix of the order n × n while A22 is an invertible matrix of the order m × m. Then, A and B are both invertible, and A
B
−1
−1
−1 A11 = 0
−1 −1 − A11 A12 A 22 −1 A 22
−1 A11 = −1 −1 − A 22 A 21 A11
0 −1 A 22
Theorem A1.3 Suppose A is a block matrix of the order n + m
A11 A = A 21
© 2012 by Taylor & Francis Group, LLC
A12 A 22
(A1.6)
(A1.7)
A P P en D I X 1: M At RI X F o RMuL A s
499
where A11 is an invertible matrix of the order n × n while A22 is an invertible matrix of the order m × m. Define A11 = A11 − A12 A 22−1 A 21 and −1 A22 = A 22 − A 21 A11 A12 . If Ã11 and Ã22 are both invertible, then A is invertible | A | = | A11 | | A 22 | = | A22 | | A11 |
A
−1
−1 −1 −1 −1 A11 A12 A22 A 21 A11 + A11 = −1 A 21 A1−11 − A22
−1 −1 A12 A22 − A11 −1 A22
(A1.8)
(A1.9)
and
A
−1
−1 A11 = −1 −1 − A 22 A 21 A11
−1 A 22
−1 −1 − A11 A12 A 22 −1 −1 −1 A12 A 22 + A 22 A 21 A11
(A1.10)
note: Formulas (9) and (10) are both inverse formulas for block matrices. Because the inverse of a matrix is unique, the corresponding blocks of formulas (9) and (10) must be equal. Then, if we compare the top left blocks of the matrices in the two formulas, we will get a commonly used matrix inverse formula: −1 −1 −1 −1 ( A11 − A12 A 22 A 21 )−1 = A11 + A11 A12 ( A 22 − A 21 A11 A12 )−1 A 21 A1−11
(A1.11) Similarly, by comparing the top right blocks, we get another formula: −1 −1 −1 −1 A11 A12 ( A 22 − A 21 A11 A12 )−1 = ( A11 − A12 A 22 A 21 )−1 A12 A 22
(A1.12) The two above formulas are established for any square matrix A11 of the order n, A22 of the order m, A12 of the order n × m, and A21 of the order m × n, only if all the inverses of matrices in the two above formulas exist.
© 2012 by Taylor & Francis Group, LLC
500
A P P en D I X 1: M At RI X F o RMuL A s
A1.3 Positive Definite Character of a Matrix
Definition A1.2 Suppose A is a symmetric square matrix of the order n. If x τ Ax ≥ 0, ∀x ∈ R n
then A is said to be nonnegative definite, denoted as A ≥ 0. If x τ Ax > 0, ∀x ∈ R n , x ≠ 0
then A is said to be positive definite, denoted as A > 0. If
−x τ Ax > 0, ∀x ∈ R n , x ≠ 0 then A is said to be negative definite, denoted as A < 0. If A is neither positive definite nor negative definite, then we call A indefinite. Theorem A1.4 Suppose A and B are symmetric square matrices of the order n, and α is a real number. Then A > 0, B > 0 ⇒ A + B > 0
(A1.13)
A > 0,α > 0 ⇒ α A > 0
(A1.14)
A > 0 ⇒ A is invertible and A −1 > 0
(A1.15)
A ≥B⇒ A−B ≥0
(A1.16)
A ≥ 0 ⇒ tr( A ) ≥ 0 and A ≤ [tr( A )]I
(A1.17)
Theorem A1.5 If An×n > 0, then there exists an orthogonal matrix P of the order n, such that A = P ΛP τ , Λ = diag(λ 1 , λ 2 ,..., λ n )
© 2012 by Taylor & Francis Group, LLC
(A1.18)
A P P en D I X 1: M At RI X F o RMuL A s
5 01
where 0 < λ 1 ≤ λ 2 ≤ ≤ λ n , λ i (i = 1, 2,… , n) is the eigenvalue of matrix A. A1.4 Idempotent Matrix
Definition A1.3 If square matrix A has the property that A 2 = A, then A is called an idempotent matrix. Theorem A1.6 If A2 = A, then the eigenvalues of A can only be 0 or 1, and rank( A ) = tr( A )
(A1.19)
A1.5 Derivative of a Quadratic Form
Theorem A1.7 Suppose x is a variable vector of dimension n, a is a constant vector of dimension n × 1, and A is a constant square matrix of the order n. Then ∂(a τ x ) ∂(x τ a ) = =a ∂x ∂x
(A1.20)
∂ τ (x Ax ) = ( A + A τ )x ∂x
(A1.21)
Specially, if A = A τ , then ∂ τ (x Ax ) = 2 Ax ∂x
© 2012 by Taylor & Francis Group, LLC
(A1.22)
Appendix 2: Distributions in Common Use Here, we will briefly introduce some commonly used distributions in this book (which are normal distribution, χ2-distribution, t-distribution, and F-distribution). For more details, refer to classical textbooks of probability theory. A2.1 χ2-Distribution
Definition A2.1 If X1, X 2, . . ., Xn are independent standard normal random variables, 2 the distribution of the random variable χn defined by χn2 =
n
∑X i =1
2 i
(A2.1)
is called the χ2-distribution with n degrees of freedom. Theorem A2.1 2 The probability density function (PDF) of the random variable χn defined by formula (A2.1) is
503
© 2012 by Taylor & Francis Group, LLC
504
A P P en D I X 2: D Is t RIbu tI o ns In C o M M o n use
1 − x / 2 (n / 2 ) − 1 x , when x > 0 2n / 2 Γ(n /2) e χ ( x ; n) = 0, wh hen x ≤ 0 2
(A2.2)
where n is the degrees of freedom; it indicates the number of free variables in (A2.1). Theorem A2.2 Suppose X ~ χ2(n). Then 1. The characteristic function of X is ϕ(t ) = (1 − 2it )
−
n 2
2. The expected value and variance of X are ΕX = n, Var( X ) = 2n Theorem A2.3 Suppose X1, X2, …, Xn are independent standard random variables, and Q1 + + Qk =
n
∑X i =1
2 i
(A2.3)
where Qi (i = 1, 2, . . ., k) is a nonnegative quadratic form of (X1, X2, . . ., Xn) with rank ni. Then, Qi, i = 1, 2, . . ., k are independent and has χ2-distribution with ni degrees of freedom, if and only if n1 + n2 + + nk = n See Figure A2.1.
© 2012 by Taylor & Francis Group, LLC
505
A P P en D I X 2: D Is t RIbu tI o ns In C o M M o n use
P(x) n=1
0.2
n=4
0.15 0.1
n = 10 n = 20
0.05 0
0
Figure A2.1
5
10
15
20
25
30
35
x
The PDF of χ2-distribution.
A2.2 Noncentral χ2-Distribution
Definition A2.2 If X1, X2, …, Xn are independent normally distributed random variables, while EXi = μi, Var(Xi) = σ2, i = 1, 2, …, n, and not all μi (i = 1, 2, …, n) equal to 0. Define Y =
n
∑ i =1
X i2 σ2
(A2.4)
where Y is called the noncentral χ2 random variable whose distribution is called the noncentral χ2-distribution. It has two parameters: n (which specifies the degrees of freedom) and noncentral parameter n n µi2 / σ 2 . We denote it as Y ~ χ2 (n, δ), where δ = µi2 / σ 2 .
∑
∑
i =1
i =1
The PDF of random variable χ 2 (n, δ ) is ∞
χ (x ; n, δ ) = e 2
−δ / 2
∑ m=0
(δ /2)m m!
χ 2 (x ; 2m + n)
(A2.5)
where χ2 (x; 2m + n) stands for the PDF of ran dom variable χ2 defined by (A2.2) with 2m + n degrees of freedom. From formula (A2.5) we can see that the PDF of the noncentral χ2-distribution has two
© 2012 by Taylor & Francis Group, LLC
506
A P P en D I X 2: D Is t RIbu tI o ns In C o M M o n use
parameters: δ and n. If δ = 0, then formula (A2.5) becomes formula (A2.2). Thus, sometimes we call the χ2-distribution defined by formula (A2. 2) the central χ2-distribution. The characteristic function of χ2(n, δ) is ϕ(t ) = (1 − 2it )
−
n 2
i δt exp 1 − 2it
(A2.6)
The expected value and variance of χ2 (n, δ) are EX = n + δ Var( X ) = 2n + 4δ
(A2.7)
If X1, X2, . . ., Xk are independent and Xi ~ χ2(ni, δi), i = 1, 2, . . ., k, then k k k 2 Xi ∼ χ ni , δi i =1 i =1 i =1
∑
∑ ∑
A2.3 t-Distribution
Definition A2.3 If X ~ N(0, 1), Y ~ χ2(n), and X and Y are independent, then the random variable X T = (A2.8) Y /n is called the t-distribution with n degrees of freedom, denoted as T ~ t(n). Theorem A2.4 The PDF of the t-distribution defined by formula (A2.8) is Γ((n + 1) /2) x2 1 t ( x ; n) = + n Γ(n /2) nπ See Figure A2.2.
© 2012 by Taylor & Francis Group, LLC
− (( n + 1)/ 2 )
(A2.9)
A P P en D I X 2: D Is t RIbu tI o ns In C o M M o n use
507
0.4 t(x;n) n=∞
0.35
n = 10 n=5
0.3 0.25
n=1
0.2 0.15
n = ∞ n = 10 n=5 n=1
0.1 0.05 0 –5
Figure A2.2
–4
–3
–2
–1
0
1
2
3
4
x5
The PDF of student’s distribution.
corollary A2.1 If X ~ N(μ, σ2), Y/(σ2) ~ χ2(n), and X and Y are independent, then T =
X −µ ∼ t (n) Y /n
(A2.10)
As n approaches infinity, t-distribution tends to a standard normal distribution. In fact, x2 lim 1 + n →∞ n
− (( n + 1)/ 2 )
= e −x
2
/2
But for small values of n, the difference between t-distribution and normal distribution is significant. And
{
}
{
P T ≥ t0 ≥ P X ≥ t0
}
where X ~ N(0, 1). That is to say, t-distribution has larger probability at the end of the distribution than normal distribution. Theorem A2.5 If X ~ t(n), n > 1, then for r < n, EX r exists and
© 2012 by Taylor & Francis Group, LLC
508
A P P en D I X 2: D Is t RIbu tI o ns In C o M M o n use
0, r is odd EX r = r Γ((r + 1)/ 2)Γ((n − r )/ 2) , r is even n 2 Γ(1/ 2)Γ(n / 2)
(A2.11)
corollary A2.2 If X ~ t(n), n > 2, then E(X ) = 0, Var( X ) =
n n−2
A2.4 F-Distribution
Definition A2.4 Suppose that X and Y are independent χ2-distribution random variables with m and n degrees of freedom, respectively. The distribution of random variable F =
X .n X /m = Y /n Y m
(A2.12)
is called the F-distribution with (m, n) degrees of freedom and is denoted as F ~ F (m, n). Theorem A2.6 The PDF of F-distribution defined by formula (A2.12) is f ( x ; m , n) − (( m + n )/ 2 ) Γ((m + n)/ 2) m m (m / 2 ) −1 m x 1 + , when x > 0 x n = Γ(m / 2)Γ(n / 2) n n 0, when x < 0
(A2.13) corollary A2.3 If X/(σ2) ~ χ2(m), Y/(σ2) ~ χ2(n), and are independent, then
© 2012 by Taylor & Francis Group, LLC
A P P en D I X 2: D Is t RIbu tI o ns In C o M M o n use
509
f(x;m, n)
0.9 0.8
(m = 10, n = ∞)
0.7
(m = 10, n = 50)
0.6
(m = 10, n = 10)
0.5
(m = 10, n = 4)
0.4 0.3 0.2 0.1 0
0
Figure A2.3
0.5
1
1.5
2
2.5
3
3.5
4
x
The PDF of F-distribution.
F =
X n ⋅ ∼ F (m, n) Y m
corollary A2.4 If X ~ F (m, n), then, 1/X ~ F (n, m). See Figure A2.3. Theorem A2.7 If X ~ F (m, n), and r > 0, then
n Γ ((m / 2) + r ) Γ((n / 2) − r )) EX r = , 2r < n m Γ(m / 2)Γ(n / 2) r
Specially EX = Var( X ) =
n , n>2 n−2
(A2.14)
(A2.15)
n 2 ( 2m + 2n − 4) , n>4 m(n − 2)2 (n − 4)
Table A2.1 lists the cumulative distribution function of a random variable X, which has standard normal distribution Φ(x ) =
© 2012 by Taylor & Francis Group, LLC
1 2π
∫
x
−∞
e
−
t2 2
dt
510
A P P en D I X 2: D Is t RIbu tI o ns In C o M M o n use
TABLE A2.1 Cumulative Distribution Function for the Normal Distribution x 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
Φ(x )
x
Φ(x )
x
Φ(x )
0.500000 0.519939 0.539828 0.559616 0.579260 0.589706 0.617911 0.636831 0.655422 0.673645 0.691463 0.708840 0.725747 0.742154 0.758036 0.773373 0.788145 0.802338 0.815940 0.828944 0.841345
1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 2.05
0.853141 0.864334 0.874928 0.884930 0.894350 0.903200 0.911492 0.919243 0.926471 0.933193 0.939429 0.945201 0.950528 0.955434 0.959941 0.964070 0.967843 0.971283 0.974412 0.977250 0.979818
2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00 4.00 5.00
0.982136 0.984222 0.986097 0.987776 0.989276 0.990613 0.991802 0.992857 0.993790 0.994614 0.995339 0.995975 0.996533 0.997020 0.997445 0.997814 0.998134 0.998411 0.998650 0.999968 0.999997
2 Table A2.2 lists the fractile χn, α of χ2-distribution with n degrees of freedom
P ( χn2 > χn2, α ) = α Table A2.3 lists the fractile tn,α of student’s distribution with n degrees of freedom P (t (n) > tn, α ) = α Table A2.4 lists the fractile Fm,n,α of F-distribution. Fm,n,0.05 corresponds to the upper ones while Fm,n,0,01 corresponds to the below ones P ( F (m, n) > Fm , n, α ) = α where m, n are the degrees of freedom of the numerator and denominator, respectively.
© 2012 by Taylor & Francis Group, LLC
n
α = 0.99
0.98
0.95
0.90
0.80
0.70
0.50
0.30
0.20
0.10
0.05
0.02
1 2 3 4 5 6 7 8 9 10 11 12 13
0.000157 0.0201 0.115 0.297 0.554 0.872 1.239 1.646 2.088 2.558 3.053 3.571 4.107
0.000628 0.0404 0.185 0.429 0.752 1.134 1.564 2.032 2.532 3.059 3.609 4.178 4.765
0.00393 0.103 0.352 0.711 1.145 1.635 2.167 2.733 3.325 3.940 4.575 5.226 5.892
0.0158 0.211 0.584 1.064 1.610 2.204 2.833 3.490 4.168 4.865 5.578 6.304 7.042
0.0642 0.446 1.005 1.649 2.343 3.070 3.822 4.594 5.380 6.179 6.989 7.807 8.634
0.148 0.714 1.424 2.195 3.000 3.828 4.671 5.527 6.393 7.267 8.148 9.034 9.926
0.455 1.386 2.336 3.357 4.351 5.348 6.364 7.344 8.343 9.342 10.341 11.340 12.340
1.074 2.408 3.665 4.878 6.064 7.231 8.383 9.524 10.656 11.781 12.899 14.011 15.119
1.642 3.219 4.642 5.989 7.289 8.558 9.803 11.030 12.242 13.442 14.631 15.812 16.985
2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.362 14.684 15.987 17.275 18.549 19.812
3.841 5.991 7.815 9.488 11.070 12.592 14.067 15.507 16.919 18.307 19.675 21.026 22.362
5.412 7.842 9.837 11.668 13.388 15.033 16.622 18.168 19.679 21.161 22.618 24.054 25.472
0.01 6.638 9.210 11.341 13.277 15.086 16.812 18.475 20.090 21.666 23.209 24.725 26.217 27.688 (continued)
A P P en D I X 2: D Is t RIbu tI o ns In C o M M o n use
TABLE A2.2 χ2-Distribution
511
© 2012 by Taylor & Francis Group, LLC
512
n
α = 0.99
14 15
4.660 5.229
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
5.812 6.408 7.015 7.633 8.260 8.897 9.542 10.196 10.856 11.524 12.198 12.879 13.565 14.256 14.953
© 2012 by Taylor & Francis Group, LLC
0.98
0.95
0.90
0.80
0.70
0.50
0.30
0.20
0.10
0.05
0.02
0.01
5.368 5.985
6.571 7.261
7.790 8.547
9.467 10.307
10.821 11.721
13.339 14.339
16.222 17.322
18.151 19.311
21.064 22.307
23.685 24.996
26.873 28.259
29.141 30.578
6.614 7.255 7.906 8.567 9.237 9.915 10.600 11.293 11.992 12.697 13.409 14.125 14.847 15.574 16.306
7.962 8.672 9.390 10.117 10.851 11.591 12.338 13.091 13.848 14.611 15.379 16.151 16.928 17.708 18.493
9.31211 0.085 10.865 11.651 12.443 13.240 14.041 14.838 15.659 16.473 17.292 18.114 18.939 19.768 20.599
11.152 12.002 12.857 13.716 14.578 15.445 16.314 17.187 18.062 18.940 19.820 20.703 21.558 22.475 23.364
12.624 13.531 14.440 15.352 16.266 17.182 18.101 19.021 19.943 20.867 21.792 22.719 23.647 24.577 25.508
15.338 16.338 17.338 18.338 19.337 20.337 21.337 22.337 23.337 24.337 25.336 26.336 27.336 28.336 29.336
18.418 19.511 20.601 21.689 22.775 23.858 24.939 26.018 27.095 28.172 29.246 30.319 31.391 32.461 33.530
20.465 21.615 22.760 23.900 25.038 26.171 27.301 28.429 29.553 30.675 31.795 32.912 34.027 35.139 36.250
23.542 24.669 25.989 27.204 24.412 29.615 30.813 32.007 33.196 34.382 35.563 36.741 37.916 38.087 40.256
26.296 27.587 28.869 30.144 31.410 32.671 33.924 35.172 36.415 37.652 38.885 40.113 41.337 42.557 43.773
29.633 30.995 32.346 33.687 35.020 36.343 37.659 38.968 40.270 41.566 42.856 44.140 45.519 46.693 47.962
32.000 33.409 34.805 36.191 37.566 38.932 40.289 41.638 42.980 44.314 45.642 46.963 48.278 49.588 50.892
A P P en D I X 2: D Is t RIbu tI o ns In C o M M o n use
TABLE A2.2 (Continued) χ2-Distribution
A P P en D I X 2: D Is t RIbu tI o ns In C o M M o n use
513
TABLE A2.3 t-Distribution n
α
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 60 120 ∞
© 2012 by Taylor & Francis Group, LLC
0.10
0.05
0.025
0.01
0.005
3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.387 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.296 1.289 1.282
6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 16.84 1.671 1.658 1.645
12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.000 1.980 1.960
31.82 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.423 2.390 2.358 2.326
63.66 9.926 5.841 4.604 4.032 3.708 4.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.660 2.617 2.576
514
m n 2 3 4 5 6 7
1
2
3
4
5
6
7
20
200
18.51 98.49 10.13 34.12 7.71 21.20 6.61 16.26 5.99 13.74 5.59 12.25
19.00 99.01 9.55 30.81 6.94 18.00 5.79 13.27 5.14 10.92 4.74 9.55
19.16 99.17 9.23 29.46 6.59 16.69 5.41 12.06 4.76 9.78 4.35 8.45
19.25 99.25 9.12 28.71 6.39 15.98 5.19 11.39 4.53 9.15 4.12 7.85
19.30 99.30 9.01 28.24 6.26 15.52 5.05 10.97 4.39 8.75 3.97 7.46
19.33 99.33 8.94 27.91 6.16 15.21 4.95 10.67 4.28 8.47 3.87 7.19
19.36 99.34 8.88 27.67 6.09 14.98 4.88 10.45 4.21 8.26 3.79 7.00
19.44 99.45 8.66 26.69 5.80 14.02 4.56 9.55 3.87 7.39 3.44 6.15
19.49 99.49 8.54 26.18 5.65 13.52 4.38 9.07 3.69 6.94 3.25 5.70
© 2012 by Taylor & Francis Group, LLC
∞
19.50 99.50 8.53 26.12 5.63 13.46 4.30 9.02 3.67 6.88 3.23 5.65
A P P en D I X 2: D Is t RIbu tI o ns In C o M M o n use
TABLE A2.4 F-Distribution
9 10 11 20 200 1000 ∞
5.32 11.26 5.12 10.56 4.96 10.04 4.84 9.65 4.43 8.10 3.89 6.76 3.85 6.66 3.84 6.64
4.46 8.65 4.26 8.02 4.10 7.56 3.98 7.20 3.49 5.85 3.04 4.71 3.00 4.62 2.99 4.60
4.07 7.59 3.86 6.99 3.71 6.55 3.59 6.22 3.10 4.94 2.96 3.88 2.61 3.80 2.60 3.78
3.84 7.01 3.63 6.42 3.48 5.99 3.36 5.67 2.87 4.43 2.41 3.41 2.38 3.34 2.37 3.32
3.69 6.63 3.48 6.06 3.33 5.64 3.20 5.32 2.71 4.10 2.26 3.11 2.22 3.04 2.21 3.02
3.58 6.37 3.37 5.80 3.22 5.39 3.09 5.07 2.60 3.87 2.14 2.90 2.10 2.82 2.09 2.80
3.50 6.19 3.29 5.62 3.14 5.21 3.01 4.88 2.52 3.71 2.05 2.73 2.02 2.66 2.01 2.64
3.15 5.36 2.93 4.80 2.77 4.41 2.65 4.10 2.12 2.94 1.62 1.97 1.58 1.89 1.57 1.87
2.96 4.91 2.73 4.36 2.56 3.96 2.42 3.66 1.87 2.47 1.26 1.39 1.19 1.28 1.17 1.25
2.93 4.86 2.71 4.31 2.54 3.91 2.40 3.60 1.84 2.42 1.19 1.28 1.08 1.11 1.00 1.00
A P P en D I X 2: D Is t RIbu tI o ns In C o M M o n use
8
515
© 2012 by Taylor & Francis Group, LLC
Index A Additional information, 213. See also Nonlinear regression analysis; Linear regression analysis applications, 217–222 effective, 217 of regression coefficients, 216, 218 for second-best scale factor construction, 169–170 sources, 213–217 AIC. See Akaike information criterion (AIC) Akaike information criterion (AIC), 267 for AR models, 268 for MA and ARMA models, 268 polynomial order, 381 uses, 279, 381 AR models. See Autoregressive models (AR models)
AR2 time series. See Autoregressive time series (AR2 time series) ARIMA model. See AutoRegressive-IntegratedMoving-Average model (ARIMA model) ARMA. See Autoregressive moving average series (ARMA) Autocorrelation function, 292. See also Partial correlation function AR models, 236–237 ARMA models, 244–245 CW radar measurement noise, 292 estimation, 251, 252 MA model, 240 stationary processes, 231 Autocovariance function, 304 ARMA series, 258 estimation, 251, 252 MA model, 238–239, 255 stationary random processes, 230–231 517
© 2012 by Taylor & Francis Group, LLC
518
In D e X
Autoregressive models (AR models), 8, 232 AIC criterion, 268 autocorrelation functions, 236–237 least squares estimation, 254 memory function, 236 model testing, 269 moment estimation, 253 properties, 251 residual in, 293 stationary conditions, 233–234 transfer form, 236 Yule–Walker equation, 235 zero-mean series, 232 Autoregressive moving average series (ARMA), 240, 241–242, 258 AIC criterion, 268 autocorrelation functions, 244 Gauss–Newton iteration method, 259–260 model testing, 269–270 moment estimation, 258–259 nonlinear least squares estimation, 259 properties, 251 stationary invertible, 242–243 Autoregressive time series (AR2 time series), 8 AutoRegressive-Integrated-MovingAverage model (ARIMA model), 271 fitting for time series data, 273 integrated, 272 B B splines, standard, 73. See also Cubic splines properties, 73–75 values, 75 Ballistic camera, 350 Bartlett formulas, 252–253
© 2012 by Taylor & Francis Group, LLC
Bessel estimator, 11 Best approximation degrees, 46–48 of induced functions, 44–46 polynomials, 43–44 Biased estimation average MSEs, 172 canonical regression coefficients, 171 compression type, 158–161, 168 eigenvalues, 171 in linear regression models, 157–158 Bonferroni inequality, 186 Box–Jenkins method, 266–267 C Carrier phase outliers removal and cycle slip detection, 461 cumulative sum method, 464–466 ionosphere-free ambiguity epoch difference method, 463–464 M–W combination epoch difference method, 462–463 Center for Space Research (CSR), 451 Centre National d’Etudes Spariales (CNES), 450 CHAMP satellite, 451 antenna phase center offset, 472 code deviation, 460, 461 cycle slip detection result, 466 fault channel detection, 461 M–W combination series, 466 phase O–C residual, 488 relativity adjustments, 471 Chebyshev polynomials, 49 bases, 54–55, 58–59 properties, 49–50 Cinetheodolite, 348 coordinates, 349–350
In D e X
laser, 349 optical measurements, 347–448 CNES. See Centre National d’Etudes Spariales (CNES) Combinational signals, 282 Conditional expectation. See Mean vector—conditional Constant systematic errors, 18, 415 EMBET method, 401 estimation, 367, 401 for precision appraisal and calibration, 354 Continuous wave radar (CW radar), 290, 350 error analysis, 393 estimation error, 398–401 Gauss–Newton method, 394–398 multiple receivers, 388 multistation measurement data model, 391–393 systematic error model, 356–357 time alignment, 398 velocity measurement mechanism, 388–391 working principle, 387 Continuous wave radar, systematic error estimation, 401 algorithm and numerical examples, 412–415 EMBET method analysis, 403–405 measurement data model, 401–403 nonlinear modeling method, 405–412 Continuous wave radar measurement, 350 autocorrelation function, 292 error models, 291 errors, 352 mathematical modeling, 290, 291 MISTRAM system, 351–353 noise, 291
© 2012 by Taylor & Francis Group, LLC
519
parameter estimation, 292 partial correlation function, 292 random error, 292 residual in AR model, 293 stationarity test, 291 trajectory parameters, 350, 351 Contrast method, 19 Covariance matrix for high precision measurements, 30 properties, 302 random vector, 302 CSR. See Center for Space Research (CSR) Cubic splines, 62. See also B splines, standard function properties, 65–73 interpolation, 75, 76–78 theorems, 62–64 Cumulative distribution function, 509–510 Cumulative sum (CUSUM), 464–465 CUSUM. See Cumulative sum (CUSUM) CW radar. See Continuous wave radar (CW radar) Cycle slip detection, 461–462. See also Cumulative sum (CUSUM) conditions, 462 reediting phase data use, 485–486 results, 466 D 3-D. See Three-dimension (3-D) Data, 1 measurement, 1, 2 preprocessing flow, 466, 467 processing, 33 re-editing phase, 485–486 re-editing pseudocode, 485
520
In D e X
Divergence, Kalman filter, 325–326 dynamic noise model selection, 326–329 measurement noise model selection, 329–332 Domain, invertible, 239 Doppler frequency shift, 389 Doppler orbitography and radiopositioning integrated by satellite (DORIS), 449–450 DORIS. See Doppler orbitography and radiopositioning integrated by satellite (DORIS) Drift error estimation procedure, 426, 436–438 in range rate measurement, 438 by spline functions, 427–428 Dynamic noise correlation matrix selection, 333 model selection, 326–329 E Earth nonspherical perturbation, 478 acceleration of, 479–480 gravitational potential, 478–479 Empirical accelerations, 484 Empirical formulas, 85–86 from experience, 87–88 mechanical type, 88–89 progressive type, 89–90 from scientific laws, 86–87 Engineering analysis, 2 for data processing results, 443 in redundant variable reduction, 147 Environmental error, 6–7 Equation representation of trajectory, 361–363
© 2012 by Taylor & Francis Group, LLC
Error distribution impact, 30 environmental, 6–7 equipment, 6 estimation, 398 human, 7 matched systematic, 384, 385 methodological, 6 models, 291 negligent, 8 in nominal trajectory, 382 postulate, 6 propagation relationship, 375 random, 7–8 slow drift error, 438 spline fitting, 370 synthesis, 28 systematic, 7 trajectory, 353 unmatched systematic, 384, 385 Error models, 291 parameter estimation, 292 position systematic, 356 systematic, 354, 356 velocity systematic, 357 Error theory error synthesis, 28–33 measurement, 1–4 measurement error, 5–8 negligent errors, 22–28 random error, 8–17 systematic errors, 17–22 Expanding-dimension method, 322 F F-distribution, 508–509, 510, 514–515 Flying target linear stochastic system, 300–301 position parameters, 299 velocity, 299 Form-keeping type signal, 280
In D e X
combinational signals, 282 completely, 280 linear combination signals, 282 polynomial signals, 280 sinusoidal signal, 281–282 trigonometric function signals, 280–281 Free flight phase systematic error estimation, 416–425 Coriolis acceleration, 418 implicated acceleration, 418 nonlinear model of the measurement data, 420–422 numerical example and analysis, 426 parameter estimation method, 422–425 trajectory equations, 417–420 Functional errors, 31 functional systematic errors, 31 random errors, 31–33 G Gaussian law of random error, 10 Gauss–Markov assumption, 98 theorem, 99–101, 190 Gauss–Newton method, 207, 394–398 improved, 380–382 iteration equation, 207–208 iteration method, 259–260 least squares estimate, 207 for nonlinear regression models, 208–213 Global positioning system (GPS), 449, 450–451. See also Spaceborne dual-frequency GPS antenna offset corrections for GPS satellites, 470–472
© 2012 by Taylor & Francis Group, LLC
5 21
spatial distribution of, 450 GPS. See Global positioning system (GPS) GRACE satellite, 451 antenna phase center offset, 472 GRGS. See Le Groupe de Recherche de Géodésie Spatiale (GRGS) Grubbs criterion, 27 H Holder inequality, 70, 192 Hollidany model, 88 Human error, 7 I Idempotent matrix, 501 IGN. See Institut Géographique National (IGN) Independence test, 262–263 Institut Géographique National (IGN), 450 Interpolation polynomial bases, 50 approximation errors, 55–56 lemmas in mathematical analysis, 56–57 theorems, 50–54 Inverse of block matrix, 498–499 Invertible condition, 239 J Jet Propulsion Laboratory (JPL), 451 JPL. See Jet Propulsion Laboratory (JPL) K Kalman filter correlation matrix selection, 333 divergence, 325–332
522
In D e X
errors, 330, 331 measurement noise, 322–324 with noises, 324–325, 332–333 state noise, 321–322 statistical feature extraction, 333–343 Kalman filter, discrete-time, 299, 310 applications in AR modeling, 320–321 calculations, 318 flow, 315, 316 formulas, 314, 315 orthogonal projection, 310–314 radar tracking data, 319–320 recursive formulas, 317, 320 review of development, 301 L Lagrange polynomial, cubic, 454 Le Groupe de Recherche de Géodésie Spatiale (GRGS), 450 Least squares estimate, 99, 114 LEO. See Low earth orbit (LEO) Limit error, 15, 16 Limited memory method, 334 Linear combination signals, 282 Linear equation general solutions, 83 homogeneous, 82 N-order, 455 Linear iteration method, 256 Linear minimum mean square error estimate (LMMSEE), 308–309 discrete-time Kalman filter, 311–312 MMSEE, 310 Linear regression analysis. See also Nonlinear regression analysis
© 2012 by Taylor & Francis Group, LLC
hypothesis tests on regression coefficients, 104–109 interval estimates of parameters, 109–114 least squares estimates and, 114–118 multicollinearity, 118 point estimates of parameters, 98–104 Linear regression model, 95–96 biased estimation in, 157–158 compound models for signals, 124–131 compression type estimation, 158–161 dynamic measurement data, 119–124, 131 numerical examples, 170–174 optimization, 118–119 parameter estimation efficiency, 190–200 ridge parameter determination, 161–166 scale factors, 166–170 Linear stochastic system, 300–301 Linearity, 313 LMMSEE. See Linear minimum mean square error estimate (LMMSEE) Low earth orbit (LEO), 449 antenna offsets for LEO satellites, 472–473 relativity adjustment, 470 spaceborne GPS, 450 M MA model. See Moving average model (MA model) MAD. See Mean absolute deviation (MAD) Matching principle, 365–366 advantages, 366
In D e X
observational data use, 366 trajectory parameters, 365 Matrix, positive definite using Cholesky decomposition, 101 half-positive, 104 symmetrical, 194 Matrix formulas, 497 cumulative distribution function, 509–510 F-Distribution, 508–509, 510, 514–515 idempotent matrix, 501 inverse of block matrix, 498–499 noncentral χ2-distribution, 505–506 positive definite character of matrix, 500 probability density function, 503–504 quadratic form derivative, 501 t-distribution, 506–508, 510, 513 trace of matrix, 497 χ2-distribution, 503–505, 510, 511–512 Mean, 261. See also Standard deviation arithmetic, 10–11 estimation, 12–13 nonstationarity, 271 square error matrix, 315 stationarity, 265 vector, 301–302 Mean absolute deviation (MAD), 457 Mean square error (MSE), 115 Mean square error matrix (MSEM), 135, 310 filter error, 315, 315 LMMSEE, 308 minimum, 305 prediction error, 315 Mean vector conditional, 303, 304
© 2012 by Taylor & Francis Group, LLC
523
random vector, 301 Measurement, 2–3 combines, 3–4 direct, 3 dynamic objects, 5 equal precision, 4 exercises, 34–36 indirect, 3 practice, 1–2 precision index, 15–17 static measurement data, 33–34 static objects, 5 unequal precision, 4 Measurement data, 1, 2, 3, 10. See also Continuous wave radar measurement from continuous-wave radar system tracking, 347 dynamic, 119–120 mathematical model, 401–403, 426–429 modeling, 441–442 negligent errors, 23 nonlinear model, 420–422 quality, 8 static, 33–34 Measurement error, 6. See also Negligent error; Systematic error; Random error in distance, 13 error classification, 7–8, 9 measurement data quality, 8 processing methods, 9 source, 6, 9 Measurement noise Kalman filtering, 324–325 mathematical modeling, 290, 291 model selection, 329–332 statistical feature extraction, 333–343 Measurement uncertainty, 28–29 estimation, 29 propagation, 29–30
524
In D e X
Melbourne-Wuebbena (MW), 462, 466 Methodological error, 6 Minimum mean square error estimation (MMSEE), 305–307 Minimum variance unbiased estimator, uniformly, 102 MISsile TRAjectory Measurement (MISTRAM), 334, 351, 271, 388 measurement data, 383 measurement errors, 352 parameter estimation, 386 position systematic error model, 356–357 trajectory error, 353 velocity systematic error model, 357 MISsile TRAjectory Measurement system trajectory determination, 370, 371–372 error propagation relationship, 375–376 mathematical method, 372–375 nonlinear regression analysis method, 376–383 MISTRAM. See MISsile TRAjectory Measurement (MISTRAM) MMSEE. See Minimum mean square error estimation (MMSEE) Model testing, 268–269 AR models, 269 ARMA models, 269–270 MA models, 269 Moving average model (MA model), 237, 238, 255–256 AIC criterion, 268 autocorrelation functions, 240 autocovariance function series, 239
© 2012 by Taylor & Francis Group, LLC
invertible domain and condition, 239 linear iteration method, 256 model testing, 269 Newton–Raphson algorithm, 256–257 properties, 251 zero-mean series, 237 MSE. See Mean square error (MSE) MSEM. See Mean square error matrix (MSEM) MW. See Melbourne-Wuebbena (MW) N Negligent error, 8, 22. See also Systematic error; Measurement error; Random error avoidance, 23 causes, 23 Grubbs criterion, 27 Romannovschi criterion, 23–26 Newton’s law, 86 Newton–Raphson algorithm, 256–257 Noncentral χ2-distribution, 505–506 Nonlinear modeling method, 405 eigenvalue, 412 EMBET nominal trajectory, 407–408 LS estimation, 406–407 systematic error, 408, 409 trajectory parameters, 408, 409 Nonlinear regression analysis. See also Linear regression analysis algorithm and error analysis, 379 improved Gauss–Newton method, 380–382 mathematical model establishment, 377–379
In D e X
models, 200–201 parameter estimation methods, 203–213 in processing measurement data, 201–202 simulation calculation results, 382–383 trajectory determination, 376–377 Nonstationarity, 270 mean, 271 variance, 270, 271 Nonstationary time series. See also Stationary time series models ARIMA model, 271–273 nonstationarity, 270–271 PAR model, 276–282 RAR model parameter estimation, 282–287 RARMA model, 273–275, 288–289 RMA model parameter estimation, 287–288 Normal distribution cumulative distribution function, 510 density function, 14 random error, 13, 15 t-distribution, 507 Normality test, 261–262 O ODE. See Ordinary differential equation (ODE) Optical measurements, 347–348 ballistic camera use, 350 equipment use, 349 laser cinetheodolite, 349–350 laser ranging, 349 wave refraction errors, 350 Optimal reduced model, 133
© 2012 by Taylor & Francis Group, LLC
525
Orbit determination of LEO satellites, 449. See also Spaceborne dual-frequency GPS antenna offsets, 470–473 atmospheric drag forces, 481–482 coefficient matrix for, 477 dynamic orbit models, 477–478, 484–485 earth nonspherical perturbation, 478–480 empirical forces, 483–484 exercise, 490, 491, 492, 493–494 iteration estimation flowchart, 478 kinematic methods of, 468 least squares estimate, 476 observational equations, 469 parameter estimation, 473–477 re-editing observational data, 485–486 relativity, 470, 483 result analysis, 487–490, 491, 492, 493 solar radiation pressures, 482–483 third body gravitational perturbations, 480–481 tide perturbations, 481 by zero-difference reduced dynamics, 467, 486–487 Ordinary differential equation (ODE), 80–81. See also Polynomial representations of functions; Spline representations of functions linear solutions, 81–82 nonlinear solutions, 83–85 Orthogonal projection, 310–311 geometric interpretation, 314 LMMSEE, 311–312 properties, 312–314 Orthogonality Chebyshev polynomials, 50 Kalman filter, 313–314
526
In D e X
P PAR model. See Polynomial autoregressive model (PAR model) Parameter estimation AR models, 253–254 ARMA models, 258 efficiency, 190, 194, 197 error models, 292 Gauss–Markov theorem, 190 Gauss–Newton method, 207–213 Holder inequality, 192 least squares method, 175, 203–206 in linear regression models, 190, 191, 194–200 MA models, 255–256 orbit models, 473 RAR model, 282–287 RARMA model, 288–289 RMA model, 287–288 simulation method, 200 stationary time series models, 251 Parameters interval estimates, 109–114 least squares estimates, 114–118 multicollinearity, 118 point estimates, 98–104 Parametric representation of functions, 40 Partial correlation function, 245. See also Autocorrelation function for AR model, 246–247 CW radar measurement noise, 292 k-order, 245 linear least squares estimation, 245–246 p-step truncation, 249–251 recursive expression, 247–249 PDF. See Probability density function (PDF)
© 2012 by Taylor & Francis Group, LLC
Periodic systematic errors, 18 Peters estimator, 12 Point-by-point elimination method, 175–176 criteria derivation, 176–180 misidentification probability, 180–181 numerical examples, 187–190 for outliers, 182–187 Polynomial approximation. See also Limited memory method; Weierstrass theorem disadvantages, 60 functional approximation, 61 required parameter, 81 Polynomial autoregressive model (PAR model), 276 discussions, 279–280 fitting, 278–279 and parameter estimation, 276 RSS, 277–278 time-varying, 335 Polynomial representation of trajectory, 363–365 Polynomial representations of functions, 40–41. See also Spline representations of functions basis representations, 59 basis selection importance, 48–49 Chebyshev polynomials, 49–55 induced functions approximation, 44–46 interpolation polynomial bases, 50–57 polynomial approximation, 43–44, 46–48, 60 Weierstrass theorem, 41–43 Polynomial signals, 280 Positive definite character of matrix, 500 Precision index limit error, 16–17
In D e X
of mean, 16 root-mean-squared deviation, 10 single measurement, 15 Probability density function (PDF), 503–504 F-distribution, 508, 509 random variable, 503, 505 t-distribution, 506, 507 χ2-distribution, 505 Processing dynamic measurement data linear model derivation, 100 mathematical models, 133 problem transformation, 93 and regression model, 119–124, 131 Q Quadratic form derivative, 501 R Radar measurement data processing, 347, 438, 444–446. See also Free flight phase systematic error estimation; MISTRAM; Slow drift error estimation analysis of abnormal data, 439–441 data processing procedures, 438 engineering analysis, 443 exercise, 445–446 measurement data modeling, 441–442 optical measurements, 348–350 precision appraisal, 353–354 precision calibration, 354–356 radar measurements, 350–353 random errors, 357–358, 442–443 space measurements, 347–348 systematic errors, 356, 358–360, 383, 384, 401, 443
© 2012 by Taylor & Francis Group, LLC
527
time alignment, 387 tracking measurements, 348 trajectory determination, 348–349, 370 trajectory parameter, 359, 360–361, 383, 384 true signal, 443 Radio measurements, 348 RAIM. See Receiver autonomous integrity monitoring (RAIM) Random error, 7–8. See also Measurement error; Negligent error; Systematic error distributions, 13–15 functional, 31 gaussian law, 10 in independent measurements, 8, 9 numerical characteristics, 10–13 postulate, 9–10 precision index of measurement, 15–17 radar measurement noise, 292 Random time series. See Time series Random vector, 301 conditional mean vector, 303, 304 conditional variance matrix, 303 covariance matrix, 302 mean vector, 301 state vector estimation, 305–310 variance matrix, 302 vector random process, 304–305 RAR model. See Regression autoregressive model (RAR model) RARMA model. See Regression autoregressive moving average model (RARMA model) Receiver clock offset value, 485 multiple, 388
528
In D e X
Receiver autonomous integrity monitoring (RAIM), 458–461. See also Statistical testing indicator statistic variable T of, 460 Re-editing observational data, 485–486 Regression analysis, 93 measurement data processing relationship, 93–97 regression coefficient vector, 97–98 Regression autoregressive model (RAR model), 275, 283 parameter estimation, 282–287 Regression autoregressive moving average model (RARMA model), 273–275 parameter estimation, 288–289 Regression coefficient canonical, 171 estimation, 158–160 hypothesis tests on, 104–109 vector, 97–98 Regression model, optimal reduced essential variable determination, 149–150 fast algorithms, 146–147 redundant variable elimination, 147–149 selection, 155–156 variable subset, 150–155 Regression moving average model (RMA model), 275 parameter estimation, 287–288 Relativity adjustment formula, 470 Residual analysis, 19 in regression models, 21 residual, 20 Residual sum of squares (RSS), 102, 259 PAR model, 277–278 Ridge parameter
© 2012 by Taylor & Francis Group, LLC
determination, 161–164 using Lagrange multiplier method, 164 solution to extreme value problem, 164–166 RMA model. See Regression moving average model (RMA model) Romannovschi criterion, 23–26 RSS. See Residual sum of squares (RSS) S Satellite laser ranging system (SLR), 449 advantages, 449 limitation, 468 Satellite-to-satellite tracking (SST), 450 Scale factors, 166–170 linear model, 158 proper, 166 Signals combinational, 282 linear combination, 282 polynomial, 280 sinusoidal, 281–282 trigonometric function, 280–281 Sinusoidal signal, 281–282 Slow drift error estimation, 426, 436–438 spline node selection, 429–436 mathematical model, 426–429 SLR. See Satellite laser ranging system (SLR) Space measurements, 347–348 Spaceborne dual-frequency GPS, 450–451 advantage, 468 carrier phase outliers removal and cycle slip detection, 461–466
In D e X
data preprocessing, 451, 466–467 gross error detection, 458, 459 ionospheric delay, 453–458 observation equations, 452, 459 pseudocode, 452, 459 RAIM, 458–461 re-editing observation data, 485 signal-to-noise ratio, 453 statistical testing indicator, 459–460 Spaceflight tracking system, 88 Spline representation of trajectory, 366–370 Spline representations of functions, 61. See also Polynomial representations of functions bases, 78–79 cubic splines, 62–73, 75–78 standard B splines, 73–75 SST. See Satellite-to-satellite tracking (SST) Standard deviation, 10, 11, 15 estimation, 11–13 State vector, 305 expanding-dimension method, 322 LMMSEE, 308–309, 310 MMSEE, 305–307 Stationarity test, 263–365 of fitting residuals, 291 mean stationarity, 265 variance stationarity, 265–266 Stationary random processes, 230 autocorrelation functions, 231 autocovariance function, 230–231 Stationary time series, 7 Stationary time series modeling, 266 AIC criterion, 267–268 Box–Jenkins method, 266–367 model testing, 268–270 modeling steps, 270 Stationary time series models. See also Nonstationary time series
© 2012 by Taylor & Francis Group, LLC
529
AR models, 232–237 AR(p) model parameter estimation, 253–254 ARMA model, 240, 241–244 ARMA(p, q) model parameter estimation, 258–260 autocorrelation function estimation, 252, 253 autocovariance function estimation, 251, 252 MA model, 237–240 MA(q) model parameter estimation, 255–257 parameter estimation, 251 partial correlation function, 245–251 stationary random processes, 230–231 Statistical testing indicator, 459–460 Student’s distribution. See t-distribution Systematic error, 7, 17, 383. See also Negligent error; Measurement error; Random error; Trajectory parameter categories, 360 causes, 17–18 elimination, 21–22 EMBET method, 358–359 estimation, 359–360 functional, 31 identification, 19 linearly, 18 matched systematic error, 384–386 measurement data models, 383 MISTRAM system, 383, 384 reduction, 21 residuals, 19–20 spaceflight tracking system, 88 unmatched systematic error, 386 variation rules, 18–19
530
In D e X
Systematic error estimation for free flight phase, 416 Coriolis acceleration, 418 implicated acceleration, 418 nonlinear model, 420–422 numerical example and analysis, 426 parameter estimation method, 422–425 trajectory equations, 417–520 T t-distribution, 506–508, 510, 513 Telemetry measurements, 348 Third body gravitational perturbations acceleration, 480–481 Three-dimension (3-D), 348 Time series analysis, 229–230 deterministic, 229 discrete random process, 229 independence test, 262–263 noise modeling, 290 nonstationary time series, 270 normality test, 261–262 observational data tests, 260 parameter estimation, 251 stationarity test, 263–266 stationary time series, 230, 266 tasks, 230 TOPEX satellite, 451 Trace of matrix, 497 Tracking measurements, 348 optical measurements, 348–350 radar measurements, 350–353 trajectory determination, 348–349 Tracking system calibration, 354–356 comparison, 355 error analysis, 354
© 2012 by Taylor & Francis Group, LLC
error model, 88 flight status determination, 300 laboratory tests, 355 measurement data modeling, 355 multiple observations, 355 parameter estimation method, 355–356 precision appraisal, 353–354 precision calibration, 354 Trajectory calculation, 370. See also MISsile TRAjectory Measurement system trajectory determination Trajectory parameter. See also Systematic error data processing problem translation, 360–361 EMBET method, 358, 366 equation representation, 361–363 error in, 350 estimation, 359, 360 matching principle, 365–366 measurement data models, 383 MISTRAM system, 355, 383, 384 nonlinear least-squares method, 349 parametric representation, 361 polynomials, 360–361, 363–365 radar measurements, 350, 351 spline fitting error, 370 spline representation, 366–370 Trigonometric function signals, 280–281 U Unbiased linear estimate, 99 Unbiasedness, 313 of orthogonal projection, 312 Uncertain systematic errors, 18
In D e X
V Variable selection, 131. See also Linear regression analysis; Measurement data—dynamic consequences, 134–138 criteria, 138–146 fast algorithms, 146–156 model selection, 134 role, 131–133 Variance matrix conditional, 303 random vector, 302 Velocity measurement, 388 Doppler frequency shift, 389 errors, 446 frequency transmission, 389–391 precision of, 349
© 2012 by Taylor & Francis Group, LLC
5 31
time alignment issue, 391 Vondrak filter, 454–458 W Weierstrass theorem, 41–43 Weighted least squares estimator, 101 X χ2-distribution, 503–505, 510 Y Yule–Walker equation, 235 Z Zero-difference reduced dynamic orbit determination, 486–487