Prognostics and Health Management of Electronics

Prognostics and Health Management of Electronics Michael G. Pecht CALCE Electronic Products and Systems University of ...

Author: Michael G. Pecht

151 downloads 2755 Views 21MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Prognostics and Health Management of Electronics

Michael G. Pecht CALCE Electronic Products and Systems University of Maryland

@3 WILEY

A JOHN WILEY & SONS, INC., PUBLICATION

This Page Intentionally Left Blank

Prognostics and Health Management of Electronics

This Page Intentionally Left Blank

Prognostics and Health Management of Electronics

Michael G. Pecht CALCE Electronic Products and Systems University of Maryland

@3 WILEY

A JOHN WILEY & SONS, INC., PUBLICATION

Copyright C 2008 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons. Inc., Hoboken, New Jersey Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 ofthe 1976 United States Copyright Act, without either the prior written permission ofthe Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 11 1 River Street, Hoboken, NJ 07030, (201) 748-601 1 , fax (201) 748-6008, or online at http:/lwuu.wiley.com/go/permission Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Catuloging-in-Publication Data:

Pecht, Michael. Prognostics and health management of electronics / Michael G. Pecht. p. cm. Includes bibliographical references and index. ISBN 978-0-470-27802-4 (cloth) 1. Electronic systems--Maintenance and repair. I. Title. TK7870.P39 2008 62 1.381028'8--dc22 2008027978 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1

Contents Preface Acknowledgements Acronyms Chapter 1 Introduction 1.1 Reliability and Prognostics 1.2 PHM for Electronics 1.3 PHM Concepts and Methods 1.3.1 Fuses and Canaries 1.3.2 Monitoring and Reasoning of Failure Precursors 1.3.3 Monitoring Environmental and Usage Profiles for Damage Modeling 1.4 Implementation of PHM for System of Systems 1.5 Summary Chapter 2 Sensor Systems for PHM 2.1 Sensor and Sensing Principles 2.1.1 Thermal Sensors 2.1.2 Electrical Sensors 2.1.3 Mechanical Sensors 2.1.4 Humidity Sensors 2.1.5 Biosensors 2.1.6 Chemical Sensors 2.1.7 Optical Sensors 2.1.8 Magnetic Sensors 2.2 Sensor Systems for PHM 2.2.1 Parameters to Be Monitored 2.2.2 Sensor System Performance 2.2.3 Physical Attributes of Sensor Systems 2.2.4 Functional Attributes of Sensor Systems 2.2.5 Cost 2.2.6 Reliability 2.2.7 Availability 2.3 Sensor Selection 2.4 Examples of Sensor Systems for PHM Implementation 2.5 Emerging Trends in Sensor Technology for PHM Chapter 3 Data-Driven Approaches for PHM 3.1 Introduction 3.2 Parametric Statistical Methods 3.2.1 Likelihood Ratio Test 3.2.2 Maximum Likelihood Estimation 3.2.3 Neyman-Pearson Criterion 3.2.4 Expectation Maximization 3.2.5 Minimum Mean Square Error Estimation 3.2.6 Maximum A Posteriori Estimation

ix x1 xv 1 1 3 6 7 9 13 18 19 25 25 26 27 28 28 29 29 30 31 32 33 33 34 34 38 38 38 38 41 44 47 47 48 49 50 50 51 51 51

V

vi

3.2.7 Rao-Blackwell Estimation 3.2.8 Cramer-Rao Lower Bound 3.3 Nonparametric Statistical Methods 3.3.1 Nearest Neighbor-Based Classification 3.3.2 Parzen Window (or Kernel Density Estimation) 3.3.3 Wilcoxon Rank-Sum Test 3.3.4 Kolmogorov-Smirnov Test 3.3.5 Chi Square Test 3.4 Machine Learning Techniques 3.5 Supervised Classification 3.5.1 Discriminative Approach 3.5.2 Generative Approach 3.6 Unsupervised Classification 3.6.1 Discriminative Approach 3.6.2 Generative Approach 3.7 Summary Physics-of-Failure Approach to PHM Chapter 4 4.1 PoF-Based PHM Methodology 4.2 Hardware Configuration 4.3 Loads Failure Modes, Mechanisms, and Effects Analysis 4.4 4.5 Stress Analysis Reliability Assessment and Remaining-Life Predictions 4.6 Outputs from PoF Based PHM 4.7 The Economics of PHM Chapter 5 5.1 Return on Investment 5.1.1 PHM ROI Analyses 5.1.2 Financial Costs PHM Cost-Modeling Terminology and Definitions 5.2 5.3 PHM Implementation Costs 5.3.1 Nonrecurring Costs 5.3.2 Recurring Costs 5.3.3 Infrastructure Costs 5.3.4 Nonmonetary Considerations and Maintenance Culture 5.4 Cost Avoidance 5.4.1 Maintenance Planning Cost Avoidance 5.4.2 Discrete Event Simulation Maintenance Planning Model 5.4.3 Fixed-Schedule Maintenance Interval 5.4.4 Precursor to Failure Monitoring 5.4.5 LRU-Independent Methods 5.4.6 Discrete Event Simulation Implementation Details 5.4.7 Operational Profile 5.5 Example PHM Cost Analysis 5.5.1 Single-Socket Model Results 5.5.2 Multiple-Socket Model Results 5.5.3 Example Business Case Construction 5.6 Summary PHM Roadmap: Challenges and Opportunities Chapter 6 6.1 Introduction 6.2 Roadmap Classifications

Contents

52 52 52 52 53 54 54 54 55 57 57 61 63 63 65 67 73 73 74 74 75 78 79 82 85 85 86 88 88 89 90 90 91 91 93 94 95 96 96 97 99 100 101 102 104 108 114 119 119 120

Contents

vii

6.2.1 PHM: Component Level 6.2.2 PHM for Integrated Circuits and Gate Devices 6.2.3 High-Power Switching Electronics 6.2.4 Built-in Prognostics for Components and Circuit Boards 6.2.5 ElectronicsiElectro-Optical Prognostics for Tactical Sensor Systems 6.2.6 Interconnect Prognostics 6.2.7 PHM as Mitigation of Reliability Risks 6.2.8 PHM in Supply Chain Management and Product Maintenance 6.3 PHM at System Level 6.3.1 Legacy Systems 6.3.2 Environmental and Operational Monitoring 6.3.3 LRU to Device Level 6.3.4 Dynamic Reconfiguration 6.3.5 System Power Management and PHM 6.3.6 PHM as Knowledge Infrastructure for System Development 6.3.7 Prognostics for Software 6.4 Methodology Development 6.4.1 Best Algorithms 6.4.2 Approaches to Training 6.4.3 Verification and Validation 6.4.4 Long-Term PHM Studies 6.5 Nontechnical Barriers 6.5.1 Cost, ROI, Business Case Development 6.5.2 Liability and Litigation 6.5.3 Role of Standards Organizations Appendix A Commercially Available Sensor Systems for PHM Appendix B PHM in Industry, Academia, and Government Appendix C Journals and Conference Proceedings Related to PHM Index

121 121 122 123 123 124 124 125 125 126 126 126 126 127 127 127 129 130 130 131 131 131 131 132 133 135 167 307 309

This Page Intentionally Left Blank

Preface Prognostics is the process of predicting the future reliability of a product by assessing the extent of deviation or degradation of the product from its expected normal operating conditions. Health management systems are programs that respond in a preemptive and opportunistic manner to the anticipation of failures. There is a growing interest among industry, government, and academia to monitor the ongoing reliability, or health, and predict the remaining life of electronic products and systems because most complex systems today contain significant electronics content. Approaches to implement prognostics in electronic products and systems include using expendable devices, such as canaries and fuses that fail earlier than the host product; monitoring and trending of parameters that are precursors to failure; and modeling accumulated damage (e.g., physics of failure) based on system exposure to life-cycle loads and operating conditions. If one can assess the extent of deviation or degradation of a system in its application environment and predict remaining lifesuccess of a future event or probability of , the information can be used to meet the following powerful objectives: Provide advanced warning of system failures Enable condition-based (predictive) maintenance Obtain knowledge of load history for future design, qualification, and root cause analysis Increase system availability through an extension of maintenance cycles and/or timely repair actions Lower life-cycle costs of equipment from reductions in inspection costs, downtime, and inventory Reduce the occurrence of intermittents and no fault founds (NFF) At present, there are many organizations conducting research and development into prognostics and even more that wish to implement it in their products and systems. However, research on prognostics and health management (PHM) for electronics has been fragmented, and until now there has been no single reference that describes what is being conducted. To address this, this book discusses the activities of the major players in the prognostics field, including companies, academia, and government organizations. This book also discusses the available sensors that are used for prognostics, the parameters that can be monitored, the functions and principles of these sensors, implementation techniques and guidelines for sensor selection. The prognostics models and algorithms currently in use are also discussed in this book. This book provides an overview of the implementation costs including recurring, nonrecurring, and infrastructure costs and the cost avoidance possible with PHM. A roadmap is then presented to show the challenges and opportunities for research and development of PHM.

ix

x

Preface

Chapter 1 provides a basic understanding of PHM and the techniques being developed to enable prognostics for electronic products and systems. The general approaches for PHM of electronics include (1) the use of fuses and canary devices; ( 2 ) monitoring and trending of failure precursors; and (3) monitoring environmental and usage loads for damage modeling. Examples are given to demonstrate each of the general approaches. Steps for implementing an effective PHM strategy for a complete product or system are presented. Chapter 2 presents the state-of-the-art in sensor systems for in situ health and usage monitoring. Advances in the areas of sensor fabrication, microprocessors, compact nonvolatile memory, battery technology, and wireless telemetry have led to novel sensor systems that can be used for in situ life-cycle monitoring of electronic products and systems. Characteristics of state-of-the-art sensor systems, including on-board power management features, on-board memory, embedded signal processing software, wireless data transmission, low size and weight, high reliability, and low cost are presented. Select stateof-the-art, commercially available sensor systems are included along with their performance characteristics. A final section on emerging trends in sensor system technology is presented. Chapter 3 discusses the various data-driven models and algorithms that can be utilized for prognostics and health management. The discussion covers statistical, usage-based, state estimation, and general pattern recognition models and algorithms. Chapter 4 discusses the physics-of-failure-based prognostics approach. This approach permits the assessment of system reliability under its actual application conditions by integrating sensor data with models that enable in situ assessment of the deviation or degradation of a product from an expected normal operating condition. A formal implementation procedure, which includes failure modes, mechanisms, effects analysis, data reduction and feature extraction from the life-cycle loads, and damage accumulation, is presented. Chapter 5 presents the economics of PHM. This chapter provides an overview of the implementation costs and the cost avoidance possible with PHM. Implementation costs, including recurring, nonrecurring and infrastructure costs are discussed. Maintenance planning is described and an example return-on-investment analysis is performed. Chapter 6 presents the challenges and opportunities for research and development in PHM of electronics. Included are recommendations on the essential next steps for continued advancement of PHM technologies. A PHM technology roadmap is then provided. It is acknowledged that the field of PHM is evolving rapidly. Furthermore, due to the large amount of published work in PHM, any assessment inevitably leaves out some organizations and topics that we either were not aware of or did not consider relevant in the context of this book

Acknowledgments This research could not have been conducted without the help of many companies, universities, and government organizations. The organizations that contributed to this book include: Companies Aeronautical Radio Incorporated BAE Systems Boeing EADS Emerson Expert Microsystems General Dynamics General Electric General Motors GMA Industries Honeywell Impact Technologies Intelligent Automation, Inc. Invocon JR Dynamics Lansmont Lockheed Martin Microstrain Northrop Grumman Qualtech Systems, Inc. Raytheon Ridgetop Group Rockwell Automation Sentinent Corporation Scientific Monitoring Smartsignal Smiths Aerospace Sun Microsystems VEXTEC Corporation

Universities Auburn University Beihang University- China Georgia Institute of Technology University of California at Los Angeles University of Maryland University of Tennessee University of North Carolina Vanderbilt University Government NASA Glenn Research Center Sandia National Laboratories U.S. Air Force Research Laboratory U.S. Army Materiel Systems Analysis Activity U.S. Army Research Laboratory VTD U.S. Navy

xi

xii

Acknowledgements

The following researchers have made special contributions to this book as part of their work and studies at CALCE-University of Maryland or in cooperation with CALCE faculty: Prof Michael Pecht has an M.S. in Electrical Engineering and an M.S. and Ph.D. in Engineering Mechanics from the University of Wisconsin at Madison. He is a Professional Engineer, an IEEE Fellow and an ASME Fellow. He served as chief editor of the IEEE Transactions on Reliability for eight years and on the advisory board of IEEE Spectrum. He is chief editor for Microelectronics Reliability and an associate editor for the IEEE Transactions on Components and Packaging Technology. He is the founder of CALCE Center at the University of Maryland, College Park, where he is also a Chair Professor in Mechanical Engineering and also a Professor in Applied Mathematics. He has written more than twenty books on electronic products development, use and supply chain management and over 400 technical articles. He has been leading a research team in the area of prognostics for the past ten years, and has now formed the CALCE Prognostics and Health Management Consortium at the University of Maryland. He has consulted for over 50 major international electronics companies, providing expertise in strategic planning, design, test, prognostics, IP and risk assessment of electronic products and systems. He was awarded the highest reliability honor, the IEEE Reliability Society’s Lifetime Achievement Award in 2008. He has previously received the European Micro and Nano-Reliability Award for outstanding contributions to reliability research, 3M Research Award for electronics packaging, and the IMAPS William D. Ashman Memorial Achievement Award for his contributions in electronics reliability analysis. Michael Azarian is a Research Scientist at CALCE. He holds a Ph.D. in Materials Science and Engineering from Carnegie Mellon University, a M.S. in Metallurgical Engineering and Materials Science from Carnegie Mellon, and a B.S. in Chemical Engineering from Princeton University. His research interests include failure mechanisms in electronic components and assemblies and sensor systems for prognostics. He has contributed to Chapter 2. Shunfeng Cheng received a B.S. and M.S. in Mechanical Engineering from Huazhong University of Science and Technology, China. He is currently a Ph.D. student and graduate research assistant at CALCE, University of Maryland. His research interests include wireless sensors and algorithms for prognostics and health management of electronics. He has contributed to Chapter 2. Kiri Feldman received a B.S. in Mechanical Engineering and a B.A. in History from the University of Maryland in 2006. She received a M.S. in Mechanical Engineering from Univeresity of Maryland in 2008. She is a member of the American Society of Mechanical Engineers and of the National Defense Industrial Association. She contributed to Chapter 5 . Jie Gu received a B.S. in Mechanical Engineering from the University of Science and Technology in China, and M.S. in Mechanical Engineering from the National University of Singapore. He is an IEEE student member, and currently a Ph.D. student in Mechanical Engineering at the University of Maryland, in the area of prognostics and health management for electronics. He contributed to Chapters 1 and 4.

Acknowledgements

xiii

Craig Hershey has a B.S. in Electrical Engineering from the Pennsylvania State University and a Master's degree in Systems Engineering from the University of Maryland. Mr. Hershey has seven years experience in electronics reliability and for the past few years has been developing prognostics and condition based maintenance (CBM) for military ground vehicle systems at the U.S. Army Materiel Systems Analysis Activity. His job responsibilities include CBM system development, and hardware maintenance and installation. He also provides technical guidance and expertise in electronics reliability, reliability improvement methods, and physics-of-failure analyses. He was actively involved in the review of this book. Andrew Hess is a CALCE consultant, renowned for his work in fixed and rotary wing health monitoring and recognized as the father of naval aviation propulsion diagnostics. He has been the leading advocate for health monitoring in the Navy and has been instrumental in the development of every Navy aircraft application. Mr. Hess managed the development and integration of the prognostic and health management (PHM) system for the Joint Strike Fighter program from his position as Air System PHM IPT Lead. He has been acting on the advisory Red Team for the DARPA Prognosis programs for Propulsion and Structures. He is a NAVAIR senior engineering fellow and has authored over 30 technical papers. He was actively involved in the review of this book. Rubyca Jaai received a B.S. in Mechanical Engineering from Visvesvaraya Technological University in India. She is currently pursuing her M.S. in Mechanical Engineering at the University of Maryland, College Park. Her research interests include electronic prognostics and health management. She contributed to Chapter 3. Sachin Kumar received a B.S. in Metallurgical Engineering from the Bihar Institute of Technology and the M.Tech. in Reliability Engineering from the Indian Institute of Technology, Kharagpur. He is currently pursuing the Ph.D. degree in Mechanical Engineering at the University of Maryland, College Park. His research interests include reliability, electronic system prognostics, and health and usage monitoring systems. He has contributed to Chapter 3. Sony Mathew is currently a Research Faculty at CALCE. He completed his M.S. in Mechanical Engineering from University of Maryland. His research areas include prognostics and health management of electronics, reliability of electronic products, and tin whisker phenomenon. He is now pursuing his Ph.D. in Mechanical Engineering at the University of Maryland. He has Bachelors in Mechanical Engineering and Masters in Business Administration degrees from Pune University, India. He has contributed to Chapter 1 , 4 and appendices. He was actively involved in the review of this book. Hyunseok Oh received a B.S. in Mechanical Engineering from the Korea University and M.S. in Mechanical Engineering from Korea Advanced Institute of Science and Technology. He is currently pursuing the Ph.D. in Mechanical Engineering at the University of Maryland, College Park. His research interests include electronic prognostics and health management. He has contributed to Chapters 2 and 3. Peter Sandborn is a Professor in the CALCE Electronic Products and Systems Center at the University of Maryland. Dr. Sandborn's group develops life-cycle cost models and tools for complex systems including PHM, technology obsolescence management, design refresh planning, and electronic part management. He is the author of over 150 technical

xiv

Acknowledgements

publications and several books on electronic packaging and electronic systems cost analysis. He has contributed to Chapter 5. Vasilis Sotiris received a B.S. degree in Aerospace Engineering from Rutgers University and the M.S. in Mechanical Engineering from Columbia University. He worked as a systems engineer for Lockheed Martin Corporation concentrating on software development projects for the Federal Aviation Administration. He is currently pursuing the Ph.D. in Applied Mathematics at the University of Maryland. His research is in the field of applied statistics and computational mathematics related to diagnostics and prognostics. He has contributed to Chapter 3. Myra Torres, has over 18 years of experience in component reliability engineering and product risk assessment at Sun Microsystems. She managed the electronic component and interconnect technology team as well as a centralized signal integrity group at Sun. Her experience in PHM stems from market demands for reliable and available systems, increasing technology complexity of electronics, and the limitations of conventional reliability methods. At CALCE, Myra served as Assistant Director for PHM research and was involved with PHM methodologies and implementations for electronic systems. She has contributed to Chapter 6. Brian Tuchband received a B.S. in Mechanical Engineering from the University of Delaware and M.S. in Mechanical Engineering from the University of Maryland, where he conducted research in PHM. He is currently employed by The Boeing Company in Ridley Park, Pennsylvania. His research interests include challenges and opportunities for prognostics, health and usage monitoring systems, and predictive maintenance for military electronic systems. He contributed to Chapter 1 and helped develop the appendices. Nikhil Vichare received a B.S. in Production Engineering from the University of Mumbai, India, the M.S. in Industrial Engineering from the State University of New York at Binghamton, and the Ph.D. in Mechanical Engineering at the University of Maryland. He is currently employed by Dell, Inc. in Austin, Texas, and works on notebook computer reliability. His research interests include reliability, electronic system prognostics, and health and usage monitoring systems. He contributed to Chapter 1 and helped develop the appendices.

Acronyms ACARS ADIP AEW&C AFRL AHM AIT AL ALIS AME AMSAA AOC ASIGS AVPHM BAA BIT C2MS CAA CALCE CBM CDF CFRS CMAC CMMS CNST CSTH DARPA DoD DOE DTPS EFV EHDUR EOTS EPRI EPSC FCS FFT FIRST FOQA FUMS GPS HMS

Aircraft Communications and Reporting System A m y Diagnostic Improvement Program Airborne Early Warning & Control Air Force Research Laboratory Airplane Health Management Automatic Identification Technology Autonomics Logistics Autonomic Logistics Information System Automated Maintenance Environment Army Materiel Systems Analysis Activity Airline Operational Control Aircraft Structural Integrity Ground Station Air Vehicle Prognostics and Health Manager Broad Agency Announcements Built-in Test Corrosion & Corrosivity Monitoring System Civil Aviation Authority Center for Advanced Life Cycle Engineering Condition-Based Maintenance Common Data Format Computerized Fault Reporting System Cerebellar Model Arithmetic Computer Computerized Maintenance Management System Center for Naval Shipbuilding Technology Continuous System Telemetry Harness Defense Advanced Research Projects Agency Department of Defense Department of Energy Drive Train Prognostics System Expeditionary Fighting Vehicle Engine Health Diagnostics Using Radar Electrical Opt0 Targeting System Electric Power Research Institute Electronic Products and Systems Center Future Combat System Fast Fourier Transform FIA-18EIF Integrated Readiness Support Teaming Flight Operations Quality Assurance Flight Usage Management Software Global Positioning System Health Management System

Acronyms

xvi

HUMS I2C ICAS IETM IMIS iMP IPHM IVHM JAHUMS JITM JSF LCM MEMS MOD MPROS MSET MTE NASA NAVAIR NTF ODBC ONR PADHM PBL PCA PEDS PFAD PHM PHMC PMA ProDAPS PSMRS PTM RASCAL RCFIS REDI-PRO RFID RUL SAMS SBIR SBM SCADA SDCC SIPS SMPS SPOT SPRT TEDANN TIG TSMD

Health and Usage Monitoring System Inter Integrated Circuit Integrated Condition Assessment System Interactive Electronic Technical Manual Integrated Maintenance Information System “intelligent” Medium Power Integrated Prognostics and Health Management Integrated Vehicle Health Management Joint Advanced Health and Usage Monitoring System Just-in-Time Maintenance Joint Strike Fighter Life Consumption Monitoring Microelectromechanical System Ministry of Defense Machineiy Prognostics System Multivariate State Estimation Technique Molecular Test Equipment National Aeronautics and Space Administration Naval Air Systems Command No-Trouble-Found Open Database Connectivity Office of Naval Research Prognostics, Advanced Diagnostics, and Health Management Performance-Based Logistics Principal Component Analysis Prognostic Enhancements to Diagnostic Systems Predictive Failures and Advanced Diagnostics Prognostic Health Management Prognostics and Health Management Consortium Portable Maintenance Aid Probabilistic Diagnostic and Prognostic System Platform Soldier Mission Readiness System Predictive Trend Monitoring Rotorcraft Aircrew Systems Concepts Airborne Laboratory Reconfigurable Control and Fault Identification System Real-Time Engine Diagnostics-Prognostics Radio Frequency Identification Remaining Useful Life Sensor-Based Aircraft Maintenance Support Small Business Innovation Research Similarity-Based Modeling Supervisory Control and Data Acquisition System Dynamics Characterization and Control Structural Integrity Prognosis System Switch-Mode Power Supply Small Programmable Object Technology Sequential Probability Ratio Test Turbine Engine Diagnostics Using Artificial Neural Networks Technology Interest Group Time Stress Measurement Device

Acronyms

UAV USAF WRA

xvii

Unmanned Aerial Vehicle United States Air Force Weapon Replaceable Assembly

This Page Intentionally Left Blank

Chapter 1 Introduction As a result of intense global competition, companies are considering novel approaches to enhance the operational efficiency of their products. For many products and systems, high in-service reliability can be a means to ensure customer satisfaction. In addition, global competitive demands for increased warranties and the severe liability of product failures are encouraging manufacturers to improve field reliability and operational availability’, and provide knowledge of in-service use, life-cycle operational and environmental conditions. Interest has been growing in monitoring the ongoing health of products and systems in order to provide advance warning failure and assist in administration and logistics. Here, health is defined as the extent of degradation or deviation from an expected normal condition. Prognostics is the prediction of the future state of health based on current and historical health conditions [ 11. Electronics are integral to the functionality of most systems today, and their reliability is often critical for system reliability [2]. This chapter provides a basic understanding of prognostics and health monitoring of products and systems and the techniques being developed to enable prognostics for electronic systems.

1.1

Reliability and Prognostics

Reliability is the ability of a product or system to perform as intended (i.e., without failure and within specified performance limits) for a specified time, in its life-cycle environment. Traditional reliability prediction methods for electronic products include MilHDBK-217 [3], 217-PLUS, Telcordia [4], PRISM [ 5 ] , and FIDES [ 6 ] . These methods rely on the collection of failure data and generally assume the components of the system have failure rates (most often assumed to be constant) that can be modified by independent “modifiers” to account for various quality, operating, and environmental conditions. There are numerous well-documented concerns with this type of modeling approach [7-lo]. The general consensus is that these handbooks should never be used, because they are inaccurate for predicting actual field failures and provide highly misleading predictions, which can result in poor designs and logistics decisions [S][ 111.

Prognostics and Health hlanagement ofElectronics. By Michael G. Pecht Copyright ‘C 2008 John Wiley & Sons, Inc. Operational availability is defined as the degree (expressed as a decimal between 0 and 1, or the percentage equivalent) to Lvhich a piece of equipment or system can be expected to work properly when required. Operational a\ailability is often calculated by dividing uptime by the sum of uptime and downtime.

’

1

2

Prognostics and Health Management of Electronics

The traditional handbook method for the reliability prediction of electronics started with Mil-HDBK-217A, published in 1965. In this handbook, there was only a single point failure rate for all monolithic integrated circuits, regardless of the stresses, the materials, or the architecture. Mil-HDBK-2 17B was published in 1973, with the RCAIBoeing models simplified by the U.S. Air Force to follow a statistical exponential (constant failure rate) distribution. Since then, all the updates were mostly “band-aids” for a modeling approach that was proven to be flawed [12]. In 1987-1990, the Center for Advanced Life Cycle Engineering (CALCE) at the University of Maryland was awarded a contract to update Mil-HDBK-217. It was concluded that this handbook should be canceled and the use of this type of modeling approach discouraged. In 1998, the Institude of Electrical and Electronics Engineers (IEEE) 1413 standard, “IEEE Standard Methodology for Reliability Prediction and Assessment for Electronic Systems and Equipment,” was approved to provide guidance on the appropriate elements of a reliability prediction [ 131. A companion guidebook, IEEE 1413.1, “IEEE Guide for Selecting and Using Reliability Predictions Based on IEEE 1413,” provides information and an assessment of the common methods of reliability prediction for a given application [14]. It is shown that the Mil-HDBK-217 is flawed. There is also discussion of the advantage of reliability prediction methods that use stress and damage physics-of-failure (PoF) technique. The PoF approach and design-for-reliability (DfR) methods have been developed by CALCE [ 151 with the support of industry, government and other universities. PoF is an approach that utilizes knowledge of a product’s life-cycle loading and failure mechanisms to perform reliability modeling, design, and assessment. The approach is based on the identification of potential failure modes, failure mechanisms, and failure sites for the product as a function of its life-cycle loading conditions. The stress at each failure site is obtained as a function of both the loading conditions and the product geometry and material properties. Damage models are then used to determine fault generation and propagation. Prognostics and health management (PHM) is a method that permits the assessment of the reliability of a product (or system) under its actual application conditions. When combined with PoF models, it is thus possible to make continuously updated predictions based on the actual environmental and operational conditions. PHM techniques combine sensing, recording, interpretation of environmental, operational, and performance-related parameters to indicate a system’s health. PHM can be implemented through the use of various techniques to sense and interpret the parameters indicative of: Performance degradation, such as deviation of operating parameters from their expected values Physical or electrical degradation, such as material cracking, corrosion, interfacial delamination, or increases in electrical resistance or threshold voltage Changes in a life-cycle profile, such as usage duration and frequency, ambient temperature and humidity, vibration, and shock The framework for prognostics is shown in Figure 1.1. Performance data from various levels of an electronic product or system can be monitored in situ and analyzed using prognostic algorithms. Different implementation approaches can be adopted individually or in combination. These approaches will be discussed in subsequent sections. Ultimately, the objective is to predict the advent of failure in terms of a distribution of remaining life, level of degradation, or probability of mission survival.

Introduction

3

Figure 1.1: Framework for prognostics and health management.

1.2

PHM for Electronics

Most products and systems contain significant electronics content to provide needed functionality and performance. If one can assess the extent of deviation or degradation from an expected normal operating condition for electronics, this information can be used to meet several powerful goals, which include (1) providing advanced warning of failures; (2) minimizing unscheduled maintenance, extending maintenance cycles, and maintaining effectiveness through timely repair actions; (3) reducing the life-cycle cost of equipment by decreasing inspection costs, downtime, and inventory; and (4) improving qualification and assisting in the design and logistical support of fielded and future systems [l]. In other words, since electronics are playing an increasingly large role in providing operational capabilities for today’s products and systems, prognostic techniques have become highly desirable. Some of first efforts in diagnostic health monitoring of electronics involved the use of a built-in test (BIT), defined as an on-board hardware-software diagnostic means to identify and locate faults. A BIT can consist of error detection and correction circuits, totally self-checking circuits, and self-verification circuits [ 11. Two types of BIT concepts are employed in electronic systems: interruptive BIT (I-BIT) and continuous BIT (C-BIT). The concept behind I-BIT is that normal equipment operation is suspended during BIT operation. The concept behind C-BIT is that equipment is monitored continuously and automatically without affecting normal operation. Several studies [16, 171 conducted on the use of BIT for fault identification and diagnostics showed that BIT can be prone to false alarms and can result in unnecessary costly replacement, requalification, delayed shipping, and loss of system availability. BIT concepts are still being developed to reduce the occurrence of spurious failure indications. However, there is also reason to believe that many of the failures actually occurred but were intermittent in nature [18]. The persistence of such issues over the years is perhaps because the use of BIT has been restricted to low-volume systems. Thus, BIT has generally not been designed to provide prognostics or remaining useful life due to accumulated damage or progression of faults. Rather, it has served primarily as a diagnostic tool.

4

Prognostics and Health Management of Electronics

PHM has also emerged as one of the key enablers for achieving efficient system-level maintenance and lowering life-cycle costs in military systems. In November 2002, the U.S. Deputy under secretary of Defense for Logistics and Materiel Readiness released a policy called condition-based maintenance plus (CBM+). CBM+ represents an effort to shift unscheduled corrective equipment maintenance of new and legacy systems to preventive and predictive approaches that schedule maintenance based upon the evidence of need. A 2005 survey of 11 CBM programs highlighted “electronics prognostics” as one of the most needed maintenance-related features or applications without regard for cost [ 191, a view also shared by the avionics industry [20]. Department of Defense 5000.2 policy document on defense acquisition states that “program managers shall optimize operational readiness through affordable, integrated, embedded diagnostics and prognostics, embedded training and testing, serialized item management, automatic identification technology, and iterative technology refreshment [ 181.” Thus, a prognostics capability has become a requirement for any system sold to the U.S. Department of Defense. PHM is also emerging as a high-priority issue in space applications. NASA’s Ames Research Center (ARC) in California is focused on conducting fundamental research in the field of integrated systems health management (ISHM). ARC is involved in design of health management systems, selection and optimization of sensors, in situ monitoring, data analysis, prognostics, and diagnostics. The prognostics center for excellence at ARC develops algorithms to predict the remaining life of NASA’s systems and subsystems. ARC’S current prognostics projects involve power semiconductor devices (investigation of the effects of aging on power semiconductor components, identification of failure precursors to build a PoF model, and development of algorithms for end-of-life prediction), batteries (algorithms for batteries prognosis), flight actuators (PoF modeling and development of algorithms for estimation of remaining life), solid rocket motor failure prediction, and aircraft wiring health management [2 11. In addition to in-service reliability assessment and maintenance, health monitoring can also be effectively used to support product take-back and end-of-life decisions. Product take-back indicates the responsibility of manufacturers for their products over the entire life cycle, including disposal. The motivation driving product take-back is the concept of extended producer responsibility (EPR) for post-consumer electronic waste [22]. The objective of EPR is to make manufacturers and distributors financially responsible for their products when they are no longer needed. End-of-life product recovery strategies include repair, refurbishing, remanufacturing, reuse of components, material recycling, and disposal. One of the challenges in end-of-life decision making is to determine whether product lines can be extruded, whether any components could be reused, and what subset should be disposed of in order to minimize system costs [23]. Several interdependent issues must be considered concurrently to properly determine the optimum component re-use ratio, including assembly/disassembly costs and any defects introduced by the process, product degradation incurred in the original life cycle, and the waste stream associated with the life cycle. Among these factors, the estimate of the degradation of the product in its original life cycle could be the most uncertain input to end-of-life decisions. This could be effectively carried out using health monitoring, with knowledge of the entire history of the product’s life cycle. Scheidt et al. [24] proposed the development of special electrical ports, referred to as green ports, to retrieve product usage data that could assist in the recycling and reuse of electronic products. Klausner et al. [25, 261 proposed the use of an integrated electronic data log (EDL) for recording parameters indicative of product degradation. The EDL was implemented on electric motors to increase the reuse of motors. In another study, [27] domestic appliances were monitored for collecting usage data by means of electronic units

Introduction

5

fitted on the appliances. This work introduced the life cycle data acquisition unit, which can be used for data collection and also for diagnostics and servicing. Middendorf et al. [28] suggested developing life information modules to record the cycle conditions of products for reliability assessment, product refurbishing, and reuse. Designers often establish the usable life of products and warranties based on extrapolating accelerated test results to assumed usage rates and life-cycle conditions. These assumptions may be based on worst-case scenarios of various parameters composing the end-user environment. Thus if the assumed conditions and actual use conditions are the same, the product would last for the designed time, as shown in Figure 1.2 a. However, this is rarely true. and usage and environmental conditions could vary significantly from those assumed. For example, consider products equipped with life consumption monitoring systems for providing in situ assessment of remaining life. In this situation, even if the product is used at a higher usage rate and in harsh conditions, it can still avoid unscheduled maintenance and catastrophic failure, maintain safety, and ultimately save cost. These are typically the motivational factors for use of health monitoring or life consumption monitoring, as shown in Figure I .2 b. One of the vital inputs in making end-of-life decisions is the estimate of degradation and the remaining life of the product. Figure 1.2 c illustrates a scenario in which a working product is returned at the end of its designed life. Using the health monitors installed within the product, the reusable life can be assessed. Unlike testing conducted after the product is returned, this estimate can be made without having to disassemble the product. Ultimately, depending on other factors such as cost of the product, demand for spares, cost, and yield in assembly and disassembly, the manufacturer can choose to reuse or dispose.

Failure Designed severity

Time Design life

(a) Usage as per design

6

Prognostics and Health Management of Electronics

(c) Less severe usage than intended design Figure 1.2: Application of health monitoring for product reuse.

1.3

PHM Concepts and Methods

The general PHM methodology is shown in Figure 1.3 [29]. The first step involves a virtual life assessment, where design data, expected life-cycle conditions, failure modes. mechanisms, and effects analysis (FMMEA), and PoF models are the inputs to obtain a reliability (virtual life) assessment. Based on the virtual life assessment, it is possible to prioritize the critical failure modes and failure mechanisms. The existing sensor data, bus monitor data, and maintenance and inspection record can also be used to identify the abnormal conditions and parameters. Based on this information, the monitoring parameters and sensor locations for PHM can be determined.

7

Introduction

Based on the collected operational and environmental data, the health status of the products can be assessed. Damage can also be calculated from the PoF models to obtain the remaining life. Then PHM information can be used for maintenance forecasting and decisions that minimize life-cycle costs, or maximize availability or some other utility function.

Figure 1.3: CALCE PHM methodology. The different approaches to prognostics and the state of research in electronics PHM are presented here. Three current approaches include (1) the use of fuses and canary devices; (2) monitoring and reasoning of failure precursors; and ( 3 ) monitoring environmental and usage loading for PoF-based stress and damage modeling.

1.3.1

Fuses and Canaries

Expendable devices, such as fuses and canaries, have been a traditional method of protection for structures and electrical power systems. Fuses and circuit breakers are examples of elements used in electronic products to sense excessive current drain and to disconnect power. Fuses within circuits safeguard parts against voltage transients or excessive power dissipation and protect power supplies from shorted parts. For example, thermostats can be used to sense critical temperature limiting conditions and to shut down the product, or a part of the system, until the temperature returns to normal. In some products, self-checking circuitry can also be incorporated to sense abnormal conditions and to make adjustments to restore normal conditions or to activate switching means to compensate for a malfunction [30]. The word “canary” is derived from one of coal mining’s earliest systems for warning of the presence of hazardous gas using the canary bird. Because the canary is more sensitive to hazardous gases than humans, the death or sickening of the canary was an indication to the miners to get out of the shaft. The canary thus provided an effective early warning of catastrophic failure that was easy to interpret. The same approach has been employed in prognostic health monitoring. Canary devices mounted on the actual product can also be used to provide advance warning of failure due to specific wearout failure mechanisms.

Prognostics and Health Management of Electronics

8

Mishra et al. [3 11 studied the applicability of semiconductor-level health monitors by using pre-calibrated cells (circuits) located on the same chip with the actual circuitry. The prognostics cell approach, known as Sentinel SemiconductorTu technology, has been commercialized to provide an early warning sentinel for upcoming device failures [32]. The prognostic cells are available for 0.35-pm, 0.25-pm, and 0.18-pm complementary metal-oxide-semiconductor (CMOS) processes; the power consumption is approximately 600 pW, The cell size is typically 800 pm2 at the 0.25-pm process size. Currently, prognostic cells are available for semiconductor failure mechanisms such as electrostatic discharge (ESD), hot carrier, metal migration, dielectric breakdown, and radiation effects. The time to failure of prognostic canaries can be precalibrated with respect to the time to failure of the actual product. Because of their location, these canaries contain and experience substantially similar dependencies as does the actual product. The stresses that contribute to degradation of the circuit include voltage, current, temperature, humidity, and radiation. Since the operational stresses are the same, the damage rate is expected to be the same for both circuits. However, the prognostic canary is designed to fail faster through increased stress on the canary structure by means of scaling. Scaling can be achieved by controlled increase of the stress (e.g., current density) inside the canaries. With the same amount of current passing through both circuits, if the cross-sectional area of the current-carrying paths in the canary is decreased, a higher current density is achieved. Further control in current density can be achieved by increasing the voltage level applied to the canaries. A combination of both of these techniques can also be used. Higher current density leads to higher internal (joule) heating, causing greater stress on the canaries. When a current of higher density passes through the canaries, they are expected to fail faster than the actual circuit [31]. Figure 1.4 shows the failure distribution of the actual product and the canary health monitors. Under the same environmental and operational loading conditions, the canary health monitors wear out faster to indicate the impending failure of the actual product. Canaries can be calibrated to provide sufficient advance warning of failure (prognostic distance) to enable appropriate maintenance and replacement activities. This point can be adjusted to some other early indication level. Multiple trigger points can also be provided using multiple canaries spaced over the bathtub curve.

Failure probability density distribution for canary health monitors

The

Failure probability density distribution for a c t r d product

-

Figure 1.4: Advanced warning of failure using canary structures.

lntroduction

9

Goodman et al. [33] used a prognostic canary to monitor time-dependent dielectric breakdown (TDDB) of the metal-oxide-semiconductor field-effect transistor (MOSFET) on the integrated circuits. The prognostic canary was accelerated to failure under certain environmental conditions. Acceleration of the breakdown of an oxide could be achieved by applying a voltage higher than the supply voltage to increase the electric field across the oxide. When the prognostics canary failed, a certain fraction of the circuit lifetime was used up. The fraction of consumed circuit life was dependent on the amount of over voltage applied and could be estimated from the known distribution of failure times. The extension of this approach to board-level failures was proposed by Anderson et al. [34], who created canary components (located on the same printed circuit board) that include the same mechanisms that lead to failure in actual components. Anderson et al. identified two prospective failure mechanisms: (1) low cycle fatigue of solder joints, assessed by monitoring solder joints on and within the canary package, and (2) corrosion monitoring, using circuits that are susceptible to corrosion. The environmental degradation of these canaries was assessed using accelerated testing, and degradation levels were calibrated and correlated to actual failure levels of the main system. The corrosion test device included an electrical circuitry susceptible to various corrosion-induced mechanisms. Impedance spectroscopy was proposed for identifying changes in the circuits by measuring the magnitude and phase angle of impedance as a function of frequency. The change in impedance characteristics can be correlated to indicate specific degradation mechanisms. There remain unanswered questions with the use of fuses and canaries for PHM. For example, if a canary monitoring a circuit is replaced, what is the impact when the product is re-energized? What protective architectures are appropriate for postrepair operations? What maintenance guidance must be documented and followed when fail-safe protective architectures have or have not been included? The canary approach is also difficult to implement in legacy systems because it may require requalification of the entire system with the canary module. Also, the integration of fuses and canaries with the host electronic system could be an issue with respect to real estate on semiconductors and boards. Finally, the company must ensure that the additional cost of implementing PHM can be recovered through increased operational and maintenance efficiencies.

1.3.2

Monitoring and Reasoning of Failure Precursors

A failure precursor is a data event or trend that signifies impending failure. A precursor indication is usually a change in a measurable variable that can be associated with subsequent failure. For example, a shift in the output voltage of a power supply might suggest impending failure due to a damaged feedback regulator and opto-isolator circuitry. Failures can then be predicted by using causal relationships between measured variables that can be correlated with subsequent failure and for PoF. A first step in failure precursor PHM is to select the life-cycle parameters to be monitored. Parameters can be identified based on factors that are crucial for safety, that are likely to cause catastrophic failures, that are essential for mission completeness, or that can result in long downtimes. Selection can also be based on knowledge of the critical parameters established by past experience. field failure data on similar products, and qualification testing. More systematic methods, such as FMMEA [35], can also be used to determine parameters that need to be monitored. Pecht et al. [36] proposed several measurable parameters that can be used as failure precursors for electronic products, including switching power supplies, cables and connectors, CMOS integrated circuits (ICs), and voltage-controlled high-frequency oscillators (see Table 1.1).

Prognostics and Health Management of Electronics

10

Table 1.1: Potential Failure Precursors for Electronics [36] Failure Precursor

Electronic Subsystem

Switching powrer supply

Direct-current (DC) output (voltage and current levels) Ripple Pulse width duty cycle Efficiency Feedback (voltage and current levels) Leakage current Radio frequency (RF) noise

Cables and connectors

Impedance changes Physical damage High-energy dielectric breakdown

CMOS 1C

Supply leakage current Supply current variation Operating signature Current noise Logic-level variations

Voltage-controlled oscillator

Output frequency Power loss Efficiency Phase distortion Noise

Field effect transistor

Gate leakage currentiresistance Drain-source leakage currentiresistance

Ceramic chip capacitor

Leakage currentiresistance Dissipation factor RF noise

I/

(I !

Reverse leakage current Forward voltage drop Thermal resistance Power dissipation RF noise Electrolytic capacitor

Leakage currentiresistance Dissipation factor RF noise

RF power amplifier

Voltage standing wave ratio (VSWR) Power dissipation Leakage current

In general, to implement a precursor reasoning-based PHM system, it is necessary to identify the precursor variables for monitoring and then develop a reasoning algorithm to correlate the change in the precursor variable with the impending failure. This characterization is typically performed by measuring the precursor variable under an expected or accelerated usage profile. Depending on the characterization, a model is

Introduction

11

developed-typically a parametric curve-fit, neural network, Bayesian network or a timeseries trending of a precursor signal. This approach assumes that there is one or more expected usage profiles that are predictable and can be simulated, often in a laboratory setup. In some products the usage profiles are predictable, but this is not always true. For a fielded product with highly varying usage profiles, an unexpected change in the usage profile could result in a different (noncharacterized) change in the precursor signal. If the precursor reasoning model is not characterized to factor in the uncertainty in life-cycle usage and environmental profiles, it may provide false alarms. Additionally, it may not always be possible to characterize the precursor signals under all possible usage scenarios (assuming they are known and can be simulated). Thus, the characterization and model development process can often be time consuming and costly and may not always work. There are many examples of the monitoring and trending of failure precursor to assess health and product reliability. Some key studies are presented below. Smith and Campbell [37] developed a quiescent current monitor (QCM) that can detect elevated Iddq current in real time during operation2. The QCM performed leakage current measurements on every transition of the system clock to get maximum coverage of the IC in real time. Pecuh et al. [38] and Xue and Walker [39] proposed a low-power built-in current monitor for CMOS devices. In the Pecuh, et al., study, the current monitor was developed and tested on a series of inverters for simulating open and short faults. Both fault types were successfully detected and operational speeds of up to 100 MHz were achieved with negligible effect on the performance of the circuit under test. The current sensor developed by Xue and Walker enabled Iddq monitoring at a resolution level of 10 PA. The system translated the current level into a digital signal with scan chain readout. This concept was verified by fabrication on a test chip. GMA Industries [40-421 proposed embedding molecular test equipment (MTE) within ICs to enable them to continuously test themselves during normal operation and to provide a visual indication that they have failed. The molecular test equipment could be fabricated and embedded within the individual IC in the chip substrate. The molecular-sized sensor "sea of needles" could be used to measure voltage, current, and other electrical parameters, as well as sense changes in the chemical structure of integrated circuits that are indicative of pending or actual circuit failure. This research focuses on the development of specialized doping techniques for carbon nanotubes to form the basic structure comprising the sensors. The integration of these sensors within conventional IC circuit devices, as well as the use of molecular wires for the interconnection of sensor networks, is an important factor in this research. However, no product or prototype has been developed to date. Kanniche and Mamat-Ibrahim [43] developed an algorithm for health monitoring of voltage source inverters with pulse width modulation. The algorithm was designed to detect and identify transistor open-circuit faults and intermittent misfiring faults occurring in electronic drives. The mathematical foundations of the algorithm were based on discrete wavelet transform (DWT) and fuzzy logic (FL). Current waveforms were monitored and

' The power supply current (Idd) can be defined by two elements: the Iddq-quiescent current and the Iddt-transient or dynamic current. lddq is the leakage current drawn by the CMOS circuit when it is in a stable (quiescent) state and lddt is the supply current produced by circuits under test during a transition period after the input has been applied. It has been reported that Iddq has the potential for detecting defects such as bridging, opens, and parasitic transistor defects. Operational and environmental stresses, such as temperature, voltage, and radiation, can quickly degrade previously undetected faults and increase the leakage current (Iddq). There is extensive literature on Iddq testing, but little has been done on using Iddq for in situ PHM. Monitoring Iddq has been more popular than monitoring Iddt [37-391.

Prognostics and Health Management of Electronics

12

continuously analyzed using DWT to identify faults that may occur due to constant stress, voltage swings, rapid speed variations, frequent stopistart-ups, and constant overloads. After fault detection, “if-then’’ fuzzy rules were used for very large scale integratd (VLSI) fault diagnosis to pinpoint the fault device. The algorithm was demonstrated to detect certain intermittent faults under laboratory experimental conditions. Self-monitoring analysis and reporting technology (SMART), currently employed in select computing equipment for hard disk drives (HDD), is another example of precursor monitoring [44-451. HDD operating parameters, including the flying height of the head, error counts, variations in spin time, temperature, and data transfer rates, are monitored to provide advance warning of failures (see Table 1.2). This is achieved through an interface between the computer’s start-up program (BIOS) and the HDD.

Table 1.2: Monitoring Parameters Based on Reliability Concerns in Hard Drives ~

Reliability Issues

~

~~~

~~~~

Parameters Monitored ~

Head assembly - Crack on head - Head contamination or resonance - Bad connection to electronics module Motorsibearings - Motor failure - Worn bearing - Excessive run-out - No spin Electronic module - Circuitichip failure - lnterconnectionisolder joint failure - Bad connection to drive or bus Media - Scratch/defects - Retries - Bad servo - ECC corrections

1

Head flying height: A downward trend in flying height will often precede a head crash. Error checking and orrection (ECC) use and error counts: The number of errors encountered by the drive, even if corrected internally. often signals problems developing with the drive. Spin-up time: Changes in spin-up time can reflect problems with the spindle motor. Temperature: Increases in drive temperature often signal spindle motor problems. Data throughput: Reduction in the transfer rate of data can signal various internal problems.

Systems for early fault detection and failure prediction are being developed using variables such as current, voltage, and temperature continuously monitored at various locations inside the system. Sun Microsystems refers to this approach as continuous system telemetry harnesses [46]. Along with sensor information, soft performance parameters such as loads, throughputs, queue lengths, and bit error rates are tracked. Prior to PHM implementation, characterization is conducted by monitoring the signals of different variables to establish a multivariate state estimation technique (MSET) model of the “healthy” systems. Once the healthy model is established using these data, it is used to predict the signal of a particular variable based on learned correlations among all variables [47]. Based on the expected variability in the value of a particular variable during application, a sequential probability ratio test (SPRT) is constructed. During actual monitoring, SPRT is used to detect deviations of the actual signal from the expected signal based on distributions (and not on a single threshold value) [48, 491. This signal is generated in real time based on learned correlations during characterization (see Figure 1.5). A new signal of residuals is generated, which is the arithmetic difference of the actual and expected time-series signal values. These differences are used as input to the SPRT model, which

13

Introduction

continuously analyzes the deviations and provides an alarm if the deviations are of concern [47]. The monitored data are analyzed to provide alarms based on leading indicators of failure and enable use of monitored signals for fault diagnosis, root cause analysis, and analysis of faults due to software aging [50]. Actual signal Expected signal values values

;f.

x=T’;/ model

SPRT

~

Difference

X”

Alarm Residual

Figure 1.5: Sun Microsystems’ approach to PHM Brown et al. [51] demonstrated that the remaining useful life of a commercial global positioning system (GPS) can be predicted by using a precursor-to-failure approach. The failure modes for GPS included precision failure due to an increase in position error and solution failure due to increased outage probability. These failure progressions were monitored in situ by recording system-level features reported using the national marine electronics association (NMEA) protocol 01 83. The GPS was characterized to collect the principal feature value for a range of operating conditions. Based on experimental results, parametric models were developed to correlate the offset in the principal feature value with solution failure. During the experiment, the BIT provided no indication of an impending solution failure [ 5 11.

1.3.3

Monitoring Environmental and Usage Profiles for Damage Modeling

The life-cycle profile of a product consists of manufacturing, storage, handling, and operating and nonoperating conditions. The life-cycle loads (Table 1.3) both individually or in various combinations, may lead to performance or physical degradation of the product and reduce its service life [52].The extent and rate of product degradation depend upon the magnitude and duration of exposure (usage rate, frequency, and severity) to such loads. If one can measure these loads in situ, the load profiles can be used in conjunction with damage models to assess the degradation due to cumulative load exposures. Table 1.3: Examples of Life-Cycle Loads __

Thermal

I I

Mechanical

Steady-state temperature, temperature ranges, temperature cycles, temperature gradients, ramp rates, heat dissipation

I Pressure magnitude,pressure gradient, vibration, shock load, acoustic level, ~

Chemical

I/ Physical / Electrical

strain, stress

, Agressive versus inert environment, humidity level, contamination, ozone, i

I

1

/

pollution, fuel spills Radiation, electromagnetic interference, altitude Current, voltage, power, resistance

I /I

Prognostics and Health Management of Electronics

14

The assessment of the impact structures and components was introduced the life consumption combined in situ measured loads remaining product life

of life-cycle usage and environmental loads on electronic studied by Ramakrishnan and Pecht [52]. This study monitoring (LCM) methodology (Figure 1.6), which with physics-based stress and damage models to assess

Step 1 : Conduct failure modes, mechanisms, and effect analysis

Step 2: Conduct a virtual reliability assessment to assess the failure mechanisms with earliest time-to-failure

Step 3: Monitor appropriate product parameters such as environmental (e.g, shock, vibration, temperature, humidity) operational (e.g., voltage, power, heat dissipation)

Step 4: Conduct data simplification for model input

Step 5: Perform damage assessment and damage accumulation

Step 6 : Estimate the remaining life of the product (e.g., data trending, forecasting models, regression analysis)

Is the remaining life acceptableq

Yes

Continue monitoring

NO

Schedule a maintenance action

Figure 1.6: CALCE life consumption monitoring methodology. Mathew et al. [53] applied the LCM methodology to conduct a prognostic remaining life assessment of circuit cards inside a space shuttle solid rocket booster (SRB). Vibration-time history, recorded on the SRB from the prelaunch stage to splashdown, was used in conjunction with physics-based models to assess damage. Using the entire life-cycle loading profile of the SRBs, the remaining life of the components and structures on the circuit cards were predicted. It was determined that an electrical failure was not expected within another 40 missions. However, vibration and shock analysis exposed an unexpected failure of the circuit card due to a broken aluminum bracket mounted on the circuit card. Damage accumulation analysis determined that the aluminum brackets had lost significant life due to shock loading. Shetty et al. [54] applied the LCM methodology to conduct a prognostic remaining-life assessment of the end-effector electronics unit (EEEU) inside the robotic arm of the space

15

Introduction

shuttle remote manipulator system (SMRS). A life-cycle loading profile of thermal and vibrational loads was developed for the EEEU boards. Damage assessment was conducted using physics-based mechanical and thermomechanical damage models. A prognostic estimate using a combination of damage models, inspection, and accelerated testing showed that there was little degradation in the electronics and they could be expected to last another 20 years. Gu et al. [55] developed a methodology for monitoring, recording, and analyzing the life-cycle vibration loads for remaining-life prognostics of electronics. The responses of printed circuit boards to vibration loading in terms of bending curvature were monitored using strain gauges. The interconnect strain values were then calculated from the measured printed circuit board (PCB) response and used in a vibration failure fatigue model for damage assessment. Damage estimates were accumulated using Miner’s rule after every mission and then used to predict the life consumed and remaining life. The methodology was demonstrated for remaining-life prognostics of a PCB assembly. The results were also verified by checking the resistance data. In case studies [52, 561, an electronic component board assembly was placed under the hood of an automobile and subjected to normal driving conditions. Temperature and vibrations were measured in situ in the application environment. Using the monitored environmental data, stress and damage models were developed and used to estimate consumed life. Figure 1.7 shows estimates obtained using similarity analysis and the actual measured life. Only LCM accounted for this unforeseen event because the operating environment was being monitored in situ. Estimated life after 5 days of data collection 46 days

Day of car accldent Estimated life after accident = 40 days

Estimated life based on similarity analysis = 25 days

F

% 10 E

Actual life from resistance monitoring = 39 days

0

5

10

15

20

25

30

35

40

45

50

Time in Use (days) Figure 1.7: Remaining-life estimation of test board.

Vichare et al. [l] outlined generic strategies for in situ load monitoring, including selecting appropriate parameters to monitor and designing an effective monitoring plan. Methods for processing the raw sensor data during in situ monitoring to reduce the memory requirements and power consumption of the monitoring device were presented. Approaches were also presented for embedding intelligent front-end data processing capabilities in monitoring systems to enable data reduction and simplification (without sacrificing relevant load information) prior to input in damage models for health assessment and prognostics.

16

Prognostics and Health Management of Electronics

To reduce on-board storage space, power consumption, and uninterrupted data collection over longer durations, Vichare et al. [S7]suggested embedding data reduction and load parameter extraction algorithms into sensor modules. As shown in Figure 1.8, a timeload signal can be monitored in situ using sensors and further processed to extract cyclic range (As), cyclic mean load (S,,,,), and rate of change of load (dsidt), using embedded load extraction algorithms. The extracted load parameters can be stored in appropriately binned histograms to achieve further data reduction. Downloaded binned data can be used to estimate the distributions of the load parameters. The usage history is used for damage accumulation and remaining life prediction.

Figure 1.8: Load feature extraction Efforts to monitor life-cycle load data on avionics modules can be found in timestress measurement device (TSMD) studies. Over the years TSMD designs have been upgraded using advanced sensors, and miniaturized TSMDs are being developed with advances in microprocessor and nonvolatile memory technologies [S8]. Searls et al. [S9] undertook in situ temperature measurements in both notebook and desktop computers used in different parts of the world. In terms of the commercial applications of this approach, IBM has installed temperature sensors on hard drives (Drive TIP) [60] to mitigate risks due to severe temperature conditions, such as thermal tilt of the disk stack and actuator arm, off-track writing, data corruptions on adjacent cylinders, and outgassing of lubricants on the spindle motor. The sensor is controlled using a dedicated algorithm to generate errors and control fan speeds. Strategies for efficient in situ health monitoring of notebook computers were provided by Vichare et al. [61]. In this study, the authors monitored and statistically analyzed the temperatures inside a notebook computer, including those experienced during usage, storage, and transportation, and discussed the need to collect such data both to improve the thermal design of the product and to monitor prognostic health. The temperature data were processed using an ordered overall range (OOR) to convert an irregular time-temperature history into peaks and valleys and also to remove noise due to small cycles and sensor variations. A three-parameter rainflow algorithm was then used to process the OOR results to extract full and half cycles with cyclic range, mean, and ramp rates. The effects of power cycles, usage history, central processing unit (CPU) computing resources usage, and external thermal environment on peak transient thermal loads were characterized.

Introduction

17

In 2001, the European Union funded a four year project, “Environmental Life-Cycle Information Management and Acquisition” (ELIMA), which aimed to develop ways to manage the life cycles of products [62]. The objective of this work was to predict the remaining life time of parts removed from products, based on dynamic data, such as operation time, temperature, and power consumption. As a case study, the member companies monitored the application conditions of a game console and a household refrigerator. The work concluded that, in general, it was essential to consider the environments associated with all life intervals of the equipment. These included not only the operational and maintenance environments but also the preoperational environments, when stresses may be imposed on the parts during manufacturing, assembly, inspection, testing, shipping, and installation. Such stresses are often overlooked but can have a significant impact on the eventual reliability of equipment. Skormin et al. [63] developed a data-mining model for failure prognostics of avionics units. The model provided a means of clustering data on parameters measured during operation, such as vibration, temperature, power supply, functional overload, and air pressure. These parameters are monitored in situ on the flight using time-stress measurement devices. Unlike the physics-based assessments made by Ramakrishnan [52], the data-mining model relies on statistical data of exposures to environmental factors and operational conditions. Tuchband et al. [64] presented the use of prognostics for a military line replaceable units (LRU) based on their life-cycle loads. The study was part of an effort funded by the Office of Secretary of Defense to develop an interactive supply chain system for the U.S. military. The objective was to integrate prognostics, wireless communication, and databases through a web portal to enable cost-effective maintenance and replacement of electronics. The study showed that prognostics-based maintenance scheduling could be implemented into military electronic systems. The approach involves an integration of embedded sensors on the LRU, wireless communication for data transmission, a PoF-based algorithm for data simplification and damage estimation, and a method for uploading this information to the Internet. Finally, the use of prognostics for electronic military systems enabled failure avoidance, high availability, and reduction of life-cycle costs. The PoF models can be used to calculate the remaining useful life, but it is necessary to identify the uncertainties in the prognostic approach and assess the impact of these uncertainties on the remaining-life distribution in order to make risk-informed decisions. With uncertainty analysis, a prediction can be expressed as a failure probability. Gu et al. [65] implemented the uncertainty analysis of prognostics for electronics under vibration loading. Gu identified the uncertainty sources and categorized them into four different types: measurement uncertainty, parameter uncertainty, failure criteria uncertainty, and future usage uncertainty (see Figure 1.9). Gu et al. [65] utilized a sensitivity analysis to identify the dominant input variables that influence the model output. With information of input parameter variable distributions, a Monte Carlo simulation was used to provide a distribution of accumulated damage. From the accumulated damage distributions, the remaining life was then predicted with confidence intervals and conficence limits (CL). A case study was also presented for an electronic board under vibration loading and a step-bystep demonstration of the uncertainty analysis implementation. The results showed that the experimentally measured failure time was within the bounds of the uncertainty analysis prediction.

18

Prognostics and Health Management of Electronics

Figure 1.9: Uncertainty implementation for prognostics.

1.4

Implementation of PHM in a System of Systems

System of systems is the term used to describe a complex system comprising many different subsystems that may be structurally or functionally connected. These different subsystems might themselves be made up of different subsystems. In a system of systems many independent subsystems are integrated such that the individual functions of the subsystems are combined to achieve a capabilityifunction beyond the capability of the individual subsystems. For example, a military aircraft is made up of subsystems, including: airframe, body, engines. landing gear, wheels, weapons, radar, avionics etc. Avionic subsystems could include the communication navigation and identification (CNI) system, GPS, inertial navigation system (INS), identification friend or foe (IFF) system, landing aids, and voice and data communication systems. Implementing an effective PHM strategy for a complete system of systems requires integrating different prognostic and health monitoring approaches. Because the systems are so complex, the first step in implementation of prognostics is to determine the weak link(s) in the system. One of the ways to achieve this is by conducting a FMMEA for the product. Once the potential failure modes, mechanisms, and effects have been identified, a combination of canaries, precursor reasoning, and life-cycle damage modeling may be implemented for different subsystems of the product, depending on their failure attributes. Once the monitoring techniques have been decided, the next step is to analyze the data. Different data analysis approaches like data-driven models, PoF-based models, or hybrid data analysis models can be used to analyze the same recorded data. For example, operational loads of computer system electronics such as temperature, voltage, current, and acceleration can be used with PoF-damage models to calculate the susceptibility to electromigration between metallization and thermal fatigue of interconnects, plated-through holes, and die attach. Also, data about the CPU usage, current, and CPU temperature, for example, can be used to build a statistical model that is based on the correlations between

Introduction

19

these parameters. This data-driven model can be appropriately trained to detect thermal anomalies and identify signs for certain transistor degradation. Implementation of prognostics for a system of systems is complicated and in the very initial stages of research and development. But there has been tremendous development in certain areas related to PHM. Advances in sensors, microprocessors, compact nonvolatile memory, battery technologies, and wireless telemetry have already enabled the implementation of sensor modules and autonomous data loggers. Integrated, miniaturized, low-power, reliable sensor systems operated using portable power supplies (such as batteries) are being developed. These sensor systems have a self-contained architecture requiring minimum or no intrusion into the host product, in addition to specialized sensors for monitoring localized parameters. Sensors with embedded algorithms will enable fault detection, diagnostics, and remaining-life prognostics, which will ultimately drive the supply chain. The prognostic information will be linked via wireless communications to relay needs to maintenance officers. Automatic identification techniques such as radio frequency identification (WID) will be used to locate parts in the supply chain, all integrated through a secure web portal to acquire and deliver replacement parts quickly on an as-needed basis. Research is being conducted in the field of algorithm development to analyze, trend, and isolate large-scale multivariate data. Methods like projection pursuit using principal component analysis and support vector machines, Mahanalobis distance analysis, symbolic time-series analysis, neural networks analysis, and Bayesian networks analysis can be used to process multivariate data. Even though there are advances in certain areas related to prognostics, many challenges still remain. The key issues with regard to implementing PHM for a system of systems include decisions of which systems within the system of systems to monitor, which system parameters to monitor, selection of sensors to monitor parameters, power supply for sensors, on-board memory for storage of sensed data, in situ data acquisition, and feature extraction from the collected data. It is also a challenge to understand how failures in one system affect another system within the system of systems and how it affects the functioning of the overall system of systems. Getting information from one system to the other could be hard, especially when the systems are made by different vendors. Other issues to be considered before implementation of PHM for system of systems are the economic impact due to such a program, contribution of PHM implementation to a condition-based maintenance, and logistics. The elements necessary for a PHM application are available, but the integration of these components to achieve the prognostics for a system of systems is still in the works. In the future, electronic system designs will integrate sensing and processing modules that will enable in situ PHM. A combination of different PHM implementations for different subsystems of a system of system will be the norm for the industry.

1.5

Summary

Due to the increasing amount of electronics in the world and the competitive drive toward more reliable products, PHM is being looked upon as a cost-effective solution for the reliability prediction of electronic products and systems. Approaches for implementing PHM in products and systems include (1) installing built-in structures (fuses and canaries) that will fail faster than the actual product when subjected to application conditions; (2) monitoring and reasoning of parameters (e.g., system characteristics, defects, performance) that are indicative of an impending failure; and (3) monitoring and modeling environmental and usage data that influence the system’s health and converting the measured data into life

Prognostics and Health Management of Electronics

20

consumed. A combination of these approaches may be necessary to successfully assess the degradation of a product or system in real time and subsequently provide estimates of remaining useful life.

References 1 . N. Vichare, and M. Pecht, “Prognostics and Health Management of Electronics.” IEEE Transactions on Components and Packaging Technologies, Vol. 29, No. 1, pp. 222229, March 2006.

2. N. Vichare, P. Rodger, V. Eveloy, and M. Pecht, “Environment and Usage Monitoring of Electronic Products for Health Assessment and Product Design,” International Journal of Oualitv Technology and Ouantitative Management, Vol. 4. No. 2, pp. 235250,2007. 3.

U.S. Department of Defense (DoD), MIL-HDBK 217. Military Handbook for Reliability Prediction of Electronic Epuipment, Version A, Washington DC, 1965.

4.

Telcordia Technologies, Special Report SR-332, “Reliability Prediction Procedure for Electronic Equipment,” Issue 1, Telcordia Customer Service, Piscataway, NJ, 2001.

5 . W. Denson, “A Tutorial: PRISM,” RAC Journal, 1999, pp. 1-6. 6.

FIDES Group, “FIDES Guide issue A, Reliability Methodology for Electronic Systems,” 2004.

7.

K. L. Wong, “What is Wrong with the Existing Reliability Prediction Methods?” Quality and Reliability Engineering International, Vol. 6, pp. 25 1-258, 1990.

8.

M. J. Cushing, D.E. Mortin, T.J. Stadterman, and A. Malhotra, “Comparison of Electronics-Reliability Assessment Approaches,” IEEE Transactions on Reliability, Vol. 42, NO.4, pp. 542-546, 1993.

9.

M. Talmor and S. Arueti, “Reliability Prediction: The Turn-Over Point,” Proceedings of the Annual Reliability and Maintainability Symposium, pp. 254-262, 1997.

10. C. Leonard, “MIL-HDBK-217: It’s Time to Rethink It,” Electronic Design, pp. 79-82, 1991. 11. S. F. Morris, “Use and Application of MIL-HDBK-217,” Solid State Technology, pp. 65-69, 1990. 12. M. Pecht and F. Nash, “Predicting the Reliability of Electronic Equipment,” Proceedings of the IEEE, Vol. 82, No. 7, pp. 992-1004, 1994. 13. IEEE Standard 1413- 1998, “IEEE Standard Methodology for Reliability Prediction and Assessment for Electronic Systems and Equipment,” IEEE, New york, December 1998. 14. IEEE Standard 1413.1-2002, “IEEE Guide for Selecting and Using Reliability Predictions Based on IEEE 1413,” IEEE, new York, February 2003. 15. M. Pecht, and A. Dasgupta, “Physics-of-Failure: An Approach to Reliable Product Development,” Journal of the Institute of Environmental Sciences, Vol. 38, pp. 30-34, 1995.

Introduction

21

16. M. Pecht, M. Dube, M. Natishan, and I. Knowles, “An Evaluation of Built-In Test,” IEEE Transactions on Aerospace and Electronic Systems, Vol. 37,No. 1, pp. 266-272, January 2001. 17. D. Johnson, “Review of Fault Management Techniques Used in Safety Critical Avionic Systems,” Progress in Aerospace Science, Vol. 32,No. 5, pp. 415431,October 1996. 18. DoD 5000.2 Policy Document, Defense Acquisition Guidebook, Chapter 5.3 Performance Based Logistics, Department of Defense, Washington DC, December 2004.

19. D. Cutter, and 0. Thompson, “Condition-Based Maintenance Plus Select Program Survey,” Report LG30 1T6,available: http://www.acq.osd.mil, accessed January 2005. 20. L. Kirkland, T. Pombo, K. Nelson, and F. Berghout, “Avionics Health Management: Searching for the Prognostics Grail,” Proceedings of the IEEE Aerospace Conference, Vol. 5, pp. 3448-3454,March 2004. 2 1. D. Korsmeyer, “Discovery and Systems Health,” National Aeronautics and Space Administration, available: http:l/ti.arc.nasa.gov/tech/techarea.php?ta=4, accessed September 2007. 22. C. Rose, A. Beiter, and K. Ishii, “Determining End-Of-Life Strategies as a Part of Product Definition,” paper presented at 1999 IEEE International Symposium on Electronics and the Environment, Piscataway, NJ, pp. 2 19-224, 1999. 23. P. Sandborn, and C. Murphy, “A Model for Optimizing the Assembly and Disassembly of Electronic Systems,” IEEE Transactions on Electronics Packaging Manufacturing, Vol. 22,No. 2,pp. 105-1 17,April 1999. 24. L. Scheidt, and S. Zong, “An Approach to Achieve Reusability of Electronic Modules,” paper presented at IEEE International Symposium on Electronics and the Environment, New York, pp. 331-336, 1994 25. M. Klausner, W. Grimm, C. Hendrickson, and A. Horvath, “Sensor-Based Data Recording of Use Conditions for Product Take-Back,’’ paper presented at IEEE International Symposium on Electronics and the Environment, New York, pp. 138-143, 1998. 26. M.Klausner, W. Grimm, and C. Hendrickson, “Reuse of Electric Motors in Consumer Products,” Journal of Ecologv, Vol. 2,No. 2,pp. 89-102, 1998. 27. M.Simon, B. Graham, P. Moore, P. JunSheng, and X. Changwen, “Life Cycle Data Acquisition Unit-Design, Implementation, Economics and Environmental Benefits,” paper presented at IEEE International Symposium on Electronics and the Environment, Piscataway, NJ, pp. 284-289,2000. 28. A. Middendorf, H.Griese, H. Reichl, and W. Grimm, “Using Life-Cycle Information for Reliability Assessment of Electronic Assemblies,” paper presented at IEEE International Integrated Reliability Workshop, Final Report, Piscataway, NJ, pp. 176179,2002. 29. J. Gu and M. Pecht, “Prognostics and Health Management Using Physics-of-Failure,’’ paper presented at 54’hAnnual Reliability & Maintainability Symposium, Las Vegas, NV. 2008.

22

Prognostics and Health Management of Electronics

30. A. Ramakrishnan, T. Syrus, and M. Pecht, “Electronic Hardware Reliability,” Avionics Handbook, CRC Press, Boca Raton, FL, pp. 22.1-22.21, December 2000. 3 1. S. Mishra, and M. Pecht, “In-situ Sensors for Product Reliability Monitoring,” Proceedings of the SPIE, Vol. 4755, pp. 10-19,2002. 32. Ridgetop Semiconductor-Sentinel SiliconTM Library, “Hot Carrier (HC) Prognostic Cell,” August 2004. 33. D. Goodman, B. Vermeire, J. Ralston-Good, and R. Graves, “A Board-Level Prognostic Monitor for MOSFET TDDB,” IEEE Aerospace Conference, Big Sky, MT 2006. 34. N. Anderson, and R. Wilcoxon, “Framework for Prognostics of Electronic Systems,” Proceedings of International Military and AerospaceIAvionics COTS Conference, Seattle, WA, August 3-5,2004. 35. S. Ganesan, V. Eveloy, D. Das, and M. Pecht, “Identification and Utilization of Failure Mechanisms to Enhance FMEA and FMECA,” Proceedinm of the IEEE Workshop on Accelerated Stress Testing and Reliability (ASTR), Austin, TX, October 3-5, 2005. 36. M. Pecht, R. Radojcic, and G. Rao, Guidebook for Managing Silicon Chip Reliability, CRC Press, Boca Raton, FL, 1999. 37. P. Smith and D. Campbell, “Practical Implementation of BICS for Safety-Critical Applications,” Proceedings of IEEE International Workshop on Current and Defect Based Testing-DBT, pp. 5 1-56, April 2000. 38. 1. Pecuh, M. Margala, and V. Stopjakova, “1.5 Volts Iddq/Iddt Current Monitor,” Proceedings of the IEEE Canadian Conference on Electrical and Computer Engineering, Vol. 1, pp. 472476, May 1999. 39. B. Xue and D. Walker, “Built-In Current Sensor for IDDQ Test,” Proceedings of the IEEE International Workshop on Current and Defect Based Testing-DBT, pp. 3-9, April 2004. 40. R. Wright and L. Kirkland, “Nano-Scaled Electrical Sensor Devices for Integrated Circuit Diagnostics,” paper presented at IEEE Aerospace Conference, Vol. 6, pp. 25492555, March 8-15, 2003. 41. R. Wright, M. Zgol, D. Adebimpe and L. Kirkland, “Functional Circuit Board Testing Using Nanoscale Sensors,” IEEE Systems Readiness Technology Conference, pp. 266272, September 2003. 42. R. Wright, M. Zgol, S. Keeton, and L. Kirkland, “Nanotechnology-Based Molecular Test Equipment (MTE),” IEEE Aerospace and Electronic Systems Magazine, Vol. 16, No. 6, pp. 15-19, June 2001. 43. M. Kanniche and M. Mamat-Ibrahim, “Wavelet based Fuzzy Algorithm for Condition Monitoring of Voltage Source Inverters,” Electronic Letters, Vol. 40, No. 4, February 2004. 44. “Self-Monitoring Analysis and Reporting Technology (SMART),” PC Guide, available: http:llwww.pcguide.com/ref/hdd/perf/qual/featuresSMART-c.html, accessed August 30,2005.

Introduction

23

45. G. Hughes, J. Murray, K. Kreutz-Delgado, and C. Elkan, “Improved Disk-Drive Failure Warnings,” IEEE Transactions on Reliability, Vol. 51, No. 3, pp. 350-357, 2002. 46. K.Gross, “Continuous System Telemetry Harness,” Sun Labs Open House, available: http:llresearch.sun.comisunlabsday/docs.2OO4ltalksll.03~Gross.pdf,accessed August 2005. 47. K.Whisnant, K.Gross, and N. Lingurovska, “Proactive Fault Monitoring in Enterprise Servers,” paper presented at IEEE International Multi-conference in Computer Science and Computer Engineering, Las Vegas, NV, June 2005. 48. K. Mishra and K. Gross, “Dynamic Stimulation Tool for Improved Performance Modeling and Resource Provisioning of Enterprise Servers,” Proceedings of the 14th IEEE International Symposium on Software Reliability Engineering (ISSRE’O3), Denver, CO, November 2003. 49. K.Cassidy, K.Gross, and A. Malekpour, “Advanced Pattern Recognition for Detection of Complex Software Aging Phenomena in Online Transaction Processing Servers,” Proceedings of the International Performance and Dependability Symposium, Washington, DC, June 23-26,2002. 50. K. Vaidyanathan and K. Gross, “MSET Performance Optimization for Detection of Software Aging,” Proceedings of the 14th IEEE International Svmuosium on Software Reliability Engineering, Denver, CO, November 2003.

51. D.Brown, P. Kalgren, C. Byington, and R. Orsagh, “Electronic Prognostics-A

Case Study Using Global Positioning System (GPS),” paper presented at IEEE Autotestcon, 2005.

52. A. Ramakrishnan and M. Pecht, “A Life Consumption Monitoring Methodology for Electronic Systems,” IEEE Transactions on Components and Packaging Technologies, Vol. 26,NO. 3, pp. 625-634,2003. 53. S.Mathew, D. Das, M. Osterman, M. Pecht, and R. Ferebee, “Prognostic Assessment of Aluminum Support Structure on a Printed Circuit Board,” ASME Journal of Electronic Packaging, Vol. 128,No. 4,pp. 339-345, December 2006. 54. V. Shetty, D. Das, M. Pecht, D. Hiemstra, and S. Martin, “Remaining Life Assessment of Shuttle Remote Manipulator System End Effector,” Proceedings of the 22nd Space Simulation Conference, Ellicott City, MD, October 21-23, 2002.

55. J. Gu, D. Barker, and M. Pecht, “Prognostics Implementation of Electronics under Vibration Loading,” Microelectronics Reliability, Vol. 47, No. 12, pp. 1849-1 856, December 2007. 56. S.Mishra, M.Pecht, T. Smith, I. McNee, andR. Harris, “Remaining Life Prediction of Electronic Products Using Life Consumption Monitoring Approach,” paper presented at European Microelectronics Packaging and Interconnection Symposium, Cracow, pp. 136-142,June 16-18,2002. 57. N.Vichare, P. Rodgers, and M. Pecht, “Methods for Binning and Density Estimation of Load Parameters for Prognostics and Health Management,” International Journal of Performability Engineering, Vol. 2,No. 2,pp. 149-161, April 2006.

24

Prognostics and Health Management of Electronics

58 V. Rouet and B. Foucher, “Development and Use of a Miniaturized Health Monitoring Device,” Proceedings of the IEEE International Reliability Physics Symposium, pp. 645-646,2004, 59 D. Searls, T. Dishongh, and P. Dujari, “A Strategy for Enabling Data Driven Product Decisions through a Comprehensive Understanding of the Usage Environment,” Proceedings of IPACK’Ol, Kauai, HI, 2001, July 8-13. 60 G. Herbst, “IBM’s Drive Temperature Indicator Processor (Drive-TIP) Helps Ensure High Drive Reliability,” IBM White Paper, available: http:llwww.hc. kzlpdf/drivetemp.pdf, September 2005. 61. N. Vichare, P. Rodgers, V. Eveloy, and M. Pecht, “In-Situ Temperature Measurement of a Notebook Computer-A Case Study in Health and Usage Monitoring of Electronics,” IEEE Transactions on Device and Materials Reliability, Vol. 4. No. 4, pp. 658-663, December 2004. 62 K. Bodenhoefer, “Environmental Life Cycle Information Management and Acquisition - First Experiences and Results from Field Trials,” Proceedings of Electronics Goes Green 2004+, Berlin, pp. 541-546, September 5 4 , 2 0 0 4 . 63 V. Skormin, V. Gorodetski, and L. Popyack, “Data Mining Technology for Failure Prognostic of Avionics,” IEEE Transactions on Aerospace and Electronic Systems, Vol. 38, No. 2, pp. 388403, April 2002. 64 B. Tuchband and M. Pecht, “The Use of Prognostics in Military Electronic Systems,” Proceedings of the 32nd GOMACTech Conference, Lake Buena Vista, FL, pp. 157160, March 19-22, 2007. 65 J. Gu, D. Barker, and M. Pecht, “Uncertainty Assessment of Prognostics Implementation of Electronics under Vibration Loading,” paper presented at 2007 AAAI Fall Symposium on Artificial Intelligence for Prognostics, Arlington, VA, November 9-1 1,2007.

Chapter 2 Sensor Systems for PHM Data collection is an essential part of PHM and often requires the use of sensor systems to measure environmental and operational parameters. This chapter introduces the common sensors and their sensing principles. The required attributes of sensor systems for PHM implementation are then discussed and some state-of-the-art PHM sensor systems are described. Finally, emerging trends in sensor system technologies are presented. There are several available methods for PHM implementation in electronic products and systems, including monitoring and analysis of parameters that are precursors to impending failure, such as shifts in performance parameters, and utilization of exposure conditions (e.g., usage, temperature, vibration, radiation) combined with PoF models to compute accumulated damage and assess the remaining life [l]. In these approaches, the monitoring of parameters (conditions) is a fundamental step. In order to accurately assess the health and predict the remaining life of the product, monitoring may be needed during all stages of the product life cycle, including manufacturing, shipment, storage, handling, and operation. Monitored parameters known as measurands may include temperature, vibration, shock, pressure, acoustic levels, strain, stress, voltage, current, humidity levels, contaminant concentration, usage frequency, usage severity, usage time, power, and heat dissipation. In each case, a variety of monitoring features may be required in order to obtain characteristics of these parameters, such as magnitude, variation, peak level, and rate of change. Sensor systems provide the means by which this information is acquired, processed, and stored.

2.1

Sensor and Sensing Principles

A sensor is defined as a device that provides a usable output signal in response to a specified measurand [ 2 ] . A sensor generally translates physical, chemical, or biological phenomena into electrical signals utilizing physical or chemical effects or through conversion of energy from one form into another. Widely used in both analog and digital instrumentation systems, sensors provide the interface between electronic circuits and the physical world. From the point of view of sensing (transduction) principles, sensors are classified into three major groups: physical, chemical, and biological. The physical principles or effects involved in detecting a measurand include thermal, electrical, mechanical, chemical, humidity, biological, optical (radiant), and magnetic. Examples of sensor signal parameters or measurands for PHM are listed in Table 2. 1.

Prognostics and Health Management of Electronics. By Michael G. Pecht Copyright 1 2 0 0 8 John Wiley & Sons, Inc.

25

26

Prognostics and Health Management of Electronics

Table 2 . 1 : Examples of Sensor Measurands for PHM Domain

I

Thermal

1 Temperature (ranges, cycles, gradients, ramp rates), heat flux, heat dissipation

Electrical

Voltage, current, resistance, inductance, capacitance, dielectric constant, charge, polarization, electric field, frequency, power, noise level, impedance

Mechanical

Length, area, volume. velocity or acceleration, mass flow, force, torque, stress, strain, density, stiffness, strength, direction, pressure, acoustic intensity or power, acoustic spectral distribution

Humidity

i Relative humidity, absolute humidity

Biological

I

Chemical

Optical (radiant)

Magnetic

2.1.1

Examples

pH, concentration of biological molecule, microorganisms Chemical species, concentration, concentration gradient, reactivity, molecular weight Intensity, phase, wavelength, polarization, reflectance, transmittance, refractive index, distance. vibration, amplitude, frequency Magnetic field, flux density, magnetic moment, permeability, direction, distance, position, flow

Thermal Sensors

The most widely used thermal sensors are resistance thermal detectors (RTDs), thermistors, thermocouples, and semiconductor junction diodes. RTDs operate on the principle that the electrical resistance of the sensor (typically a metal) changes predictably and in an essentially linear and repeatable manner (the resistance-temperature characteristic of the detecting element) with changes in temperature. Hence, the temperature of the sensing element can be determined by measuring its resistance change. A thermistor is a thermally sensitive resistor whose function is to exhibit a change in resistance with a change in its body temperature. Thermistors exhibit a large change in resistance over a relatively small range of temperature. Thermistors are usually made of evaporated films, carbon or carbon compositions, or ceramic-like semiconductors formed from oxides of copper, cobalt, manganese, magnesium, nickel, or titanium. Unlike the basic RTD, thermistors can be molded or compressed into various shapes to fit a wide range of applications. A thermocouple is any pair of electrically conducting and thermoelectrically dissimilar elements coupled at an interface. Its operation is based on the Seebeck effect (one of the three thermoelectric effects-Seebeck, Peltier, and Thompson), which is the generation of a thermo-emf (electromotive force) in an electric circuit composed of two heterogeneous conductors whose junctions are at different temperatures. Two different materials (usually metals) are joined at one point to form a thermocouple. A reference junction is held at a known temperature, such as the ice-water equilibrium point, and the difference between this fixed voltage and the thermocouple voltage at the measurement junction is measured by a voltmeter.

Sensor Systems for PHM

2.1.2

21

Electrical Sensors

The signals generated by most sensors are electrical in form, and the output of the sensing circuit is often voltage or current. Some electrical parameters, such as resistance and capacitance, are also converted into voltage or current. This section focuses on measurements of voltage and current first and then examines the power and frequency sensors. There are four basic types of sensor that are commonly used in voltage measurements: inductive, thermal, capacitive, and Hall effect sensors. Inductive voltage sensors are based on the characteristics of magnetic fields. They obtain voltage data using tools such as voltage transformers, alternating current (AC) inductive coils, and measurements of eddy currents. Thermal voltage sensors are based on the thermal effects, such as Joule effect. of a current flowing through a conductor. The voltage or current is measured by converting it into heat, and then measuring the resulting temperature changes. The sensor’s output is a function of the input voltage or current. Capacitive voltage sensors are based on the characteristics of electric fields. These sensors detect voltages by different methods, such as electrostatic force, Josephson effect, and change of refractive index of optical fibers. Hall effect voltage sensors are semiconductor devices. They operate on the principle that the voltage difference across a thin conductor carrying a current depends on the intensity of the magnetic field applied perpendicular to the direction of the current flow. Electrons moving through a magnetic field experience a Lorentz force perpendicular to both the direction of motion and the direction of the field. The response of electrons to the Lorentz force creates a voltage known as the Hall voltage. Depending on the application, the Hall voltage can be detected with an instrumentation amplifier (for a DC excitation current) or a lock-in amplifier (for an AC excitation current). Since the Hall voltage is proportional to the product of the excitation current and magnetic field, the Hall effect sensors can also be used to sense current and magnetic field. Other magnetic field sensors beside Hall effect sensors can also be configured for current measurements. An example is the Rogowski coil. The Rogowski coil is a solenoid air core winding of small cross-section looped around a conductor carrying the current. Since the voltage that is induced in the coil is proportional to the rate of change (derivative) of current in the straight conductor, the output of the Rogowski coil is usually connected to an electrical (or electronic) integrator circuit in order to provide an output signal that is proportional to current. One of the simplest methods of current measurement is the current-to-voltage conversion based on Ohm’s law. This type of current measurement circuit employs a resistor, referred to as a shunt resistor, despite being connected in series with the load. The voltage drop across the shunt resistor can be detected by a variety of secondary meters, such as analog meters, digital meters, and oscilloscopes. Electrical power is (for DC devices) the product of the current and the voltage. A typical power sensor includes a current-sensing circuit with voltage output and an analog multiplier. The high-side current sensor provides an output voltage proportional to load current, which is multiplied by the load voltage to obtain an output voltage proportional to load power. Frequency is a measure of the number of occurrences of a repeating event per unit time. One method of measuring frequency is to use a frequency counter, which accumulates the number of events occurring within a specific period of time. Most general purpose frequency counters will include some form of amplifier as well as the filtering and shaping circuitry at the input to make the signal suitable for counting. Another common approach to measure frequencies that are difficult to count directly by the previous method is based on a stroboscopic effect. The source (such as a laser, a tuning fork, or a waveform generator) of a known reference frequency fo must be tunable or very close to the measured frequency$ Both the measured frequency and the reference frequency are simultaneously produced, and the interference between these signals generates beats, which are observed at a much

28

Prognostics and Health Management of Electronics

lower frequency Af. After measuring this lower frequency by counting, the unknown frequency is then found from f= fo +Af.

2.1.3

Mechanical Sensors

Mechanical parameters may be converted to other energy domains and then sensed or measured directly. For direct sensing, the parameters are related to strain or displacement. The basic principles used to sense strain are piezoelectricity, piezoresistivity, and capacitive or inductive impedance. Piezoelectricity is the ability of certain crystals and certain ceramic materials to generate a voltage in response to applied mechanical stress. When used for sensors, the piezoelectric effect is used to measure various forms of strain or stress. Examples are microphones for strains generated by acoustic pressure on a diaphragm; ultrasonic sensors for high-frequency strain waves arriving at or propagating through the sensors; and pressure sensors for AC pressures on a silicon diaphragm coated with piezoelectric materials. The piezoelectric effect can also be used to sense small displacements. bending, rotations, and so on. These measurements require a high-input-impedance amplifier to measure the surface charges or voltages generated by the strain or stress. The piezoresistive effect in conductors and semiconductors is used for measuring strain in many commercial pressure sensors and strain gauges. The strain on the crystal structure deforms the energy band structure and, thus, changes the mobility and carrier density, which changes the resistivity or conductivity of the material. The piezoresistive effect differs from the piezoelectric effect. In contrast to the piezoelectric effect, the piezoresistive effect only causes a change in resistance; it does not produce electrical charges. Capacitive or inductive impedances can also be used to measure displacements and strains. Capacitive devices integrate the change of elementary capacitive areas while piezoresistive devices take the difference of the resistance changes of the bridge arms. Capacitive sensors require a capacitance-to-voltage (C-to-V) converter on or near the chip to avoid the effects of stray capacitances.

2.1.4

Humidity Sensors

Humidity refers to the water vapor content in air or other gases. Humidity measurements can be stated in a variety of terms and units. The three commonly used terms are absolute humidity, dew point, and relative humidity (RH). Absolute humidity is the ratio of the mass of water vapor to the volume of air or gas. It is commonly expressed in grams per cubic meter or grains per cubic foot (1 grain = 1!7000 Ib). Dew point, expressed in degrees Celsius or Fahrenheit, is the temperature at which a gas begins to condense into a liquid at a stated pressure (usually 1 atm). Relative humidity (RH) refers to the ratio (stated as a percent) of the moisture content of air compared to the saturated moisture level at the same temperature and pressure. There are three common kinds of humidity sensor: capacitive, resistive, and thermal conductivity humidity sensors. Capacitive RH sensors consist of a substrate on which a thin film of polymer or metal oxide is deposited between two conductive electrodes. The sensing surface is coated with a porous metal electrode to protect it from contamination and exposure to condensation. The substrate is typically glass, ceramic, or silicon. The change in the dielectric constant of a capacitive humidity sensor is nearly directly proportional to the RH of the surrounding environment. Capacitive sensors are characterized by low temperature coefficient, ability to function at high temperatures (up to 200°C), full recovery from condensation, and fairly good resistance to chemical vapors. Resistive humidity sensors measure the change in electrical impedance of a hygroscopic medium such as a conductive polymer, salt, or treated substrate. The impedance change is typically an inverse

Sensor Systems for PHM

29

exponential relationship to humidity. The sensor absorbs the water vapor and ionic functional groups are dissociated, resulting in an increase in electrical conductivity. Thermal conductivity humidity sensors (or absolute humidity sensors) consist of two matched negative temperature coefficient (NTC) thermistor elements in a bridge circuit; one is hermetically encapsulated in dry nitrogen and the other is exposed to the environment. When current is passed through the thermistors, resistive heating dissipated from the sealed thermistor is greater than the exposed thermistor due to the difference in the thermal conductivity of the water vapor as compared to dry nitrogen. Since the heat dissipated yields different operating temperatures, the difference in resistance of the thermistors is proportional to the absolute humidity.

2.1.5

Biosensors

A biosensor is a device for the detection of an analyte that combines a biological component with a physicochemical detector component. It consists of three parts: the sensitive biological element, such as biological materials or biologically derived materials, a transducer, and the detector element. The sensing principles used in biosensors include optical, electrochemical, piezoelectric, thermometric, and magnetic. Optical biosensors, based on the phenomenon of surface plasmon resonance, make use of evanescent wave techniques. This utilizes the property that a thin layer of gold (or certain other materials) on a high-refractive-index glass surface can absorb laser light, producing electron waves (surface plasmons) on the gold surface. Electrochemical biosensors are normally based on enzymatic catalysis of a reaction that produces ions. The sensor substrate contains three electrodes: a reference electrode, an active electrode, and a sink electrode. The target analyte is involved in the reaction that takes place on the active electrode surface, and the ions produced create a potential which is subtracted from that of the reference electrode to give a signal. Piezoelectric sensors utilize crystals which undergo an elastic deformation when an electrical potential is applied to them. An alternating potential produces a standing wave in the crystal at a characteristic frequency. This frequency is highly dependent on the surface properties of the crystal, such that if a crystal is coated with a biological receptor element, the binding of a (large) target analyte to the receptor will produce a change in the resonance frequency, which represents a binding signal.

2.1.6

Chemical Sensors

Chemical sensors are intended for recognition of the presence of specific substances and their composition and concentrations. Chemical sensors are used in industry for process control and safety monitoring, such as in environmental protection, hazardous materials tracking, pollution monitoring, food safety, and medicine. They also are used around the home and for hobbies, CO detection, smoke alarms, and pH meters. At a high level, chemical sensors may be classified into direct and indirect sensors. In direct sensors, a chemical reaction or the presence of a chemical produces a measured electrical output. One example is an electrochemical sensor. Indirect sensors rely on a secondary, indirect reading of the sensed stimulus; for example, thermochemical sensors rely on the heat generated in chemical reactions to sense the amount of particular substances. The principles used for chemical sensing are diverse. Table 2.2 lists the common chemical sensing principles.

Prognostics and Health Management of Electronics

30

Table 2. 2: Chemical Sensors Principles Classification

Sensors

Principle ~

Electrochemical sensors: exhibit changes in resistance (conductivity) or changes in capacitance (permittivity) due to substances or reactions

~~

~

~~

Metal-oxide sensor

Metal oxides at elevated temperature change their surface potential, and therefore their conductivity, in the presence of various reducible gases such as ethyl alcohol, methane, and many other gases.

Solid electrolyte sensor

A galvanic cell (battery cell) produces an emf across two electrodes based on the oxygen concentrations at the two electrodes under constant temperature and pressure.

Potentiometric sensor

Measures changes in voltage: electric potential develops at the surface of a solid material immersed in solution containing ions that exchange at the surface. The potential is proportional to the number or density of ions in the solution.

Conductometric sensor

Measures changes in conductance: adsorption of a gas onto the surface of a semiconducting oxide material can produce large changes in its electrical conductance.

Amperometric sensor

Measures changes in current: the current-solute concentration relationship is measured at a fixed electrode potential or overall cell voltage.

Thermistor-based chemical sensor

Senses small change in temperature due to a chemical reaction.

Calorimetric sensor

Measures the temperature change caused by the heat evolved during the catalytic oxidation of combustible gases. The temperature indicates the percentage of flammable gas in the environment.

Thermal conductivity sensor

Measures the thermal conductivity in air due to the presence of a sensed gas.

Optical sensors

Light sensor

Detects the transmission, reflection, and absorption (attenuation) of light in a medium; its velocity and hence its wavelength are all dependent of the properties of the medium.

Mass sensors

Mass humidity sensor

Detects the change in the mass of a sensing element due to absorption of water.

Thermochemical sensors: rely on the heat generated in chemical reactions to sense the amount of particular reactants

2.1.7

Optical Sensors

Optical sensors include photoconductors, photoemissive devices, photovoltaic devices, and fiber optic sensors. A photoconductor is a device that changes electrical resistance when illuminated with light or radiation. The conductivity of photoconductors changes under the effect of radiation due to changes in the charge carrier population. Photoemissive devices are diodes that generate an output current that is proportional to the intensity of a light source that impinges on its surface. Photovoltaic devices consist of a p-n junction where radiation-generated carriers may cross the junction to form a self-generated voltage.

Sensor Systems for PHM

31

When strained, a fiber optic cable changes the intensity or the phase delay of the output optical wave relative to a reference. Using an optical detector and an interference-measuring technique, small strains can be measured with high sensitivity. Fiber Bragg gratings (FBGs) can be used in optical fiber sensors to sense some measurands. A FBG is a type of distributed Bragg reflector constructed in a short segment of optical fiber that reflects particular wavelengths of light and transmits all others. The Bragg wavelength is sensitive to strain as well as temperature. FBG can be used to sense strain and temperature directly. They can also be used to convert the output of another sensor which generates a strain or temperature change from the measurand, for example, FBG gas sensors use an absorbent coating which in the presence of a gas expands, generating a strain which is measurable by the grating. FBGs are also finding uses in instrumentation applications such as seismology and as downhole sensors in oil and gas wells for measurement of the effects of external pressure, temperature, seismic vibrations, and inline flow. Common examples of optical sensors include underwater acoustic sensors, fiber micro-bend sensors, evanescent or coupled waveguide sensors, moving fiber optic hydrophones, grating sensors, polarization sensors, and total internal reflection sensors. Optical interference sensors have been developed for interferometer acoustic sensors, fiber optic magnetic sensors (with magnetostrictive jackets), and fiber optic gyroscopes. Specially doped or coated optical fibers have been shown to have great versatility as physical sensors of various types and configurations. They have been used for radiation sensors, current sensors, accelerometers, temperature sensors, and chemical sensors.

2.1.8

Magnetic Sensors

Magnetic sensors generally utilize: (a) the magneto-optic effect, which is any one of a number of phenomena in which an electromagnetic wave propagates through a medium that has been altered by the presence of a quasistatic magnetic field; (b) the magnetostrictive effect, where the imposed magnetic field causes strain on the material; (c) the galvanomagnetic effect, manifested as a Hall field and carrier deflection; or (d) magnetoresistance, which is the property of some materials to change the value of their electrical resistance when an external magnetic field is applied. The measurands most commonly sensed are position, motion, and flow. The sensing in these cases is contactless. Magnetic sensors mainly include Hall effect sensor, magnetoresistive sensor, magnetometers (fluxgate, search-coil, Squid), magnetotransistor, magnetodiode, and magneto-optic sensor. The Hall effect sensor combines the Hall element and the associated electronics. The Hall element is constructed from a thin sheet of conductive material with output connections perpendicular to the direction of current flow. When subjected to a magnetic field, it responds with an output voltage proportional to the magnetic field strength. The voltage is so small that it requires additional electronics to amplify to useful voltage levels. Magnetoresistance is the property of a material to change the value of its electrical resistance when an external magnetic field is applied to it. The magnetoresistive sensor usually comes in a bridge configuration with four magnetically sensitive resistors in a Wheatstone bridge configuration, with each resistor arranged to maximize sensitivity and minimize temperature influences. In the presence of a magnetic field, the values of the resistors change, causing a bridge imbalance and generating an output voltage proportional to the magnetic field strength. Magnetometers are devices that measure magnetic fields. They can refer to very accurate sensors or low-field sensors or complete systems for measuring the magnetic field which include one or more sensors. Magnetodiodes and magnetotransistor sensors are made from silicon substrates with undoped areas that contain the sensor between n-doped and p-doped regions forming pn, npn, or pnp junctions. Depending on the direction, an external magnetic field deflects electron flow between

Prognostics and Health Management of Electronics

32

emitter and collector in favor of one of the collectors. The two collector voltages are sensed and related to the current or the applied magnetic field. Highly sensitive magneto-optic sensors have been developed. These sensors are based on various technologies, such as fiber optics, polarization of light, Moire effect, and Zeeman effect. These types of sensors lead to highly sensitive devices and are used in applications requiring high resolution, such as human brain function mapping and magnetic anomaly detection.

2.2

Sensor Systems for PHM

A PHM sensor system will typically have sensors, onboard analog-to-digital ( A D ) converters, onboard memory, embedded computational capabilities, data transmission, and a power source or supply, as shown in Figure 2. 1. Every PHM sensor system will not necessarily contain all these elements, and not all sensor systems are suitable for the implementation of PHM. In this section, the Wired or wireless data transmission

External power

lnternal sensors

Internal Dower

Microprocessor (with analog-to-digital conk erter)

External devices (e.g., PDAs, computers, cell phones)

Memory (data storage, embedded software)

Figure 2.1: Integrated sensor system for in situ environmental monitoring. Fig 2.2 shows a general procedure for sensor system selection. The first step is to identify the application and the requirements for the sensor system. Then, sensor system candidates are identified and evaluated. List the requirements for the Considerations sensor system Parameters to be monitored Requirements for physical characteristics of PHM sensor system Requirements for fbnctional attributes Search the candidates of PHM sensor system cost Reliability Availability Make trade-offs to select the optimal sensor system

Figure 2.2: Sensor system selection procedure.

Sensor Systems for PHM

33

The requirements of a sensor system for PHM depend on the specific application, but there are some common considerations. These include the parameters to be measured, the performance needs of the sensor system, the electrical and physical attributes of the sensor system, reliability, cost, and availability. The user needs to prioritize the considerations. Trade-offs may be necessary to select the optimal sensor system for the specific application.

2.2.1

Parameters to Be Monitored

The parameters which are to be monitored in a PHM implementation can be selected based on their relationship to functions that are crucial for safety, that are likely to be implicated in catastrophic failures, that are essential for mission completeness, or that can result in long downtimes. Selection is also based on knowledge of the critical parameters established by past experience and field failure data on similar products and by qualification testing. More systematic methods, such as FMMEA, can be used to determine parameters that need to be monitored. The parameters used as precursors and the parameters monitored for stress and damage modeling in the life cycle of the product are discussed in the previous chapters of this book. These parameters can be measured by appropriate sensors. PHM requires integration of many different parameters to assess the health state and predict the remaining life of a product. If an individual sensor system can monitor multiple parameters, it will simplify PHM. Sensing of multiple parameters refers to one sensor system that can measure multiple types of parameters such as temperature, humidity, vibration, and pressure. Structures that can realize multiple sensing include a sensor system which contains several different sensing elements internally; a sensor system which has flexible, add-on external ports that support various sensor nodes which plug in; and combinations of these structures. For these structures, some common components can be shared, such as the power supply, A D converter, memory, and data transmission.

2.2.2

Sensor System Performance

The required performance of the sensor system should be considered during the analysis of the application. The relevant performance attributes include: Accuracy: the closeness of agreement between the measurement and the true value of the measured quantity. Sensitivity: the variation of output with respect to a certain variation in input (slope of the calibration curve). Precision: the number of significant digits to which a measurand can be reliably measured. Resolution: the minimal change of the input necessary to produce a detectable change at the output. Measurement range: the maximum and minimum value of the measurand that can be measured. Repeatability: closeness of the agreement between the results of successive measurements of the same measurand carried out under the same conditions of measurement. Linearity: the closeness of the calibration curve to a straight line corresponding to the theoretical behavior. Uncertainty: the range of values which contains the true value of the measured quantity. Response time: the time a sensor takes to react to a given input. Stabilization time: the time a sensor takes to reach a steady state output upon exposure to a stable input.

Prognostics and Health Management of Electronics

34

2.2.3

Physical Attributes of Sensor Systems

The physical attributes of sensor system include its size, weight, shape, packaging, and how the sensors are mounted to their environment. In some PHM applications, the size of the sensor may become the most significant selection criterion due to limitations of available space for attaching the sensor or due to the inaccessibility of locations to be sensed. Additionally, the weight of the sensor must be considered in certain PHM applications such as for mobile products or for vibration and shock measurements using accelerometers, since the added mass can change the system response. If a fixture is required to mount the sensor to a piece of equipment, the added mass of the sensor and fixture may change the system characteristics. When selecting a sensor system, users should determine the available size and weight capacity that can be handled by the host environment and then consider the entire size and weight of the sensor system, which includes the battery and other accessories such as antennas or cables. For some applications, one must also consider the shape of the sensor system, such as round, rectangular, or flat. Some applications also have requirements for the sensor packaging materials, such as metal or plastic, based on the application and the parameter to be sensed. The method for attaching or mounting the sensor should also be considered based on the application. Mounting methods include using glue, adhesive tape, magnets, or screws (bolts) to fix the sensor system to the host. Sensor systems which are embedded in components, such as temperature sensors in ICs, can help to save space and to improve performance.

2.2.4

Functional Attributes of Sensor Systems

The electrical attributes of the sensor systems which should be considered include onboard power and power management ability; onboard memory and memory management ability and programmable sampling rate and modes; the rate, distance, and security of data transmission of the sensor system; and the onboard data processing capability. Each of these attributes will be discussed below. 2.2.4.1 Onboard Power and Power Management Power consumption is an essential characteristic of a sensor system that determines how long it can function without connection to an external source of power. It is therefore particularly relevant to wireless and mobile systems. In order to attain the required duration of operation in such applications, a sensor system must have sufficient power supply and the ability to manage the power consumption. Sensor systems can be divided into two main categories with respect to their power sources: non-battery-powered sensor systems and battery-powered sensor systems. Non-battery-powered sensor systems are typically either wired to an external AC power source or use power from an integrated host system. For example, temperature sensors are often integrated within the microprocessors on motherboards inside computers and utilize the computer’s power supply. Battery-powered sensor systems are equipped with an onboard battery. No interaction is required with the outside world, so they are able to function autonomously on a continuous basis. Replaceable or rechargeable batteries are preferable for battery-powered sensor systems. Batteries that are replaceable or rechargeable allow the sensor system to operate continuously, without needing to replace the entire system. Rechargeable lithium-ion batteries are commonly used in battery-powered sensor systems. In some situations, the battery must be sealed inside the sensor packaging or it is difficult to access the sensor system. The use of larger batteries or stand-by batteries may be required in such applications. Power management is used to optimize the power consumption of the sensor system in order to extend its operating time. Power consumption varies for different operational modes of the system

Sensor Systems for PHM

35

(e.g. active mode, idle mode, and sleep mode). The sensor is in the active mode when it is being used to monitor, record, transmit, or analyze the data. The power consumed for sensing varies depending on the parameter-sensing methods and sampling rate. Continuous sensing will consume more power, while periodic or event-triggered sensing can consume less power. A higher sampling rate will consume more power because it senses and records data more frequently. Additionally, wireless data transmission and onboard signal processing will consume more power. In its idle state, a sensor system consumes much less power than during active mode. Sleep mode consumes the lowest power. The tasks of power management are to track and model the incoming requests or signals to identify parts of the sensor system to activate, when it should switch between the active state and idle state, how long the idle states will be maintained, when to switch to the sleep state and when to wake up the system. For example, in continuous sensing, the sensing elements and memory are active, but if data transmission is not required, it can be put into sleep mode. Power management will wake up the data transmission circuit when it receives a request. 2.2.4.2 Onboard Memory and Memory Management Onboard memory is the memory contained within the sensor system. It can be used to store collected data as well as information pertaining to the sensor system (e.g., sensor identity, battery status), which enables it to be recognized and to communicate with other systems. Firmware (embedded algorithms) in memory provides operating instructions to the microprocessor and enables it to process the data in real time. Onboard memory allows much higher data sampling and save rates. If there is no onboard memory, the data must be transmitted. For sensor systems, common onboard memory types include EEPROM (electrically erasable programmable read-only memory) and NVRAM (non-volatile random access memory). EEPROM is a user-modifiable ROM that can be erased and reprogrammed (written to) repeatedly. In sensor systems, EEPROM is often used to store the sensor information. NVRAM is the general name used to describe any type of random-access memory which does not lose its information when power is turned off. NVRAM is a subgroup of the more general class of nonvolatile memory types, the difference being that NVRAM devices offer random access, as opposed to sequential access like hard disks. The best-known form of NVRAM memory today is flash memory, which is found in a wide variety of consumer electronics, including memory cards, digital music players, digital cameras, and cell phones. In sensor systems, flash memory is used to record the collected data. Continued development of semiconductor manufacturing technology has allowed the capacity of flash memory to increase even as size and cost decrease. Memory requirements are affected by the sensing modes and sampling rate. Sensor systems should allow the user to program the sampling rate and set the sensing mode (i.e., continuous, triggered, thresholds). These settings affect the amount of data stored into memory. Memory management allows one to configure, allocate, monitor, and optimize the utilization of memory. For multiple-sensing sensor systems, the data format will often depend on the sensing variable. Memory management should be able to distinguish various data formats and save them into corresponding areas of the memory. For example, the sampling rate, the time stamp, and the data range of temperature are different from those of vibration data. In memory, these different data may be stored separately based on algorithms that make them easy to identify. Memory management also should have the ability to show the usage status of the memory, such as the percentage of available memory, and give an indication when the memory becoming full. 2.2.4.3 Programmable Sampling Mode and Sampling Rate The sampling mode determines how the sensor monitors parameters and at what times it will actively sample the measurand. Commonly used sampling modes include continuous, periodic, and

36

Prognostics and Health Management of Electronics

event-triggered sampling. The sampling rate defines the number of samples per second (or other unit) taken from a continuous signal to make a discrete signal. The combination of sampling mode and rate controls the sampling of the signal. Programmable sampling modes and rates are preferred for PHM applications, since these features affect diagnostics and prognostics power consumption and memory requirements directly For a fixed sampling rate, periodic and event-triggered sampling will consume less power and memory than continuous sampling. Under the same sampling mode, a low sampling rate consumes less power and memory than a high sampling rate. But too low a sampling rate may lead to signal distortion and may reduce the likelihood of capturing intermittent or transient events needed for fault detection. Additionally, if the user wants to utilize a sensor, for example, to monitor vibration and temperature at the same time, the sensor system should allow the user to set the sampling mode and rate for these two different types of parameters individually. 2.2.4.4 Signal Processing Software Signal processing consists of two parts: one is embedded processing which is integrated into the onboard processor to enable immediate and localized processing of the raw sensor data; the other is processing conducted in the host computer. When selecting sensor systems, one should consider both of these functions. Onboard processing can significantly reduce the number of data points and thus free up memo0 for more data storage. This in turn reduces the volume of data which must be transmitted out to a base station or computer and hence results in lower power consumption. In the case of a large number of sensor systems working in a network, this would allow decentralization of computational power and facilitate efficient parallel processing of data. Embedding computational power with onboard processors can also facilitate efficient data analysis for environmental monitoring applications. Embedded computations can be set to provide real-time updates for taking immediate action, such as powering off the equipment to avoid accidents or catastrophic failures, and prognostic horizon for conducting future repair and maintenance activities. Currently, onboard signal processing includes feature extraction (e.g., rain flow cycle counting algorithm), data compression, and fault recognition and prediction. Ideally it should display its calculation results and execute actions when a fault is detected and should be programmable. The abilities of the onboard processor are limited by some physical constraints. One constraint is the available power. If processing requires extended calculation and high calculating speeds, it will consume much more power. The other constraint is onboard memory capacity. Running complex software requires a lot of memory. These two constraints make it challenging to embed complex algorithms into onboard processors. However, even using simple algorithms and routines to process the raw sensor data can achieve significant gains for in situ analysis.

2.2.4.5 Fast and Convenient Data Transmission Once collected by the sensor system, data are typically transmitted to a base station or computer for postanalysis. In general, the methods for data transmission are either wireless or wired. Wireless monitoring has emerged as a promising technology that can impact PHM applications. Wireless transmission refers to the transmission of data over a distance without the use of a hard-wired connection. The distances involved may be short (a few meters, as in a television remote control) or very long (thousands or even millions of kilometers for radio communications). Wireless sensor nodes can be used to remotely monitor inhospitable and toxic environments. In some applications, sensor(s) must operate remotely with data stored and downloaded by telemetry to a centrally located processing station. Also, wireless sensor systems are not dependent on extensive lengths of wires for the transfer of sensor measurement data, thus saving installation and maintenance costs. The

Sensor Systems for PHM

37

advantage of wireless sensor nodes can be greatly enhanced by embedding microcontrollers with the sensor nodes to improve the data analysis capabilities within the wireless sensing nodes themselves. Methods of wireless data transmission include Ethernet, cellular, radio frequency identification (RFID), vicinity cards [international organization for standardization (ISO) 156931, personal area network (IEEE 802.15), Wi-Fi (IEEE 802.1 l), and proprietary communications protocols. When selecting which type of wireless data technology to use for a particular application, one should consider the range of communication, power demand, ease of implementation, and data security. RFID is an automatic identification method relying on storing and remotely retrieving data using devices called RFID tags or transponders. An RFID tag is an object that can be attached to or incorporated into a product, animal, or person for the purpose of identification using radio waves. An W I D sensor system combines the RFID tag with the sensing element. It uses sensing elements to detect and record temperature, humidity, movement, or even radiation data. It utilizes RFID to record and identify the sensor as well as to transfer the raw data or processed data. For example, the same tags used to track items, such as meat, moving through the supply chain may also alert staff if it is not stored at the right temperature or if meat has gone bad or if someone has injected a biological agent into the meat. The transfer range and speed of an RFID tag depend on many factors, such as the frequency of operation, the power of the reader, interference from other RF devices, and so on. W I D tags and readers have to be tuned to a common frequency to communicate. W I D systems use many different frequencies, but the most common are low frequency (around 125 kHz), high frequency (13.56 MHz) and ultrahigh frequency, or UHF (860-960 MHz). Microwave (2.45 GHz) is also used in some applications. Different frequencies have different characteristics that make them more useful for certain applications. For instance, low-frequency tags use less power and are better able to penetrate nonmetallic substances. They are ideal for scanning objects with high water content, such as fruit, but their read range is limited to less than a foot (0.33 meter). High-frequency tags work better on objects made of metal. They have a maximum read range of about 3 ft (1 m). UHF frequencies typically offer better range (3-8 m) and can transfer data faster than low and high frequencies. But they use more power and are less likely to pass through materials. And because they tend to be more directed, they require a clear path between the tag and reader. UHF tags might be better for scanning boxes of goods as they pass through a dock door into a warehouse. If longer ranges are needed, such as for tracking railway cars, active tags can use batteries to boost read ranges to 300 ft (1 00 m) or more. The security of wireless data transmission is another important factor to be considered. There are a great number of security risks associated with the current wireless protocols and encryption methods. For example, current RFID technology and its implementation have some possible security leaks to be exploited. W I D tags and readerdwriters transmit identifying information via radio signals. Unlike bar coding systems, W I D devices can communicate without requiring a line of sight and over longer distances for faster batch processing of inventory. As RFID devices are deployed in more sophisticated applications, concerns have been raised about protecting such systems against eavesdropping and unauthorized uses [7]. One should evaluate the security strategy of the wireless sensor system or customize the security level to protect the data during transmission. Currently, wired data transmission can offer high-speed transmission, but it is limited by the need for transmission wires. Wireless transmission can offer very convenient data communication, eliminating the need for a wire, but the transmission rate is lower than that for wired transmission. This requires some trade-offs to be made for a given application. Many sensor systems transfer data from a sensor to a receiving device wirelessly and then transfer the data to a computer by wired connection with a universal serial bus (USB) port. This arrangement can represent a compromise that improves data through put, power requirements, and cost.

Prognostics and Health Management of Electronics

38

2.2.5

Cost

The selection of the proper sensor system for a given PHM application must include an evaluation of the costs. The cost evaluation should address its total cost of ownership, including the purchase, maintenance, and replacement of sensor systems. In fact, initial purchase costs can be less than 20% of the product’s lifetime costs. Consider the experience of an airline that went with “an affordable” choice only to find out 15 months later that the sensors were surviving for only 12 months on average and needed to be replaced annually. The replacement sensor system selected did cost 20% more but was available off-the-shelf and was previously qualified for aircraft use [4].

2.2.6

Reliability

Sensor systems for PHM should be reliable. Sensor systems are generally limited to some degree by noise and the surrounding environment, which vary with operating and environmental conditions. To reduce the risk of sensor system failure, the user must consider the sensor’s environmental and operating range and determine if that is suitable for the particular application. The packaging of the sensor system should also be considered, as it can shield the unit from unwanted effects such as humidity, sand, aggressive chemicals, mechanical forces, and other environmental conditions [ 5 ] . Sensor validation is used to assess the integrity of the sensor system and adjust or correct it as appropriate. This functionality checks the sensor performance and ensures that the sensor system is working correctly by detecting and eliminating the influence of systematic errors. Self-diagnostics, self-calibration, and sensor fusion are a few methods that can be applied to achieve this functionality. Another strategy to improve the reliability of sensor systems is to use multiple sensors (redundancy) to monitor the same product or system. By using multiple sensor systems, the risk of losing data due to sensor system failure is reduced. While it is essential to consider the reliability of sensor systems, it is equally necessary to consider the effects of the sensor system on the reliability of the product it is intended to monitor. Sensor systems that are heavy may reduce the reliability of circuit boards when attached to the surface over time. In addition, the method of attachment (soldering, glue, screws) can reduce the reliability of the product if the attachment material is incompatible with the product’s materials of construction.

2.2.7

Availability

The selected sensor systems should be available. Generally, two aspects should be considered when determining the availability. First, a user should determine whether the sensor system is commercially available. This means that the sensor system has been moved from its development phase into production and is being sold on the market. There are many sensor systems which are advertised and promoted in publications and websites which are not commercially available. These sensor systems are generally prototypes and are not available for purchase on the open market. Second, a user should look at the supplier of the sensor system. Depending on the particular needs and application, a user may be required to select a sensor system from a domestic supplier due to security reasons. This information is typically not found in product datasheets but can be verified through communications with the supplier.

2.3

Sensor Selection

For a specific PHM application, the user may need to consider some or all of the factors described above. Table 2.3 offers a checklist of all the considerations which may enter into the selection process for a sensor system. In the next section, some current sensor systems are surveyed

Sensor Systems for PHM

39

to identify the state-of-the-art and the availability of the sensor systems for PHM. Using the selection method in this chapter, one can select proper sensor systems for the actual PHM application. Table 2.3: Considerations for Sensor Selection Performance requirements:

I

sensing parameters, measurement range, sensitivity,

C

precision, resolution, sampling rate, linearity, response time, stabilization time, how many sensor systems are needed, which parameters can be monitored by one sensor system

Requirements for functional attributes of sensor system

Power

Expected power

other power sources, such as solar power. If using battery as power, specify the requirements, e.g., rechargeable, lithium. yes/ no

Power management needed:

Kind of management:

I e.g., programmable sampling modes and rate

Memory memory Capacity

Samp 1ing

Memory management Needed:

yes/ no

Kind of management?

e.g., programmable sampling modes and rate

Sampling modks

I Activelpassive? Auto onloff? Programmable (continuous, periodic, triggered by events)?

Nyquijt criteria.

Prognostics and Health Management of Electronics

40

Table 2.3 (Continued) Requirements for functional attributes of

Data transmission

Activeipassive transmission? Wireless or wired transmission

Transmission range Protocols Transmission rate

I Security strategy Type of wired transmission? e.g., USB, serial port, or other method of connecting with host computer Type of device the sensor system can communicate with: e.g., PDAs, cell phone, computers Data processing

Types of processes needed: e.g., FFT, data reduction, additional analysis functions Types of processes provided by host software: e.g., signal processing tools, regression, other prediction models

Requirements for physical characteristics of sensor system

Size Weight Shape Package Attachment

Constraints

Reliability

With battery Without battery With battery Without battery Round, rectangular, flat Plastic, metal Screws, glue, tape, magnet

Ambient I Temperature, humidity, radiation, gas, dust, chemical Operational I Input limits of signal (loading) Other Does the sensor system have the functionality to check its own performance and ensure that it is working correctly? Need redundancy in sensor system?

cost Availability

Including purchase, maintenance, and replacement of sensor system Is sensor system commercially available?

Sensor Systems for PHM

2.4

41

Examples of Sensor Systems for PHM Implementation

A survey was conducted to determine the commercial availability of sensor systems that can be used in PHM for electronic products and systems. The search only included commercially available sensor systems having features desired by PHM. The survey results (see Table 2.4) show the characteristics of 16 sensor systems from 10 manufacturers. The sensor system characteristics include the sensing parameters, power supply and power management ability, sample rate, onboard memory, data transmission method, availability of embedded signal processing software, size, weight, and cost. The data for each sensor system were collected from the manufacturer’s website and product datasheets, e-mails, and evaluations of demo products. Key findings from the survey are that state-of-the-art prognostic sensor systems: Can autonomously perform multiple functions using their own power management, data storage, signal processing, and wireless data transmission. Have multiple, flexible or add-on sensor ports that support various sensor nodes to monitor various parameters such as temperature, humidity, vibration, and pressure. Have onboard power supplies, such as rechargeable or replaceable batteries. Have onboard power management, allowing control of operation modes (active, idle and sleep), and programmable sampling modes (continuous, triggered, or threshold) and rate. These management strategies, combined novel battery technologies and low-power-consumption circuitry, enable the sensor system to operate longer. Have diverse onboard data storage capacity (flash memory), from several kilobytes to hundreds of megabytes. Have embedded signal processing algorithms, which enable data compression or simplification prior to data transfer.

Table 2.4: Characteristics of Sensor Systems Identified

44

2.5

Prognostics and Health Management of Electronics

Emerging Trends in Sensor Technology for PHM

In general, sensor technology is headed toward extreme miniaturization, wireless networks, ultra-low-power consumption, and battery-free power. Since electronic systems and components continue to decrease in size, sensors to monitor their environment and operation will also become smaller and weigh less in order to be integrated. As MEMS (microelectromechanical systems) or NEMS (nanoelectromechanical systems) and smart material technologies mature, MEMS sensors or nanosensors will integrate the sensing element, amplification, analog-to-digital converter and memory cells into a microchip. The fabrication of MEMS and NEMS will offer significant advantages or integration with electronics, fabrication of arrays of sensors, small size of individual devices, low power consumption, and lower costs [6]. With the development of new materials and energy technologies, battery-free sensor systems are being considered, especially for use in embedded, remote, and other inaccessible monitoring conditions. Battery-free sensor systems will be developed based on ultra-low-power electronics and the energy harvesting technologies. Ultra-low-power electronics enable future sensor systems to consume much lower power. For example, in 2005, Intel Corporation was developing an ultra-low-power derivative of its high-performance 65-nm logic manufacturing process. Intel made several modifications to the design of the transistor. Lost electricity leaking from these microscopic transistors, even when they are in their “off’ state, is a problem that is a challenge for the entire industry. The modifications will result in significant reductions in the three major sources of transistor leakage: subthreshold leakage, junction leakage, and gate oxide leakage, which translates into lower power requirements and increased battery life. Test chips made on the ultra-low-power 65-nm process technology have shown transistor leakage reduction roughly 1000 times from our standard process [7]. This technology allows us to produce very low power chip for laptops and small-form factor devices. Another example of ultra-low-power electronics is the modem pacemaker. A modem pacemaker consumes between 10 and 40 pW from its internal cell. Taken together, one million pacemakers consume less than half the power of a common 40-W household incandescent light bulb. Pacemakers remain implanted, regulating the patient’s cardiac rhythm 24 hours a day, seven days per week, for seven years or more on a single pacemaker power source [8]. Energy harvesting technology is a process to extract energy from the ambient or from a surrounding system and convert it to usable electrical energy. Some large-scale energy harvesting schemes, such as wind turbines and solar cells, have made the transition from research to commercial products. The interest in small-scale energy harvesting for embedded sensor systems, such as implanted medical sensors and sensors on aerospace structures, is increasing. Current energy harvesting sources include sunlight, thermal gradient, human motion, body heat, wind, vibration, radio power, and magnetic coupling. Several basic effects used in energy harvesting include electromagnetic, piezoelectric, electrostatic, and thermoelectric effects. For example, the mechanical vibration in the device or ambient can be converted into electric energy by piezoelectric materials or electromagnetic induction. Piezoelectric materials form transducers that are able to interchange electrical energy and mechanical vibration or force. The electromagnetic induction systems are composed of a coil and a permanent magnet attached to a spring. The mechanical movement of the magnet, which is caused by device or ambient vibration, induces a voltage at the coil terminal and this energy can be delivered to an electrical load. The thermal energy is often converted into electric energy by the thermoelectric generators (TEGs). With the recent advances made in nanotechnologies, the fabrication of MEMS-scale TEG devices has been actively studied. The combined use of several energy harvesting sources in the same device can increase the harvesting capabilities in different situations and applications and can minimize the gap between the required and harvested energy [9].

Sensor Systems for PHM

45

Distributed sensor networks (DSNs) consist of multiple sensor nodes that are capable of communicating with each other and collaborating on a common sensing goal. The advantage of DSNs is that they allow data from multiple sensors to be combined or fused to obtain inferences that may not be possible from a single sensor. This is referred to as multisensor data fusion. The sensor nodes in a DSN are organized into a cooperative system. The nodes can communicate with each other and have the ability to self-organize. The development of wireless transmission technology will realize long-distance, hightransmission-rate, and more secure data communication for future sensor systems [ lo]. Furthermore, future smart sensor nodes will be highly intelligent, with more functions than today’s. They will have built-in diagnostics and prognostics capabilities, which will make the entire wireless sensor network more functional.

References 1.

2. 3.

4. 5. 6.

7. 8.

9. 10.

N. Vichare and M. Pecht, “Prognostics and Health Management of Electronics,” IEEE Transactions on Components and Packaging Technologies, Vol. 29, No. 1, pp. 222-229, March 2006. J. R. Carstens, Electrical Sensors and Transducers, RegentdPrentice-Hall, Englewood Cliffs, New Jersey, 1993. T. Karygiannis, B.Eydt, G. Barber, L. Bunn, and T. Phillips, “Guidelines for Securing Radio Frequency Identification (WID) Systems,” NIST Special Publication 800-98, National Institute of Standards and Technology, Gaithersburg, MD, April 2007. SpaceAge Control, “Sensor Total Cost of Ownership,” White Paper, available: http://www.spaceagecontro1.comisO54a.htm, accessed December 30, 2007. W. Claes, W. Sansen, and R. Puers, Design of Wireless Autonomous Datalogger ICs, Springer, Dordrecht, The Netherlands, 2005. H. Metras, “Trends in Wireless Sensors: Potential Contributions of Nanotechnologies and Smart Systems Integration Issues,” paper presented at the 2006 IST Conference, Helsinki, Japan, November 2006. “Intel’s New Ultra-Low Power Manufacturing Process Will Stretch Battery Life,” Technologv@,Intel Magazine, pp. 1-4, October 2005. Zarlink Semiconductor Inc., “How Low is Ultra LOW?”, available: http://ulp.zarlink.com/how-low.htm, accessed January 7,2008. L. Mateu, and F. Moll, “Review of Energy Harvesting Techniques and Applications for Microelectronics,” Proceedings of SPIE, Vol. 5837, pp. 359-373, 2005. B. Betts, “Smart Sensors,” IEEE Spectrum Magazine, Vol. 43, No. 4, pp. 50-53, April 2006.

This Page Intentionally Left Blank

Chapter 3 Data-Driven Approaches for PHM Prognostics and health management for electronic systems aims to detect, isolate, and predict the onset and source of system degradation as well as the time to system failure. The goal is to make intelligent decisions about the system health and to arrive at strategic and business case decisions. As electronics become increasingly complex, performing PHM efficiently and cost-effectively is becoming more demanding. This chapter discusses data-driven (DD) prognostic techniques that can use available and historical information to statistically and probabilistically derive decisions, estimates, and predictions about the health and reliability of electronic systems.

3.1

Introduction

The practice of monitoring the health of a system entails understanding or learning about health-versus-unhealthy system behavior. Predicting future behavior is tied to the ability to learn from the past. In this regard, the field of machine learning (ML) is appropriate for the DD approach to PHM. ML is a subfield of artificial intelligence concerned with the design and development of algorithms and techniques that allow computers to “learn.” The major focus of ML research is to extract information from data automatically by computational and statistical methods, then apply supervised and unsupervised learning methodologies. This chapter discusses some of the fundamental DD algorithms to provide an understanding of learning algorithms and their relevance to PHM applications. ML is a methodology for performing PHM for electronic systems, while statistics and probability theory are its backbone. Learning algorithms can be combined to increase the efficiency and effectiveness of the algorithms. The most suitable learning algorithm will depend on the environment and system characteristics. In combination with numerical optimization, statistics and probability are used to make decisions about current and future system health based on the data. ML extracts relevant data to explain the trends and characteristics of system health in such a way as to make statistical and probabilistic estimates more accurate. Figure 3.1 illustrates the top-level breakdown for PHM implementation and identifies three distinct ways to approach PHM. The PoF approach uses underlying engineering and failure principles to model and predict remaining useful life. The DD approaches make health decisions and predictions based purely on the data available. The third category is a hybrid approach that utilizes information from both PoF and DD approaches to make decisions and predictions about system health and remaining useful life. This hybrid approach benefits from the merits of both the PoF and DD approaches while eliminating some of their drawbacks. Prognostics and Health Management of Electronics. By Michael G. Pecht CopyrightC 2008 John Wiley & Sons, Inc.

41

Prognostics and Health Management of Electronics

48

Figure 3.1: Approaches to health monitoring and anomaly detection.

DD approaches are an economical way to automatically monitor the health of large multivariate systems and are capable of intelligently detecting and assessing correlated trends in the system dynamics to estimate the current and future health of the system. ML is a path to DD PHM because, as an algorithmic approach, it incorporates statistical and probability theory in addition to data preprocessing, dimensionality reduction by compression and transformations, feature extraction, and cleaning (denoising). Moreover, ML utilizes computational solutions to otherwise expensive or intractable theoretical alternatives. ML is also usefbl in situations in which user interaction is required, through which and as a result of which the machine can learn the task it is intended for and, in the case of PHM, learn the relationship between data trends and physics, modes, and mechanisms of failure. These characteristics make ML an attractive approach for PHM. It has achieved success in several other fields as well: 1. Speech recognition. Available commercial systems for speech recognition all use ML in one fashion or another to train the system to recognize speech. 2. Computer vision. Many current vision systems, from face recognition systems to systems that automatically classify microscope images of cells, are developed using ML, again because the resulting systems are more accurate than handcrafted programs. 3. Biosuweillance. A variety of government efforts to detect and track disease outbreaks now use ML. The following sections present statistical techniques used in ML for PHM. The specific ML approaches are discussed.

3.2

Parametric Statistical Methods

Parametric statistical techniques assume that the data fits a certain type of distribution (for example, the Gaussian distribution) and that the parameters (such as the mean and the standard deviation) of the distribution are calculated from the data. The data are represented by the parameters, and the formulation of the test to classify the data is based on these parameters. Some parametric approaches are introduced in this section. Figure 3.2 shows the statistical techniques used in ML for PHM.

49

Data-driven Approaches for PHM

Figure 3.2: Statistical approaches.

3.2.1

Likelihood Ratio Test

The likelihood ratio test (LRT) is a statistical test of the goodness-of-fit between two models. It is aimed at testing the null hypothesis against an alternate hypothesis. It compares the maximum likelihood under the alternative hypotheses with that under the null hypothesis. The test statistic can be expressed by r = P@, I H"",,1 P(X, I H,,,)

(3.1)

Prognostics and Health Management of Electronics

50

where x,is the observed data and P(xl 1 H ) is the conditional probability of x,provided the hypothesis H is true. The hypotheses being tested are Hnull, the null hypothesis, and Halt,the alternative hypothesis. The value of r is concentrated near 1 if Hnullis true. The LRT rejects the null hypothesis if the value of this statistic is too small. LRTs as part of hypothesis testing are used in ML and applied to the extracted features of the data to make a decision about the health of the system. LRT has been used by Linping et al. [ l ] for the purpose of fault detection to provide proactive fault management. LRT has been successfully implemented for advanced electronic prognosis in the SPRT. The SPRT algorithm using LRT has been used for condition monitoring [2]. Lopez [3] has used SPRT and the MSET incorporating the concepts of the LRT and the Neyman-Pearson criterion for PHM of electronics.

3.2.2

Maximum Likelihood Estimation

Maximum likelihood estimation (MLE) states that the desired probability distribution is the one that makes the observed data “most likely” - that is, the value of a parameter vector maximizes the likelihood function of a given distribution [4]. MLE is a statistical method used to calculate the best way of fitting a mathematical model to data. A statistical inference or procedure should be consistent with the assumption that the best explanation of a set of data is provided by O’, a value of 6’ that maximizes the likelihood function, L(6’). When 6’ is a single real parameter, it is typical that L is continuously differentiable, so that L’vanishes at 6” [5]. Also, since In L has a maximum value whenever L does, 6” satisfies the likelihood equation:

d

-In L(B) = 0 dB In such cases, to find a maximum likelihood state, the likelihood equation 3.2 is solved and its roots are possible maximizing values of 6‘. A maximum likelihood-based method has been used on gyro signals to detect faults on spacecraft by Wilson et al. [6]. MLE has been used along with wavelet denoising of vibration signals from gears and roller bearings by Lin et al. [7].Platt [8] used MLE to calibrate support vector machine (SVM) posterior class probabilities by estimating the parameters of a sigmoid function.

3.2.3

Neyman-Pearson Criterion

Hypothesis testing generally refers to checking whether the variation between two given sample distributions can be explained by chance or not. In hypothesis testing, there are two types of errors possible: the error of rejecting a null hypothesis when it is actually true (Type I error), and the error of failing to reject a null hypothesis when the alternative hypothesis is the true state of nature (Type I1 error). A Type I error, denoted by a, is called a “false positive”; a Type I1 error, denoted by p, is defined as “false negative.” Learning in general will invariably involve mistakes, ideally only a fraction of them. ML algorithms need to be able to tractably manage and reduce these alarms. Management of alarm types is an important part of ML, discussed later in this chapter. The goal of the Neyman-Pearson criterion is to design a classifier (based on training data) that minimizes the probability of missed alarms while constraining the probability of a false alarm to be less than some user-specified significance level a [9]. The null hypothesis is rejected based on the values of the two error types.

Data-driven Approaches for PHM

51

Xiang [ 101 used the Neyman-Pearson criterion for distributed detection of faults using a number of sensors in parallel with one fusion center. Durham and Younan [ 111 used the Neyman-Pearson hypothesis test to detect targets in radar applications.

3.2.4

Expectation Maximization

The expectation maximization (EM) algorithm is an iterative process mainly used to solve MLE problems. Every iteration of the EM algorithm consists of an E-step (expectation) and an M-step (maximization). In the expectation, or the E-step, the missing data are estimated given the observed data and the current estimate of the model parameters. In the M-step, the log-likelihood function is maximized under the assumption that the missing data are known. The estimates of the missing data from the E-step are used in lieu of the actual missing data [12]. The parameters found in the M-step are then used to begin another E-step, and the process is repeated. The EM steps repeat until the log-likelihood of the data does not change. Expectation maximization is a description of a class of algorithms, not a specific algorithm. For example, the Baum-Welch algorithm is an example of an EM algorithm applied to hidden Markov models [13]. EM is particularly usehl when MLE of a complete data model is easy as it makes the M-step trivial. Hamerly and Elkan [I41 used a mixture model of a naive Bayes model with EM training for the detection of hard drive failures. Willis [ 151 used Gaussian mixture models using the EM method for the detection of military targets from hyperspectral images.

3.2.5

Minimum Mean Square Error Estimation

Minimum mean square error (MMSE) estimation is a technique used in statistics to describe the estimator with the least possible mean squared error. The mean square error (MSE) of a point estimator (denoted by A‘) of a parameter x is given by

MSE(X) = [ E ( X -x)’]

(3.3)

A point estimator is a statistic that is a function of the random sample and provides the best estimate of an unknown population parameter. The MSE specifies for each estimator a trade-off between bias and efficiency. The point estimator is the MMSE estimator of the parameter x if, among all possible estimators of x, it has the smallest MSE for a given sample size. Hazel [ 161 has used MMSE estimation with a multivariate Gaussian Markov random field for anomaly detection in multispectral images. The MMSE technique has been used for feature selection to detect faults in cable networks using status signals from the cable modems by Somers et al. [ 171.

3.2.6

Maximum A Posteriori Estimation

Maximum a-posteriori (MAP) estimation is similar to MLE and is considered as the Bayesian version. Given two random variables f and x, where f is the parameter to be estimated and x is the observed parameter, the conditional probability is denoted Pcflx), with the corresponding conditional probability density represented by PA.;.The estimate fmap (given the observation y and prior knowledge of the fault parameter f denoted by pr) is found by [I81 fmap = argmax&, 1 (3.4) By application of Bayes’s rule,

Prognostics and Health Management of Electronics

52

fmap = argmaxf (P,lfPf 1

(3.5)

Samar et al. [ 181 applied MAP estimation for estimating time-varying fault parameters and applied the technique to an unmanned aerial vehicle. A MAP classifier and a class conditional Reed Xiaoli (RX) algorithm which is generalized LRT were used to detect anomalies in hyperspectral imagery by Stein et al. [ 191.

3.2.7

Rao-Blackwell Estimation

Rao-Blackwell estimation is based on the Rao-Blackwell theorem, which states that if 8* is an estimator of a parameter x and T a sufficient statistic for x, then the conditional expectation of ~9* given T is typically a better estimator of x.Improving an estimator using the Rao-Blackwell theorem is also known as the Rao-Blackwellization procedure. When we examine the form of the superior estimator P , we note that it is a random variable that is a function of the sufficient statistic T(Xl,...,X,). Hence, an estimator that is not a function of the sufficient statistic can be improved upon by the process of Rao-Blackwellization. The Rao-Blackwellized particle-filtering technique and, conditionally, Gaussian state space models have been used by de Freitas [20] for fault diagnosis. Flores-Quintanilla et al. [21] have used an enhanced version of the Rao-Blackwellized particle filter along with a dynamic Bayesian network for fault diagnosis in electrical machines.

3.2.8

Cramer-Rao Lower Bound

The Cramer-Rao lower bound (CRLB) is a well-known lower bound for the variance of any estimates achieved by an unbiased estimator. The CRLB can be used to test the performance of an estimator as it sets a lower limit to the MSE of the estimator [22]. The CRLB in general states that the error covariance matrix is greater than or equal to the inverse of the Fisher information matrix [23]. The C U B has been used by Schweizer and Moura [22] for comparison of different estimators as applied to hyperspectral sensor data. It has been used to evaluate the efficiency of the MMSE estimator by Noiboar and Cohen [24] for three-dimensional anomaly detection.

3.3

Nonparametric Statistical Methods

In many practical situations, it is often the case that information regarding the distribution of the data is not known. In this case statistical approaches that are nonparametric in nature are used. Nonparametric methods do not make any assumptions regarding the underlying distribution of data. Some of the methods and their use in detection are described in this section.

3.3.1

Nearest Neighbor-Based Classification

A well-known classification procedure is the kNN procedure where NN stands for nearest neighbor. In this technique, classification of an object is based on a majority vote of its neighbors. The object is assigned to the class that is most common among its k nearest neighbors (where k is a positive integer whose value is typically small). If k = 1, then the object is simply assigned to the class of its nearest neighbor. In binary classification problems, k is chosen as an odd number to avoid tie votes. The algorithm does not need any explicit training step. The objects are represented as position vectors in multidimensional space-forming clusters. The state (normal versus anomaly) of the test sample is assessed by measuring the normalized distance from the

Data-driven Approaches for PHM

53

cluster centers [25]. The problem with this technique is that for large data sets a large number of computations have to be performed. When a collection of n correctly classified samples (xI, yl),(x2,y 2 ),...,(x,, y,) is available (or the data are assumed to represent the normal state), it is reasonable to assume that the observations that are close together will have the same classification. To classify the unknown sample of data, the evidence of each of the nearest points is weighted. It is necessary to first pick the value of k (i.e., the number of nearest neighbors to be considered for classification) that depends upon the data. Larger values of k reduce the effect of noise on the classification but make boundaries between classes less distinct. Next, the squared Euclidean distance between the test point to be classified (xc,yc)and each of the n classified samples is calculated by where D,is the Euclidean distance between the test point and the (xl,yl) for all i (1 5 i 5 n), (x,,yc)represent the test data point, and (x,,yl)are the previously classified samples. The Euclidean distances are then sorted and the k nearest neighbors ( k points with the least Euclidean distances) are found. In the simplest form of the algorithm, the test point is classified into the same category as the maximum number of its k nearest neighbors. This assumes that the classified data contains normal data points as well as anomalies. Other distance measures, such as the Mahalanobis distance, can be used in place of the Euclidean distance. Further, changes in the algorithm can be made to add weights to the vote of each neighbor, depending on the distance from the test point. He and Wang [26] used a combination of principal component analysis (PCA), which is a dimensionality reduction technique, with the k" algorithm for fault detection in semiconductor manufacturing processes. Liao and Vemuri [27] used kNN along with the signature verification technique to detect intrusions in computer network systems. Li et al. [28] used a ML algorithm that incorporated kNN, known as TCM-kNN (transductive confidence machines for k nearest neighbors) to perform intrusion detection on network systems.

3.3.2

Parzen Window (or Kernel Density Estimation)

The Parzen window (or kernel density estimation) technique is a method of estimating the probability density function (PDF) of a random variable. To estimate the PDF of a random variable x that is distributed in a space R , a kernel function is chosen to represent the volume V, in which the data are distributed. Each point x,contributes to the PDF based on its distance from the center of the volume. The general probability density estimation equation is 1 " x - xi (3.7) P(X) = - X K nh i=l h where P(x) is the estimated PDF, n is number of samples of the random variable x,K is the kernel function chosen, and h is the smoothing parameter that decides the contribution of each sample. The extent of the contribution to the estimate by the point x,depends on the shape of the kernel function and the bandwidth chosen. Gaussian kernels are the most often used. However, there are various other choices, such as uniform, triangular, and so on. Qian and Mita [29] used the Parzen window approach to detect the damage location in a five-story structure using small amounts of training data as opposed to parametric techniques. Rippengill et al. [30] used the Parzen window method to classify acoustic emissions from experimental work on a box girder of a

Prognostics and Health Management of Electronics

54

bridge to identify damage initiation and growth based on high-frequency acoustic emissions. Yeung and Chow [31] used this technique for network intrusion detection, a problem of detecting anomalous network connections caused by intrusive activities. He took a nonparametric density estimation approach based on Parzen window estimators with Gaussian kernels to build an intrusion detection system using normal data only.

3.3.3

Wilcoxon Rank-Sum Test

The Wilcoxon-Mann-Whitney rank-sum test is used to determine if two random variables arise from the same probability distribution [32]. It is a hypothesis test on the two samples, one of which is the training data (from the normal samples) and the other is the data from the item currently under test to determine if both the samples belong to the same distribution. The advantage of using ranks instead of the sample data is that the distribution of the ranks can be calculated in advance, reducing the number of computations required during run time. Also, using ranks reduces the effects of noise and outliers on the test [33]. Eklund and Goebel [33] used rank permutations along with neural networks on a simulated statistical distribution for the detection of abnormal conditions in an aircraft engine. Xeu et al. [34] used the rank-sum test along with a ML methodology for anomaly detection using operational data from locomotives.

3.3.4

Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov (KS) test is one of the distribution-free techniques for testing the differences between distributions [35]. In the KS test, no assumption regarding the underlying distribution of the measurements is made. Basically, it is used to compare the observed and expected cumulative distribution functions (CDFs) for a certain number of data points before (expected) and after (observed) each measurement is evaluated. The test statistic is calculated by taking the vertical difference between the two CDFs under test. One of the limitations of this test is that it can only be used for continuous distributions (i.e., not discrete distributions such as the binomial), and it must be h l l y specified (i.e., the location, scale, and shape parameters should be known). To run the KS test, the number of points for the time window of the KS calculation is required. The CDF is calculated for a window of the number of specified points. For example, for a value of 100, the CDF will be calculated on a window of 100 time points before and after the current time stamp. A greater window size increases the accuracy of the CDF while increasing the processing time. The KS test statistic is defined as

where F is the CDF of the distribution being tested. The null hypothesis is rejected if the test statistic D is greater than the critical value obtained from statistical tables available in the literature. Variations of the tables are based on slightly differing scaling factors. Hence, it is necessary to ensure that the test statistic is calculated in a way that is consistent with how the critical values were tabulated. The KS test has been used along with a clustering algorithm by Hall et al. [36] to perform diagnostic testing on bearings of rotating machinery using acoustic emissions. Caberera et al. [37] used KS statistics to model and detect anomalous traffic patterns.

3.3.5

Chi Square Test

The chi square (x’) goodness-of-fit hypothesis test can be used to make inferences about the distribution of a variable from collected information and to determine whether the

Data-driven Approaches for PHM

55

distribution in question differs from that of the known population. In this test, the data are split into k mutually exclusive categories. The number of observations in each cell is known as the observed frequency of the cell. The hypothesis evaluated is whether the observed frequency is significantly greater than the expected frequency. The expected frequency can be found based on empirical information about the system. The test is based on the assumptions that the data used are categorical, consisting of independent random samples, and that the expected frequency of each cell is greater than 5 [38, 391. The chi square statistic is given below, where 0, is the observed frequency and El is the expected frequency from a normal system: x2

= I=I

(0,- E, 1’ E,

(3.9)

The chi squared test statistic has (k-1) degrees of freedom, where k is the number of possible values for the variable (mutually exclusive categories). Based on the significance level ( a ) required, and the degrees of freedom (df = k-I), the critical chi square value is obtained from the table of the chi square distribution. If the results show that the obtained chi square value is equal to or greater than the tabled critical value at the level of significance required, the null hypothesis is rejected. The chi square test has been used by Zhang et al. [40] for the diagnosis of navigation systems. Ye and Chen [41] used a chi square statistic for the detection of intrusions into information systems.

3.4

Machine Learning Techniques

Machine learning techniques are based on acquired data, and the translation from raw data to meaningful information should be achieved by classification and clustering, regression, and ranking since the raw data do not provide any meaningful information. For PHM applications, ML techniques are closely related to the problem of classification (or clustering) of input data because the classes and clusters from classification (or clustering) can be applied to detect, isolate, and predict the health condition of systems. The classification (or clustering) models or algorithms can be accomplished with ML techniques, considering data type, the amount of test and validation data, computational resources, and allowable risk level. The main goal of classification and clustering is to allocate data to a specific class. The difference between classification and clustering is the existence of class labels. The raw data can be grouped into classes. When the class can be identified with a specific label, the ML algorithm can be regarded as classification of data; else the ML algorithm is regarded as clustering of the data. Clustering is different from classification in that the character of a class cannot be identified, even though a class has a given name such as “group A” or “segment 1 .” Classification is the task of arranging data into specific classes according to sorting criteria, where each class has its own label. The classification can be adapted into a PHM methodology to detect anomalies of a system. The classification method is used to identify the health condition of a system and can be extended to predict the span of life in PHM. Each labeled class can be regarded as a specific health condition of a system. The health of a system can be divided into two categories, healthy and unhealthy. With the help of classification, the concept of labeled class data expands to various grades of health conditions of a system-that is, more than just two categories. Different kinds of labeled classes represent different health conditions (the diseases of the system). This idea can be realized by matching a specific abnormal condition with a specific disease name of

56

Prognostics and Health Management of Electronics

the system with the help of an expert or historical data on the system. A specific health condition of a system (even the disease name) can be defined with the concept of classification. Each class of data can be identified by experts who have experience and expertise in the study area or can be directly distinguished by historical data acquired from experiments. This work requires human effort-namely, an expert’s contribution. But it is not an easy task to find an expert who is always able to distinguish classes. In some cases, it is impossible to label a class due to lack of knowledge or background. Furthermore, the work of labeling every class takes time and money. It is often very hard to acquire experiment results in the case of long-life field equipment such as aircraft, computer server systems, automobiles, and so forth. According to the type of data, classification can be into two parts: supervised and unsupervised. “Supervised” means that the data are labeled and the algorithm learns from the distinctions among the labeled data. For a system using supervised learning, the labeled data provide the definition of each class and contain the positive and negative examples that distinguish each other. The classification algorithm extracts the common properties of features from the classes in order to classify the data. The classification of data is the most important part of the algorithm for PHM applications. Once raw data acquired from sensors are stored, then patterns of the collected data are analyzed for identification of the health condition of a system by means of a classification algorithm. The data in unsupervised learning have no predefined classes and do not include any labeled data. The system using unsupervised learning finds clusters from its own data automatically. There are different ways of dividing the data into clusters and many different ways to prescribe clusters. The same data might be differently clustered according to the clustering algorithms. Semisupervised learning is an expanded version of supervised and unsupervised learning. It uses test data for training, as well as a combination of labeled and unlabeled data, to make an improvement in classification and detection accuracy [42, 431. Both supervised and unsupervised learning can be divided into two categories based on the estimation method of the posterior probability. The discriminative approach directly models the conditional probability of y given x, p(ylx), using probability theory, where x is input data and y is the associated label. The approach subsequently uses decision theory to assign new input data into the class. The discriminative approach is supposed to learn a single model that predicts the class label from the observed input data. This gives an answer focusing on task-related criteria without wasting computational resources [44, 451. A discriminative function is a special case of the discriminative approach with 100% posterior probability assigned to the designated label and 0% to the other label. When the collected data are classified into classes with a discriminant function, the decision boundary divides a feature space into two separate areas. Having classified data points into a specific class with a discriminant function, the probability of membership of the data in the assigned class is loo%, even though a point is closely located to a decision boundary- a process that directly maps input data onto a class label without considering the probability of membership of the data. A discriminative function only shows 100% or 0% posterior probability of the decision boundary. The generative approach models the prior probability density p(nly) for each class, y , and then chooses the class that best fits the observed data, x, based on optimization algorithms such as MLE. least square estimation, Bayes’s method, or Monte Carlo Markov chain (MCMC) simulation. In other words, when it comes to approximating an unknown joint probability, p(x,y), the classifier can be obtained by, first, estimating the class conditional probability, p(xly), using, for example, Bayes’s rule to calculate, and then classifying each new data point into the class with highest posterior probability p(y1x). The

Data-driven Approaches for PHM

57

generative approach to classification includes building a classifier, which is a predictor of a class of new input data. The generative approach produces a different probability density model for each class and describes the overall probability over all the variables. As the estimation process of generative models does not optimize the model parameters for a given specific task, it does not necessarily show the optimized classification result to the class of interest. If an overfitting problem can be avoided, the discriminative approach directly optimizes the model parameters without computational complexity. As the discriminative approach only focuses on the decision boundary based on its particular classification, it shows lower residual error than the generatively trained classifier but does not guarantee higher accuracy. It was reported that the generative approach using naive Bayes gives better performance than the conditionally trained discriminative counterpart when the amount of labeled training data is small [46]. The modelsialgorithms for supervised and unsupervised classification are shown in Figure 3.3.

3.5

Supervised Classification

For supervised classification, the training data already have their own label and the health conditions of the acquired input data are identified. The application of the appropriate models and algorithms for the known labeled data and the classification of new input data onto a specific labeled class using the posterior probability model are of interest in this section. Approaches for supervised classification can be divided into discriminative and generative approaches, according to the estimation method of the posterior probability density.

3.5.1

Discriminative Approach

In the discriminative approach of supervised classification, the posterior probability of y given x, p(ylx), is directly defined at the inference stage and then new input data, x, are classified into a designated class based on decision theory. The posterior probability is optimized without considering the prior probabilities, such as p ( x ) ,p(x1y). A discriminant function is a model with 100% of the posterior probability to the given class in the discriminative approach. Each class space can be separated clearly with the decision boundary. If the input data belong to a certain class, the membership of the class is 100%. The simplest form of a discriminant function is the linear discriminant mentioned below.

3.5.1.1 Linear Discriminant Analysis Linear discriminant analysis (LDA) is a model directly for the discriminant, bypassing the estimation of the likelihoods or posteriors. There are no assumptions of the knowledge of densities, such as whether they are Gaussian or whether the inputs are correlated. The main goal of the discriminant classifier is to find the correct estimation of the boundaries between the class regions. This process can be started from the simplest case, where the discriminant functions are linear in x; this is called the linear discriminant classifier, which can be expressed by 1’ g , (x 1 w,,wio) = w, x + wio=

c d

wI/x

+ wz0

(3.10)

J

where w,is the weight vector, wois the threshold value, and x is the training data vector. The linear discriminant classifier can be divided into two-class and multiple-class problems according to its geometry. In the two-class problem, one discriminant function is

58

Prognostics and Health Management of Electronics

Figure 3.3: Diagram of ML approaches for YHM.

Data-driven Approaches for PHM

59

sufficient and defines a hyperplane. The hyperplane divides the input space into half sides. If the value from the discriminant function is more than the threshold value wg,the location of input data is on the positive side; if not, it is on the negative side. The threshold value wo determines the location of the hyperplane with respect to the origin, and the weight vector w, determines its orientation. In multiple-class ( N 2 2 ) problems, there are N discriminant functions. The input data should be linearly separable to form a hyperplane with N discriminant fimctions. The longest distance between the input point and the hyperplane with each linear discriminant is formed to assign each class. This linear discriminant classifier divides the input space into N decision regions. Epstein et al. [47] applied the hypothesis testing and discrimination analysis technique for multivariate variables to detect and classify faults in the process of IC designs combined with conventional circuit simulation (i.e., with SPICE). In spite of the method’s potential applicable benefits, such as tracking IC failures in field use, the authors point out some limitations of the techniques in real applications. For example, more complex circuits could have a higher probability of multiple simultaneous faults. The potential combination of faults should be accounted for in the training step. Yoshida and Nappi [48] developed three-dimensional computer-aided diagnosis techniques for automated detection of colonic polyps in computed tomography data sets. After classifying the colonic polyps from the three-dimensional images, they applied LDA to reduce false positives by classifying the extracted polyps into two categories: true positive and false positive. Goodlin et al. [49] demonstrated that the methodology with fault-specific charts that was constructed using an orthogonal LDA has improved sensitivity for detection of faults in semiconductor manufacturing processes. LDA was used both to discriminate a given class of fault from the normal process condition and to distinguish a specific class of fault from the other class of faults and the normal process condition. It was shown that the LDA methodology is effective in reducing the fault detection and classification time by combining the detecting and classifying processes. 3.5.1.2 Neural Networks The neural network (NN) has its origins in finding mathematical representations to describe biological systems that transfer information between a series of neurons. Novelty detection methods based on NNs were categorized with reviews of recent researches by Markou and Singh [50]. These include the multilayer perceptron (MLP) approach, SVMbased approaches, adaptive resonance theory (ART) approaches, radial basis function (RBF) approaches, auto-associator approaches, Hopfield networks, oscillatory NNs, the selforganizing map (SOM) approach, habituation-based approaches, and the neural tree. The MLP is a specialized model for practical application of pattern recognition and classification closely related to fault detection and diagnosis problems. A series of steps to form function models of a system of interest were summarized by Bishop [51]. The first step is to consider the functional form of the network model, including the specific parameterization of the basis functions, and to determine the network parameters within a maximum likelihood framework-which includes the solution of a nonlinear optimization problem. The error-back propagation technique is adapted to evaluate the derivatives of the log likelihood function with respect to network parameters. The second step is to extend the back propagation framework to allow other derivatives to be evaluated, such as the Jacobian and Hessian matrices. The last step is to regularize NN training by various algorithms. The NN model can be extended to Bayesian treatments of NNs or to a general framework by modeling a conditional probability distribution, such as mixture density networks.

60

Prognostics and Health Management of Electronics

Petsche et al. [52] demonstrated that a NN-based auto-associator can be used to detect imminent motor failures. The auto-associator to form the NN-based model was trained with four motor current measurement data acquired from healthy motors. A motor monitoring system using the auto-associator was designed and built. 3.5.1.3 Support Vector Machines Support vector machines are a set of supervised learning methods used for classification. SVMs simultaneously minimize empirical classification errors and maximize the geometric margins. Each data point, N-dimensional vector, belongs to one of two classes divided with (N-1 )-dimensional hyperplanes. The goal is to maximize the separation margin between the two classes. The hyperplane that maximizes the distance from the hyperplane to the nearest data points is called the maximum margin hyperplane. The mathematical form is expressed as C,(wx,-b) L 1 15i5n (3.1 1) where the vector w points perpendicularly to the separating hyperplane and x, is an Ndimensional (usually normalized) vector. Adding the offset margin b allows one to increase the margin. The value of c, is either + I or -1, denoting classes to which point x, belongs. The concept of a linear classifier can be expanded to a nonlinear classifier by applying the kernel trick to the maximum margin hyperplane. In this case, data points that cannot be separated into two classes with a linear classifier are separated into two classes by mapping to another space. This procedure involves two processes: mapping to another space and returning to the original space. The resulting algorithm is similar, except that every dot product is replaced by a nonlinear kernel function. Some common basis functions include the homogeneous polynomial k ( x , x ’ ) = ( ~ Y )inhomogeneous ~, polynomial k(x,x?=(x.x’+ l)d, and RBF k(x,x’)=exp(-y 1 /x-x’l12) for y > 0. Sarmiento et al. [53] showed that the one-class SVM with RI3F kernel was successfully applied to detect faults in reactive ion etching systems. The training data were acquired during normal operation of the system and the one-class SVM model was trained with the acquired data. The results suggest that this technique detects equipment faults with 100% accuracy. Since the one-class SVM in this approach is the simplest case of SVM that discriminates only between a normal condition and an abnormal condition with one class, the 100% accuracy is not surprising. Poyhonen et al. [54] demonstrated that an SVM-based classification methodology can ue useful to detect faults in induction motors. The electrical motor was supposed to have seven classes (one healthy condition and six faulty conditions) and the SVM-based classification had 2 1 two-class classifiers used to make final decisions from the acquired input signals. The use of the kernel function enables the number of the faults not to be limited; this is one of the powerful benefits of the SVM approach, compared with the NN approaches. The SVM model was trained with one half of the generated sample data and verified with the other half. The experimental results showed that all the faults were classified correctly into their own classes when all the training and verification data were validated. 3.5.1.4 Decision Tree Classifier The decision tree classifier is a supervised learning algorithm based on a treestructured model of feature attributes used to predict an expected value based on given data. Predicting future output in the decision tree classifier consists of extracting patterns from test data and making a judgment about whether the extracted pattern is useful or not.

Data-driven Approaches for PHM

61

The decision tree starts from the root node and conditionally chooses the splits of one or more branches from the high information gain that provides information about which attribute should be chosen to be split first. This procedure is recursively applied to the children nodes and the tree is grown until three base cases are achieved. The first base case comprises all records of the current data subset having the same output. The second base case includes all records having the same set of input attributes. The last base case is of all attributes that have zero information gain. Once the decision tree is set up, the predicted value needs to be compared with real test data. The smaller the test set error, the better the prediction. Chen et al. [ 5 5 ] suggested a decision tree approach to detect failures and localize the root causes of the failures in a commercial Internet site (eBay). The MinEntropy algorithm was adapted to make the decision tree splits by maximizing the information gain. After the decision tree was established, the postprocessing included noise filtering, node merging, and ranking selected important features that correlate with the largest number of failures. The experimental results showed that the algorithm successfilly identified 13 out of 14 true causes of failure along with 2 false positives. But the real application of this approach was restricted to detect 14 labeled faults out of hundreds. Automatic classification of hundreds of unlabeled faults is required and is a part of unsupervised classification. Stein et al. [56] demonstrated the decision tree approach using a generic algorithm to reduce the dimension of the feature space for network intrusion detection. Higher features (in this case 41 features) in the training and validation of the decision tree model caused some problems, such as noise, time consumption, and overfitting. The generic algorithm randomly generated the feature population (hypothesis testing data) composed of 4 1 features and recursively created each decision tree model. Then the most appropriate model was selected by the criteria of the detection and false positive rates. The experimental result showed that the hybrid model outperformed the decision tree model without the feature selection by eliminating unnecessary or distracting features.

3.5.2

Generative Approach

The generative approach of supervised classification models the prior probability density p(xly) for each class y and then chooses the class that best fits the observed data x based on optimization algorithms. The most widely used form to calculate the posterior probability in the generative approach is Bayes’s theorem of the form (3.12) where x is input data and y is its relevant class. To estimate the posterior probability pkix), the prior probabilities including p(xly),p k ) , and p(x) are first calculated in the inference step. The simplest approach of using Bayes’s theorem is the naive Bayesian classifier described below. Alternative algorithms in the generative approach include the tree-augmented naive (TAN) Bayes, forest-augmented naive (FAN) Bayes, and generative hidden Markov model (HMM). More information about Bayesian networks of the naive Bayesian classifier, TAN Bayes, and FAN Bayes can be found in reference [57]. 3.5.2.1 Naive Bayesian Classifier The naive Bayesian classifier (NBC) is a simple probabilistic classifier based on applying Bayes’s theorem with a naive independent assumption. Bayes’s theorem describes the posterior probability in terms of given probability information. The conditional independence assumption of the naive Bayesian theorem makes the probability calculation

62

Prognostics and Health Management of Electronics

very simple. For prior probability of multiple states, p(x1,...J,), Bayes’s theorem under the independence condition is (3.13)

where Z is a scaling factor dependent only on the feature states xi, ...,x,-that is, it is constant if the feature states are known. This approach is not applicable to cases where the conditional prior probability depends on greater than two classes (the number of y 3 2), but bias in appraising probability makes little difference in practice. For this reason, the method can process high-dimensional data from complex electronic systems, which usually collect signal data from hundreds of sensors. The NBC puts importance on the estimation of the probability density. The rate of misclassification uniquely depends on the quality of the probability density estimation. Lemer [58] summarized three major density estimation methods: the single Gaussian estimation (SGE; parametric method), Gaussian mixture model (GMM; semiparametric method), and kernel density estimation (KDE; nonparametric method). In his case study, for low-dimensional densities of the fluorescence in situ hybridization (FISH) signal data, the GMM outperformed the KDE, since the KDE had a tendency to overfit the training data. In contrast, for high-dimensional densities, GMM lost some accuracy because of the violation of the conditional independence assumption. The classification result of the SGE was inferior to GMM or KDE. Even though the classification accuracy of the NBC was not better than NNs, it avoids the intensive training and optimization required for NNs. All the parameters, including the class priors and the feature state probability distribution, can be approximated to frequencies with training data. In any case, the probability estimate value should not be zero. It is problematic if a given class and a feature state never occur together, because all the other feature states in a given class might be useless. In addition, the selection of the number of training data sets is important because it sometimes causes underfitting or overfitting problems. The naive Bayesian classifier combines the conditional distribution over a class variable (i.e., the Bayes’s probability model) with a decision rule. One common rule is to pick the hypothesis that is the most probable. This is known as the MAP decision rule. 3.5.2.2 Hidden Markov Model The HMM is a stochastic model appropriate for sequences including deletion, insertion, and iteration. Although the states in an HMM are not observable, the observation from the each state can be expressed in the form of a probabilistic function of the state; p,(m)=P(O,=z,lq,=x,), where pi is an observation. It is assumed that the discrete observation in each state from the set is {zl.z2,...,zw}, m=1,2 ,...,M, in state x,. The probability model is homogeneous in that the probability does not depend on t. The observed values make up the observation sequence 0, but the state sequence Q is not observed directly. This is why the model is called “hidden.” The hidden state sequence Q is inferred from the observation sequence 0. There are many different state sequences that could generate the same observation sequence with different probabilities. For example, it is possible to establish an infinite number of normal distributions with mean and standard deviation pairs. The main goal with an HMM is to get the model with the highest likelihood of generating the sample. The parameter set of an HMM is established in the form i= (A,B,IT). The typical process with an HMM is to estimate the parameters of the model, given a training set of sequences. The three basic problems should be solved to implement an HMM. The first one is an estimation of probability of any given observation sequence, P(Oli), where 0 =

Data-driven Approaches for PHM

63

(01,02, ...,OT$.The second is to find state sequences Q* = {q1,q2, ...,qT} which have the highest probabilities of generating 0 given iand an observation sequence 0. The last one is to optimize model parameters A* that maximize the probability of generating X-that is, maximization of PkIR) given a training set of observation sequences, r { o k } k It. is known that the evaluation problems of finding the state sequences and optimization of the model parameter problems can be solved with a forward-backward algorithm, the Viterbi algorithm, and an EM algorithm, respectively. Optimization of model parameters using the training data can be achieved with the Baum-Welch algorithm, which is one of the most popular EM algorithms [591. Smyth [60] applied HMM to monitor the health condition of NASA’s 34 m-by-70 m ground antenna. The combination of the generative (HMM) and discriminative (NN) approach showed higher sensitivity than the discriminative approach alone. The ability to combine the two different models is the critical factor in the successful application to the problem of antenna health monitoring.

3.6

Unsupervised Classification

In unsupervised learning, the given data have no predefined classes and do not include any labeled data. The algorithm using unsupervised learning finds clusters by itself from its unlabeled data. There are different ways of dividing the data into clusters and many different ways to prescribe clusters. The same data might be differently clustered according to its clustering algorithm. Acquisition of labeled input data is costly because an expert needs to distinguish the class of data. Models for unsupervised classification can also be divided into two parts, as in supervised classification: the generative and discriminative approaches. (These were already overviewed in Section 3.4). Some examples for unsupervised classifiers are given below. 3.6.1

Discriminative Approach

In this section, some of the discriminative approaches for unsupervised learning are introduced. Principal component analysis mostly aims to reduce the size of the data sets; independent component analysis separates a mixed signal set into individual signal sets. HMM- and SVM-based approaches are introduced and combined with novel techniques for unsupervised learning. Lastly, a brief overview of particle filtering is presented. 3.6.1.1 Principal Component Analysis Principal component analysis is a dimension-reduction method that finds a mapping from the inputs in the n-dimensional space into d- (less than n) dimensional space with minimum loss of original information. It tries to capture maximum variation from the input data in terms of principals. A principal component is defined as a set of vectors that represents an orthogonal projection that encapsulates the maximum variation of input data. This process can be accomplished using a mathematical tool called singular value decomposition (SVD). The eigenvector and eigenvalue resulting from the SVD calculation represent the correlation degree of the input data. If the input data are highly correlated, the number of the eigenvector will be small. If not, the number of the eigenvector will be large and there will not be much gain in the dimensional reduction through PCA. Bouzida et al. [61] performed the PCA method to reduce the huge quantity of information in the given different data sets in the KDD99 intrusion detection cup before applying the ML algorithms. PCA enabled the reduction of the information in the different data sets without significant loss of information. After the quantity of the data sets was decreased, another ML algorithm, such as the decision tree or nearest neighborhood, was

64

Prognostics and Health Management of Electronics

performed to classify the data. Those algorithms with PCA outperformed the ones without PCA in terms of training time. However, it was reported that, when combining ML algorithms with the PCA, there is a trade-off between the computation time and the prediction accuracy. Zhou et al. [62] combined PCA with HMM for on-line fault diagnosis and illustrated case studies from the Tennessee Eastman plant. PCA was also used to optimally reduce the large number of variables. In the same manner, Liang and Wang [63] also combined PCA with the statistical control chart to reduce the multivariate variables on an industrial rolling mill reheating furnace. 3.6.1.2 Independent Component Analysis Independent component analysis (ICA) is closely related to the blind source problem. The term ‘‘blind’’ indicates that the signal is mixed and how to mix the signals is unknown. The goal of the blind source separation is to separate blind sources into independent sets of linear coordinate systems (the unmixed systems). The resulting signals are statistically independent. PCA is similar to ICA in separating the signals into linearly independent signals, but ICA both segregates the signals and reduces higher order statistical dependencies. Two different ICA approaches [64] have been developed. One models the problem of separating mixed sources observed in an array of sensors. The other uses unsupervised learning rules based on information theory, which aims to maximize the mutual information between the inputs and outputs of a neural network. Jung et al. [65] analyzed the weak signals from electroencephalographic (EEG) and magnetoencephalographic recordings. These two signals, recorded from the surface of the scalp, have been studied for nearly a hundred years, but the characteristic of the signals was not identified because of multiple brain generators. An ICA algorithm based on the Infomax principle was applied to identify the different types of EEG generators as well as their magnetic counterparts. It analyzed the brain signals recorded using functional magnetic resonance imaging (fMRI), which includes a mixture of brain signals. Gao et al. [66] proposed a neural network model to extract independent components from nonlinearly mixed signals. A polynomial-based neural network demixer for nonlinear ICA was constructed with a gradient based on a parameter learning algorithm to provide optimal performance; it was tested using real-life speech signals. The proposed method outperformed other linear and nonlinear algorithms in terms of accuracy and dynamic convergence speed. 3.6.1.3 HMM-Based Approach Xu et al. [67] presented an unsupervised algorithm for training structured predictors that is discriminative and does not make use of EM algorithms. It was shown that the discriminative training criterion for structured predictors like HMM does not create local minima, although an unsupervised version of structured learning methods can be trained via semidefinite programming. The experimental results of heuristic procedures that avoid dependence on semi-definite programming revealed that the discriminative procedure makes better conditional models than conventional training with EM algorithms. 3.6.1.4 SVM-Based Approach SVM is a supervised ML technique based on labeled input data, but some efforts have shown that SVM can be expanded to unsupervised learning techniques. Xu and Schuurmans [68] introduced an unsupervised training algorithm for multiclass SVM based on

Data-driven Approaches for PHM

65

semidefinite programming. It revealed that semidefinite relaxation can yield an optimum SVM classifier in the resulting training data for two-class problems as well as multiclass problems, although the training procedures are computationally intensive. 3.6.1.5 Particle Filtering For nonlinear systems or non-Gaussian noise, there is no general analytic solution for the state-space PDF. Particle filtering (PF), also known as the sequential Monte Carlo (SMC) method, is a model estimation technique based on simulation. It approaches the Bayesian optimal estimate with a large number N of samples (N+infinite). This is the sequential analogue of MCMC batch methods. It is often an alternative to the extended Kalman filter (EKF). Saha et al. [69] combined the PF technique with the model developed from relevance vector machine (RVM) regression to predict Li ion battery remaining useful life. He illustrated that the Bayesian regression-estimation approach implemented as a RVM-PF framework has significant advantages over conventional methods for remaining-life estimation. There are several PFs, such as auxiliary PF, Gaussian PF, unscented PF, Monte Carlo PF, Gauss-Hermite PF, and cost reference PF. They are all based on model estimation by simulation but vary by adding or changing sampling algorithms. De Freitas [70] introduced Rao-Blackwellized PF, which only samples the discrete states for fault diagnosis and showed improved results over PF in terms of computational time and diagnosis error. 3.6.2

Generative Approach

The concept of the generative approach was already discussed in Section 3.4. In this section, some of the generative approaches, including the hierarchical classifier, k nearest neighbor, and fuzzy C-means classifier, are presented. 3.6.2.1 Hierarchical Classifier The hierarchical classifier is a method for clustering that only uses similarities of instances without any other requirement in the data. A goal of hierarchical clustering is to find clusters with instances that are more similar to each other than instances in other clusters. To make the number of clusters defined by the user, the distance between the data points are defined (usually using Euclidian distance). An agglomerative clustering algorithm starts with N groups, initially containing a single instance. A single instance combines with a closest single instance, resulting in a larger group. At each iteration, a short-distance instance combines with a larger group until the defined number of groups is reached. Once an agglomerative method finishes, results can be drawn into a hierarchical structure known as a dendrogram, which visualizes the hierarchical clustering and enables clustering at a user-defined level [7 11. Virmani et al. [72] suggested a DNA methylation pattern recognition method using hierarchical clustering, which can be a first step in developing a molecular marker to achieve accurate diagnosis of lung cancer. Recent research of global and gene-specific methylation patterns in cancer cells showed that cancers from different organs demonstrate distinct patterns of Cytosine Guanine (CpG) island hypermethylation. As a CpG island in a given organ exhibits distinct methylation patterns, a methodology to distinguish the different patterns of CpG islands enables determination of the subtypes of lung cancer. It was found that hierarchical clustering using a panel of seven loci yielded two major clusters closely related to a severe type of lung cancer.

66

Prognostics and Health Management of Electronics

3.6.2.2 kNearest Neighbor Classifier The kNN classifier is an unsupervised learning algorithm that divides data points into k clusters by minimizing the objective function. The data points from a sample can be grouped into k clusters that represent the degrees of health. The degrees of health can be detected based on the geometric characteristic between each data point and the centroid of the cluster. Varying degrees of health of a system are described by the centroid and the density. The relative position of the centroid shows the variation of health states from one condition to another. The density of a cluster gives the change of health state within one condition. Finding the optimized centroids from the raw data points is the goal. This can be accomplished by minimizing the objective function. Some examples of the objective function include total distance between a point and its centroid, the maximum distance to its centroid for any point, and the sum of the variances over all clusters. The objective function is related to the residual between each data point (n-dimensional vector) and the centroid of the cluster. The residual should always be minimized to obtain the optimal positions of the centroids. One of the popular objective functions is the squared error function. The objective function J i s given by (3.14) where 1 ~xlo’-cJ~ 1‘ is a chosen distance measure between a data point xlU)and the cluster center c,, which is an indicator of the distance of the n data points from their respective cluster centers. The optimal number of clusters can be determined by an approach that tries to find the solution to minimize the Schvarz criterion (Bayesian information criterion). The optimal number of clusters with the Schvarz criterion sets the specific condition of the system. However, this does not mean that the fault condition of the test system can be directly correlated to the specific condition from the cluster. No efforts to figure out the relationship theoretically or experimentally are ongoing. Yavuz and Guvenir [73] applied the kNN method on the feature projections (kNNFP) classifier for text categorization and compared the result with that of the kNN classifier. The experiment results in the text categorization showed that kNNFP outperformed k” in terms of classification accuracy. He and Wang [74] demonstrated that the kNN method can be used to cluster normal operation data acquired from semiconductor manufacturing processes and to automatically perform online fault detection without human intervention. The kNN can be an alternative to the traditional PCA-based method when considering nonlinearity in the batch process of a semiconductor manufacturing process. 3.6.2.3 Fuzzy C-Means Classifier The fuzzy C-means classifier is another unsupervised learning algorithm similar to the kNN classifier, but in it, one data point can belong to one or more clusters, rather than belonging completely just to one cluster. In other words, each data point belongs to each cluster with a degree of membership. The concept of degree of membership provides fuzzy C-means clustering with a probabilistic approach to the cluster boundaries. This is based on minimization of the following objective function [75]: (3.15)

Data-driven Approaches for PHM

67

(3.16)

(3.17) i=1

where uy is the degree of membership of x, in the cluster j , which is normalized and fuzzyfied with a real value m> 1. Then, x,is the ith of n-dimensional measured data, and c, is the centroid of the cluster weighted by the degrees of memberships. The value l/*/l is any norm expressing the similarity between any measured data point and the center. Fuzzy partitioning is carried out until the iterative optimization for the objective function. The iteration stops when (3.18)

where E is a termination criterion between 0 and 1 and k are the iteration steps. The objective of this procedure is similar to minimizing the objective function in k" and converges to the saddle value of J, at the end. The fuzzy C-means clustering sets up probabilistic boundaries among clusters rather than a distinct boundary, which is the basis of I<". This is important for fault detection in PHM. A probabilistic boundary approach provides some advantages that cannot be acquired with a distinct boundary. When the acquired data point from sensors is closely located near the boundary line, the judgment of system health is very difficult for most decision makers. In some cases, the process of system degradation is described in terms of normal, deteriorating, and failure. However, the system classified as the normal condition might be on the process of degradation. The fact is that the system degradation is always in progress and the judgment of system health can change. The application of the probabilistic boundary to the clustering process might overcome this difficulty. Osareh et al. [76] applied the fuzzy C-means clustering method to segment the color retinal image into homogeneous regions by enhancing the contrast between the object and background and then classifying exudate and nonexudate patches using a neural network. This approach showed 92% sensitivity and 82% specificity through experiments. Chen and Giger [77] also used the fuzzy C-means clustering algorithm to estimate the shading effect and to segment clinical breast magnetic resonance (MR) images. In each iteration, the bias field was estimated and smoothed by an iterative low-pass filter.

3.7

Summary

In this chapter, DD approaches were presented along with their selected references to their applications. ML based on statistical methods is suited for PHM because it draws its capability from mathematics, computer science, and engineering to actively learn about the system and its dynamics, faults, and failures. ML is also suitable because not only is it a data-driven approach that can process the increasing complexity of system information, but it also is a more general methodology that can adapt to changes. These changes can result from changes in the system itself, in its operating environment, or even in management and its mission statements or expectations.

Prognostics and Health Management of Electronics

68

With all the benefits that accompany the use of ML for PHM, there are nonetheless related concerns, exceptions, and difficulties. One of the primary concerns in using ML for PHM is the analysis and interpretation of the ML output results. These require proper preprocessing of the data, especially the training data, an essential step in ML, without which the remainder of the analysis is susceptible to the effects of noise, scaling, redundancy, masking, and other data-specific problems. Preprocessing of the training data is often specific to the type and size of the system data. Additionally, because optimization and search methods are often employed in ML, their computational complexity and tractability are critical for efficient and effective function. Lastly, it is important to keep in mind that the learning, and therefore usefulness, of these machines is only as good as the expectation of their envisioned purpose.

References 1.

2. 3. 4.

5. 6. 7. 8. 9.

10.

11.

12. 13.

W. Linping, M. Dan, G. Wen, and Z. Jianfeng, “A Proactive Fault-Detection Mechanism in Large-Scale Cluster Systems,” Proceedings of the 20th International Parallel and Distributed Processing Symposium, pp. 10, 25-29, April 2006. S. Wegerich and R. Pipke, “Residual Signal Alert Generation for Condition Monitoring Using Approximated SPRT Distribution,” United States, SMARTSIGNAL CORP., (US), patent number 6975962, December 2005. L. Lopez, “Advanced Electronic Prognostics through System Telemetry and Pattern Recognition Methods,” Microelectronics Reliability, Vol. 47, No. 12, Electronic System Prognostics and Health Management, pp. 1865-1 873, December 2007. J. Myung, “Tutorial on Maximum Likelihood Estimation,” Journal of Mathematical Psychology, Vol. 47, No. 1, pp. 90-100, February 2003. B. Lindgren, Statistical Theow, 4‘h Edition, Chapman & Hall, Nonvell, MA, 1998. E. Wilson, C. Lages, and R. Mah, “Gyro-Based Maximum-Likelihood Thruster Fault Detection and Identification,” Proceedings of the 2002 American Control Conference, Vol. 6, pp. 45254530, 2002. J. Lin, M. Zuo, and K. Fyfe, “Mechanical Fault Detection Based on the Wavelet DeNoising Technique,” Journal of Vibration and Acoustics, Vol. 126, No. 1, pp. 9-16, January 2004. J. Platt, “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods,” in Advances in Laree Machine Classifiers, MIT Press, Cambridge, MA, 1999. M. Davenport, R. Baraniuk, and C. Scott, “Controlling False Alarms with Support Vector Machines,” paper presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toulouse, France, Vol. 5 , pp, 586-592, 1 4 19, May 2006. M. Xiang, “Optimization of Distributed Detection Systems under Neyman-Pearson Criterion,” paper presented at the 9th International Conference on Information Fusion, Florence, Italy, pp. 1-6, 10-13 July 2006. J. Durham and N. Younan, “Neyman-Pearson Detector Design for Steady Point Targets with Known Phase Detection,” Proceedings of the Thirtieth Southeastern Symposium on System Theory, pp. 130-133, 1998. A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society: Series B, Vol. 39, No. 1, pp. 1-38, November 1977. L. Welch, “Hidden Markov Models and the Baum-Welch Algorithm,” IEEE Information Theory Society Newsletter, Vol. 53, No. 4, pp. 1, 10-13, December 2003.

Data-driven Approaches for PHM

69

14. G. Hamerly and C. Elkan, “Bayesian Approaches to Failure Prediction for Disk Drives,” pp. 202-209,2001. 15. C. Willis, “Mixture Models for Anomaly Detection in Hyperspectral Imagery,” Proceedings of the Societv of Photo-Optical Instrumentation Engineers, Military Remote Sensing, Vol. 5613, pp. 119-128,2004. 16. G. Hazel, “Multivariate Gaussian MRF for Multispectral Scene Segmentation and Anomaly Detection,” IEEE Transactions on Geoscience and Remote Sensing, Vol. 38, No. 3, pp. 1199-121 1, May 2000. 17. C. Somers, N. Dimopoulos, and S. Neville, “Cable Network Fault Detection Using Cable Modem Status Signals,” paper presented at the IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Victoria, BC, Canada, Vol. 1, pp. 422425, August 2003. 18. S. Samar, D. Gorinevsky, and P. Boyd, “Embedded Estimation of Fault Parameters in an Unmanned Aerial Vehicle,” paper presented at the IEEE International Conference on Control Applications, Munich, Germany, pp. 3265-3270, October 2006. 19. D. Stein, S. Beaven, L. Hoff, E. Winter, A. Schaum, and A. Stocker, “Anomaly Detection from Hyperspectral Imagery,” Vol. 19, No. 1, pp. 58-69, January 2002. 20. N. de Freitas, “Rao-Blackwellised Particle Filtering for Fault Diagnosis,” IEEE Aerospace Conference Proceedings, Vol. 4, pp. 1767-1772,2002. 21. J. Flores-Quintanilla, R. Morales-Menendez, R. Ramirez-Mendoza, L. Garza-Castanon, and F. Cantu-Ortiz, “Towards a New Fault Diagnosis System for Electric Machines Based on Dynamic Probabilistic Models,” Proceedings of the American Control Conference, Vol. 4, pp. 2775-2780, June 2005. 22. S. Schweizer and J. Moura, “Hyperspectral Imagery: Clutter Adaptation in Anomaly Detection,” IEEE Transactions on Information Theory, Vol. 46, No. 5, pp. 1855-1871, August 2000. 23. S. Kay, Fundamentals of Statistical Signal Processing: Estimation Theorv, Vol. 1, Prentice-Hall, Upper Saddle River, NJ, 2003. 24. A. Noiboar and I. Cohen, “Anomaly Detection in Three Dimensional Data Based on Gauss Markov Random Field Modeling,” Proceedinm of the 23rd IEEE Convention of Electrical and Electronics Engineers in Israel, pp. 448-45 1, September 2004. 25. T. Cover and P. Hart, “Nearest Neighbor Pattern Classification,” IEEE Transactions on Information Theory, Vol. 13, No. 1, pp. 21-27, January 1967. 26. Q. He and J. Wang, “Fault Detection Using the k-Nearest Neighbor Rule for Semiconductor Manufacturing Processes.” IEEE Transactions on Semiconductor Manufacturing, Vol. 20, No. 4, pp. 345-354, November 2007. 27. Y. Liao and V. R. Vemuri, “Use of K-Nearest Neighbor Classifier for Intrusion Detection,” Computers & Security, Vol. 21, No. 5, pp. 439448, October 2002. 28. Y. Li, B. Fang, L. Guo, and Y. Chen, “Network Anomaly Detection Based on TCMKNN Algorithm,” Proceedings of the 2nd ACM Svmposium on Information, Computer and Communications Security, 2007. 29. Y. Qian and A. Mita, ‘‘Structural Damage Identification Using Parzen-Window Approach and Neural Networks,” Structural Control and Health Monitoring, Vol. 14, NO. 4, pp. 576-590,2007, 30. S. Rippengill, K. Worden, K. Holford, and R. Pullin, “Automatic Classification of Acoustic Emission Patterns,” Strain,Vol. 39, Part 1, pp. 31-41, 2003. 31. D. Yeung and C. Chow. “Parzen-Window Network Intrusion Detectors,” Proceedings of the 16th International Conference on Pattern Recognition, pp. 385-388, 2002.

n ,

70

Prognostics and Health Management of Electronics

32. J. Murray, G. Hughes, and K. Kreutz-Delgado, “Hard Drive Failure Prediction Using Non-Parametric Statistical Methods,” Proceedings of the International Conference on Artificial Neural Networks and International Conference on Neural Information Processing, June 2003. 33. N. Eklund and K. Goebel, “Using Neural Networks and the Rank Permutation Transformation to Detect Abnormal Conditions in Aircraft Engines,” Proceedings of the 2005 IEEE Mid-Summer Workshop on Soft Computing in Industrial Applications, pp. 1-5, June 2005. 34. F. Xue, W. Yan, N. Roddy, and A. Varma, “Operational Data Based Anomaly Detection for Locomotive Diagnostics,” Proceedings of the International Conference on Machine Learning; Models, Technologies & Applications, pp. 236-24 1, June 2006. 35. R. Cottrell, C. Logg, M. Chhaparia, M. Gngonev, F. Haro, F. Nazir, and M. Sandford, “Evaluation of Techniques to Detect Significant Network Performance Problems Using End-to-End Active Network Measurements,” Proceedings of the 10th IEEEAFIP Network Operations and Management Symposium, pp. 85-94,2006. 36. L. Hall, D. Mba, and R. Bannister, “Acoustic Emission Signal Classification in Condition Monitoring Using the Kolmogorov-Smirnov Statistic,” Journal of Acoustic Emission, Vol. 19, pp. 209-228, 2001. 37. J. Caberera, B. Ravichandran, and R. Mehra, “Statistical Traffic Modeling for Network Intrusion Detection,” Proceedings of the 8th International Symposium on the Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 466473, 2000. 38. D. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, CRC Press, Boca Raton, FL, 1997, pp. 95-98. 39. R. Goonatilake, A. Herath, S. Herath, S. Herath, and J. Herath, “Intrusion Detection Using the Chi-square Goodness-of-Fit Test for Information Assurance, Network, Forensics and Software Security,” Journal of Computing in Small Colleges, Vol. 23, No. 1, pp. 255-263, October 2007. 40. H. Zhang, C. Chan, K. Cheung, and Y. Ye, “Fuzzy Artmap Neural Network and Its Application to Fault Diagnosis of Navigation Systems,” Automatica, Vol. 37, No. 7, pp. 1065-1070, July 2001. 41. N. Ye and Q. Chen, “An Anomaly Detection Technique Based on a Chi-square Statistic for Detecting Intrusions into Information Systems,” Quality and Reliability Engineering International, Vol. 17, pp. 105-1 12, 200 1. 42. E. Alpaydin, Introduction to Machine Learning, MIT Press, Cambridge, MA, 2004, pp. 10-11. 43. C. Bishop, Pattern Recognition and Machine Learning, Springer, New York, 2006, pp. 1-4. 44. R. Raina, Y. Shen, A. Ng, and A. McCallum, “Classification with Hybrid Generative/Discriminative Models,” Advances in Neural Information Processing Systems, Vol. 16, 2003. 45. T. Jebara, “Discriminative, Generative and Imitative Learning,” Ph.D. Dissertation, Media Laboratory, MIT, Cambridge, MA, 2001. 46. A. Ng and M. Jordan, “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes,” Advances in Neural Information Processing Systems, Vol. 14,2002. 47. B. Epstein, M. Czigler, and R. Miller, “Fault Detection and Classification in Linear Integrated Circuits: An Application of Discrimination Analysis and Hypothesis

Data-driven Approaches for PHM

48. 49. 50. 51. 52. 53. 54.

55. 56.

57. 58. 59. 60. 61. 62. 63. 64.

71

Testing,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 12, No. 1, pp. 102-1 13, 1993. H. Yoshida and J. Nappi, “Three Dimensional Computer Aided Diagnosis Scheme for Detection of Colonic Polyps,” IEEE Transactions on Medical Imaging, Vol. 20, No. 12, pp. 1261-1274,2001. B. Goodlin, D. Boning, H. Sawin, and B. Wise, “Simultaneous Fault Detection and Classification for Semiconductor Manufacturing Tools,” Journal of the Electrochemical Society, Vol. 150, pp. G778-G784,2003. M. Markou and S. Singh, “Novelty Detection: A Review-Part 2: Neural Network Based Approaches,” SinnalProcessinrr, Vol. 83, pp. 2499-2521,2003. C. Bishop, Pattern Recognition and Machine Learning, Springer, New York, 2006, pp. 225-284. T. Petsche, A. Marcantonio, C. Darken, S. Hanson, G. Kuhn, and I. Santoso, “A Neural Network Autoassociator for Induction Motor Failure Prediction,” Advances in Neural Infirmation Processing Systems, Vol. 8, pp. 924-930, 1996. T. Sarmiento, S. Hong, and G. May, “Fault Detection in Reactive Ion Etching Systems Using One-Class Support Vector Machines,” paper presented at the Advanced Semiconductor Manufacturing Conference and Workshop, pp. 139-142,2005. S. Poyhonen, M. Negrea, P. Jover, A. Arkkio, and H. Hyotyniemi, “Numerical Magnetic Field Analysis and Signal Processing for Fault Diagnostics of Electrical Machines,” International Journal for Computation and Mathematics in Electronic Engineering, Vol. 22, No. 4, pp. 969-981,2003. M. Chen, A. Zheng, J. Lloyd, M. Jordan, and E. Brewer, “Failure Diagnosis Using Decision Trees,” Proceedings of the International Conference on Automatic Computing, pp. 3 6 4 3 , 2004. G. Stein, B. Chen, A. Wu, and K. Hua, “Decision Tree Classifier for Network Intrusion Detection with GA-based Feature Selection,” paper presented at the ACM Southeast Regional Conference, Proceedings of the 43rd annual Southeast Regional Conference, Kennesaw, GA, pp. 136-141,2005. P. Lucas, L. van der Gaag, and A. Abu-Hanna, “Bayesian Networks in Biomedicine and Health Care,” Artificial Intelligence in Medicine, Vol. 30, pp. 201-214,2004. B. Lerner, “Bayesian Fluorescence In situ Hybridisation Signal Classification,” Artificial Intelligence in Medicine, Vol. 30, pp. 301-316,2004. C. Bishop, Pattern Recognition and Machine Learning, Springer, New York, 2006, pp. 6 10-63 5. P. Smyth, “Hidden Markov Models and Neural Network for Fault Detection in Dynamic Systems,” Proceedings of the IEEE-SP Workshop-Neural Networks for SinnalProcessina, Vol. 3, pp. 582-592, 1993. Y. Bouzida, F. Cuppens, N. Cuppens-Boulahia, and S. Gombault, “Efficient Intrusion Detection Using Principal Component Analysis,” paper presented at the 3rd Conference on SAR, La Londe, France, 2004. S. Zhou, J. Zhang, and S. Wang, “Fault Diagnosis in Industrial Processes Using Principal Component Analysis and Hidden Markov Model,” Proceedings of the 2004 American Control Conference, 2004. J. Liang and N. Wang, “Faults Diagnosis in Industrial Reheating Furnace Using Principal Component Analysis,” paper presented at the IEEE International Conference on Neural Networks & Signal Processing, Nanjing, China, 2003. T. Lee, Independent Component Analysis: Theory and Application, Kluwer Academic, Dordrecht, Netherlands, 1998.

12

Prognostics and Health Management of Electronics

65. T. Jung, S. Makeig, M. Mckeown, A. Bell, T. Lee, and T. Sejnowski, “Imaging Brain Dynamics Using Independent Component Analysis,” Proceedings of the IEEE, Vol. 89, pp. 1107-1 122,2001. 66. P. Gao, L. Khor, W. Woo, and S. Dlay, “Extraction of Unique Independent Components for Nonlinear Mixture of Sources,” Journal of Computers, Vol. 2, No. 6, pp. 9-16,2007. 67. L. Xu, D. Wilkinson, F. Southey, and D. Schuurman, “Discriminative Unsupervised Learning of Structured Predictors,” Proceedings of the 23rd International Conference ,2006. 68. L. Xu and D. Schuurmans, “Unsupervised and Semi-supervised Multi-Class Support Vector Machines,” Proceedings of the National Conference on Artificial Intelligence, 2005. 69. B. Saha, K. Goebel, S. Poll, and J. Christophersen, “A Bayesian Framework for Remaining Useful Life Estimation,” paper presented at the AAAI Fall Symposium, Arlington, VA, 2007. 70. N. de Freitas, “Rao-Blackwellised Particle Filtering for Fault Diagnosis,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, Vol. 4, pp. 1767-1772, 2002. 7 1. E. Alpaydin, , MIT Press, Cambridge, MA, 2004, pp. 146- 148. 72. A. Virmani, J. Tsou, K. Siegmund, L. Shen, T. Long, P. Laird, A. Gazdar, and I. Laird-Offringa, “Hierarchical Clustering of Lung Cancer Cell Lines Using DNA, Methylation Markers, Cancer Epidemiology,” Biomarkers & Prevention, Vol. 11, pp. 291-297,2002. 73. T. Yavuz and H. Guvenir, “Application of k-Nearest Neighbor on Feature Projections Classifier to Text Categorization,” Proceedings of the 13th International Symposium on Computer and Information Sciences, 1999. 74. Q. He and J. Wang, “Fault Detection Using the k-Nearest Neighbor Rule for Semiconductor Manufacturing Processes,” IEEE Transactions on Semiconductor ManufacturinR, Vol. 20, No. 4, pp. 345-354, November 2007. 75. J. Bezdek, J. Keller, R. Krishnapuram, and N. Pal, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, Springer, New York, 1999. 76. A. Osareh, M. Mirmehdi, B. Thomas, and R. Markham, “Automatic Recognition of Exudative Maculopathy Using Fuzzy C-Means Clustering and Neural Networks,” Medical Image Understanding and Analysis, pp. 49-52, July 2001. 77. W. Chen and M. Giger, “A Fuzzy C-means (FCM) Based Algorithm for Intensity Inhomogeneity Correction and Segmentation of MR Images,” paper presented at the IEEE International Symposium on Biomedical Imaging: Nan0 to Macro, Arlington, VA, Vol. 2, pp. 1307-1 3 10,2004.

Chapter 4 Physics-of-Failure Approach to PHM Physics of failure is an approach that utilizes knowledge of a product’s life-cycle loading and failure mechanisms to assess product reliability. PoF methodology is based on the identification of potential failure mechanisms and failure sites for a device, product, or system. A failure mechanism is described by the relationship between the stresses and variabilities at potential failure sites. The methodology proactively assesses reliability by establishing a scientific basis for evaluating new materials, structures, and technologies. PoF-based prognostics permits the assessment and prediction of system reliability under its actual application conditions. It integrates sensor data with models that enable in situ identification of the deviation or degradation of a product from an expected normal operating condition (i.e., the system’s “health”) and the prediction of the future state of reliability. This chapter overviews the PoF approach to PHM.

4.1

PoF-Based PHM Methodology

The general PoF-based PHM methodology is shown in Figure 4.1 [l]. Physics of failure methodology and software have been developed by CALCE at the University of Maryland with the support of industry, the government, and other universities.

Figure 4.1: PoF-based PHM approach. Prognostics and Health Management ofElectronics. By Michael G. Pecht Copyright C 2008 John Wiley & Sons, Inc.

13

Prognostics and Health Management of Electronics

74

The first step involves a FMMEA, where design data, expected life-cycle conditions, and PoF models are the inputs for assessment. Then it is possible to prioritize the critical failure modes and failure mechanisms to select monitoring parameters and sensor locations for PHM. Based on the collected operational and environmental data, the health status of the products can be assessed. The amount of damage can then be calculated from the PoF models to obtain the remaining life.

4.2

Hardware Configuration

To realize reliability assessment goals, the PoF methodology needs certain inputs at the start. Inputs include the product hardware configuration at all levels (i.e., parts to systems), loads, and failure models. Product architecture is the breakdown (classification) of a product to physical elements. A complex product usually consists of a number of elements working together to ensure the overall function of the product. The architecture describes the elements of the product, the function of each element, the function relationships, and the assembly relationships [2]. For the discussion here, six prognostic levels for electronics have been defined [3]. Level 0 includes the chip and on-chip sites, such as circuits, and metallization. Level 1 includes parts and components as well as the wirebonds, lead frames, and encapsulants comprising the component. This level includes integrated circuits and discrete components such as resistors, capacitors, and inductors. Level 2 includes the circuit board and interconnects (leads, solder balls, etc.) connecting the components to the circuit card. This level also includes sites on the circuit board such as pads, plated-through holes, vias, and traces. Level 3 includes the enclosure, chassis, drawer, and connections for circuit cards. This level includes products or subsystems, such as hard drives, video cards, and power supplies. Level 4 includes electronic products, such as a notebook computer, single LRU, and connections. Level 5 includes electronic systems and external connections between different systems (e.g., the connection from the computer to the printer or LRU and the cockpit display). A system of systems is included in this level. Along with the geometries, materials used in the product will affect the response of a product to external and internal stresses. The material properties are used as inputs of PoF-based failure models to compute the time to failure for a particular failure site and failure mechanism.

4.3

Loads

In order to evaluate the reliability of a product, loads must be considered since they determine the useful life of products. The product life cycle includes manufacturing, assembling, storing, handling, transporting, and operating conditions. During each step of their life cycle, products experience loads from the environment, including temperature, air pressure, moisture, vibration, mechanical stress, chemical reactions, radiation, and so on. All of these loads may accumulate damage (either detectable or undetectable) in products, thus affecting their remaining life. A product is inevitably exposed to one or more environmental loads, including thermal loads (such as temperature), mechanical loads (such as pressure), chemical loads (such as corrosive etching), magnetic loads (such as from a magnetic field), radiation loads (such as from cosmic radiation), and so on. Any of these environmental loads can cause stress to the product. Under different environmental conditions, different loads may dominate the stress caused in the product, while other loads may not be critical or may be neglectable. For example, under many conditions, radiation need not be taken into consideration, since the radiation level is low enough to be ignored except during space flight.

Physics-of-Failure Approach to PHM

75

Different products or components may also have different sensitivities to a load because of differences in their material properties or protection strategies. Thus, a load that is critical to one product may be neglectable to another. For example, a mechanical shock caused by dropping a cell phone onto the hard ground may not be comparable to the shock caused by dropping an open laptop computer. Operational loads arise during the use of products. These loads include thermal, mechanical, chemical, magnetic, electrical, as well as other loads. For example, under certain operational conditions, a product may generate heat due to electrical work, mechanical work, or chemical reactions. Due to the coefficient of thermal expansion (CTE) mismatches of different materials, mechanical stress can also be produced by temperature changes. Mechanical thermal stress may even be produced within a homogeneous part due to its internal temperature gradient.

4.4

Failure Modes, Mechanisms, and Effects Analysis

FMMEA is a systematic methodology to identify potential failure mechanisms and models for all potential failure modes and to prioritize failure mechanisms. FMMEA utilizes the basic steps in developing a traditional FMEA in combination with knowledge of physics of failure. FMMEA uses application conditions to assess the active stresses and select the potential failure mechanisms. Knowledge of stresses is combined with failure models to prioritize failure mechanisms according to their severity and likelihood of occurrence. FMMEA was developed by CALCE to address the weaknesses in the traditional failure modes, effects analysis (FMEA) [4-71 and failure modes, effects, and criticality analysis (FMECA) [8], [9] processes. FMMEA is based on understanding the relationship between product requirements and the physical characteristics of a product (and its variations in the production process), the interactions of product materials with loads (stresses due to application conditions), and their influence on the product’s susceptibility to failure with respect to use conditions. This involves identifying the failure mechanisms and reliability models to quantitatively evaluate the susceptibility to failure. FMMEA combines life-cycle environmental and operating conditions and the duration of the intended application with knowledge of the active stresses and potential failure mechanisms. A FMMEA process begins by defining the system to be analyzed, which is understood as a composite of subsystems or levels that are integrated to achieve a specific objective. The system is divided into various subsystems or levels, continuing to the lowest possible level-the component or element. All of the associated functions are listed for each element. This list of functions is necessary since failure is defined as the loss of these functions. A failure mode is the effect by which a failure is observed to occur. It can also be defined as the way in which a component, subsystem, or system could fail to meet or deliver its intended function. All possible failure modes for each identified element should be listed. Potential failure modes may be identified using numerical stress analysis, accelerated tests to failure (e.g., HALT), past experience, and engineering judgment. If a mode can only be identified during the initial inspection, it is not a failure mode to be considered in FMMEA. The failure mode needs to be directly observable by visual inspection, electrical measurement, or other tests and measurements. The failure mode identification should not imply a cause or mechanism. Severity is the seriousness of the effect of the failure and it can be assigned to each mode for each site. The assignment of severity levels is based on the design and functionality of the item, past experience, and engineering judgment.

Prognostics and Health Management of Electronics

76

A failure cause is defined as the specific process, design, and/or environmental condition that initiated the failure. Knowledge of potential failure causes can help identify the failure mechanisms driving the failure modes for a given element. Causes are identified by brain storming in the FMMEA group. One method of looking for causes is to review the life-cycle environmental profile (LCEP) item by item to evaluate if any of the items there can cause failure. The cause field in an analysis can be used to describe all the possible alternatives. Potential failure mechanisms are determined based on appropriate available mechanisms corresponding to the material system, stresses, failure mode, and cause. Failure mechanisms are categorized as either overstress or wear-out mechanisms. Information on life-cycle conditions can be used to eliminate failure mechanisms that may not occur under the given application conditions. Care should be taken not to mix mechanisms with sites, modes, or causes. When the failure mechanism is not identified, it is better to record it as “unknown” or “not yet determined” rather than make a wrong or uninformed decision. Failure models help quantify failure by determining the time to failure or likelihood of a failure for a given geometry, material construction, environment, and operational condition. For overstress mechanisms, failure models can offer stress analysis to estimate whether the product will fail under the given conditions. For wear-out mechanisms, failure models use both stress and damage analysis to quantify the damage accumulated in a product. FMMEA implementation for different electronic products and monitored data is summarized in Table 4.1 [3]. When using the fuse/canary devices PHM approach, the geometries or material properties of the canaries can be scaled to accelerate the failure under user conditions, based on potential failure mechanisms. When using the modeling of stress and damage approach, environmental and usage load profiles are captured using sensors. Sensor data is then converted into a format that can be used in the failure models.

Table 4.1: PoF-Based PHM for Different Electronic Products

I

Monitored Product Circuit on the semiconductor [ 101 Circuit board [ 1 11

Potential Failure Mode TDDB, electromigration

Die wear-out

Prognostics Approach

Data Monitored I Analyzed

Fuse and canary device Fuse and canary device

Current density Temperature I

Interconnect thermal or vibration fatigue Solder joint thermal fatigue failure

Fuse and canary device

Temperature and acceleration

Monitoring environmental and usage loading

Temperature profile

PCB under hood in the automobile [15-161

Solder joint thermal or vibration fatigue

Monitoring environmental and usage loading

Temperature, acceleration

End-effector electronics unit inside the space shuttle robot arm

Solder joint thermal or vibration fatigue

Monitoring environmental and usage loading

Temperature, acceleration

Power supply [12-141

ri7i

Physics-of-Failure Approach to PHM

I

Monitored Product

1

Circuit cards inside a rocket booster [ 181

Notebook and desktop [ 19-20]

17

Potential Failure Mode Mechanism Thermal fatigue and vibration fatigue of electronic parts Vibration fatigue of connection NIA

Prognostics Approach

Data Monitored I Analyzed

Monitoring environmental and usage loading

Temperature, acceleration

Monitoring environmental and usage loading Monitoring environmental and usage loading Monitoring

Acceleration Temperature near the CPU Temperature of the motherboard

NIA

NIA

I

Refrigerator [21-221 Monitoring NIA

and usage loading

I Game console [2 1-22] Monitoring NIA

Total run time, compress run time, door opening time, compressor cycles, defrost cycles, power on /iff cvcies

1 heat Ambient temperature, sink temperature,

and usage loading

1

humidity, spike of the voltage, rotation speed of CD, product orientation

NIA, NIA, not not available. available. During the life cycle of a product, several failure mechanisms may be activated by different environmental and operational parameters acting at various stress levels, though in general only a few operational and environmental parameters and failure mechanisms are responsible for the majority of failures. High-priority failure mechanisms determine the operational stresses and environmental and operational parameters that must be accounted for in the design or that must be controlled. High-priority mechanisms are those with high combinations of occurrence and severity. The prioritization of failure mechanisms provides an opportunity for effective utilization of resources. Figure 4.2 shows the methodology for prioritizing failure mechanisms. The risk priority number (RPN) of the failure modes and mechanisms can be calculated. In failure modes analysis, the RPN is the product of severity, occurrence, and detection. Severity describes the seriousness of the effect of failure for the customer. Occurrence describes how frequently the failure mode is projected to occur as a result of a specific cause. For manufacturers, detection is the ability to detect problems or the possible cause of defects, including external failures, before they reach customers. For customers, detection is their ability to spot the initiation of a failure before it malfunctions. Typically, these are rated on a scale from the level of highest impact on reliability to the lowest [23]. In the analysis of failure mechanisms, the RF” only includes the severity and

Prognostics and Health Management of Electronics

78

occurrence, since the failure mechanisms are not detectable [23]. A prioritized assessment of failure modes and mechanisms and the environmental conditions that affect the modes and mechanisms need to be established to ensure that the appropriate data is collected and utilized for prognostics. Potential failure mechanisms

1 Evaluate failure susceptibility and assign occurrence I

L Look up severity level

Prioritize mechanisms

High risk

Medium risk

Low risk

Figure 4.2: Prioritization of failure mechanisms. The LCEP is used for evaluating failure susceptibility. If certain environmental and operational conditions are nonexistent or generate a very low level stress, the failure mechanisms that are exclusively dependent on those environmental and operational conditions are assigned low occurrence. For overstress mechanisms, failure susceptibility is evaluated by conducting a stress analysis to determine if the failure occurs under the given environmental and operational conditions. For wear-out mechanisms, failure susceptibility is evaluated by determining the time to failure (TTF) under the given LCEP. The levels are assigned based on benchmarking the individual failure mechanism’s TTF with expected product life, past experience, and engineering judgment. In cases where no failure models are available, the evaluation is based solely on past experience and engineering judgment. Severity ratings are obtained from the failure modes associated with the mechanism; there can be more than one mode for a mechanism. The high-priority failure mechanisms are the critical ones. In enumerating these, each mechanism will have one or more associated sites, modes, and causes. This information can be used to help determine what to monitor, where to monitor it, and how to react to the results of monitoring.

4.5

Stress Analysis

Stress analysis is necessary to determine the severity of a failure mechanism. The stress analysis to be conducted depends on the loads and the architecture of the product. The stress level and severity resulting from different loads need to be estimated. The same type of stress resulting from different loads may be considered together. For example, the temperature of a particular component may result from a combination of the environment temperature and the heat generated within the component during operation. Thermal analysis is used to determine the temperature distribution on a particular component or on the whole body of a product. For a printed wired board (PWB) in an

Physics-of-Failure Approach to PHM

79

electronic product, thermal analysis will output the temperature of the board layer, component junctions, and component cases based on the heat generated by the components and the environment temperature. Thermal analysis involves the solving of heat transfer equations for three fundamental modes of heat transfer: conduction, convection, and radiation. In most cases, steady-state temperature results are sufficient to assess failure conditions; for example, a temperature cycle can be adequately defined by only determining the steady state conditions for the high and low ends. An evaluation of the heat capacity of the structure on the PWB is necessary to determine if transition will play a significant role. Vibration analysis can be used to determine the response of the PWB due to the random oscillating motion of the structure that contains the PWB. While calculating the natural frequency of a PWB, the boundary conditions are critical. Classical boundary conditions are free, simple, or clamped. The natural frequency of a circuit card assembly (CCA) can be experimentally or numerically determined. Experimental determination requires the placement of a strain gauge or accelerometer on the CCA, attaching the CCA to a dynamic shaker, and measuring the response of the CCA to a known input. Numerically the natural frequency of the PWB can be determined using first-order approximations or finite element modeling. There are numerous other kinds of stress analysis not mentioned here. Based on the loads that the products are experiencing, certain kinds of stress analysis will be more appropriate to calculate stress from load conditions.

4.6

Reliability Assessment and Remaining-Life Predictions

Based on the stress level and severity determined by stress analysis, the architecture, the material properties, and the life-cycle profile of the products, reliability assessment is conducted by calculating the time to failure for dominant failure mechanisms at particular sites based on PoF models. The failure identification step involves using the geometry and material properties of a product, together with the measured life-cycle loads acting on the product, to identify the potential failure modes, mechanisms, and failure sites in the product. This task is usually only conducted on the itemdparts that are new to the product. In many cases, it is only necessary to evaluate such new itemsiparts. In any case, a virtual qualification is then performed to identify and rank the potential failure mechanisms. Failure is defined as the inability of a system to perform its intended function. Failure mechanisms are the processes by which a material or system degrades and eventually fails. Three basic categories of failure are overstress (i.e., stress strength), wear-out (i.e., damage accumulation), and performance tolerance (i.e., excessive propagation delays). Failures can be broadly categorized by the nature of the loads-mechanical, thermal, electrical, radiation, or chemical-that triggers or accelerates the mechanism. Mechanical failures, for example, can result from elastic or plastic deformation, buckling, brittle or ductile fracture, interfacial separation, fatigue-crack initiation and propagation, creep, and creep rupture. Thermal failures can arise when a product is operating outside its thermal performance specifications by being heated beyond its critical temperature (such as the glass transition temperature, melting point, or flash point) or by severe changes in the temperature. Electrical failure of an electronic product can be due to electrostatic discharge, dielectric breakdown, junction breakdown, hot electron injection, surface and bulk trapping, surface breakdown, and electromigration. Radiation failures are caused principally by uranium and thorium contaminants and secondary cosmic rays. Chemical failures arise in environments that accelerate corrosion, oxidation, and ionic surface dendritic growth.

80

Prognostics and Health Management of Electronics

Different failure mechanisms may be triggered or accelerated by different loads. For example, mechanical failures can result from elastic or plastic deformation, buckling, brittle or ductile fracture, interfacial separation, fatigue crack initiation and propagation, creep, and creep rupture. Thermal failures can arise from operating a component outside its thermal performance specifications, from heating a component beyond its critical temperature (such as the glass transition temperature, melting point, or flash point), or by severely changing the temperature. Electrical failures include those due to electrostatic discharge, dielectric breakdown, junction breakdown, hot electron injection, surface and bulk trapping, surface breakdown, and electromigration. Radiation failures are caused principally by uranium and thorium contaminants and secondary cosmic rays. Chemical failures arise in environments that accelerate corrosion, oxidation, and ionic surface dendritic growth. Different loads can also interact to cause failures. For example, a thermal load can trigger mechanical failure because of a thermal expansion mismatch. Other interactive failure mechanisms include stress-assisted corrosion, stress-corrosion cracking, field induced metal migration, and temperature acceleration of chemical reactions. Models are used to predict product reliability in the field application. The model should (1) provide repeatable results, (2) reflect the bariables and interactions that are causing failures, and (3) predict the reliability of a product over the entire domain of its application conditions [24]. In PoF models, the stresses and the various stress parameters and their relationships to materials, geometry, and product life are considered. For electronic products, there are many PoF models describing the behavior of components, such as printed circuit boards, interconnections, and metallizations under various conditions, such as temperature cycling, vibration, humidity, and corrosion. For example, PoF models used to calculate the damage caused by temperature and vibration loadings are summarized in Figure 4.3.

Figure 4.3: Damage calculation approach for temperature and vibration data Damage caused by temperature can be calculated in time domain using Coffin Manson’s model. This approach has been demonstrated in the work by Ramakrishnan and Pecht [15] and Cluff et al. [25]. Damage caused by vibration can be calculated in both the time and the frequency domains. The time domain, which has been demonstrated by Gu et al. [26], can use Basquin’s model. The frequency domain, which has been demonstrated by Ramakrishnan and Pecht [15], can use the Steinberg’s model [27]. In some cases, models need to be developed. This is usually accomplished by using a series of statistically designed

81

Physics-of-Failure Approach to PHM

experiments. Typical failure models and mechanisms for electronics are summarized in Table 4.2 [28].

Table 4.2: Failure Mechanisms, Relevant Loads, and Models in Electronics Failure Sites

Fatigue Corrosion Electromigration Conductive filament formation Stress-driven diffuison voiding Time-dependant dielectric breakdown cyclic range ; gradient ;

Relevant Loads

Failure Models

AT, T,,,,, dT/dt, dwell time, AH, AV

Nonlinear power law (CoffinManson, Basquin)

M, AV, T T, J

Eyring (Howard) Eyring (Black)

M, V V

Power law (Rudra)

Metal traces

S. T

Eyring (Okabayashi)

Dielectric layers

v,T

Arrhenius (FowlerNordheim)

V: voltage ; M: moisture ;

T: temperature ; J: current density;

S: stress; H: humidity

Die attach, wirebondITAB, solder leads, bond pads, traces, viasiPTHs, interfaces Metallizations Metallization Between metallization

Reliability information can be used to assess whether the product will survive for its designated life. If the time to failure for the mechanism with the lowest time is less than the desired mission life, then the sensitivity of the failure mechanism to design parameters can be iteratively evaluated until system reliability goals are met. Since the PoF-based PHM approach collects life-cycle loads in real time, it is possible to make continuously updated predictions based on the actual environmental and operational conditions. Based on the knowledge of the product degradation mechanisms, appropriate health monitoring systems can be developed from the beginning of the life cycle, which corresponds to the manufacturing phase for product, to system failure. Whereas diagnostic systems can be implemented from the beginning of the product life cycle, the domain of a prognostic system typically only commences after a fault or defect condition has been observed. The remaining life of the product can be calculated from the beginning of the product life cycle and continues to assess the degradation of the product by monitoring its life cycle environment in order to provide an estimate of the remaining life in the application environment. At each time period, damage of the product can be calculated from various stresses, which are caused by environmental or operational loads. Then the damage accumulation can be performed for a certain period. At last, the remaining life can be calculated based on accumulated damage. As a case study, in the experiment of Ramakrishnan and Pecht, [ 151 and Mishra et al. [ 161, the test vehicle consisted of an electronic component-board assembly placed under the hood of an automobile and subjected to normal driving conditions in the Washington, DC, area. The test board incorporated eight surface-mount leadless inductors soldered onto an FR-4 substrate using eutectic tin-lead solder. It comprises six steps to estimate the remaining life of an electronic product: (i) FMMEA, (ii) virtual reliability assessment, (iii) monitoring of the appropriate product parameters, (iv) simplification of the monitored data, (v) stress and damage accumulation analysis, and (vi) remaining-life estimation. FMMEA and virtual reliability assessment have been incorporated in the present improved methodology to

Prognostics and Health Management of Electronics

82

determine the dominant failure mechanism in a given life-cycle environment and the corresponding environmental and operational parameters. In addition, step (vi) has also been incorporated to determine the product remaining life based on the accumulated damage information. The remaining life of the test board, estimated by PoF-based PHM, is compared in Figure 4.4 with estimates obtained using similarity analysis, and the actual measured life. As shown in the figure, the remaining life estimated by either similarity analysis differs significantly from the actual life of the board, whereas the remaining life estimated by PoF-based PHM is in excellent agreement with actual life. The discrepancies between similarity analysis estimate and actual life are attributed to the fact that it does not account for the accident that the car experienced on day 22. PoF-based PHM could account for this unforeseen event since the operating environment was being monitored in situ.

0

5

10

15

20

25

30

35

40

45

50

Time in Use (days) Figure 4.4: Remaining life estimation of test board

4.7

Outputs from PoF-Based PHM

The PoF-based PHM methodology provides outputs that can be used to (1) provide advance warning of failures; ( 2 ) minimize unscheduled maintenance, extend maintenance cycles, and maintain effectiveness through timely repairs; (3) reduce the life-cycle cost of equipment by decreasing inspection costs, downtime, and inventory; and (4) improve qualification and assist in the design and logistical support of fielded and future systems. Compared with the data-driven approach for PHM, the PoF approach has certain advantages in both new and legacy systems, which are often difficult to use in the data driven method since there is little data available for training the algorithm. However, the PoF model can still be used when the material properties and the structure geometries of products are available. Virtual qualification, which is first step of PoF-based PHM, can also be used to evaluate new materials and structures. Therefore it reduces the design margin, which is important for time to market. For legacy systems, the PoF-based PHM first utilizes all available information (such as previous loading conditions, maintenance records, and so on) to assess the health status of the legacy system. Then it calibrates the health status using individual unit data so that an assessment of an individual legacy systems’ health can be derived. After that, it uses sensors

Physics-of-Failure Approach to PHM

83

and prognostic algorithms to update the health status on a continual basis to provide the most up-to-date prognosis of the system [29]. PoF-based PHM also has an advantage in reliability prediction under storage conditions The limitation for the data-driven approach is that it can only detect failure when near the failure point. Thus, it is difficult to assess the remaining life from the beginning or middle of storage. In addition, it is impossible to measure a product’s performance or other data from the product directly since the product is not in use. The PoF approach measures the environmental loads (such as temperature, vibration, and humidity) in situ, and the load profiles can be used in conjunction with damage models to assess the degradation due to cumulative load exposures.

References 1. J. Gu and M. Pecht, “Prognostics and Health Management Using Physics-of-Failure”, 54‘hAnnual Reliability & Maintainability Symposium, Las Vegas, NV, 2008. 2. S. Fixson, “Product Architecture Assessment: A Tool to Link Product, Process, and Supply Chain Design Decisions,” Journal of Operations Management, Vol. 23, No. 3 4 , pp. 345-369,2005. 3. J. Gu, N. Vichare, T. Tracy, and M. Pecht, “Prognostics Implementation Methods for Electronics,” Proceeding of the Annual Reliability and Maintainability Symposium, pp. 101-106,2007, 4. J. Bowles, “Fundamentals of Failure Modes and Effects Analysis,” Tutorial Notes of the Annual Reliability and Maintainability Symposium, 2003. 5. J. Coutinho, “Failure-Effect Analysis,” Transactions of the New York Academy of Sciences, Vol. 26, pp. 564-585, 1964. 6. C. Kara-Zaitri, A. Keller, and P. Fleming, “A Smart Failure Mode and Effect Analysis Package,” Proceedings of the Annual Reliability and Maintainability Symposium, pp. 414421, 1992. 7. Guidelines for Failure Mode and Effects Analysis for Automotive, Aerospace and General Manufacturing Industries, Dyadem Press, Ontario, 2003. 8. “Failure Mode and Effect Analysis,” Electronic Industries Association G-41 Committee on Reliability Bulletin, No. 9, November 1971. 9. United States Department of Defense, “Procedures for Performing a Failure Mode Effects and Criticality Analysis,” US Mil-Std- 1629 (ships), November 1, 1974; US MilStd-l629A, November 24, 1980; US Mil-Std-l629A/Notice 2, November 28, 1984. 10. S. Mishra and M. Pecht, “In Situ Sensors for Product Reliability Monitoring,” Proceedings of SPIE, Vol. 4755, pp. 10-19,2002. 11. N. Anderson and R. Wilcoxon, “Framework for Prognostics of Electronic Systems,” Proceedings of International Military and Aerospace/Avionics COTS Conference, Seattle, WA, August 3-5, 2004. 12. R. Orsagh, D. Brown, M. Roemer, T. Dabney, and A. Hess, “Prognostic Health Management for Avionics System Power Supplies,” Proceedings of IEEE Aerospace Conference, Big Sky, MT, pp. 3585-3591, March 2005. 13. D. Goodman, B. Vermeire, P. Spuhler, and H. Venkatramani, “Practical Application of PHMiPrognostics to COTS Power Converters,” Proceedings of IEEE Aerospace Conference, Big Sky, MT, pp. 3573-3578, March 2005. 14. L. Nasser and M. Curtin, “Electronics Reliability Prognosis Through Material Modeling and Simulation,” Proceedings of IEEE Aerospace Conference, Big Sky, MT, March 2006.

84

Prognostics and Health Management of Electronics

15. A. Ramakrishnan and M. Pecht, “Life Consumption Monitoring Methodology for Electronic Systems,” IEEE Transactions on Components and Packaging technologies, Vol. 26, No. 3, pp. 625-634, September 2003. 16. S. Mishra, M. Pecht, T. Smith, I. McNee, and R. Harris, “Remaining Life Prediction of Electronic Products Using Life Consumption Monitoring Approach,” Proceedings of the European Microelectronics Packaging and Interconnection Symposium, Cracow, pp. 136-142, June 16-18,2002. 17. V. Shetty, D. Das, M. Pecht, D. Hiemstra, and S. Martin, “Remaining Life Assessment of Shuttle Remote Manipulator System End Effector,” Proceedings of the 22nd Space Simulation Conference, Ellicott City, MD, October 21 -23, 2002. 18. S. Mathew, D. Das, M. Osterman, M. Pecht, and R. Ferebee, “Prognostic Assessment of Aluminum Support Structure on a Printed Circuit Board,” ASME Journal of Electronic Packaging, Vol. 128, No. 4, pp. 339-345, December 2006. 19. D. Searls, T. Dishongh, and P. Dujari, “A Strategy for Enabling Data Driven Product Decisions through a Comprehensive Understanding of the Usage Environment,” Proceedings of IPACK’OI Conference, Kauai, HI, pp. 1279-1284, July 8-13,2001. 20. N. Vichare, P. Rodgers, V. Eveloy, and M. Pecht, “In-Situ Temperature Measurement of a Notebook Computer-A Case Study in Health and Usage Monitoring of Electronics,” IEEE Transactions on Device and Materials Reliability, Vol. 4, No. 4, pp. 658-663, December 2004. 2 1 . K. Bodenhoefer, “Environmental Life Cycle Information Management and Acquisition-First Experiences and Results from Field Trials,” Proceedings of Electronics Goes Green 2004+, Berlin, pp. 541-546, September 54,2004. 22. ELIMA Report; “D-19 Final Report on ELIMA Prospects and Wider Potential for Exploitation,” April 30, 2005, available: www.ELIMA.org, accessed December 2005. 23. D. Das, M. Azarian, and M. Pecht, “Failure Modes, Mechanisms, and Effects Analysis (FMMEA) for Automotive Electronics,” paper presented at the 1 1th Annual AEC Workshop, Indianapolis, IN, May 9-1 1,2006. 24. P. Viswanadham and P. Singh, Failure Modes and Mechanisms in Electronic Packages, Chapman & Hall, New York, pp. 283-285, 1998. 25. K. Cluff, D. Robbins, T. Edwards, and D. Barker, “Characterizing the Commercial Avionics Thermal Environment for Field Reliability Assessment,” Journal of the Institute of Environmental Sciences, Vol. 40, No. 4, pp. 22-28, 1997. 26. J. Gu, D. Barker, and M. Pecht, “Prognostics Implementation of Electronics under Vibration Loading,” Microelectronics Reliability, Vol. 47, No. 12, pp. 1849-1 856, December 2007. 27. D. Steinberg, Vibration Analysis for Electronic Eauiument, 3rd edition, John Wiley & Sons, Inc. New York, NY, 2000. 28. N. Vichare, and M. Pecht, “Prognostics and Health Management of Electronics,” IEEE Transactions on Components and Packaging Technologies, Vol. 29, No. 1, pp. 222229, March 2006. 29. B. Tuchband, N. Vichare, and M. Pecht, “A Method for Implementing Prognostics to Legacy Systems,” Proceedings of IMAPS Military, Aerosuace. Suace and Homeland Security: Packaging Issues and Applications (MASH), Washington DC, June 6-8, 2006.

Chapter 5 Economics of PHM Prognostics and health management provides an opportunity for lowering sustainment costs, improving maintenance decision making, and providing product usage feedback into the product design and validation process. The adoption of PHM approaches requires consideratior; and planning for integration into new and existing systems, operations, and processes. PHM must provide a significant advantage in order to add value for the maintenance process; commitments to implement and support PHM approaches cannot be made without the development of supporting business cases. The realization of PHM requires implementation at different levels of scale and complexity. The maturity, robustness, and applicability of the underlying predictive algorithms impact the overall efficacy of PHM within a technology enterprise. The utility of PHM to inform decision makers within tight scheduling constraints and under different operational profiles likewise affects the cost avoidance that can be realized. This chapter discusses the determination of the benefits and potential cost avoidance offered by electronics PHM.

5.1

Return on Investment

One important attribute of most business cases is the development of an economic justification. Return on investment (ROI) is a useful means of gauging the economic merits of adopting PHM. ROI measures the “return,” the cost savings, profit, or cost avoidance that result from a given use of money. Types of ROI include investment return, cost savings (or cost avoidance), and profit growth [l]. At the enterprise level, ROI may reflect how well an organization is managed. In regards to specific organizational objectives such as gaining more market share, retaining more customers, or improving availability, the ROI may be measured in terms of how a change in practice or strategy results in meeting these goals. In general, ROI is the ratio of gain to investment. Equation 5.1 is a way of defining a ROI calculation over a system life cycle:

ROI = return - investment investment

avoided cost investment

-1

(5.1)

The middle ratio in 5.1 is the classical ROI definition and the right ratio is the form of ROI that is applicable to PHM assessment. ROI allows for enhanced decision making regarding the use of investment money and research and development efforts by enabling comparisons of alternatives. However, its inputs must be accurate and thorough in order for the calculation itself to be meaningful. In the case of PHM, the investment includes all the Prognostics und Heulth >Managementof Electronics. By Michael G . Pecht Copyright ‘C 2008 John Wiley & Sons. Inc.

85

Prognostics and Health Management of Electronics

86

costs necessary to develop, install, and support a PHM approach in a system, while the return is a quantification of the benefit realized through the use of a PHM approach. Constructing a business case for PHM does not necessarily require that the ROI be greater than zero (ROI > 0 implies that there is a cost benefit), that is, in some cases the value of PHM is not quantifiable in monetary terms but is necessary in order to meet a system requirement that could not otherwise be attained, for example, an availability requirement. However, the evaluation of ROI (whether greater than or less than zero) is still a necessary part of any business case developed for PHM [2].

5.1.1

PHM ROI Analyses’

The determination of the ROI allows managers to include quantitative and readily interpretable results in their decision making [3]. ROI analysis may be used to select between different types of PHM, to optimize the use of a particular PHM approach, or to determine whether to adopt PHM versus traditional maintenance approaches. The economic justification of PHM has been discussed by many authors, [e.g., 4-16 and 18-23]. The ROI associated with PHM approaches has been examined for specific nonelectronic military applications, including ground vehicles, power supplies, and engine monitors [12, 131. NASA studies indicate that the ROI of prognostics in aircraft structures may be as high as 0.58 in three years for contemporary and older generation aircraft systems assuming a 35% reduction in maintenance requirements [14]. To generalize the costs of electronics PHM for commercial and military aircraft requires knowledge of industry practices and regulations, knowledge of phased and mission scheduling, understanding of the underlying PHM component technologies, and an assessment of their accuracy. Simple ROI analyses of electronic prognostics for high-reliability telecommunications applications (power supplies and power converters) have been conducted, including a basic business case for the Bladeswitch voice telecommunications deployment in Malaysia [ 151. The Joint Strike Fighter (JSF) program was the first implementation of PHM in a major multinational defense system [16]. PHM is the principal component in the JSF’s Autonomic Logistics’ system. ROI predictions of the costs of PHM implementation and the potential for cost avoidance have been evaluated and an analysis of PHM for JSF aircraft engines was developed using a methodology that employed FMECA to model hardware [18, 191. The effectiveness of the PHM devices in detecting and isolating each of the failures was determined and evaluated against unscheduled maintenance and scheduled maintenance approaches. Ashby and Byer [ 191 employed a logistic simulation model to assess impacts on availability within military flight scheduling for an engine control unit (ECU) equipped with PHM for different subcomponents. PHM, when applied to suitable subcomponents, offered substantial monetary and nonmonetary benefits, specifically in increased safety and improved sortie generation. Ashby and Byer provide results showing maintenance and cost avoidance savings for a program using PHM over a five-year period. Byer et al. [20] describe a process for conducting a cost-benefit analysis for prognostics applied to aircraft subsystems. The definition of a baseline system without PHM and the aircraft system with PHM is the first step in the analysis. Secondly, reliability and

’

Warning: Not all researchers that quote ROI numbers define ROI in the same way. Equation 5.1 is the standard definition used by the financial world for ROI. 2 “Autonomic logistics” describes an automated system that supports mission reliability and maximizes sortie generation while minimizing costs and logistical burden [ 171.

Economics of PHM

87

maintainability predictions for the components of the aircraft are developed. Next, the measures of PHM effectiveness are defined and the corresponding metrics associated with these measures of effectiveness are established. The impact of PHM on training, support equipment. the cost of consumables, and manpower is then assessed. The overall nonrecurring and recurring costs of providing PHM are estimated. The results are then computed for the cost benefits. The process is then repeated for PHM benefits that are not denominated in monetary units, including sortie generation capability, reduction in the frequency of accidents, and the change in footprint. As supplemental information and for model refinement, Byer et al. [20]use FMECA, line maintenance activity costing, and legacy field event rates in addition to scheduling matrices and cost data on parts to produce life-cycle costs and operational impact assessments. The detailed inputs present an improvement over the more general information contained in typical military maintenance databases, which may have a great amount of historical data overall but lack specific data on fault diagnostic and isolation times needed to assess the cost avoidance of PHM. The methodology can be used to enhance the accuracy of operational and support costs, even in the absence of PHM technologies, by creating a more rigorous framework for the examination of maintenance costs. The cost-benefit analysis of PHM for batteries within ground combat vehicles was modeled using the Army Research Laboratory’s Trade Space Visualizer software tool [21]. The analysis was performed by conducting a study of asset failure behavior, calculating the cost of PHM technology development and integration, estimating the benefits of the technology implementation, and calculating decision metrics. The initial analysis focuses on isolating the subcomponents that contribute to the degradation of the larger components or the system itself. FMECA can then be used to classify the failure mode and determine which prognostics technology could be used to monitor it. This information is then extended into a fleet operations framework in which a user can select variables of parameters, such as availability, battery failure rate, or the logistic delay time. These parameters can be optimized to achieve a given ROI, or the user can set values for these parameters and then calculate the ROI for different scenarios. Banks and Merenich [21] found that ROI was maximized when the time horizon (prognostic distance) was greatest and when the number of vehicles and the failure rates were largest. A comparison of the ROI of prognostics for two types of military ground vehicle platforms was performed using data from Pennsylvania State University’s battery prognostics program [7].Nonrecurring development costs were estimated for the prognostic units developed for the batteries of the light armored vehicle (LAV) and the Stryker platform used in the Stryker Brigade Combat Team (SBCT) family of vehicles. ROI was calculated as 0.84 for the LAV and 4.61 for the SBCT based on estimates of the development and implementation costs. The difference in ROI is attributed to a shorter period of benefit over which the costs of PHM development would be absorbed for the LAV in addition to a smaller quantity of batteries. The implementation costs considered were manufacturing of the PHM sensors and their installation in each vehicle. The nonrecurring development costs included algorithm development; hardware and software design, engineering, qualification, and testing; vehicle system integration; and the development of an integrated data environment (IDE) for data management. When combined with known data about battery performance across the Department of Defense (DoD), the total ROI of battery prognostics for the DoD was calculated as 15.25 over a 25-year period. The Boeing Company developed a life-cycle cost model for evaluating the benefits of prognostics for the JSF program. The model was developed by Boeing’s Phantom Works Division to enable cost-benefit analysis of prognostics for the fighter’s avionics during system demonstration and then enhanced to permit life-cycle cost assessment of prognostic

Prognostics and Health Management of Electronics

88

approaches [22]. The model allowed for selection of standard mission profiles or definition of custom mission profiles. Cost influencing parameters in addition to economic factors were incorporated into a cost-benefit analysis [23]. Although existing PHM ROI assessments contain valuable insight into the cost drivers, most cost analyses and cost-benefit analyses are application specific; they do not provide a general modeling framework or consistent process with which to approach the evaluation of the application of PHM to a new system. Furthermore, existing approaches provide primarily “point estimates” of the value based on a set of fixed inputs when, in reality, the inputs are uncertain. For example, the reliability of a system is best represented as a probability distribution, as are many other inputs to the ROI analysis. Accommodating the uncertainties in the PHM ROI calculation is at the heart of developing realistic business cases that address prognostic requirements.

5.1.2

Financial Costs

Financial costs are part of the engineering economics of technology acquisitions. The business cases for the inclusion of PHM into systems are long-term propositions, that is, for most types of systems, investments are made and cost avoidance is realized over many years. Because the ROI assessment spans a significant time period, the cost of money must be included in the ROI evaluation. In examining options for capital allocations, key financial concepts are used to evaluate alternatives and to determine the best use of an organization’s resources. The borrowing of money carries with it an interest charge, while examination of resource allocation and payments over a system life cycle may require consideration of the value of money over time, depreciation, and inflation. Economic equivalence correlates the cash flows associated with different usage alternatives to produce meaninghl comparisons for investment decision making. Concepts such as present value may be used to compare the value of money in the present to its value in the future. A dollar today is worth more than a dollar in the future, because money available today can be invested and grow while money spent today cannot. Ignoring inflation, the present value of V, at n years from the present with a constant discount rate (rate of ROI) of r is given by

vn Present value = ___ (1 + r)” Using 5.2, a cost of V, can be shifted n years into the past for comparison purposes. Other forms of the present value calculation exist for various assumptions about the growth of money over time; see reference 24 for an overview of engineering economics concepts.

5.2

PHM Cost-Modeling Terminology and Definitions

This section provides some necessary definitions of several concepts that are central to the discussion of PHM costs in this chapter. Line replaceable unit is a general term referring to a generic “black box” electronics unit that is usually designed to common specifications and is readily replaceable on the “line” (i.e.. in the field). LRUs are distinguished from shop replaceable units (SRUs) and depot

Economics of PHM

89

replaceable units (DRUs), which may require additional time, resources, and equipment for replacement and maintenance. A socket is a unique instance of an installation location for an LRU. One instance of a socket occupied by an engine controller is its location on a particular engine. The socket may be occupied by a single LRU during its lifetime (if the LRU never fails) or multiple LRUs if one or more LRUs fail and need to be replaced. Unscheduled maintenance refers to operating a system until failure and then taking appropriate maintenance actions to replace or repair the failure. The opposite of unscheduled maintenance is preventative niaintenance, in which a maintenance action is taken prior to failure at a scheduled interval or in response to an indication provided by a PHM approach. A ,fixed-schedule maintenance intewal is the interval at which scheduled maintenance is performed. The fixed-schedule maintenance interval is kept constant for all instances of the LRUs occupying all socket instances throughout the system life cycle. The common wisdom that oil should be changed every 3000 miles for personal vehicles represents a fixed-schedule maintenance interval policy. Precursor-to-failure methodologies refer to methodologies that are dependent on the specific LRU instance they are applied to. Included in this category of PHM approaches are health monitoring (HM) and LRU-dependent fuses. LRU-dependent fuses are assumed to be fabricated concurrently with specific instances of LRUs, for example, they would share LRU-specific variations in manufacturing and materials. LR L‘-independent methodologies are independent of the specific LRU instance they are applied to. Included in this category of PHM approaches are LCM and LRU-independent fuses. LRU-independent fuses are fabricated separately from the LRUs and assembled into the LRUs. so they do not share any LRU-specific variations in manufacturing and materials. The remainder of this chapter treats the total cost of ownership of PHM by discussing two major categories of cost-contributing activities that must be considered in an analysis of the ROI of PHM. These categories, implementation costs and cost avoidance, represent the “investment” portion and the “return on” portion of the ROI calculation, respectively.

5.3

PHM Implementation Costs

Implementation costs are the costs associated with the realization of PHM in a system, that is, the achievement of the technologies and support necessary to integrate and incorporate PHM into new or existing systems. The costs of implementing PHM can be categorized as recurring. nonrecurring. or infrastructural depending on the frequency and role of the corresponding activities. The implementation cost is the cost of enabling the determination of remaining useful life (RUL) for the system. Implementation costs can be characterized as nonrecurring, recurring, and infrastructural. “Implementation” may be decomposed into many separate activities at different levels of complexity and detail. The following sections discuss the major groups of implementation costs while maintaining generality and breadth. This broadness reflects the incorporation of implementation costs into ROI models for PHM; an organization will likely not be able to put an exact “price tag” on very specific activities. Implementation cost models can and should be adapted to meet the needs of a particular application and can be expanded as knowledge of the PHM devices and their use increase.

Prognostics and Health Management of Electronics

90

5.3.1

Nonrecurring Costs

Nonrecurring costs are associated with one-time-only activities that typically occur at the beginning of the timeline of a PHM program-although disposal or recycling nonrecurring costs would occur at the end. Nonrecurring costs can be calculated on a per-LRU or per-socket basis or per a group of LRUs or sockets. The development of hardware and software is the most prominent nonrecurring cost. Hardware cost modeling will vary depending on manufacturing specifications, country of origin, level of complexity, and materials. LRU-dependent prognostics are manufactured concurrently with the device whose failure they are intended to indicate; if a general cost model can be developed for the electronic components of interest, it may be a reasonable assumption that the costs of materials, parts, and labor for the manufacturing of the prognostic device will be equivalent. This simplifies the cost modeling of the LRU-dependent prognostics but not the LRU-independent approaches, which need not have anything in common with the device they are monitoring. The development of PHM software may be outsourced and treated as a single contract amount or may be modeled according to standard software cost models such as COCOMO [25]. COCOMO and other software cost models provide cost estimates based on the source lines of code (SLOC), the programming language used, and the manpower needed for development. Both hardware and software design include testing and qualification to ensure performance, compatibility with existing architectures, and compliance with standards and requirements. Other nonrecurring costs include the costs of training, documentation, and integration. Training costs arise from the need to develop training materials to instruct and educate maintainers, operators, and logistics personnel as to the use and maintenance of PHM, in addition to the cost of removing these workers from their ordinary duties to attend training. PHM hardware and software must have documentation to serve as guides and as usage manuals, while integration costs refer to the costs of modifying and adapting systems to incorporate PHM. The specific nonrecurring cost is calculated as ‘,RE

= ‘dev-hard

+ ‘dev-sofr

+

‘

training

+ ‘doc

+ ‘int

+ ‘qua1

(5.3)

where Cde\ hard is the cost of hardware development; CdeL-soft is the cost of software development; C,,,,, is the cost of training; Cdoc is the cost of documentation; C,,, is the cost of integration; and Cqualis the cost of testing and qualification.

5.3.2

Recurring Costs

Recurring costs are associated with activities that occur continuously or regularly during the PHM program. As with nonrecurring costs, some of these costs can be viewed as an additional charge for each instance of an LRU or for each socket (or for a group of LRUs or sockets). The recurring cost is calculated as REC

=

hard-add

+

‘

assembly

+

test

+

install

(5.4)

Economics of PHM

91

where Chdrdadd is the cost of hardware in each LRU (e.g., sensors, chips, extra board area) and may include the cost of additional parts or manufacturing or the cost of hardware for each socket (such as connectors and sensors); Cassembly is the cost of assembly, installation, and functional testing of the hardware in each LRU or the cost of assembly of hardware for each socket or for each group of sockets; C, is the cost of functional testing of hardware for each socket or for each group of sockets; and ClnStall is the cost of installation of hardware for each socket or for each group of sockets, which includes the original installation and reinstallation upon failure, repair, or diagnostic action.

5.3.3

Infrastructure Costs

Unlike recurring and nonrecurring costs, infrastructure costs are associated with the support features and structures necessary to sustain PHM over a given activity period and are characterized in terms of the ratio of money to a period of activity (i.e., dollars per operational hour, dollars per mission, dollars per year). During a mission or use period, the PHM device may be collecting, processing, analyzing, storing, and relaying data. These activities constitute the data management needed to implement PHM and are continual throughout the life of the PHM program. The addition of PHM to an LRU imposes a cost associated with the extra time for maintainers, diagnosticians, and other personnel to read and relay the information provided by PHM to render a decision about the timing and content of maintenance actions. As with the LRUs that they monitor, PHM devices may also require maintenance over their life cycles. including repairs and upgrades. Maintenance of the PHM devices may require the purchase of repair expendables (consumables) or ordering of new parts. The labor required for such maintenance contributes to the infrastructure costs. Lastly, retraining or “continuous education” is an infrastructure cost, ensuring that personnel are prepared to use and maintain the PHM devices as intended. The infrastructure costs are calculated as ,I‘

= ‘prog-maintenance

+ ‘decision

+ ‘retraining

+ ‘data

(5.5)

where Cdatd is the cost of data management, including the costs of data archiving, data collection, data analysis, and data reporting; Cprog,malntenance is the cost of maintenance of the prognostic devices; Cdeclrlon is the cost of decision support; and Cretraining is the cost of retraining costs to educate personnel in the use of PHM.

5.3.4

Nonmonetary Considerations and Maintenance Culture

The implementation of PHM imparts additional burdens onto systems that cannot always be easily measured and considered in monetary terms. The physical hardware apparatuses used in PHM will consume volumetric space and alter the weight (loading) of the systems where they are installed. The time needed for PHM data to be processed, stored, and analyzed to render a maintenance decision is an additional metric of importance. Space, weight, time, and cost (SWTC) are the dimensions in which PHM activities could be fully expressed. Each of these dimensions may not be useful or needed for a particular analysis; however, awareness of these physical and time-related factors can be leveraged to calculate

9:

Prognostics and Health Management of Electronics

the nonmonetary impositions and potential benefits associated with PHM. Examples of these nonmonetary quantities are given in Table 5.1,

Table 5.1 : Categories of Nonmonetary Considerations for PHM

Maintenance culture has been studied to identify areas of improvement following accidents or failures, to determine the most effective ways of training maintenance crews, and as part of resource management, with 12-15% of accidents in the commercial aviation industry attributable to maintenance errors [26]. Analyses of the maintenance culture underscore the complexity of decision making within the industry and point to the underlying difficulties of effecting organizational changes [27, 281. Organizations seeking to implement changes within their daily operations are confronted by direct and tangible impacts such as new equipment and fewer personnel that can be correlated to different costs. However, the role of seemingly intangible elements has proved important to the practices and business culture of productive and efficient organizations and has been studied within the contexts of industrial and organizational psychology, group dynamics, human factors, and team and training effectiveness [29]. The aviation workplace culture has been examined as an environment in which high-pressure, safety-critical decisions must be made in a team atmosphere. PHM represents a departure from traditional maintenance procedures; to implement it will require a change in the maintenance culture such that maintainers are comfortable and educated to use PHM as intended. This cost of changing the maintenance culture may be quantified as a continuous education cost beyond standard training. System architects and designers would eventually transition to placing greater responsibility in PHM, ultimately to remove redundancy and to make other changes necessary to allow the full value of PHM to be realized. While this is not a tangible or engineering cost, it is nonetheless a real factor contributing to the adoption of PHM.

Economics of PHM

5.4

93

Cost Avoidance

Prognostics provide estimations of RUL in terms that are useful to the maintenance decision-making process. The decision process can be tactical (real-time interpretation and feedback) or strategic (maintenance planning). All PHM approaches are essentially the extrapolation of trends based on recent observations to estimate RUL [30]. Unfortunately, the calculation of RUL alone does not provide sufficient information to form a decision or to determine corrective action. Determining the best course of action requires the evaluation of criteria such as availability, reliability, maintainability, and life-cycle cost. Cost avoidance is the value of changes to availability, reliability, maintainability, and failure avoidance. The primary opportunities for obtaining cost avoidance from the application of PHM to systems are failure avoidance and minimization of the loss of remaining system life. Field failure of systems is often very expensive. If all or some fraction of the field failures can be avoided, then cost avoidance may be realized by minimizing the cost of unscheduled maintenance. Avoidance of failures can also increase availability, reduce the risk of loss of the system, and increase human safety depending on the type of system considered. Failures avoided fall into two types: (1) real-time failure avoidance during operation that would otherwise result in the loss of the system or loss of the function the system was performing (i.e., loss of mission) and (2) warning of future (but not imminent) failure that allows preventative maintenance to be performed at a place and time that are convenient. PHM may allow minimization of the amount of remaining useful life thrown away when performing scheduled maintenance. Cost can be avoided if the system components are used for their full lifetimes rather than removing and disposing of system components while they still possess significant RUL. The two opportunities discussed above are the primary targets for most PHM business cases; however, other cost avoidance opportunities, discussed in the remainder of this section, may exist depending on the application of the system. Logistics footprint reduction: Reduction in the system’s logistics footprint may be possible through better spares management (quantity, refreshment, and locations), better use of and control over inventory, and minimization of external test equipment. Note, this does not necessarily imply that the quantity of spares required will be reduced; in fact, a successful PHM program could increase the number of spares needed compared to a non-PHM unscheduled maintenance approach. Repair cost reduction: PHM may reduce the costs of repair by enabling better fault isolation (decreased inspection time, decreased troubleshooting time, less equipment removal [3 11). PHM may also reduce collateral damage during repair because of better fault isolation. Reduction in redundancy: In the long term, it may be possible to reduce critical system redundancy for selected subsystems. This will not happen until and unless PHM approaches are proven effective for the subsystems. Reduction in NFFs3: PHM approaches may be able to reduce the quantity or reduce the cost of resolving NFFs. A substantial portion of the maintenance cost of many systems is

No-fault-founds (NFFs), also known as cannot-duplicates (CNDs) or no-trouble-founds (NTFs), occur when an originally reported mode of failure cannot be duplicated and therefore the potential defect cannot be fixed. Many organizations ha\e policies regarding the management of NFFs such that, depending on the number of occurrences of an NFF in a specific LRU, the NFF LRUs are put back in service or contributed back into the spares pool.

94

Prognostics and Health Management of Electronics

due to NFFs. It may be possible to construct an entire business case for electronics PHM based on only the reduction in NFFs. Eased design and qualification of future systems: The data collected through the use of PHM is an extremely valuable resource for understanding the actual environmental stresses and product usage conditions seen by a product during its field use. This knowledge can be used to refine the design, refine reliability assessments, improve uncertainty estimates, and to enhance knowledge of failure modes and behaviors. Designers of a product often cannot anticipate how that product is actually used, for example, designers rated the maximum load of high-mobility multipurpose wheeled vehicles (HMMWVs) at 2500 lb; in combat zones, they have been loaded to more than 4530 lb, that is, 181% of their maximum load [32]. Warranty verification: PHM can be used to verify the field usage conditions for products returned for warranty claims, thereby allowing products that have been used in environmental conditions that void the warranty to be readily identified and warranty claims for them appropriately managed. Reduced waste stream: For some systems. PHM may lead to a reduction in the end-of-life disposal costs for the system and thereby a reduction in product take-back costs. Not all of the opportunities listed above are applicable to every type of system, however, a combination of the opportunities has to be targeted or a business case cannot be substantiated. Several key concepts differentiate the cost avoidance modeling from implementation cost modeling. First, the temporal order of events in the lifetime of an LRU or socket affects the calculation of cost avoidance (this is true whether financial costs are included or not). The cost avoidance is heavily influenced by the sequencing (in time) of failures and maintenance actions, whereas implementation costs are not time sequence dependent and can be modeled independently of each other in many cases, despite sharing cost-contributing factors. Secondly, irrespective of the combination of criteria for cost avoidance under consideration, corresponding measures of the uncertainty associated with the calculation must be incorporation. It is the inclusion and comprehension of the corresponding uncertainties-decision making under uncertainty-that is at the heart of being able to develop a realistic business case that addresses prognostic requirements. The next section addresses the use of PHM for maintenance planning. It quantifies how to determine the cost avoidance associated with PHM for the realization of failure avoidance and the minimization of the loss of RUL.

5.4.1

Maintenance Planning Cost Avoidance

The modeling discussed in this section is targeted at finding the optimum balance between avoiding failures and throwing away RUL with fixed-interval scheduled maintenance. Two systems, fielded and used under similar conditions. will not generally fail at exactly the same time due to differences in their manufacturing and materials, and due to differences in the environmental stress history they experience. Therefore, system reliability is generally represented as a probability distribution over time or in relation to an environmental stress driver. Likewise, the ability of a PHM approach to accurately predict RUL is not perfect due to sensor uncertainties, sensor gaps, sensor locations, uncertainties in algorithms and models used, or other source. Practically speaking, these uncertainties make 100% failure avoidance impossible to obtain; optimal maintenance planning for systems effectively becomes a trade-off between the potentially high costs of failure and the costs of throwing away remaining system life in order to avoid failures.

Economics of PHM

95

Although many applicable models for single- and multiunit maintenance planning have appeared [33, 341, the majority of the models assume that monitoring information is perfect (without uncertainty) and complete (all units are monitored identically), that is, maintenance planning can be performed with perfect knowledge as to the state of each unit. For many types of systems, and especially electronic systems, these are not good assumptions and maintenance planning, if possible at all, becomes an exercise in decision making under uncertainty with sparse data. The perfect monitoring assumption is especially problematic when the PHM approach is LCM because LCM does not depend on precursors. Thus, for electronics, LCM processes do not deliver any measures that correspond exactly to the state of a specific instance of a system. Previous work that treats imperfect monitoring includes references 35 and 36. Perfect but partial monitoring has been previously treated [37]. This section describes a stochastic decision model [38] that enables the optimal interpretation of LCM damage accumulation or HM precursor data and applies to failure events that appear to be random or appear to be clearly caused by defects. Specifically the model is targeted at addressing the following questions. First, how do we determine on an application-specific basis when the reliability of electronics has become predictable enough to warrant the application of PHM-based scheduled maintenance concepts? Note that predictability in isolation is not necessarily a suitable criterion for PHM versus non-PHM solutions, for example, if the system reliability is predictable and very reliable, it would not make sense to implement a PHM solution. Second, how can PHM results be interpreted so as to provide value, that is, how can a business case be constructed given that the forecasting ability of PHM is subject to uncertainties in the sensor data collected, the data reduction methods, the failure models applied, the material parameters assumed in the models, and so on? The interpretation boils down to determining an optimal safety margin on LCM prediction and prognostic distance for HM.

5.4.2

Discrete Event Simulation Maintenance Planning Model

The maintenance planning model discussed here accommodates variable TTF of LRUs and variable RUL estimates associated with PHM approaches implemented within LRUs. The model considers both single and multiple sockets within a larger system. Discrete event simulation is used to follow the life of individual socket instances from the start of their field lives to the end of their operation and support.' Discrete event simulation allows for the modeling of a system as it evolves over time by capturing the changes as separate events (as opposed to continuous simulation where the system evolves as a continuous function). The evolutionary unit need not be time; it could be thermal cycles, or some other unit relevant to the particular failure mechanisms addressed by the PHM approach. Discrete event simulation has the advantage of defining the problem in terms of an intuitive basis, that is, a sequence of events, thus avoiding the need for formal specification. Discrete event simulation is widely used for maintenance and operations modeling [e.g., 39-41] and has also previously been used to model PHM activities [42]. The model discussed in this chapter treats all inputs to the discrete event simulation as probability distributions, that is, a stochastic analysis is used, implemented as a Monte Carlo simulation. Various maintenance interval and PHM approaches are distinguished by how sampled TTF values are used to model PHM RUL forecasting distributions. To assess PHM, 4

Alternatively, one could follow the lifetime of LRUs through their use, repair. reuse in other sockets, and disposal.

Prognostics and Health Management of Electronics

96

relevant failure mechanisms are segregated into two types. Failure mechanisms that are random from the viewpoint of the PHM methodology are failure mechanisms that the PHM methodology is not collecting any information about (nondetection events). These failure mechanisms may be predictable but are outside the scope of the PHM methods applied. The second type refers to failure mechanisms that are predictable from the viewpoint of the PHM methodology-probability distributions can be assigned for these failure mechanisms. For the purposes of cost model formulation, PHM approaches are categorized as (defined in detail in Section 5.2) (a) a fixed-schedule maintenance interval; (b) a variable maintenance interval schedule for LRU instances that is based on inputs from a precursor to failure methodology; and (c) a variable maintenance interval schedule for LRU instances that is based on an LRU-independent methodology. Note, for simplicity, the model formulation is presented based on “time” to failure measured in operational hours; however, the relevant quantity could be a nontime measure.

5.4.3

Fixed-Schedule Maintenance Interval

A fixed-schedule maintenance interval is selected that is kept constant for all instances of the LRU that occupy a socket throughout the system life cycle. In this case the LRU is replaced on a fixed interval (measured in operational hours), that is, time-based prognostics. This is analogous to mileage-based oil changes in automobiles.

5.4.4

Precursor to Failure Monitoring

Precursor to failure monitoring approaches are defined as a fuse or other monitored structure that is manufactured with or within the LRUs or as a monitored precursor variable that represents a nonreversible physical process, that is, it is coupled to the manufacturing or material variations of a particular LRU. Health monitoring and LRU-dependent fuses are examples of precursor to failure methods. The parameter to be determined (optimized) is prognostic distance. The prognostic distance is a measure of how long before system failure the prognostic structures or prognostic cell is expected to indicate failure (e.g., in operational hours). The precursor to failure monitoring methodology forecasts a unique TTF distribution for each instance of an LRU based on the instance’s TTF.’ For illustration purposes, the precursor to failure monitoring forecast is represented as a symmetric triangular distribution with a most likely value (mode) set to the TTF of the LRU instance minus the prognostic distance, Figure 5.1. The precursor to failure monitoring distribution has a fixed width measured in the relevant environmental stress units (e.g., operational hours in our example) representing the probability of the prognostic structure indicating the precursor to a failure. As a simple example, if the prognostic structure was a LRU-dependent fuse that was designed to fail at some prognostic distance earlier than the system it protects, then the distribution on the right side of Figure 5.1 represents the distribution of fbse failures (the TTF distribution of the fuse). The parameter to be optimized in this case is the prognostic distance assumed for the precursor to failure monitoring forecasted TTF. 5

In this model. all failing LRUs are assumed to be maintained via replacement or good-as-new repair; therefore, the time between failure and the time to failure are the same.

Economics of PHM

97

LRU’s TTF PDF represents variations in manufacturing and materials

LRU TTF

PHM Structure TTF

Figure 5.1: Precursor to failure monitoring modeling approach. Symmetric triangular distributions are chosen for illustration. Note, the LRU TTF PDF (left) and the precursor to failure TTF PDF (right) are not the same (they could have different shapes and sizes). The model proceeds in the following way: for each LRU TTF distribution sample (tl) taken from the left side of Figure 5.1, a precursor to failure monitoring TTF distribution is created that is centered on the LRU TTF minus the prognostic distance (tl-d). The precursor to failure monitoring TTF distribution is then sampled, and if the precursor to failure monitoring TTF sample is less than the actual TTF of the LRU instance, the precursor to failure monitoring is deemed successful. If the precursor to failure monitoring distribution TTF sample is greater than the actual TTF of the LRU instance, then precursor to failure monitoring was unsuccessful. If successful, a scheduled maintenance activity is performed and the timeline for the socket is incremented by the precursor to failure monitoring sampled TTF. If unsuccessful, an unscheduled maintenance activity is performed and the timeline for the socket is incremented by the actual TTF of the LRU instance. At each maintenance activity, the relevant costs are accumulated.

5.4.5

LRU-Independent Methods

In LRU-independent PHM methods, the PHM structure (or sensor) is manufactured independent of the LRUs, that is, the PHM structures are not coupled to a particular LRU’s manufacturing or material variations. An example of an LRU-independent method is LCM. LCM is the process by which a history of environmental stresses (e.g., thermal, vibration) is used in conjunction with PoF models to compute damage accumulated and thereby forecast RUL. The LRU-independent methodology forecasts a unique TTF distribution for each instance of an LRU based on its unique environmental stress history. For illustration purposes, the LRU-independent TTF forecast is represented as a symmetric triangular distribution with a most likely value (mode) set relative to the TTF of the nominal LRU and a fixed width measured in operational hours, Figure 5.2. Other distributions may be chosen and [43] has shown how this distribution may also be derived from recorded environment

Prognostics and Health Management of Electronics

98

LRC's TTF PDF represents variations in manufacturing and materials

.....

LRU TTF

PHM Structure TTF

Figure 5.2: LRU-independent modeling approach. Symmetric triangular distributions are chosen for illustration. Note, the LRU TTF PDF (left) and the LRU-independent method TTF PDF (right) are not the same (they could have different shapes and sizes). history. The shape and width of the LRU-independent method distribution depend on the uncertainties associated with the sensing technologies and uncertainties in the prediction of the damage accumulated (data and model uncertainty). The variable to be optimized in this case is the safety margin assumed on the LRU-independent method forecasted TTF, that is, the length of time (e.g., in operation hours) before the LRU-independent method forecasted TTF the unit should be replaced. The LRU-independent model proceeds in the following way: for each LRU TTF distribution sample (left side of Figure 5.2), an LRU-independent method TTF distribution is created that is centered on the TTF of the nominal LRU minus the safety margin-right side of Figure 5.2 (note, the LRU-independent methods only know about the nominal LRU, not about how a specific instance of an LRU varies from the nominal). The LRU-independent method TTF distribution is then sampled, and if the LRU-independent method TTF sample is less than the actual TTF of the LRU instance, then the LRU-independent method was successful (failure avoided). If the LRU-independent method TTF distribution sample is greater than the actual TTF of the LRU instance, then the LRU-independent method was unsuccessful. If successful, a scheduled maintenance activity is performed and the timeline for the socket is incremented by the LRU-independent method sampled TTF. If unsuccessful, an unscheduled maintenance activity is performed and the timeline for the socket is incremented by the actual TTF of the LRU instance.6 In the maintenance models discussed, a random failure component may also be superimposed as discussed in reference 38. The fixed-schedule maintenance, precursor to failure monitoring, and LRU-independent method models are implemented as stochastic simulations, in which a statistically relevant number of sockets are considered in order to

LRU-independent fuses and canary devices may require replacement for each alert that they provide whether that alert is a false positive or not. After the PHM devices are removed for maintenance, to download data, or for other activities, reinstallation follows.

Economics of PHM

99

construct histograms of costs, availability, and failures avoided. Again, at each maintenance activity, the relevant costs are accumulated. The fundamental difference between the precursor to failure and LRU-independent models is that in the precursor to failure models the TTF distribution associated with the PHM structure (or sensor) is unique to each LRU instance, whereas in the LRU-independent models the TTF distribution associated with the PHM structure (or sensor) is tied to the nominal LRU and is independent of any manufacturing or material variations between LRU instances.

5.4.6

Discrete Event Simulation Implementation Details

The model follows the history of a single socket or a group of sockets from time zero to the end of support life for the system. To generate meaninghl results, a statistically relevant number of sockets (or systems of sockets) are modeled and the resulting cost and other metrics are presented in the form of histograms. The scheduled and unscheduled costs computed for the sockets at each maintenance event are given by

where Csocker I is the life-cycle cost of socket i; CLRL i is the cost of procuring a new LRU; CLRU is the cost of repairing an LRU in socket i; f is the fraction of maintenance events on socket i that require replacement of the LRU in socket i with a new LRU; Treplace i is the time to replace the LRU in socket i; Trepalris the time to repair the LRU in socket i; and V is the value of time out of service. Note, the values of f and V generally differ depending on whether the maintenance activity is scheduled or unscheduled. As the discrete event simulation tracks the actions that affect a particular socket during its life cycle, the implementation costs are inserted at the appropriate locations, Figure 5.3. At the beginning of the life cycle, the non-recurring cost is applied. The recurring costs at the LRU level and at the system level are first applied here and subsequently applied at each maintenance event that requires replacement of an LRU (CLRUi, as in Equation 5.6). The recurring LRU-level costs include the base cost of the LRU regardless of the maintenance *Base LRU recurring cost PHM LRU recurring cost

Base LRU recurring cost *PHM LRU recurring cost

b

I

LRU/socket associated nonrecurring cost System recurring cost

<

Maintenance event requiring a replacement LRU

Time

/

--4-

Infrastructure cost (charged periodically)

Figure 5.3: Temporal ordering of implementation cost inclusion in the discrete event simulation.

Prognostics and Health Management of Electronics

100

approach. Discrete event simulations that compare alternative maintenance approaches to determine the ROI of PHM must include the base cost of the LRU itself without any PHMspecific hardware. If discrete event simulation is used to calculate the life-cycle cost for a socket under an unscheduled maintenance policy, then the recurring LRU-level cost is reduced to the cost of replacing or repairing an LRU upon failure. Under a policy involving PHM, the failure of an LRU results in additional costs for the hardware, assembly, and installation of the components used to perform PHM. The infrastructure costs are distributed over the course of the socket’s life cycle and are charged periodically. The model assumes that the TTF distribution represents manufacturing and material variations from LRU to LRU. The range of possible environmental stress histories that sockets may see are modeled using an environmental stress history distribution. Note, the environmental stress history distribution need not be used if the TTF distribution for the LRUs includes environmental stress variations. The environmental stress history distribution is not used with the precursor to failure or LRU-independent models. Random TTFs are characterized by a uniform distribution with a height equal to the average random failure rate per year and a width equal to the inverse of the average random failure rate. Uncertainty, which must be propagated throughout the life-cycle simulations of systems, is present at multiple levels in the calculation of RUL. The data collected by the prognostic devices, the material inputs reliability modeling depends on, and the underlying assumptions of electronic failure behavior that are applied to produce reliability estimates may not always be accurate. Uncertainties can be handled using different approaches; however, the most general method of handling uncertainties is to use a Monte Carlo analysis approach in which each input parameter is optionally represented as a probability distribution. The CALCE implementation of the maintenance modeling discussed in this chapter is implemented as a Monte Carlo analysis that follows a statistically relevant number of sockets over their support lives. Additional model implementation details, including a flow chart that describes the discrete event simulation process, are available [38].

5.4.7

Operational Profile

The operational profile of systems equipped with PHM dictates how the information provided by PHM may be used to affect the maintenance and usage schedules. The effective costs associated with maintenance actions depend on when (and where) actions are indicated relathe to some operational cadence. Cadences may be proscribed by business constraints, regulations, or mission requirements and may be subject to change as user requirements shift. The cadence may be best described according to a probabilistic model rather than a timeline, that is, a defined probability of a maintenance request being issued before, during, or after a mission or particular type of use. The implications of the safety margins or prognostics distances will vary with the difference in cadence to affect the timing of maintenance actions. The operational profile is reflected in the maintenance modeling by varying the value of the parameter V in Equation 5.6. The value of an hour out of service, V, is set to a specific value if the maintenance is scheduled, but if the maintenance is unscheduled, the value of V is given by the data in Table 5.2.

Economics of PHM

I01

Table 5.2: Data Defining Unscheduled Maintenance Operational Profile

“Before mission” represents maintenance requirements that occur while preparing to place the system into service, that is, while loading passengers onto the aircraft for a scheduled commercial flight. “During mission” means that the maintenance requirement occurs while the system is performing a service and may result in interruption of that service, that is, making an emergency landing or abandoning a HMMWV by the side of the road during a convoy. “After mission” represents time that the system is not needed, that is, the period of time from midnight to 6:OO am when the commercial aircraft could sit idle at a gate. When an unscheduled maintenance event occurs, a random number generator is used to determine the portion of the operational profile the event is in and the corresponding value (V) used in the analysis. This type of valuation in the discrete event simulation is only useful if a stochastic analysis that follows the life of a statistically relevant number of sockets is used.

5.5

Example PHM Cost Analysis

The baseline data assumptions used to demonstrate the model in this chapter are given in Table 5.3. All of the variable inputs to the model can be treated as probability distributions or as fixed values; however, for example purposes, only the TTFs of the LRUs and the PHM structures have been characterized by probability distributions. Note, all of the life-cycle cost results provided in the remainder of this chapter are the mean life-cycle costs from a probability distribution of life-cycle costs generated by the model.

Table 5.3: Data Assumptions for Example Cases Presented in This Section

102

Prognostics and Health Management of Electronics

5.5.1

Single-Socket Model Results

Figure 5.4 shows the fixed-schedule maintenance interval results. Ten thousand sockets were simulated in a Monte Carlo analysis and the mean life-cycle costs were plotted. The general characteristics in Figure 5.4 are intuitive: for short scheduled maintenance intervals, virtually no expensive unscheduled maintenance occurs, but the life-cycle cost per unit is high because large amounts of RUL in the LRUs are thrown away. For long scheduled maintenance intervals virtually every LRU instance in a socket fails prior to the scheduled maintenance activity and the life-cycle cost per unit becomes equivalent to unscheduled maintenance. For some scheduled maintenance interval between the extremes, the life-cycle cost per unit is minimized. If the TTF distribution for the LRU had a width of zero, then the optimum fixed-schedule maintenance interval would be exactly equal to the forecasted TTF. As the forecasted TTF distribution for the LRU becomes wider (i.e., the forecast is less well defined), a practical fixed-schedule maintenance interval becomes more difficult to find and the best solution approaches an unscheduled maintenance model. Figure 5.5 shows example results for various widths of the LRU TTF distribution as a function of the safety margin and prognostic distances associated with the precursor to failure and LRU-independent models. Several general trends are apparent. First, the width of the LRU TTF distribution has little effect on the precursor to failure PHM method results. This result is intuitive in that in the precursor to failure case the PHM structures are coupled to the LRU instances and track whatever manufacturing or material variation they have, thereby also reflecting the LRU TTF distribution. The degree to which the LRU-to-LRU variations are removed from the problem depends on the degree of coupling between the LRU manufacturing and materials and the PHM structure manufacturing and materials. Alternatively, the LRU-independent PHM method is sensitive to the LRU TTF distribution

Width of time to failure distribution (hours) t1000 -4%- 2000

t4000

*6000 ++8000 -0-

10,000

100,000 -

0

2000

4000

6000

8000

10,0001

12,000

Fixed-Schedule Maintenance Interval (operational hours)

Figure 5.4: Variation of the effective life-cycle cost per socket with the fixed-schedule maintenance interval (10,000 sockets simulated with no random failures assumed).

Economics of PHM

103

LRU Independent

-

+

190 000

R

I

D

2

x

180,000

s

2000 hr TTF wdlh 1000 hr PHM wdlh

*.

4000 hrTTFwdth 1000 hr PHMwdth

0

Precursor to Failure

__ -

_._ - 1000 hr TTF wdth 1000 hr PHMwdth

200 000

200000 r- -

$

+

. 1000 hr TTF wdth 1000 hr PHM vndlh

0

4000 h i TTF wdth 1000 hr PHM wdlh

190 000

2000 hr TTF wdlh 1000 h i PHM wdlh

180000

x 5 1600001I ;170000

b -;

170000

P

0

160000

0

2 150 000 V

8

V e 150000

8-"

$ 140 000

+

*30000

140000 130000

5 120000

f

120000

110,000

5

110000

c

-

1

100 000 0

500

1 I

t

MR

;r

4

I

1 1

100000 1000

1500

0

500

Safety Margin (operating hours)

1500

A

-,x

Variations in TTF distribution width

1000

Prognostic Distance (operating hours)

3 8

1

,FyTFwid LRL TTF

Figure 5.5: Variation of the effective life-cycle cost per socket with the safety margin and prognostic distance for various LRU TTF distribution widths and constant PHM structure TTF width (10,000 sockets simulated). width because it is uncoupled from the specific LRU instance and can only base its forecast of failure on the performance of a nominal LRU. A second observation is that the optimum safety margin decreases as the width of the LRU TTF distribution decreases. This is also intuitive because as the reliability becomes more predictable (i.e., a narrower forecasted LRU TTF distribution width), the safety margin that needs to be applied to the PHM predictions also drops. Figure 5.6 shows example results for various widths of the PHM associated distribution (constant LRU TTF distribution width) as a function of the safety margin and prognostic distances associated with the precursor to failure and LRUindependent models. In this case, both PHM approaches are sensitive to the width of their distributions. General observations from Figures 5.5 and 5.6 are that 1) the LRU-independent model is highly dependent on the LRU's TTF distribution, while 2 ) precursor to failure methods are approximately independent of the LRU's TTF distribution. With all other factors being equal (ceteris paribus), 3) optimal prognostic distances for precursor methods are always smaller than optimal safety margins for LRU-independent methods, and therefore, precursor to failure PHM methods will always result in lower life cycle cost solutions than LRUindependent methods. The assumption in 3) is that equivalency is maintained between the LRUs and between the shapes and sizes of the distribution associated with the PHM approach. Any comparison between the precursor to failure approach and the LRUindependent approach should be performed with the assumption that both are possible choices, i.e., that there is a precursor to failure method that is applicable - there may not be

Prognostics and Health Management of Electronics

104

LRU Independent 200 000

Precursor to Failure

,

190 000

f

2000 h i TTF Mdth 1000 h i PHM wdth

A

2000 hr TTF mdth 2000 hr PHM wdth

0 2000 h i TTF wdth 4000 hr PHM wdth

b 170000 rn

5

I

$

p 150 000

A E

A A

*’

160,000

0 o 150000

u

$ 140,000

<

130000

s

120,000

yI

110000

-”

>

c

~ 0 0 0 0 0;__--LA

O

500

1000

1500

Prognostic Distance (operating hours)

A\

Variations in PHM distribution width I +

PHM width

\ PHM approach TTF

Figure 5.6: Variation of the effective life-cycle cost per socket with the safety margin and prognostic distance for various PHM structure TTF and constant LRU TTF distribution widths (10,000 sockets simulated). (especially for application to electronic systems). An example business case construction for the single socket case is given in Section 5.5.3. Figure 5.7 shows an example with a random failure rate of 10% included in the simulation. Figure 5.7 also includes the associated failures avoided. In all cases the failures avoided when random failures are included is lower than when random failures are not included, however, the change in the optimum safety margin or prognostic distance is small. As the safety margin or prognostic distance increase the failures avoided limits to 100% in all cases (with and without random failures included). However, for the example data used in this paper, safety margins or prognostic distances must be increased substantially beyond the range plotted in Figure 5.7 for the cases with random failures to approach 100%.

5.5.2

Multiple-Socket Model Results

Typical systems are composed of multiple sockets, where the sockets are occupied by a mixture of LRUs, some with no PHM structures or strategies and others with fixed-interval strategies, precursor to failure structures, or LRU-independent structures. Maintenance, even when it is scheduled, is expensive. Therefore, when the system is removed from service to perform a maintenance activity for one socket, it may be desirable to address multiple sockets (even if some have not reached their most desirable individual maintenance point). First, we address how to use the single-socket models developed in Section 5.4 to optimize a system composed of multiple sockets, where we are assuming that all the LRUs

Economics of PHM

105

Precursor to Failure 100

1

200000

*

Fati" 03 4," bed No ailuom 'a I ii

1c &d

t-ft+ 100

-

90 80 70

60

180 000

-&

170.000 I

p

2 0

50

40 30 20

0

500

1000

1500

-5

'

130000 120000

~

I 0

Safety Margin (operating hours)

500

1000

1500

Prognostic Distance (operating hours)

Figure 5.7: Variation of the effective life-cycle cost per socket and failures avoided, with the safety margin and prognostic distance for 2000 hr LRU TTF distribution widths and 1000 hr PHM distribution widths, with and without random failures included (10,000 sockets simulated). that occupy a particular socket have the same PHM approach (but approaches can vary from socket to socket). To address this problem we introduce the concept of a coincident time. The coincident time is the time interval within which different sockets should be treated by the same maintenance action. If

Time coincideiit > Time required

maintenance action on LRU i

- Time current

maintenanc e action

(5'7)

then the LRU i is addressed at the current maintenance action. A coincident time of zero signifies that each socket is treated independently. A coincident time of infinity signifies that whenever any LRU in any socket in the system demands to be maintained, all sockets are maintained regardless of their remaining-life expectancies. In the discrete event simulation, the time of the current maintenance and the fiture times for the required maintenance actions on other LRUs are known or forecasted and application-specific optimum coincident times can be found. Implementation of the above constraint in the discrete event simulation is identical to the single-socket simulation except we follow more than one socket at a time (see Section 5.4.6 and [38]). When the first LRU in the multiple-socket system indicates that it needs to be maintained by RUL forecast or actually does fail, a maintenance activity is performed on all sockets in which the LRUs forecast the need for maintenance within a user-specified coincident time (e.g., Figure 5.8). The model assumes that LRUs replaced at a maintenance event are good-as-new and that portions of the system where damage occurred that was not addressed by any maintenance are not otherwise affected by the maintenance event. Costs are accumulated for scheduled and unscheduled maintenance activities and a final total lifecycle cost computed. In practice, the future maintenance action times for LRUs, other than the one indicating the need for maintenance, need to be determined from reliability forecasting. However, there is greater uncertainty in these forecasts as the time distance increases. Analysis of multisocket systems demonstrates that three types of system responses are possible for three types of systems: dissimilar LRUs, similar LRUs, and mixed systems of

Prognostics and Health Management of Electronics

106

Socket 1 timeline Etc.. . to end of support Socket 2 timeline me . > Coincident time

Cumulative timeline

+

255; c.__

0 3 3 = m m 0 5 0 3

fifi -N-

c 3

85 8 - i-z ics F: c 5’ gcc 5.5’ -I-

3 %

g

N

c y y

gg

BB

< Coincident time a

LRU instance-specific “fix me” requests originating from failures, scheduled maintenance intervals, or PHM structures

N-

.c L-

Figure 5.8: Multisocket timeline example. LRUs for which optimization can be performed. Consider systems built from the two different sockets shown in Figure 5.9. For the examples in this section, with the exception of the LRU TTF distribution, all the data are given in Table 5.3. With LRU TTFs defined as shown in Figure 5.9, a system composed of sockets 1 and 2 is considered to be dissimilar (LRUs with substantially different reliabilities and different PHM approaches). The first step in analyzing a multisocket system is to determine what prognostic distance/safety margins to use for the individual sockets-we have observed no differences between the optimum prognostic distanceisafety margins determined analyzing individual sockets or the sockets within larger systems. For the case shown in Figure 5.9, the optimum prognostic distance for the LRU in socket 1 was 500 hrs.

Figure 5.9: TTF distributions for LRUs used in multisocket analysis examples. The plot on the right shows the cost of single-socket systems made from these two LRUs as a function of time using a prognostic distance of 500 hrs for the LRU in socket 1 (note, the results for 10,000 instance of each socket are shown). All data other than the LRU TTF are given in Table 5.3.

107

Economics of PHM

Figures 5.10-5.12 display plots of the mean life-cycle cost for a system of sockets. The mean life-cycle cost is the mean of a distribution of life-cycle costs computed for a population of 10,000 systems. Figure 5.10 shows the most common life-cycle cost characteristic for dissimilar systems. For small coincident times, both sockets are being maintained separately; for large coincident times, LRUs in both sockets are replaced

Coincident Time (operational hours)

Figure 5.10: Mean life-cycle cost per system of two dissimilar sockets. Socket 1 LRU, location parameter = 19900 hrs (health monitoring); socket 2 LRU, FFOP = 9900 hrs (unscheduled maintenance); 10,000 systems simulated. 70,000

-

E 68,500

3

7

~

I

A

1 All sockets

I

$' 68,000 maintained

=-B

separately

All sockets ' maintained at the same time

A

67,500

g

0 67,000

A

C

66,500

-? Two socket I

66,000

1

10

100

'...4%

1,000

A

10,000

&.

A

100,000

Coincident Time (operational hours)

Figure 5.11: Mean life-cycle cost per system of two or three similar sockets All LRUs, location parameter = 19900 hrs (health monitoring); 10,000 systems simulated.

Prognostics and Health Management of Electronics

270,000

-5

All sockets maintained at the same time

250,000

All sockets maintained separately

L

>

-:

~

il

230,000

I

0

g

Thrcc rocket I and

1\40

hncket 2

210,000

I"

Minimum life-cycle costs are for coincident times = 2000 operational hours

190,000

T u n socket I and t h n i a c k t 2

170,000

1

10

100

1,000

10,000

100,000

Coincident Time (operational hours)

Figure 5.12: Mean life-cycle cost per system of mixed sockets (10,000 systems simulated). whenever either socket requires maintenance. It follows that mean life-cycle costs are smaller for dissimilar systems when coincident times are small. Figure 5.11 shows the cases of two and three similar LRUs in a system. In this case, the multiple sockets that make up the system are all populated with LRU 1 in Figure 5.9. The solution in this case is favorable to maintaining the LRUs in all the sockets at the same time; that is. when the LRU in one socket indicates that it needs to be maintained, the LRUs in all the sockets are maintained. Note that the height of the step depends on the number of hours to perform scheduled maintenance and the cost of those hours. Figure 5.12 shows the results for a mixed system that has a nontrivial optimum in the coincident time. In this case there is a clear minimum in the mean life-cycle cost that is at neither zero nor infinity.

5.5.3

Example Business Case Construction

Commitments to implement and support PHM approaches cannot be made without the development of a supporting business case justifying it to acquisition decision makers. One important attribute of most business cases is the development of an economic justification. The economic justification of PHM has been previously discussed [38, 44, 451. These previous business case discussions provide useful insight into the issues influencing the implementation, management, and return associated with PHM and present some application-specific results but do not approach the problem from a simulation or stochastic view. The following example presents an application of the discrete event simulation model to business case development.

Economics of PHM

109

The scenario for this business case example considers the acquisition of PHM for electronics LRU in a commercial aircraft used by a major commercial airline [46].' The representative LRU is a multifunction display (MFD), two of which are present in each aircraft. A fleet size of 502 aircrafts was chosen to reflect the quantities involved for a technology acquisition by a major airline, in this case, Southwest Airlines [47]. The Boeing 737 300 series was chosen as the representative aircraft to be equipped with electronics PHM. The implementation costs reflect a composite of technology acquisition cost benefit analyses (CBAs) for aircraft and/or for prognostics. The implementation costs are summarized in Table 5.4. All values are in 2008 U.S. dollars; all conversions to year 2008 dollars were performed using the Office of Management and Budget (OMB) discount rate of 7% [48]. The discount factor was calculated as 1/(1 + r)" where r is the discount rate (0.07) and n is the year (n = 0 represents 2008); see Section 5.1.2.

Table 5.4: Implementation Costs and Categories

Maintenance costs vary greatly depending on the type of aircraft, the airline, the amount and extent of maintenance needed, the age of the aircraft, the skill of the labor base, and the location of the maintenance (domestic versus international, hangar versus specialized facility). The maintenance costs in the model are assumed to be fixed; however, the effects of aging are known to produce increases in maintenance costs [49]. Koch et al. [50]give the maintenance cost per hour for Boeing 737-100 and -200 series aircraft as 12% of the hourly operating cost, noting that the ratio of maintenance costs per hour to aircraft operating costs per hour has remained between 0.08 and 0.13 since the 1970s. The average of the direct hourly operating costs for major airlines summarized in [51] was used. This cost is treated as the cost of scheduled maintenance per hour, which is equivalent to the cost of unscheduled maintenance that can be performed during the downtime period (see Table 5.5) after the flight segments for the day have been completed.

Table 5.5: Unscheduled Maintenance Costs and Modes

7

Most commercial aircraft business data is kept proprietary; when possible, data for the same type of aircraft was used to preserve consistency in this example.

Prognostics and Health Management of Electronics

110

The cost of unforeseen failures that require immediate attention during a flight can vary depending on the interpretation and on the subsequent actions required to correct the problem. Unscheduled maintenance that would require a diversion of a flight can be extremely expensive. The cost of a problem requiring unscheduled maintenance that is detected before the aircraft has left the ground (during a flight segment but not airborne) can be highly complex to model if the full value of passenger delay time and the downstream factors of loss of reputation and indirect costs are included [52]. For the determination of the cost of unscheduled maintenance during a flight segment, it is assumed that such an action typically warrants a flight cancellation. This represents a more extreme scenario than a delay; the model assumes that unscheduled maintenance that occurs between flight segments (during the preparation and turnaround time) would be more likely to cause a delay, whereas unscheduled maintenance during a flight segment would result in a cancellation of the flight itself. The Federal Aviation Administration provides average estimates of the cost of cancellations on commercial passenger aircraft that range from S3,500 to $6,684 per operational hour [53]. The operational profile for this example case was determined by gathering information for the flight frequency of a typical commercial aircraft. Table 5.6 shows the operational profile. A large aircraft is typically flown several times each day; these individual journeys are known as flight segments. The average number of flight segments for a Southwest Airlines aircraft was seven in 2007 [47]. Although major maintenance, repair, and overhaul operations (MROs) call for lengthy periods of extensive inspections and upgrades as part of mandatory maintenance checks, a commercial aircraft may be expected to be operational up to 90% to 95% of the time for a given year [54]. A median airborne time for commercial domestic flights was approximately 125 minutes in 2001 [48]. A representative support life of 20 years was chosen based on [48]. A 45-minute turnaround time was taken as the time between flights based on the industry average [55]. Using this information, an operational profile was constructed whose details are summarized in Tables 5.5 and 5.6. Table 5.6: Operational Profile Factor

Multiplier

Support life: 20 years

2,429 flights Per year

7 flights per day

I25 minutes per flight

L

45 minutes turnaround between flights [55]

6 preparation periods per day (between flights)

Total = 48,580

=

flights life

875 minutes in flight per day

= 270 minutes between flightslday

Reliability data was based on [44] and [56], which provide models of the reliability of avionics with exponential and Weibull distributions, commonly used to model avionics [57]. The assumed TTF distribution of the LRUs is provided in Figure 5.13. An analysis of over 20,000 electronic products built in the 1980s and 1990s [58] shows that Weibull distributions with shape parameters close to 1, that is, close to the exponential distribution, are the most appropriate Weibull distributions for modelling avionics. Upadhya and Srinivasan [59] model the reliability of avionics with a Weibull shape parameter of 1.1, consistent with the common range of parameters found in [58]. Although [58] found exponential distributions to be the most accurate, failure mechanisms associated with

Economics of PHM

111

current technologies [60] suggests that the Weibull distribution may prove to be more representative for future generations of electronic products. The location parameter was chosen based on the typical avionics unit being considerably shorter-lived than the ten years that is a common life assumption within the aerospace industry [58]. To enable the calculation of ROI, an analysis was performed to determine the optimal prognostic distance for the example case, shown in Figure 5.14. For the combination of

.-s

0.0007 0.0006

a

LL

.-

0.0005

5 0.0004 -2 0.0003 v)

0

E

8

2 n

0.0002 0.0001

25,000

26,000

27,000

28,000

29,000

30,000

Operational Hours

Figure 5.13: Weibull distribution of TTFs (p=1.1 [58], q= 1,200 1561).

al

Y

0

84,000 al

P c)

82,000 0

al

76,000

0

200

400

600

800

1000

1200

Prognostic Distance (operational hours)

Figure 5.14: Variation of precursor to failure PHM with prognostic distance. Small prognostics distances cause PHM to miss failures; large precursor to failure is too conservative.

Prognostics and Health Management of Electronics

112

PHM approach, implementation costs, reliability information, and operational profile assumed in this example, a prognostic distance of 485 hours yielded the minimum life-cycle cost over the support life. The TTF distribution of the monitored structure with the precursor-to-failure approach was a triangular distribution with a width of 500 hours was chosen (right side of Figure 5.1). Using a prognostic distance of 485 hours, a discrete event simulation was performed under the assumptions of negligible random failure rates and false alarm indications. Figure 5.15 illustrates the cumulative cost per socket as a function of time. The graph of life-cycle cost intersects the vertical axis at the point corresponding to the initial implementation cost; as maintenance events accumulate over the support life, the cost rises, culminating at the end of the 20 years. Each socket required a replacement of five LRUs on average, corresponding to the distinct jumps in cost every -3.6 years. The small step increases between LRU replacements (most clearly seen between year 0 and 3) represent annual PHM infrastructure costs. For this case study, 1,000 sockets were simulated; divergence in life cycle cost due to randomness and variability of parameters can be seen as the support life progresses. The investment cost is the effective cost per socket of implementing PHM. This cost can be used to guide maintenance planning. Investment cost is calculated as

where CUREare the PHM non-recurring costs, CRECare the PHM recurring costs, and ClhF are the annual infrastructure costs associated with PHM. Note, the costs of false alarm resolution, procurement of additional LRUs (more than the unscheduled maintenance quantity), and differences in maintenance cost are not included in the investment cost because they are the result of the investment and are reflected in CPHM.Applying equation 5.1, the ROI is given by

081

5

10 Time (yrs)

I5

20

Figure 5.15: Socket cost histories over the system support life.

Economics of PHM

113

ROI = ‘us

- (‘PHM

I

- I>- 1

(5.9)

where C,, is the life-cycle cost of the system with unscheduled maintenance policy, CPHM is the life cycle cost of managing the system using a PHM approach and I is the investment. Equation 5.9 measures ROI relative to unscheduled maintenance, that is, if CpHM= C,,, then ROI = 0 (breakeven). Using this PHM approach, 91% of failures were avoided’ and the total life-cycle cost per socket was CpHM = $77,391 with an effective investment cost per socket of I = $6,249, representing the cost of developing, supporting, and installing PHM. This cost was compared to an unscheduled maintenance policy in which LRUs are fixed or replaced only upon failure. Preserving all simulation details not particular to the PHM approach, the lifecycle cost per socket under an unscheduled maintenance approach was C,, = $96,958. Following 5.9, the ROI of PHM was calculated as [$96,958 - ($77,391 - $6,249)]/ $6,249 1, approximately 3.13 1. Figure 5.16 contrasts the ROI with the annual infrastructure cost of implementing PHM on a per-socket basis, including the costs of hardware, assembly, installation, and functional testing. The intersection with the abscissa represents the breakeven point at which PHM no longer yields a positive return on investment. In this instance, the breakeven point occurred at approximately $2,500 per LRU; Figure 5.17 illustrates the relationship between ROI and the TTFs of the LRUs for three annual infrastructure costs. The TTF parameter varied is the location parameter used in the Weibull distribution; the shape and scale parameters were kept constant. For large TTFs, the reliability of the LRUs is such that PHM is no longer beneficial to the program; LRUs with smaller TTFs provide the opportunity for greater ROI. The example provided in this section demonstrates the conditions under which a

0

500

1000

1500

2000

2500

3000

3500

4000

Annual Infrastructure Cost per Socket ($)

Figure 5.16: R 0 1 as a function of the annual infrastructure cost of PHM per LRU.

8

Sockets uith LRU failures not detected by the PHM approach appear in Figure 5.15 as the histories above the majority of the data set (these first appear at approximately 8 years).

Prognostics and Health Management of Electronics

I13

25

Annual infrastructure cost $1,000

20

+

$450

A

$100

15

B

+

10

I

+

5 '

0

10,000

20,000

30,000

40,000

TTF mode (operational hours)

Figure 5.17: ROI versus TTF (Weibull, p=1.1 [58], q= 1,200 1.561) for various annual infrastructure costs. positive return on investment can be obtained using a precursor to failure approach. In reality. for the time-to-failure distribution assumed in Figure 5.13, potentially larger ROIs may be possible using a fixed scheduled maintenance interval; however, it is not generally true that fixed schedule maintenance interval maintenance will always result in higher ROIs than other PHM-based approaches.

5.6

Summary

PHM can be used within the maintenance decision-making process to provide failure predictions, to lower sustainment costs by reducing the costs of downtime, for inspection, for inventory management, to lengthen the intervals between maintenance actions, and to increase the operational availability of systems. PHM can be used in the product design and development process to gather usage information and to provide feedback for future generations of products. The potential benefits of prognostics are significant for the military and commercial sectors; the U.S. Air Force estimates that successful health monitoring of the Minuteman I11 strategic missile fleet could cut its life-cycle costs in half [61]. Proponents of PHM have prophesied that its success may one day obviate the need for redundant components in systems, but the transition to a full PHM approach will require extensive validation and verification before that can happen. To determine the ROI requires an analysis of the cost-contributing activities needed to implement PHM and a comparison of the costs of maintenance actions with and without PHM. Analysis of the uncertainties in the PHM ROI calculation is necessary for developing realistic business cases. Allowance for variability in cadence, false alarm, and random failure rates and system size enables a more comprehensive calculation of ROI to support acquisition decision making.

Economics of PHM

115

References 1. G. T. Friedlob and F. J. Plewa, Jr., Understanding Return on Investment, John Wiley and Sons, New York, 1996. 2. F. Wong and J. Yao, “Health Monitoring and Structural Reliability as a Value Chain,” Computer-Aided Civil and Infrastructure Engineering, Vol. 16, pp. 71-78. 2001. 3. P. A. Sandborn, Course Notes on Manufacturing and Life Cycle Cost Analysis of Electronic Systems, CALCE EPSC Press, College Park, MD, 2005. 4. J.H. Spare, “Building the Business Case for Condition-Based Maintenance,” Proceedings of the IEEE/PES Transmission and Distribution Conference and Exposition, Atlanta, GA, pp. 954-956, November 2001. 5. D.L Goodman, S. Wood, and A. Turner, “Return-on-investment (ROI) for Electronic Prognostics in Mil/Aero Systems,” Proceedings of the IEEE Autotestcon, Orlando, FL, pp. 1-3, September 2005. 6. H. Hecht, “Prognostics for Electronic Equipment: An Economic Perspective,” Proceedings of the Reliability and Maintainability Symposium (RAMS), Newport Beach, CA, January 2006. 7. J. Banks, K. Reichard, E. Crow, and K. Nickell, “How Engineers Can Conduct Cost Benefit Analysis for PHM Systems,” Proceedings of the IEEE Aerospace Conference, Big Sky, MT, pp. 1-10, March 2005. 8. C. Drummond, “Changing Failure Rates, Changing Costs: Choosing the Right Maintenance Policy,” Proceedings of the AAAI Fall Symposium on Artificial Intelligence for Prognostics, Washington, DC, November 2007. 9. S. Vohnout, D. Goodman, J. Judkins, M. Kozak, and K. Harris, “Electronic Prognostics System Implementation on Power Actuator Components,” Proceedings of IEEE Aerospace Conference, Big Sky, MT, March 2008. 10. B. Leao, K. Fitzgibbon, L. Puttini, and P. de Melo, “Cost-Benefit Analysis Methodology for PHM Applied to Legacy Commercial Aircraft,” Proceedings of IEEE Aerospace Conference, Big Sky, MT, March 2008. 11. J. Kurien and M.D.R. Moreno, “Costs and Benefits of Model-based Diagnosis,” Proceedings of IEEE Aerospace Conference, Big Sky, MT, March 2008. 12. B. Tuchband and M. Pecht, “The Use of Prognostics in Military Electronic Systems,” Proceedings of the 32nd GOMACTech Conference, Lake Buena Vista, FL, pp. 157160. March 2007. 13. R. Kothamasu, S.H. Huang, and W.H VerDuin, “System Health Monitoring and Prognostics-A Review of Current Paradigms and Practices,” International Journal of Advanced Manufacturing Technology, Vol. 28, No. 9, pp. 1012-1024,2006. 14. R.M. Kent and D.A Murphy, “Health Monitoring System Technology AssessmentsCost Benefits Analysis,” NASA Report CR-2000-209848, January 2000. 15. S.M. Wood and D.L. Goodman, “Return-on-Investment (ROI) for Electronic Prognostics in High Reliability Telecom Applications,” Proceedings of the International Telecommunications Energy Conference, Providence, RI, pp. 229-23 1, September 2006. 16. A. Hess and L. Fila, “The Joint Strike Fighter (JSF) PHM Concept: Potential Impact on Aging Aircraft Problems,” Proceedings of the IEEE Aerospace Conference, Big Sky, MT, pp. 6-3021-6-3026, March 2002. 17. S. Henley. R. Currer, B. Scheuren, A. Hess, and D. Goodman, “Autonomic LogisticsThe Support Concept for the 21st Century,” Proceedings of the IEEE Aerospace Conference, Big Sky, MT, pp. 41 7-42 1, March 2000.

1 I6

Prognostics and Health Management of Electronics

18. T. Brotherton and R. Mackey, “Anomaly Detector Fusion Processing for Advanced Military Aircraft,” Proceedings of the IEEE Aerospace Conference, Big Sky, MT, March 2001. 19. M.J. Ashby and R. Byer, “An Approach for Conducting a Cost Benefit Analysis of Aircraft Engine Prognostics and Health Management Functions,” Proceedings of the Reliability and Maintainability Symposium (RAMS), Vol. 6, pp. 2847-2856,2002. 20. B. Byer, A. Hess, and L. Fila, “Writing a Convincing Cost Benefit Analysis to Substantiate Autonomic Logistics,” Proceedings of the IEEE Aerospace Conference, Big Sky, MT, Vol. 6, pp. 3095-3103, March 2001. 21. J. Banks and J. Merenich, “Cost Benefit Analysis for Asset Health Management Technology,” Proceedings of the Reliability and Maintainability Symposium (RAMS), Orlando, FL, pp. 95-100, January 2007. 22. K. Keller, K. Simon, E. Stevens, C. Jensen, R. Smith, and D. Hooks, “A Process and Tool for Determining the CostiBenefit of Prognostic Applications,” Proceedings of the IEEE Autotestcon, Valley Forge, PA, pp. 532-544, August 2001. 23. T. J Wilmering and A. V Ramesh, “Assessing the Impact of Health Management Approaches on System Total Cost of Ownership,” Proceedings of the IEEE Aerospace Conference, Big Sky, MT, March 2005. 24. C. S. Park, Contemporary Engineering Economics, 4th ed., Prentice Hall, Englewood Cliffs, NJ, 2006. 25. T. C. Jones, Estimating Software Costs, McGraw-Hill, New York, 1998. 26. M. S. Patankar and J. C. Taylor, Risk Management and Error Reduction in Aviation Maintenance, Ashgate, Hampshire, U.K., 2003. 27. T.Feo and J. Bard, “Flight Scheduling and Maintenance Based Planning,” Management Science, Vol. 35,No. 12,pp. 1415-1432, 1989. 28. R. Gopalan and K.T. Talluri, “The Aircraft Maintenance Routing Problem,” Operations Research, Vol. 46, No. 2, pp. 260-271, 1998. 29. R. L. Helmreich and A.C. Merritt, Culture at Work in Aviation and Medicine: National, Organizational. and Professional Influences, Ashgate, Hampshire, U.K., 1998. 30. S. Engel, B. Gilmartin, K. Bongort, and A. Hess, “Prognostics, the Real Issues Involved with Predicting Life Remaining,” Proceedings of the IEEE Aerospace Conference, Big Sky, MT, pp. 457469, March 2000. 31. R. M. H. Knotts, “Civil Aircraft Maintenance and Support Fault Diagnosis from a Business Perspective,” f l , Vol. 5, No. 4, pp. 335-348,1999. 32. T.F. Wright, “The Need for a New Cargo HMMWV,” Infantry Magazine, pp. 26-32, January 2006. 33. C. Valdez-Flores and R. Feldman, “A Survey of Preventative Maintenance Models for Stochastically Determining Single-Unit Systems,” Naval Research Logistics, Vol. 36, pp. 419-446, 1989. 34. D. Cho and M. Parlar, “A Survey of Preventative Maintenance Models for Multi-Unit Systems,” European Journal of Operational Research, Vol. 51, pp. 1-23, 1991. 35. W. Wang, “A Model to Determine the Optimal Critical Level and the Monitoring Intervals in Condition-Based Maintenance,” International Journal of Production Research, Vol. 38, No. 6, pp. 1425-1436, 2000. 36. A. Barros, C. Berenguer, and A. Grall, “Optimization of Replacement Times Using Imperfect Monitoring Information,” IEEE Transactions on Reliability, Vol. 52, No. 4, pp. 523-533,2003. 37. G. Heinrich and U. Jensen, “Bivariate Lifetime Distributions and Optimal Replacement,” Mathematical Methods of Operations Research, Vol. 4, pp. 3 1-47, 1996.

Economics of PHM

117

38. P.A. Sandborn and C. Wilkinson, “A Maintenance Planning and Business Case Development Model for the Application of Prognostics and Health Management (PHM) to Electronic Systems,” Microelectronics Reliability, Vol. 47, No. 12, pp. 1889-1901, 2007. 39. T. Raivio, E. Kuumola, V.A. Mattila, K. Virtanen, and R.P. Hamalainen, “A Simulation Model for Military Aircraft Maintenance and Availability,” Proceedings of the European Simulation Multiconference, pp. 190-1 94, September 2001. 40. L. Warrington, J.A. Jones and N. Davis, “Modelling of Maintenance, within Discrete Event Simulation,” Proceedings of the Reliability and Maintainability Svmuosium (RAMS), Seattle, WA, pp. 260-265, January 2002. 41. M. Bazargan and R.N. McGrath, “Discrete Event Simulation to Improve Aircraft Availability and Maintainability,” Proceedings of the Reliability and Maintainability Svmuosium (RAMS), Tampa, FL, pp. 63-67, January 2003. 42. Y. Lin, A. Hsu, and R. Rajamani, “A Simulation Model for Field Service with Condition-Based Maintenance,” Proceedinm of the Winter Simulation Conference, San Diego, CA, pp. 1885-1890, December 2002. 43. N. Vichare, P. Rodgers, and M. Pecht, “Methods for Binning and Density Estimation of Load Parameters for Prognostics and Health Management,” International Journal of Performabilitv Engineering, Vol. 2, No. 2. pp. 149-161,2006. 44. E. Scanff, K. Feldman, S. Ghelam, P. Sandborn, M. Glade, and B Foucher, “Life Cycle Cost Estimation of Using Prognostic Health Management for Helicopter Avionics,” Microelectronic Reliability, Vol. 47, No. 12, pp. 1857-1864, 2007. 45. J. Koelsch, “Profit from Condition Monitoring,” Automation World, pp. 32-35, December 2006. 46. K. Feldman and P. Sandborn, “Analyzing the Return on Investment Associated with Prognostics and Health Management of Electronic Products,” Proceedings of the ASME 2008 International Design Engineering Technical Conferences, New York, August 2008. 47. Southwest Airlines, “Southwest Airlines Fact Sheet,” available: http:l/www.southwest.com/ about-swalpress/factsheet.htm, accessed August 6, 2007 48. “Investment Analysis Benefit Guidelines: Quantifying Flight Efficiency Benefits, Version 3 .O,” Investment Analysis and Operations Research Grouu, Federal Aviation Administration, June 2001. 49. M. Dixon, “The Maintenance Costs of Aging Aircraft: Insights from Commercial Aviation,” RAND Proiect Air Force Monograph, Santa Monica, CA, 2006. 50. G.H. Koch, M.P.H. Brongers, N.G. Thompson, Y.P. Virmani, and J.H. Payer, “Corrosion Cost and Preventive Strategies in the United States,” Federal Highway Administration Report 315-01, September 2001. 51. “Economic Values for FAA Investment and Regulatory Decisions: A Guide, FAA Office of Aviation Policy and Plans,” Draft Final Report, December 3 1,2004. 52. S. Matthews, “Safety-An Essential Ingredient for Profitability,” Proceedings of the 2000 Advances in Aviation Safetv Conference, Daytona Beach, CA, April 2000. 53. Office of the Inspector General, Audit Report, “Air Carrier Flight Delays and Cancellations,” Federal Aviation Administration. Report No. CR-2000- 1 12, July 2000. 54. K. Peppard, Program Manager, Performance Analysis Group, Operations Planning Services, Federal Aviation Administration, Washington, DC, October 2007, E-mail correspondence to Prof Peter Sandborn (CALCE). 55. A. Henkle. C. Lindsey, and M. Bernson, M., “Southwest Airlines: A Review of the Operational and Cultural Aspects of Southwest Airlines,” Ouerations Management Course Presentation, Sloan School of Management, MIT, 2002.

118

Prognostics and Health Management of Electronics

56. D. Kumar, J. Crocker, J. Knezevic, and M. El-Haram, Reliabilitv Maintenance and Logistic Supvort: A Life Cycle Approach, Kluwer Academic Publishers, Nonvell, MA, 2000. 57. L.V. Kirkland, T. Pombo, K. Nelson, and F. Berghout, “Avionics Health Management: Searching for the Prognostics Grail,” Proceedings of IEEE Aerospace Conference, Big Sky, MT, pp. 3448-3454,2004, 58. J. Qin, B. Huang, J. Walter, J. Bernstein, and M. Talmor, “Reliability Analysis of Avionics in the Commercial Aerospace Industry,” Journal of the Reliability Analysis Center, First Quarter, 2005. 59. K.S. Upadhya and N.K. Srinivasan, “Availability of Weapon Systems with Multiple Failures and Logistic Delays,” International Journal of Quality & Reliability Management, Vol. 20, No. 7, pp. 836-846, 2003. 60. L. Condra, “Integrated Aerospace Parts Acquisition Strategy,” Technical Committee GELI107, Process Management for Avionics, BSI Chiswick, October 7,2002. 61. G. Ruderman, “Health Management Issues and Strategy for Air Force Missiles,” Proceedings of the Fifth International Workshop on Structural Health Monitoring, Stanford, CA, September 2005.

Chapter 6

PHM Roadmap: Challenges and Opportunities PHM is an enabling technology with the potential of solving complex reliability problems that have manifested due to complexity in design, manufacturing, and maintenance. PHM offers solutions to improved supply management, better part and product integrity, better maintenance with lower overhead, and more efficient mission execution with improved awareness of product health An assessment of the state of practice and state of the art in prognostics and health management of electronics was conducted to identify the core research and development (R&D) opportunities and challenges that exist in the field of PHM, so that recommendations can be made on where resources should be directed. This chapter presents a perspective of PHM trends and road-mapping activities from a holistic perspective that includes WDIA (National Defense Industrial Association), NASA surveys on PHM algorithms, other research trending analysis, and findings from CALCE PHM researchers at the University of Maryland

6.1

Introduction

In assessing trends and developing a roadmap for PHM, significant differences between PHM for electronics and PHM for mechanical structures must be recognized. Electronics tend to be more complex and have more variability and a higher density of components than mechanical devices. Combining this complexity with the 50-60% NTF/NFF failure rate seen in product returns suggests that current reliability practices need innovation to meet the demands of new products and their customers. Global competitive forces have the electronics industry racing to improve customer satisfaction and meet pricing pressures. PHM offers a solution to address both these needs, which can fuel investment in PHM R&D and the technology infrastructure. The challenge for research institutions, PHM adopters, and standards committees is making PHM for electronics realizable and developing communities to nurture adoption of the technology. In mission-critical and high-reliability product classes, PHM is being driven by equipment customers as a way to reduce maintenance costs and improve availability. Customers have enjoyed price declines where gross margins on electronic manufacturing have been decreasing and are well below 10%. However, maintenance and repair support costs have been steadily increasing, with 3 0 4 0 % gross margins profiting the service provider. Although these margins have been the envy of Wall Street, they have motivated equipment and service customers to seek alternatives. Several new business models have been developed to address these equipment service issues: equipment leasing, service leasing, outsourcing services, and original equipment manufacturer (OEM) contracts with Prognostics and Htulth Management ofElectronics. By Michael G. Pecht Copyright 02008 John Wiley & Sons, Inc.

119

Prognostics and Health Management of Electronics

120

service provisions. With PHM implementation, more innovations in business models can be expected, from condition-based maintenance approaches to condition-based leasing options in which a customer would pay for product availability and life of the product consumed as an alternative to buying the product. Contractual requirements are further driving the Department of Defense Small Business Innovation Research Program (SBIR) and other programs to develop PHM-for example, in both the U.S. Army's Future Combat Systems and the U.S. Air Force JSF program. This is also becoming the case in the automotive and oil drilling industries. PHM for electronics has its challenges; research and development will bring much needed innovation to the electronics industry and with its adoption will also come disruption in business, design, development, manufacturing, and maintenance models. The roadmaps and discussions that follow will inform companies and institutions about the benefits and innovations that may be expected as well as what concerns should be mitigated. Some of the key findings from the NDIA on PHM for electronic systems indicated that current technology maturity would not support fielding PHM for legacy or new platforms. There was also doubt that new technologies would evolve without dedicated process and funding to develop validation and verification (V&V) and its integration into a systems engineering environment. However, since that report, advances in PHM have led to other conclusions The NDIA report indicated that PHM for electronics was not mature. However, PHM implementation by companies such as Schlumberger, General Motors, Dell, and Sun Microsystems, using traditional sensors and other control structures found in conventional products, shows that it can now be rapidly implemented. While the approach used by these companies requires careful configuration control of the electronic components, the investment has resulted in an increase in the average availability of their products. These case studies can serve as an application baseline for other electronic assemblies. In defense applications, the Air Force JSF implementation is a model application for PHM adoption from design conception. While the PHM community waits for the fielding and trials of JSF's PHM adoptions, lessons are already being learned about the business case. cross-discipline resource requirements, and implementation needs for the design level.

6.2

Roadmap Classifications

Categories to assess the adoption of PHM have been adapted in part from the NDIA's PHM roadmap for defense applications but are generalized with respect to technology needs and time to maturity.' In each category, research and development needs, as well as logistics and implementation infrastructural requirements, are highlighted. The time predicted for the research actualization is based on our view of academic, industrial, and commercial progress in PHM. The following sections classify PHM activity into two broad categories: the component level and the system level. At the component level, PHM activities are considered for IC devices, high-power switching electronics, printed circuit boards, electromechanical-optical systems. and interconnects. Novel uses of PHM in reliability applications are also considered. PHM tasks for managing components in supply chains and product maintenance are discussed along with other infrastructure opportunities. At the system level, the

'

PHM implementations are of specific interest to national defense as they improve fleet maintenance, reduce cost, impro\ e safety. and boost readiness and mobilization. New approaches to warfare dictated by these approaches, together N ith changes in logistics and support, have forced a new look at support technology. In addition, impacts to reduction in manning of weapon platforms and longer deployment times require reliable advanced warning of failures and predictions of when a failure a i l 1 occur.

121

PHM Roadmap: Challenges and Opportunities

classification includes legacy systems; environmental and operational monitoring; LRU level; software; and opportunities of dynamic reconfiguration. Systems-of-systems tasks are also considered, such as power management and utilizing PHM as knowledge infrastructure. Approaches for PHM algorithms and their training are discussed as well as challenges in verification and validation. The final sections of the roadmap look at nontechnical barriers to PHM adoption.

6.2.1

PHM: Component Level

One of the key features of PHM technology is the ability to obtain a holistic view of a system's health as well as to isolate and focus on component-specific reliability issues. PHM algorithms that can provide autonomous PHM down to the component level have significant market potential. Autonomous fault isolation is needed for complex electronics and the 50-60% percent NTF-NFF rates suggest that current bench-top diagnostics are insufficient to resolve these problems. Innovation for better electronic diagnostics is needed and PHM techniques should be investigated. Research must relate failure modes and mechanisms models with the prognostic capabilities of PHM technology. The component related tasks defined in Table 6.1 show the overall objective of autonomous PHM from system-level behavior to component-level faults. Table 6.1: PHM Roadmap for Electronic Components' PHM for Electronic Components

Time (projected standard years) Advanced Technology Development

Advanced Components Development

Basic Research

Applied Research

2008-2011

2010-2012

2008--2009

2008-2010

201 0-20 12

2012-201 3

2008--2009

2008-20 10

201 0-2012

2012-201 3

2008-2009

2008-2010

2010-2012

2012-2013

2008-2009

2008-2010

2010-2011

2011-2012

2008-2009

2008-2010

2010-2012

2012-2013

2008-2009 2009-2010

2008-2010

2010-2011

2011-2012

Tin whisker detection Reliability testing

2009-20 10

Counterfeititamper detection

2009-20 10

Task PoF model for gates, devices, and ICS Electronics prognostics for high- power switching electronics Built-in prognostics for devices and circuit boards Approaches for canaries and fuses Electronics!electro-optical prognostics for tactical sensor systems Interconnection prognostic technology Electronic interconnection prognostic design tools

~

-

-

-

-

-

-

-

-

-

-

' Adapted from "Final Report of the Systems Engineering Division". Integrated Diagnostics Committee, E-Prog I1 Electronics Prognostics Workshop, National Defense Industrial Association (NDIA), Miami, FL. 24-26 January, 2006.

122

6.2.2

Prognostics and Health Management of Electronics

PHM for Integrated Circuits and Gate Devices

The critical execution functions of electronic products continue to be dominated by IC and gate devices that continually increase in complexity and functionality. Microprocessors, microcontrollers, and memories (static random access memory [SRAM] and field programmable gate arrays [FPGA]) are examples of devices in this category. These devices have been targeted as being able to benefit from PHM, especially as silicon technology approaches 45 nm, where current densities increase and reliability risks are further exacerbated. Failure mechanisms such as hot-carrier degradation, gate oxide breakdown, and single-event upsets may have a greater likelihood of occurrence and thus need to be addressed at a technology implementation level. Prognostics for gate-level degradation within a device can be scaled to circuit- and system-level behavior, which would allow systems to achieve autonomous diagnostics and prognostics. Components could self-verify their health and possibly self-heal. The challenge here depends on cooperation from IC vendors in adopting standards for modeling data, gate damage properties, and implementation of self-testing or monitoring capability. However, if 45-nm and lower technologies encounter barriers to adoption due to reliability concerns, then built-in prognostics at the device level and PHM methodologies could be positioned as a proactive response and a competitive enabler for the IC community. PHM hybrid approaches that utilize data-driven techniques and PoF models are also an opportunity for research, as these techniques have strong potential for revealing failure precursors that are not identified in traditional reliability techniques. As an example, high-frequency operations have been limited in SRAM components due to random bit error failures. PHM hybrid approaches to track bit error damage at the device level could enable self-healing for these devices. Further research is needed to investigate and utilize parametric fault trending for these devices and how failure distributions can be incorporated into PoF models for prognostic outputs. Once research establishes PHM capabilities with a level of accuracy and coverage, then system designers need PHM application tools to implement PHM approaches into new designs. The problem with any reliability implementation at the technology level lies with the skill sets and business pressures involved in getting silicon products to market. Silicon designers are focused on device functionality and meeting schedules. Without sufficient tool development, design implementation will require significant cross-disciplinary skill sets from the system, packaging, interconnect, and device levels, and this will be a barrier to PHM adoption. Thus PHM architecture implemented at the silicon level needs to be researched and developed into methodologies and tool sets with consideration of silicon and system-level design infrastructures. With high-tech automation, telecom, and computer industries driving PHM requirements, adoption of PHM into the design infrastructure at the device, simulation, and design levels can be a competitive advantage and product differentiator. To move forward, standards committees, such as IPC, IEEE-Aerospace, IEEE-Reliability, Electronics Industries Alliance (EIA), Semiconductor Industry Association (SIA) and other organizations must define these PHM requirements in their guidelines and requirements.

6.2.3

High-Power Switching Electronics

The trend toward ever-increasing power density is a critical driver for PHM, especially with advances in silicon technology. Silicon solutions that utilize multiple cores in one semiconductor chip package or complete systems on a chip are driving up the power requirements of the traditional electronic circuit board assembly. In mission-critical

PHM Roadmap: Challenges and Opportunities

123

applications, a loss of power during system application could have both life and economic consequences. This increase in power density is best seen in computer products; however, the trend can be generalized to avionics and other modern electronic systems. Dual-core processors, once the domain of elite engineering-level computers, are now commonplace in consumer electronic products and are in demand for gaming and media applications. As the power density on electronic products rapidly increases, the need grows for autonomous precursor fault detection and mitigation for runaway failures and component degradation in high-power switching electronics. The research required in this area includes identifying parameters and identifying sensor performance characteristic configurations that can be built into the component or sensed externally. PoF models are needed that utilize this information and monitor runaway and adverse conditions. Furthermore, data-driven techniques that can isolate faults and detect anomalies at a system level and relate these to PoF models could prove quite useful for AC, DC pulsed, and switching power electronics.

6.2.4

Built-in Prognostics for Components and Circuit Boards

Built-in structures for PHM at the component and circuit board levels can leverage features that already exist for diagnostics. PHM implementation can use existing bus architectures, such as JTAG, I2C, and CAN buses, that are used for diagnostics and in-box communications and are well proven and low cost. For example, 12C buses used for connecting thermal sensors to fan controllers for thermal regulation could be adapted for PHM with minimal system intrusion. Case studies and best practice documents need to be produced, along with software adaptations for data collection. Built-in prognostic implementations will need design guidelines and research that identify Component features and techniques that are well suited for incipient damage detection and limitations. PHM utilizing canaries and fusing devices can be implemented at the subcomponent and system level. In general these devices exhibit or detect failure precursors in advance of similar critical features in a system, thus providing early detection of incipient damage. Built-in prognostics using canary devices could be as simple as a test circuit designed within the margin of a circuit board that would exhibit specification changes or damage in advance of the conventional structures on the circuit board. Research needs to identify canaries that have the same failure modes and mechanisms as the critical features but are statistically significant and distinct from the features of interest. Research must also better identify specific parameters for these devices, tools, and techniques to evaluate confidence in these prognostic features and additionally must develop analysis techniques for obtaining prognostic outputs. Opportunities for incorporating PHM into traditional fuses could also be an approach for system monitoring, where an alarm can be raised at the onset of an interruption. Self-healing fusing devices could provide dual functions and be damage accumulators to counteract overvoltage tendencies. Since fusing circuits are also a safety agency requirement, incorporating PHM within a fusing circuit may be another opportunity to provide PHM functionality with minimal overhead. Conventional products, such as fuses, with PHM enhancements for dual-function capability would be highly marketable.

6.2.5

Electronics/Electro-Optical Prognostics for Tactical Sensor Systems

Electromechanical and electro-optical components that could greatly benefit from PHM include lasers, radar, infrared devices, and tactical sensors. In many of these devices the

124

Prognostics and Health Management of Electronics

failure modes can behave as random output, and because they are sensory systems these failure modes could produce easily misinterpreted false data and aberrations. Anomaly detection techniques could be of significant benefit in providing assurance that data from the system is correct. Research is needed into the integration of PHM for electronic and electromechanical components to produce system-level prognostics. Tasks requiring further evaluation are identifying the critical prognostic and diagnostic parameters for sensing, the excitation levels required to achieve coverage of normal usage conditions, and anomalies detection methodologies. System-level prognostics will also require further integration between electronic and electromechanical systems. Since these classes of components are basically sensors, the challenge is to incorporate the inherent parameters into models for anomaly detection. Another research opportunity is implementing PHM directly into electromechanical and MEMS sensing devices, further providing dual functionality and also allowing for adaptation into dynamic reconfiguration applications in sensor networks.

6.2.6

Interconnect Prognostics

Research should develop PHM techniques for interconnect degradation assessment in solder, wire, optical, and wireless interconnects. Since interconnects are the critical links that bridge components to circuit boards to system-level functions, integrated health monitoring at this level could provide a holistic monitoring architecture. Interconnects are also most likely to fail in an intermittent mode, making them an elusive target for traditional diagnostics. Anomaly detection and fault isolation techniques need to be developed and validated. Interconnect prognosis for failure mechanisms such as corrosion, whiskers, conductive path formation, dielectric breakdown, and electromigration have significant potential for improving product reliability. Further development of PoF models is necessary to achieve these objectives. Corrosion precursors and wire chafing for wire interconnects continue to be elusive with conventional nondestructive analysis and reliability techniques. PHM techniques present a new perspective on damage precursors for wiring harnesses and how they could be implemented in interconnect systems. Data-driven techniques, in combination with PoF models, can potentially be developed into low-cost, self-test features in connection systems, especially wiring harnesses that would enable effective mitigation of intermittent failures in these systems. As PHM implementations in wiring improve system performance, the potential for reducing wiring redundancy and weight will become a design and cost reduction opportunity for many products.

6.2.7

PHM as Mitigation of Reliability Risks

The ability of PHM technology to evaluate products from the system to the component levels has produced some novel reliability solutions. In particular, it has potential for system PHM techniques for greater fault detection and isolation. Tin and zinc whiskers are a plague in modem electronics, as there is no fail-safe reliability screen for them. Most of the industry focus has been on understanding tin whisker propagation in lead-free solders. PHM techniques could aid in resolving this problem by detecting early growth and correlating that to the associated usage and environmental conditions. PHM technology also holds great potential when implemented and integrated with reliability testing. PHM techniques could provide faster “glitch detection” than conventional

PHM Roadmap: Challenges and Opportunities

125

technology, a development that is desperately needed for accelerated testing of high-speed components. Research is needed on how to incorporate PHM into reliability testing and how to use this information for algorithm training. PHM data-driven techniques could be integrated into accelerated testing to make holistic evaluations of system reliability, not just system robustness, as is the convention today. PHM techniques in combination with traditional reliability techniques offer a new and revolutionary way to approach difficult reliability and nondestructive testing problems. Further research is needed to connect root cause analysis and failure precursors into PoF models that could autonomously predict failure mechanisms based on component behavior.

6.2.8

PHM in Supply Chain Management and Product Maintenance

There are opportunities for PHM technology in supply chain management and product maintenance as a means to improve efficiencies and offer cost reductions. Condition-based recall implementations can be utilized when suppliers need to recall fielded parts or products due to reliability concerns. The task of product recalls utilizing PHM could be conducted on the basis of precursors and remaining useful-life estimations, rather than the conventional procedure of field product isolation using date codes and part numbers, where an entire population of products is recalled at tremendous expense. PHM implementation offers solutions to nuisance problems such as counterfeit parts and products tampering that have resulted from globalization and the complexity of supply chains. Conventionally, counterfeit components are detected after laborious detective work and failure analysis. Global policies to solve this problem have been enacted, but the reality of mitigating counterfeit parts, whether at international borders or within manufacturing processes or product stages, are extremely costly and sadly ineffective. PHM technology has significant potential to prevent customer and liability issues and to combat counterfeit products before they reach the gray market as well as to enhance the integrity of products within the customer base. PHM technologies, in combination with W I D technology already in use for inventory control, can be developed into solutions to address a host of counterfeit and tampering issues. Logistics footprint reduction considers PHM implementation where replaceable parts and spares could be allocated on a “just in time” basis versus stocking global depots. PHM technology could further be utilized for self-diagnosis, where built-in algorithms could advise customers that a part needs to be replaced and how long they had until replacement. The advantages of PHM in supply chain and product maintenance are well recognized and may be the compelling application that drives PHM research and development. This is already being seen by small companies utilizing population mapping and tracking, coupled with PHM algorithms for parts and system availability management. PHM is also being implemented for warranty validation, where equipment companies have implemented it as a way to verify warranty requests for new products or spares. Research is needed to identify new business models for PHM in supply and logistics implementation as well as in warranty methodologies.

6.3

PHM at System Level

PHM at the system level can improve system uptime availability. Traditional approaches to this problem have incorporated system redundancy. Reliable, available, and maintainable (RAM) system efforts have spurred the development of fault-tolerant systems using redundancy and diagnostics as central tenets. Weaknesses in these systems are common mode failures, such as radiation damage, solder fatigue, and conductive filament

126

Prognostics and Health Management of Electronics

formation, that do not distinguish redundancy. Redundancy also comes at a cost, and high energy prices create motivation for more efficiency in system engineering while meeting RAMS requirements. PHM offers an elegant solution as well as the potential for more efficient resource management. Opportunities for failure prevention and early detection are present in other PHM features, such as dynamic reconfiguration, self-healing, and environmental and operational monitoring.

6.3.1

Legacy Systems

PHM may not have been implemented into many legacy systems. PHM methodologies are needed that encompass fielded applications and environmental conditions as well as degradations. Noninvasive PHM techniques that can utilize parameters that are already part of the system are most beneficial for legacy systems, as refurbishment costs are usually kept to a minimum. PHM hybrid approaches could offer significant contributions by coupling data-driven techniques for monitoring system parameters with PoF models for component degradation. Useful life estimations need to be developed with confidence intervals to assess these methodologies. Autonomous environmental and usage monitoring can be implemented noninvasively and also has market potential in addressing life expectancy issues for legacy systems.

6.3.2

Environmental and Operational Monitoring

Health monitoring and identifying a baseline usage condition to evaluate system health are fundamental for prognostics. The challenge here is developing an efficient “training” program for the algorithms to define healthy conditions. Another challenge is identifying what usage and environmental conditions to consider for the baseline. For environmental monitoring, autonomous tags that utilize RFID and programmable sensor kits offer a nonimasive solution. These tag devices could host a range of environmental monitors for contamination, corrosion, electrical degradation, and so on. Further research is needed to develop tags for prognostics. Environmental and operational monitoring could also be considered in the development of environmentally tolerant electronics. Environmental and usage conditions obtained in field trials could be fed into design tools to simulate whether future devices and design can withstand these conditions. Simulation techniques, tools, and autonomous sensors are all areas of opportunity for research and development.

6.3.3

LRU to Device Level

Anomaly detection has been one approach to prognostics at the system or LRU level. The area of research that needs to be conducted is how to isolate parameters that contribute to the anomaly and to obtain fault detection, isolation, and prognostic outcomes. Currently, this is the biggest challenge in PHM for electronics: being able to drill down to device-level faults and deliver remaining useful life estimations. The ability to achieve autonomous fault prediction and isolation would be of considerable benefit to the electronics industry, as it offers potential in supply chain management and in addressing NTFLNFF issues.

6.3.4

Dynamic Reconfiguration

Dynamic reconfiguration techniques can use the prognostic horizon to perform fail-over transitions such as performance latency. Critical to these dynamic reconfiguration systems

PHM Roadmap: Challenges and Opportunities

127

are sensing arrays that also enable circuits to reconfigure “detours” in real time to avoid the failing part and seek alternative means to complete their executions. Reconfiguration does not necessarily mean redundancy but also suggests self-healing approaches to defective parts. For memory systems this may mean disabling a chip on a DIMM card that has been experiencing too many random bit errors and having the memory traffic bypass this chip for memory execution by other chips. Fault cells may be identified and isolated and alternative cells on the same chip could replace the functionality through reconfiguring. With increased semiconductor capability and multithreading and multi-core technology, the capacity for self-healing systems at the semiconductor level is already possible. The challenge is designing in the PHM capability and the dynamic reconfiguration needed to achieve self-healing systems.

6.3.5

System Power Management and PHM

Power management will be a significant infrastructure concern facing the IT industry. In 2002, the average power requirement per data center rack was 1-3 kW. In 2008 highdensity blade technologies have elevated this to 24-30 kW per rack. As electronic content increases, energy costs are increasing, and the issue of power management has become pervasive in all industries. Although PHM implementation can be considered to benefit the availability of power systems, PHM can also play a significant role in power and resource management. For example, cooling requirements can be allocated via condition-based monitoring through the PHM infrastructure. Resources could be more efficiently allocated. Sun Microsystems is investigating this resource management possibility with its real time-power harness (RTPH), in which dynamic thermal flux (watts vs. time) is monitored per system during fan-speed changes. load changes, and dynamic reconfiguration events. A data center can then be mapped out based on its thermal flux and usage and allocation can be monitored. Research is needed to further investigate the adoption of PHM technology for power systems, runaway failure management, prognostics for supply failure, resource management, and reducing power supply redundancy in high-availability systems.

6.3.6

PHM as Knowledge Infrastructure for System Development

One of the significant benefits of PHM is the ability to document and transfer knowledge and experience between systems and throughout the product life cycle. This ability is developed from in situ monitoring and the ability to code information into training sets and methodologies associated with PHM. Knowledge-based tools that aid design enterprises in collecting a priori and a posteriori knowledge from reliability testing, FMMEA processes, vendor selection, customer experience, and past products are beneficial when developing PHM training sets and methodologies and developing mitigations. PHM, as a knowledge backbone that connects design centers to their products and customers, provides a completely different view of product development; system developers gain insight into how customers actually use their product, enabling effective cost reductions.

6.3.7

Prognostics for Software

Hardware degradation is one part of the reliability problem in electronics; the other part is software. Although great strides have been made in software reliability, memory leaks and software aging. as they pertain to electronic systems, can be opportunities for PHM implementation. Research, development, and adoption are needed to evaluate data-driven approaches for PHM, utilizing software service variables that can accomplish prognostics.

Prognostics and Health Management of Electronics

128

An advantage of software application for PHM technology is that many software suites and tool sets become open source eventually. Applications like DTrace, a tracing framework that allows health management and troubleshooting of system problems in real time, was originally proprietary on Solaris 10. In 2005 DTrace was open sourced and has since been implemented in Linux and Mac OSX 10.5, where further innovations are contributed by the software community. DTrace then offers a low-cost way to implement health management at the operating system level and allows customers to experience the benefits of real-time diagnostics. Prognostic applications adapted to DTrace are foreseeable and may evolve PHM technology in new ways. Table 6.2 details the tasks identified for system-level PHM. Table 61.2:PHM Roadmap for Systems3 PHM for Svstems

Task Generic environmentalioperational parameter monitoring module for electronic prognostics Maintenance mode/ prognostic interaction design tool Electronic interconnection Prognostic design tools Tool for logistics impact of eprog Prognostics for redundant electronic systems Electronic prognostics design tool for environmentally tolerant electronics Electronic life usage assessment and prognostics (e-plus) Data enterprise system module to LRU tracking for electronics prognostics Electronics prognostics reasoner engine applicable to device through system Electronic system-level prognostic and remaining life assessment (RLA) tool set Prognostics for power supplies and converters Knowledge tool sets for managing product data Resource management using PHM

I Basic Research

Time (proiected standard years) Advanced Advanced Technology Application Applied Development Research Development

-

2007-2008

2008-2010

20 10-20 1 1

-

2007-2010

2010-2012

20 12-20 13 2011-2012 2009-20 10 20 12-20 13

2007-2008

2008-2010

2010-2012

2012-2013

2007-2008

2008-2010

2010-2012

201 2-201 3

2007-2008

2008-2010

2010-2012

201 2-20 13

2007-2009

2009-2010

2010-2011

201 1-2012

2007-2008

2008-2010

2010-2011

201 1-2012

2007-2009

2009-2011

2011-2012

2007-2009

2009-2010

2010-2011

2011-2012

2007-2009

2009-2010

2010-201 1

201 1-2012

__

Adapted from "Final Report of the Systems Engineering Division", Integrated Diagnostics Committee, E-Prog I1 Electronics Prognostics Workshop, National Defense Industrial Association (NDIA), Miami, FL, 24-26 January, 2006.

PHM Roadmap: Challenges and Opportunities

6.4

129

Methodology Development

NASA requirements for long-duration human space exploration have spurred a significant investment in PHM. Schwabacher and Goebe14 identified the three critical problems that a PHM system needs to accomplish: fault detection, diagnostics, and prognostics (see Table 6.3). The first column shows several approaches to resolution. Of interest in this table is that physics-based models are not known for their diagnostic capability and artificial intelligence (A1)-based models are not documented for prognostic capability. Schwabacher further noted that many research efforts have focused on datadriven approaches and that much of the prognosis effort has been conducted for structural prognostics. Although data-driven approaches like SVMs have had very good success in diagnostic applications, research is needed to determine if these approaches will produce accurate results for prognosis.

Table 6.3: Algorithm Tasks’

Physics based

Fault Detection System theory

A1 model based

Expert systems

Conventional numeric’s Machine learning

Linear regression Clustering

Approach

Diagnostics -

Prognostics Damage propagation models

Finite-state machines Logistic regression

Kalman filters

Decision trees

Neural networks

-

Exclusively data-driven approaches to PHM have lacked the diagnostic detail needed to drive product improvements down to the component level. They also require significant training data to implement as well as a significant amount of engineering experience to validate the PHM output for false alarm accuracy. Electronic product reliability has been greatly improved over the years by migrating from an average statistics life estimation approach based on MTBF (mean time between failure) to a PoF approach to component reliability. This PoF approach is standard practice in reliability engineering and has significantly influenced design and manufacturing improvements and produced a vast knowledge base of failure modes and mechanisms for electronic components. Data-driven techniques can be utilized for system-level monitoring, and the PoF approach is adapted for parameter selection and for detailed component behaviors, failure modes, and life estimations. This hybrid approach also has potential to aid the task of verification and validation of PHM implementations by reducing system-level outputs to more manageable component outputs that can potentially be simulated in laboratory environments. From a road-mapping perspective, Table 6. 3 suggests that a complete PHM methodology should not be based exclusively on one approach but be based on a hybrid of approaches to accomplish specific prognostic goals.

M. Schwabacher and K. Goebel., “A Survey of Artificial Intelllgence for Prognostics”, paper presented at AAAI Fall Symposium, Arlington, VA, 2007 ‘ Ibid.

Prognostics and Health Management of Electronics

130

6.4.1

Best Algorithms

The best algorithms for PHM implementation depend on the problem as well as the product. For example, Sun Microsystems has demonstrated the suitability of MSET/SPRT for server products. Other computer companies, such as Dell, are using other algorithms, and General Motors in the automotive industry has still other sets of algorithms for use in vehicles. The algorithm of choice has to be suited to the platform considerations and the outcomes expected. For some products, anomaly detection may be appropriate, especially if the time in which a failure can be predicted in advance is too short. On the other hand, NASA is interested in predicting failures even a few seconds in advance for certain safetycritical electronic systems. In avionic systems, mission criticality and a limited computational footprint may drive algorithms that can optimize remaining useful life and are well suited for inline application. For consumer products, autonomous diagnostics. through which the fault is predicted and isolated to enable self-maintenance, may be more appropriate. Thus, research is needed not only to determine “the best generic algorithms” but also to find ways to classify algorithms and methodologies based on the types of problems for which they are best suited. Uncertainty will always be a factor in any estimation process, but this should not be a barrier to PHM. Opportunities lie in how uncertainties are managed within the methodology, the validation approaches, and how mitigating actions can be made and held accountable. Research is also required to address uncertainty requirements, the types and degrees of trade-offs, and the platform or problem benefits of various estimation techniques and confidence level approaches In general, algorithms, models, methodologies, and verification techniques are needed for PHM. Physics-of-failure models are needed; these provide significant opportunity for PHM design implementations. Design verification and validation methodologies are needed for developing metrics by which to quantify PHM methodologies. Maintenance process evaluation and system integration approaches are also needed to validate whether PHM implementation is being optimized. A barrier to methodology research is the availability of case studies and real data sets for algorithm development and validation. Stakeholders in PHM, such as CALCE and NASA, have already started to make available data sets and it is hoped that other organizations, industry groups, and standards committees nurture these practices.

6.4.2

Approaches to Training

CALCE research indicates that supervised learning is an effective approach to training PHM algorithms; however, this requires significant data and a wide range of normal operating and environmental conditions. The unique situation with some high-performance electronics is that the energy level changes with each operating mode and it is unclear if training data can excite, or needs to excite, all these modes to achieve efficient results. Research is needed to investigate how to more effectively train PHM algorithms for electronic systems. Another area of research to be considered is the use of simulated data for training. Since most electronic systems are validated by simulation, it seems reasonable to consider either behavioral or circuit-level simulations for training and parameter isolation purposes. Research is hrther needed on data preprocessing techniques. Both parametric and nonparametric approaches are needed that can pick both the correct parameters for analysis,

PHM Roadmap: Challenges and Opportunities

131

and the appropriate fusion techniques. These steps are critical to all data-driven approaches and could optimize training and algorithm accuracy. In cases where unsupervised learning is the only option, stochastic approaches and methodologies must be developed and validated. Particle filtering processes also show significant potential and demand further research and validation efforts.

6.4.3

Verification and Validation

One of the significant barriers to PHM is V&V methods. One practice for electronics is a case-based methodology with redundancy. When a prognostic alarm is raised, the system reconfigures to a redundant unit while the original system continues operation to fault. As more cases are run, confidence in the prognostic distance is developed and fail-over techniques are refined. This method is costly and not applicable for mission-critical or safety-related systems. Incorporation of V&V methodology with reliability testing may be an alternative. Research into V&V techniques for electronic systems is needed.

6.4.4

Long-Term PHM Studies

Researchers in structural health monitoring (SHM) have noted that most SHM programs are funded for three years, and very few have a fielded deployment component in the study'. These study limitations are not sufficient to demonstrate SHM capability over long periods of time, where the cost benefits of such systems are directly tied to the lifetime of the structure and to aging in the field. Long-term SHM studies need to be conducted in parallel, but this can be quite costly. The dilemma is that investment in SHM is limited due to the lack of real-world demonstrations. PHM for electronics is also impacted by study limitations and the lack of real-world case studies conducted over the life of a product. Autonomous PHM structures that are noninvasive and cost-effective are one approach to getting at issues of aging products, so that legacy systems can be utilized as effective case studies for PHM.

6.5

Nontechnical Barriers

Another key barrier to PHM entry is the cross-disciplinary skills needed to deploy a PHM infrastructure. In addition, cost issues for implementation have been an ongoing obstacle; however. the cost of not having the benefits associated with PHM also needs to be investigated for competitiveness. Liability considerations can also be a threat to PHM if not addressed. These nontechnical barriers are opportunities for researchers, developers, and implementers to pool expertise and knowledge to develop and build a culture of cooperation and support that will enable PHM adoption.

6.5.1

Cost, ROI, Business Case Development

The cross-disciplinary nature of PHM requires a cross-departmental effort for a design that maximizes opportunity and return. For dynamic reconfiguration implementation or on-chip reconfiguration, this is especially true where cross-functional skills are required to achieve PHM. The issue with cross-functional implementations is the cost associated with bringing these resources together. At a systems level, PHM needs a top-down directive and customers and leaders that see the long-term benefit of the PHM paradigm to product sustainability. 'Chuck Farrar, Los Alamos National Laboratory & UCSC Jacobs Institute, CALCE PHM notes. 10107.

Prognostics and Health Management of Electronics

132

ROI software and scenario plans offer a business planning scenario to optimize product maintenance and support and to simulate the business case for PHM. Cost-benefit analysis and comparison of various PHM approaches are also needed with respect to logistics and supply chain management. Autonomous diagnostics and prognostics can enable higher efficiencies in product sustainability. As an example, simulation tools can enable reengineering approaches to achieve better efficiency within supply chain operations for maintenance procedures. PHM methodologies can also provide a total cost of ownership for suppliers from product introduction to product obsolescence. Although PHM is often thought only realizable for high-end products, consider the case of consumer goods, where a faulty product cannot only incur liability costs but is also detrimental to brand recognition. This was especially true for Microsoft and the XBOX product; Microsoft wrote off one billion dollars in order to offer a three-year warranty on the XBOX and recoup its brand. From the XBOX case, another customer behavior could be observed; customers want timely root cause information about the faults they are experiencing. Customers did not find Microsoft’s website explanation that the product was very complex reasonable, and bloggers issued their own diagnosis of the problem. For modern products with demanding customers who have access to instant information, PHM, with its capability for autonomous prognosis and diagnosis, may become a required feature and a differentiator of product quality.

6.5.2

Liability and Litigation

Unlike diagnostic predictions in which fault is already established and ownership and responsibility of that fault are also established, ownership of the prognostic output and the actions associated with those outputs can be ambiguous. As an example, in cases of missed alarms or false alarms, what is at fault? Are the systems and components generating faulty data? Are the algorithm sensitivity levels improperly tuned? If a missed alarm has a catastrophic outcome, does the responsibility lie with the implementer and trickle down to the developer of the code or algorithm? Similar situations have been experienced in other fields and research on methodologies and structures is needed to provide guidance and balance. The liability issue may indeed force structural changes in how PHM technology is implemented. Some of these are described below

6.5.2.1 Code Architecture: Proprietary or Open Liability issues could affect how code architecture is selected. Proprietary architectures would be better suited for security; however, if fundamental flaws exist in the architecture, the liability would rest with the code developer. An alternative option would be to utilize standardized architectures that are regulated by an established board with capabilities for code maintenance. There are many organizations that are evaluating open architecture structures for PHM as well as structures that would address the liability issues associated with code implementation and debugging.

6.5.2.2 Long-Term Code Maintenance and Upgrades Code maintenance and upgradability needs must be considered. Structured practices and methodologies for PHM code upgrading and maintenance will have to have associated validation and verification models. This is another opportunity for committees and community efforts to drive standardization and establish guidelines.

PHM Roadmap: Challenges and Opportunities

133

6.5.2.3 False Alarms, Missed Alarms, and Life-Safety Implications

PHM offers the potential for safer products in its ability to give early warning; however, the consequence of false and missed alarms associated with life safety situations do exist and will plague the industry until they are addressed. PHM industry groups and standards committees need to take the lead in providing guidelines for proper verification and validation so that these situations can be mitigated to the best of the industries’ capability. Validation and verification approaches are needed and the discussions of their validity needs to take place in conferences that then produce best-practices guidelines, along with associated test suites for application developers to test their products. Those considering PHM can assess similar situations with machine learning and other disciplines to see how these issues have been mitigated.

6.5.3

Role of Standards Organizations

Standards organizations, suppliers, and research groups need to nurture a general knowledge base that will provide a solid foundation for PHM and enable best-in-class and best-practice fielded solutions. Rigor is vital and must be enforced by the community so that PHM methodologies can be vetted and technology integrity developed.

This Page Intentionally Left Blank

Appendix A Commercially Available Sensor Systems for PHM A.l eprognostics Sensor Tag Company Information Name: Brief description:

Major area of work: Major customers: Case studies and publications (in PHM area):

ePrognostic Systems Focused on autonomous operational and environmental monitoring, with smart RFID hardware and PHM software; consulting in the area of parameter monitoring and PHM analysis Smart RFID for monitoring temperature, humidity, vibration, shock, and custom sensors by application area N o t public information Various army and industrial applications

Product Information Name Brief description: Is the product designed for a specific purpose: Potential applications: Power Portable power source: Nonconventional power sources (if any): Power rating: Power management capabilities: Physical Characteristics Size (with batteries) Weight (with batteries): Communications Interface with host: Interface type: Wireless protocol: Wireless acquisition modes: Ability to communicate with other portable devices:

eprognostics Sensor Tag Smart RFID sensor tag Temperature, humidity, shock, vibration, motion, direction, etc. Product tracking and monitoring, PHM Ultrathin flexible battery No NIA Programmable operating modes and sampling rates 85 mm x 55 mm x 5 mm (approx.)

15 g with full function (approx.) Wireless or wired (custom reference1 requested) NIA I S 0 15693-3 NIA With PDA, desktop, laptop through RFID reader

Prognostics and Health Management ofElectronics. By Michael G. Pecht Copyright C 2008 John Wiley & Sons, Inc.

135

Prognostics and Health Management of Electronics

136

I1

Ran e for RF:

2 m UD to maximum of 100 m

Number of channels:

On-board temperature, humidity, acceleration, position, and open-architecture channels Customizable Application focused

connected: Channel in ut: Ability to connect to external sensors: Sampling rates:

Customizable External sensors based on applications Up to 0.5 Hz for temperature, programmable up to minutes or hour5

On-Board Memory Type of memory:

Flash Expanded memory options available

On-Board Processing Availabilitv of on-board orocessor:

Compiled “C” code, open architecture Enable user programmability Date storage, analysisicompression

comoutations: Software Availability of host software: Software features for configuration of hardware for efficient monitorine: Other software features that can be useful for health monitoring: Other Details Housing details:

CALCE-ePrognostic System (data manipulation and analysis) Graphical and tabular format data, date and time stamped, alarm threshold settings, user selectable sampling rates, design selectable RF power (range), frequency Data-driven and PoF PHM methods

Front side cover: polycarbonate 250 pm digital printed; backside cover: PVC 230 pm (designed for agency approvals) Stick by PSI adhesive back (or mechanical per application) -15°C - +80°C ihieh humiditv caoable) $500 per tag for small order

1 Mountine details:

1

Operating temperature:

Picture of Product (Courtesy of CALCE)

Size: 85 mm

x

55 mm

x

5 mm (approx.)

Appendix A

137

A.2 SmartButton-ACR Systems Name: Brief description:

Major area of work: Major customers: Case studies and publications

Name: Brief description: Is the product designed for a specific purpose: Potential applications:

Power Portable power source: Nonconventional power sources (if Power rating: Power management capabilities: Physical Characteristics Size (with batteries): Weight (with batteries): Communications Interface with host: Interface type: Wireless protocol: Wireless acquisition modes: Range for RF: Ability to communicate with other portable devices: Sensors Number of channels: Channel configuration: Type of sensors that can be connected: Channel input: Ability to connect to external sensors: Sampling rates: On-Board Memory Type of memory: Size:

ACR Systems ACR Systems manufactures data loggers that measure and record temperature, relative humidity, electric current, pressure, process signals, pulse frequency, power quality, and more Data loggers Heating, ventilation, and air conditioning (HVAC), pharmaceutical, transportation, process control companies N/A

SmartButton Miniature-sized temperature sensor Various, temperature recording Food processing verification, pharmaceutical storage, laboratories, transportation of temperature-sensitive goods. equipment run time, HVAC system testing and balancing, predictive maintenance monitoring, etc. 3-V lithium battery (10 year life) NIA NIA NIA 17 mm diameter 4g

x

6 mm height

Yes RS232 serialiACR SmartButton interface Nn

Continuous (first-in, first-out), stop when full NIA No

1 1 internal channel for ambient temperature Temperature sensor (range: -40 to 85°C) One for temperature sensor No User selectable rates from 1 to 255 min NIA 2 kB (capable of storing up to 2048 readings)

Prognostics and Health Management of Electronics

138

On-Board Processing Availability of on-board processor: Primary functionality: Embedded code for on-board computations: Software Availability of host software: Software features for configuration of hardware for efficient monitoring: Other software features that can be useful for health monitoring: Other Details Housing details: Mounting details:

NIA NIA NIA

SmartButton ReaderTM Graphical and tabular format data, date and time stamped, alarm threshold settings, user selectable sampling rates Export data into Excel for further data processing

Operating temperature:

Stainless steel User selectable (magnetic backing, plastic plate mount, or angled blue hard plastic) - 40 "C- +85 "C

cost:

$39

Picture of Product (Courtesy of ACR Systems)

Size: 17 mm diameter

139

Appendix A

A.3 EWB MicroTAUTxInvocon Comnanv Information Name: Brief description: Major area of work:

Major customers:

Case studies and publications (in PHM area):

Invocon, Inc. Wireless networking technology solutions for structural analysis, mechanical condition-based maintenance, aircraft test, and evaluation missile-defense troop condition monitoring National Aeronautics and Space Administration (NASA) National Space Development Agency of Japan (NASDA) U.S. Department of Defense U.S. Department of Transportation University of Houston and University of Texas N/A

Product Information Name:

Enhanced Wide-Band Micro-Miniature Tri-Axial Accelerometer Unit (EWB MicroTAUTM)

Brief description:

Wireless data acquisition unit for vibration, strain, pressure, and temperature capable of storing and communicating real-time data Vibration, strain, pressure, and temperature

Is the product designed for a specific purpose: Potential applications:

Has been used by NASA to monitor the shuttle wing leading edge RCC panels during ascent and on-orbit phases for potentially damaging impacts from foam, ice, ablator, and metallic objects; other applications: aircraft, engines, gearboxes, industrial equipment, and other components that experience random vibration events, balancing, etc. Battery input range 3-4 V Life: 50-200 cumulative hours of data acquisition or trigger mode (depending on the sample rate); extended-life external batteries available No

Portable power source:

Nonconventional power sources (if any): Power rating: Power management capabilities:

~

NIA

Yes, programmable sleep, triggering, data acquisition, and data processing and sample rate

Physical Characteristics Size (with batteries) Weight (with batteries): Communications

83 mm x70 mm x 3 8 mm 250 g

lnterface with host: Interface type: Wireless protocol: Wireless acquisition modes: Range for RF:

Supports wireless transmission of data to receiver Wireless to RS-232/USB/Ethemet/cell phone (in development) Invocon proprietary Real-time and store-and-forward (data logging) 30 m in open air

140

Ability to communicate with other ortable devices: Number of channels: Channel configuration: Type of sensors that can be connected: Ability to connect to external sensors: Sampling rates: On-Board Memory Type of memory:

, Prognostics and Health Management of Electronics

Autonomous buffer that has a cellular download capability with subsequent Internet data posting (in development)

Factory settable gain for wide range of charge output accelerometers, 86 dB d narnic ran e

I

N/A

II

Yes Programmable up to 20 kHz

I

Nonvolatile 256 Mb

Six=.

On-Board Processin

RMS signal analysis, frequency analysis, decimation, peak detection

/I Software

I

Availability of host software:

Yes Graphical and tabular format data, date and time stamped, alarm threshold settings, user selectable sampling rates RMS signal analysis, frequency analysis, decimation, peak detection NIA

of hardware for efficient monitoring: Other software features that can be useful for health monitoring: Other Details Housing details: Mounting details: Operating temperature: cost:

I

I

Flange enclosure NIA -40 "C- +85 "C $2350 per channel

Picture of Product (Courtesy of Invocon, Inc.)

Size: 83 mm

x

70 mm x 3 8 mm

Appendix A

141

A.4 MITE WISTM-Invocon Commnv Information 1 Name: 1 Brief descriotion: Major area of work:

Major customers:

Case studies and publications (in PHM area):

Invocon. Inc. Wireless networking technology solutions for structural analysis, mechanical condition-based maintenance, aircraft test, and evaluation missile-defense troop condition monitoring National Aeronautics and Space Administration (NASA) National Space Development Agency of Japan (NASDA) U.S. Department of Defense U.S. Department of Transportation University of Houston and University of Texas NIA

Product Information Name: Brief description: Is the product designed for a specific purpose: Potential applications:

Power Portable power source: Nonconventional power sources (if any): Power rating: Power management capabilities: Physical Characteristics Size:

Weight: Communications Interface with host: Interface type: Wireless protocol: Wireless acquisition modes: Range for RF: Ability to communicate with other portable devices: Sensors Number of channels: Channel configuration: Type of sensors that can be connected: Channel input:

Multiple-Input Tiny Enhanced Wireless Instrumentation System (MITE WISTM) Wireless data acquisition unit capable of storing and communicating real-time data Near-static sensing and recording applications

Is currently being used to monitor repaired concrete sections of the Westerschelde Tunnel in the Netherlands. Replaceable battery powered; lithium-ion, 3.6 V (2-year life 1 sampleImin) No

10's of microamperes to less than 15 mA depending on mode Remains in a low-power state until the specified sample time Sensor: 60 mm x 63 mm x 25 mm; RF RECEIVER UNIT: 90 mm x 40 mm antenna, minimum 355 mm cable length 135 g (with sealed batter)

x

17 mm, 90-mm

Supports wireless transmission of data to receiver Wireless to RS-232/USBIEthernetIcell phone (in development) Invocon Proprietary 55.6 kilobitsisecond, half duplex @ 916.5 MHz Direct line-of-site (LOS): up to 60 m; no LOS: up to 30 m Data later downloaded via RF to the receiver and application software for graphical display and storage on PC 4 channels per unit NIA Supports any resistive sensor type (strain gauges, RTDs, pressure sensors, humidity sensors, accelerometers) 1.2 V-2.5 V

142

Ability to connect to external sensors: Sampling rates:

On-Board Memory Type of memory: Size: On-Board Processing Availability of on-board processor: Primary functionality: Embedded code for on-board computations: Software Availability of host software: Software features for configuration of hardware for efficient monitoring: Other software features that can be useful for health Monitoring: Other Details Housing details: Mounting details: Operating temperature:

Prognostics and Health Management of Electronics

Supports external resistive sensors 1 sample every 15 s to 1 sample/h, programmable via wireless link (optional: up to 1 sampleis configured at factory with reduced number of units in operation)

Flash 2 MB internal nonvolatile memory (capable of storing 2 years of data when sampling four channels at 1 sample/5 min)

NIA NIA

MITE WIS software, model IVC 176 10 13 Software that provides a simple user interface for monitoring, graphically displaying, and storing transducer data and for setup (e.g., sample rates) of MITE WIS units. NIA

Snap enclosure with replaceable internal battery or rugged housing with nonreplaceable internal battery NIA - 35 "C - +85 "C (battery life is reduced by 50% when continuous operation at - 35°C) 550 Der channel

Picture of Product (Courtesy of Invocon, Inc.)

143

Appendix A

A S MicroWISTM-XG-Invocon Company Information (Please see A.4) Product Information Micro-Miniature Wireless Instrumentation System-Next Generation (MicroWISTbl-XG) Set of miniature wireless units that asynchronously transmit data to a receiver attached to a standard RS-232 port on a PC; System includes MicroWISTM-XGremote sensors, a MicroWISTM-XGreceiver, and a graphical user interface Designed for near-sensing applications

. . Is the product designed for a

Has been used on 8 NASA shuttle flights

Portable power source: Nonconventional power sources (if any): Power rating: Power management capabilities:

Physical Characteristics Size (without batteries): Size (with batteries) Weight (without batteries): Weight (with batteries): Communications Interface with host: Interface type: Wireless protocol: Wireless acquisition modes:

Range for RF: Ability to communicate with other ortable devices: Sensors Number of channels:

connected: Channel in ut: Ability to connect to external sensors:

Battery powered, 2 . 8 4 . 0 V input range (6 months life) NIA

10’s of microamperes to less than 15 mA depending on mode Remains in a low-power state until the specified sample time

30 mm 30 mm 17 g 23 g

x

x

30 mm 30 mm

x x

20 mm (replaceable internal battery version) 20 mm (redaceable internal batten/ version)

Supports wireless transmission of data to receiver Wireless to RS-232IUSBIEtherneticell phone (in development) Invocon proprietary Real-time and store-and-forward (data logging); highly synchronized (250 ps) real-time mode 100 ft in open air Autonomous buffer that has a cellular download capability with subsequent Internet data posting (in development)

1 channel per unit; 28 units or more per network (sample rate dependent) 1 channel per unit plus internal temperature for compensation Supports any resistive sensor type (strain gauges, RTDs, pressure sensors, humidity sensors, accelerometers) 0-1.2 v Supports external sensors; optional full-bridge completion; 1.2 V excitation; 16-bit AID; factory programmable gain and fi I ter 1 sample per hour (0.25 Hz)

48 h of memory sampling at once per minute; in real-time transmission mode, the hard drive of the PC is the limiting

Prognostics and Health Management of Electronics

144

On-Board Processing Availability of on-board processor: Primary functionality: Embedded code for on-board computations: Software Availability of host software: Software features for configuration of hardware for efficient monitoring: Other software features that can be useful for health monitoring:

Other Details Housing details: Mounting details: Operating temperature:

See Enhanced \!.ideband .MicroTAL' products [or this capability \lultiplP t! pss of tiltcring. peakdetection. and alarms \ .A

XG graphical user interface; MITE WIS graphical user interface Allows the user to select sample rates for each unit from 1 sample per hour to 1 sample per 4 s Decodes, time stamps, saves, and plots the incoming data in real time; contains unique calibration coefficients for each remote sensor allowing for accurate simultaneous conversion to engineering units for any type of sensor Screw-on enclosure with replaceable internal battery NASA uses Velcro, or for mounting in the engine bay, RTV adhesive - 35 'C - +85 "C (battery life is reduced by 50% when continuous ooeration at - 35°C)

Picture of Product (Courtesy of Invocon, Inc.)

Appendix A

145

A.6 SAVERTM3x90-Lansmont Instruments Company Information Name: Brief description:

Lansmont Instruments Lansmont products are used in a wide range of industries, from basic packaging materials to bulk commodities and highly sophisticated electronics and medical devices. Test equipment and test services NIA NIA

Major area of work: Major customers: Cases studies and publications (in PHM area):

Product Information Product of interest: Brief description:

Is the product designed for a specific purpose: Potential aoolications: Power Portable power source: Nonconventional power sources (if any): Power rating: Power management capabilities: Physical Characteristics Size (with batteries) Weight (with batteries): Communications Interface with host: Interface type: Wireless protocol: Wireless acquisition modes: Range for RF: RF transceiver carrier: Ability to communicate with other portable devices: Sensors Number of channels: Channel configuration: Type of sensors that can be connected: Channel input: Ability to connect to external sensors: Sampling rates: On-Board Memory Type of memory:

SAVERTM3x90 The SAVERTM3x90 can be used as a data logger, shock recorder, vibration recorder, temperature recorder, humidity recorder, or drop height recorder NIA

2 lithium (90 days life) or alkaline (45 days life) 9-V batteries NIA ~

NIA

NIA

3.74 x 2.90 x 1.70 inch (95 16.7 oz (473 g)

x

74

x

43 mm)

NIA USB 1.1 compatible (data rate: 400 kB per second typical) NIA NIA YIA NIA

3 Built-in triaxial accelerometer, built-in temperature and relative humidity NIA NIA NIA 50-5000 samples per second per channel Standard nonvolatile flash memory

Prognostics and Health Management of Electronics

146

Size: On-Board Processing Availability of on-board processor: Primary functionality: Embedded code for on-board computations: Software Availability of host software: Software features for configuration of hardware for efficient monitoring: Other software features that can be useful for health monitoring: Other Details Housing details: Mounting details: Operating temperature:

128 MB N/A N/A N/A

SaverXwareTM N/A

N/A

606 1-T6 aluminum (weather resistant) 4 holes for #6 screws, mounting bars recommended - 40 O F - +158 O F (- 40 "C - +70 "C) using lithium batteries; - 4 "F - +130 O F (- 20 "C - +54 "C) using alkaline batteries

Picture of Product (Courtesy of Lansmont Instruments)

Size: 95 mm

x

74 mm

x

43 mm

Appendix A

147

A.7 G-LinkTMWireless Accelerometer System-Microstrain Company Information Name: Brief description: Maior area of work: Major customers: Case studies and publications (in PHM area):

Microstrain, Inc. Microstrain produces smart, wireless, microdisplacement, orientation and strain sensors Microminiature sensors Aerospace, military, automotive, civil engineering, manufacturing, biomechanics, and robotics N/A

Product Information Product of interest: Brief description: Is the product designed for a specific purpose: Potential applications:

Power Power rating: Portable power source: Nonconventional power sources (if Power management capabilities:

G-Linkr“*‘Wireless Accelerometer System High-speed triaxial MEMS accelerometer and microdata logging transceiver Designed to operate as part of an integrated wireless sensor network system Inclination and vibration testing, security systems enabled by wireless sensor networks, assembly line testing with “smart packaging,” condition-based maintenance by wireless sensor networks, smart machines, smart structures, and smart materials Real-time streaming 25 mA, data logging 25 mA, sleeping 0.5 mA Rechargeable 3.6-V lithium ion, 200 mAh capacity Customer may also supply external power from 3.2 to 9 V Two-position internal power switch: The default position allows the node to only operate on the internal battery power and at the same time allows the battery to be recharged through the recharge/power connector. The bypass position allows the node to only operate on power supplied through the rechargeipower connector.

Physical Characteristics Size (without batteries): Weight (without batteries): Communications Interface with host: Interface type: Wireless protocol: Wireless acquisition modes:

Range for RF: RF transceiver carrier:

43 mm x 75 mm x 37 mm with antenna 46 g (assumed without batteries) ~~

Wireless N/A IEEE 802.15.4, open communication architecture Mode 1: transmit on command from base station, transmit duration from 100 to 65,500 sweeps, or continuous Mode 2: log on command from base station Mode 3: auto trigger, user specified as a programmable threshold voltage from a specific channel 70 m line of sight, up to 300 m with optional high gain-antenna 2.4 GHz. direct sequence spread spectrum, license-free worldwide (2.450 - 2.490 GHz, 16 channels)

148

Prognostics and Health Management of Electronics

NIA Ability to communicate with other nortable devices: I1 Sensors I 3 Number of channels: NIA Channel configuration: Triaxial MEMS accelerometers, analog devices ADXL202 or Type of sensors that can be ADXL2 10 connected: NIA Channel input: NIA Ability to connect to external sensors: Programmable, from 32 to 2048 sweeps/second Sampling rates: On-hoard memorv Type of memory: Flash Size: 2 MB (approximately 1,000,000 data points) On-Board Processing N/A Availability of on-board processor: N/A Primarv functionalitv: Embedded code for on-board NIA computations: Software Availability of host software: Agile-LinkTM(Windows XP compatible) NIA Software features for configuration 1 of hardware for efficient I monitoring: N/A Other software features that can be useful for health monitoring: Other Details ABS plastic Housing details: Mounting details: Screw Operating temperature: - 20 - +60 "C with standard internal battery and enclosure, extended temperature range optional with custom battery and enclosure. - 40 to - 85 "C for electronics only Cost: $1995 (starter kit includes 2 nodes. 1 base station, software, and charger)

Size: 43 mm x 75 mm mni (with antenna)

x

37

I/

/I

Appendix A

149

A.8 EmbedSenseTMWireless Sensor-Microstrain Company Information (Please see A.7) Product Information Name: Brief description:

Is the product designed for a specific purpose: Potential applications:

EmbedSenseTMWireless Sensor A wireless sensor and data acquisition system that is small enough to be embedded in a product, enabling the creation of smart structures, smart materials, and smart machines; batteries are completely eliminated Batteryless sensing (extreme G-level and high-temperature environments) Range from monitoring the healing of the spine to testing strains and temperatures on jet turbine engines

Pnwer

Power rating: Portable power source: Nonconventional power sources (if any):

Po\{cr nianagcment capabilities Ph! sical Characteristics Size (without batteries): Weight: Communications Interface with host: Interface type: Wireless protocol: Wireless acquisition modes: Range for RF: RF transceiver carrier: Ability to communicate with other portable devices: Sensors Number of channels: Channel configuration: Type of sensors that can be connected: Channel input: Ability to connect to external sensors: Sampling rates:

200 mA at 3 V DC, not including bridge excitation, obtained by rectification of external AC magnetic field Uses an inductive link to receive power from an external coil and to return digital strain, temperature, and unique ID information

50 mm diameter, 6 mm overall thickness SIA Wirplpw

Switched reactance, pulse code modulated serial (RS-232), clocked synchronous N/A NIA N/A NIA

?

1 differential input and 1 internal temperature sensor (other configurations available as custom options) Piezoresistive bonded foil and semiconductor strain gauges, pressureiloaditorque transducers, thermocouples NIA N/A

50 Hzichannel with 125 kHz operating freqency, 16-bit AID (other configurations available as custom options)

On-Board Memory Type of memory: N/A

On-Board Processing Availabilitv of on-board Drocessor:

NIA

Prognostics and Health Management of Electronics

150

Primary functionality: Embedded code for on-board computations: Software Availability of host software: Software features for configuration of hardware for efficient monitoring: Other software features that can be useful for health monitoring: Other Details Housing details: Mounting details: Operating temperature:

NIA

cost:

$3295 (starter kit includes 2 sensor nodes, 1 external

KIA

NIA NIA

N/A ~

NIA

NIA - 40 - +I25 "C

Picture of Product (Courtesy of Microstrain, Inc.)

Size: 50 mm diameter. 6 mm overall thickness

Appendix A

151

A.9 V-Link0 Wireless Voltage Node-Microstrain Company Information (Please see A.7) Product Information Product of interest: Brief description:

Is the product designed for a specific purpose:

Potential applications:

V-Link@ Wireless Voltage Node Designed to operate as part of an integrated wireless sensor network system The V-LINKE is compatible with a wide range of analog sensors, including strain gauges, displacement sensors, load cells, torque transducers, pressure sensors, accelerometers, geophones, temperature sensors, inclinometers, and others Condition-based monitoring of machines, health monitoring of civil structures and vehicles, smart structures and materials, experimental test and measurement, robotics and machine automation, vibration and acoustic noise testing, sports performance and sports medicine analysis

Power Power rating:

~

Portable power source: Nonconventional power sources (if any): Power management capabilities: ~~

V-Link node only: real-time streaming 25 mA, datalogging 25 mA, sleeping 0.5 mA External sensors: 35042 strain gauge 8 mA, 100042 strain gauge 3 mA (add sensor consumption to above to calculate total power consumption) Life:55 days (w/four 1000-R strain gauges) 3.7-V lithium ion rechargeable battery, 600 mAh capacity Customer may supply external power from 3.2 to 9 V Two-position internal power switch: The default position allows the node to only operate on the internal battery power and at the same time allows the battery to be recharged through the rechargeipower connector. The bypass position allows the node to only operate on power supplied through the rechargeipower connector. Different modes: sleep mode, idle mode, data logging mode:

Physical Characteristics Size (without batteries): Weight:

88 mm x 72mm x 2 6 m m 72 nim x 65 mm x 24 mm board only 97 g (assumed with batteries)

Communications Interface with host: Interface type: Wireless protocol:

Wireless NIA IEEE 802.15.4, open communication architecture

Wireless acquisition modes:

Mode 1: transmit on command from base station, transmit duration from 100 to 65,500 sweeps, or continuous Mode 2: log on command from base station Mode 3: autotrigger, user specified as a programmable threshold voltage from a specific channel 70 m line of sight, up to 300 m with optional high-gain antenna 2.4 GHz. direct-sesuence spread spectrum, license free worldwide (2.450 12.490 GHz,l6channels)

Range for RF: RF transceiver carrier:

Prognostics and Health Management of Electronics

152

Ability to communicate with other portable devices: Sensors Number of channels: Channel configuration:

Type of sensors that can be connected:

Channel input: Ability to connect to external sensors:

Sampling rates: On-Board iMemory Type of memory: Size: On-Board Processing Availability of on-board processor: Primary functionality: Embedded code for on-board computations: Software Availability of host software: Software features for configuration of hardware for efficient monitoring: Other software features that can be useful for health monitoring: Other Details Housing details: Mounting details: Operating temperature:

cost:

Yes

Up to 8 Up to 8 input channels: 4 full differential, 350 R resistance or higher (with optional bridge completion), 3 single-ended inputs (0-3 V maximum), and internal temperature sensor The V-LINK33 is compatible with a wide range of analog sensors, including strain gauges, displacement sensors, load cells, torque transducers, pressure sensors, accelerometers, geophones, temperature sensors, inclinometers, and others. 4 full differential, 350 0 resistance or higher (with optional bridge completion), 3 single ended inputs (0-3 V maximum) The V-LINK% is compatible with a wide range of analog sensors, including strain gauges, displacement sensors, load cells, torque transducers, pressure sensors, accelerometers, geophones, temperature sensors, inclinometers, and others. Log up to 1,000,000 data points (from 100 to 65,500 samples or continuous) at 32-2048 Hz (8 min) Flash 2 MB (approximately 1,000,000 data points) No NIA NIA

Agile-LinkTM(Windows XP compatible) N/A NIA

ABS plastic Screw - 20 - +60 "C with standard internal battery and enclosure, extended temperature range optional with custom battery and enclosure; - 40 - +85 "C for electronics only $1800 (starter kit)

Picture of Product (Courtesy of MicroStrain, Inc.)

Size: 88 mm x 72 mm x 26 mm 72 mm x 65 mm x 24 mm board only

Appendix A

A.10

153

SG Link@ Wireless Strain Node-Microstrain

Company Information (Please see A.7) Product Information Product of interest: Brief description:

Is the product designed for a specific purpose:

Potential applications:

SG-LinkR Wireless Strain Node Fast and extremely versatile, the SG-Link% is designed to operate as part of a high-speed wireless sensor network. The SG-LINKE is compatible with a wide range of Wheatstone bridge-type sensors, including strain gauges, displacement sensors, load cells, torque transducers, pressure sensors, accelerometers, geophones, temperature sensors, inclinometers, and others. Condition-based monitoring of machines, health monitoring of civil structures and vehicles, smart structures and materials, experimental test and measurement, robotics and machine automation, vibration and acoustic noise testing, sports performance and sports medicine analysis, distributed security networks

Pnwer

Power rating:

Portable power source: Nonconventional power sources Power management capabilities:

Physical Characteristics Size (without batteries): Weight: Communications Interface with host: Interface type: Wireless protocol: Wireless acquisition modes:

Range for RF: RF transceiver carrier: Ability to communicate with other portable devices:

SG-Link node only: real-time streaming 25 mA, datalogging 25 mA, sleeping 0.5 mA; external sensors: 350-0 strain gauge 8 mA, 1000-0 strain gauge 3 mA (add sensor consumption to above to calculate total power consumption) 3.7 V (rechargeable) 200 mAh Life: 95 days (wione1000-0 strain gauge) Customer may also supply external power from 3.2 to 9 V Two-position internal power switch: The default position allows the node to only operate on the internal battery power and at the same time allows the battery to be recharged through the rechargeipower connector. The bypass position allows the node to only operate on power supplied through the rechargeIpower connector. Different modes : sleep mode, idle mode, data logging mode; programmable sample rate 58 mm 47 mm

x x

49 mm 36 mm

x x

26mm 24 mm board only

Wireless N/A IEEE 802.15.4, open communication architecture Mode 1 : transmit on command from base station, transmit duration from 100 to 65,500 sweeps, or continuous Mode 2: log on command from base station Mode 3: autotrigger, user specified as a programmable threshold voltage from a specific channel 70 m line of sight, up to 300 m with optional high-gain antenna 2.4 GHz, direct-sequence spread spectrum, license free worldwide (2.450 - 2.490 GHz, 16 channels) NIA

Prognostics and Health Management of Electronics

I54

Sensors

Type of sensors that can be connected:

Channel input: Ability to connect to external sensors:

1

Sampling rates:

u p to 2 1 full differential, 350 R or higher (optional bridge completion), 1 single-ended input (0-3V) and internal temperature sensor The SG-LINK& is compatible with a wide range of Wheatstone bridge-type sensors, including strain gauges, displacement sensors, load cells, torque transducers, pressure sensors, accelerometers, geophones, temperature sensors, inclinometers, and others. 1 full differential, 350 R or higher (optional bridge completion), 1 single-ended input (0-3 V) The SG-LINK8 is compatible with a wide range of Wheatstone bridge-type sensors, including strain gauges, displacement sensors, load cells, torque transducers, pressure sensors, accelerometers, geophones, temperature sensors, inclinometers, and others. Log up to 1,000,000 data points (from 100 to 65,500 samples or continuous) at 32-2048 Hz (8 min)

On-Board Memor

Flash 2 MB (approximately 1,000,000 data points) No

Embedded code for on-board computations: Software Availability of host software: Software features for configuration of hardware for efficient monitoring: Other software features that can be useful for health monitoring: Other Details Housing details: Mounting details: Operating temperature:

I

NIA NIA

Qi

A ile-LinkTM (Windows XP com atible)

/I

extended temperature range optional with custom battery and

Picture of Product (Courtesy of Microstrain, Inc.)

Size: 58 mm x 49 mm x 26 mm 47 mm

x

36 mm

x

24 mm board only

155

Appendix A

A.11

TC- Link@ Wireless Thermocouple System-Microstrain

Company Information (Please see A.7) Product Information Product of interest: Brief description:

Is the product designed for a specific purpose:

Potential applications:

TC- Link% Wireless Themocouole Svstem Complete, cold junction compensated, linearized six-channel wireless thermocouple node. TC-Link% features six standard thermocouple input connectors with an embedded cold junction temperature sensor. On-board linearization algorithms are software programmable to support a wide range of thermocouple types (J, K, R, S, T, E, B). Civil structures sensing: concrete maturation Industrial sensing networks: machine thermal management Food and transportation systems: refrigeration, freezer performance monitoring Advanced manufacturing: plastics processing, composite cure monitoring Assembly line testing with smart packaging

Pnwer

Power rating: Portable power source:

Nonconventional power sources (if any): Power management capabilities:

Physical Characteristics Size (without batteries): Weight: Communications Interface with host: Interface tvoe: Wireless protocol: Wireless acquisition modes:

Range for RF:

0.8 mA at 2 Hz, 0.48 mA at 1 Hz, 0.1 mA at 3imin; 0.09 mA at 1imin 3.7 V (rechargeable), 60 mAh 2 samples per second, 0.8 mA (1 month) 1 sample per second, 0.48 mA (2 months) 3 samples per minute, 0.1 mA (8 months) 1 sample per minute, 0.09 mA (10 months) Customer may also supply external power from 3.2 to 9 V Two-position internal power switch: The default position allows the node to only operate on the internal battery power and at the same time allows the battery to be recharged through the rechargeipower connector. The bypass position allows the node to only operate on power supplied through the rechargeipower connector. Different modes: sleep mode, idle mode, data logging mode, LDC mode Programmable sample rate ~~

~

110mm x 62 mm x 2 8 m m 76 mm x 58 mm x 23 mm board only

Wireless

NIA IEEE 802.15.4, open communication architecture Mode 1 : transmit on command from base station, transmit duration from 100 to 65,500 sweeps, or continuous Mode 2: log on command from base station Mode 3: autotrigger, user specified as a programmable threshold voltage from a specific channel 70 m line of sight, 100 m with optional high-gain antenna

I56

RF transceiver carrier: Ability to communicate with other portable devices: Sensors Number of channels: Channel configuration:

Type of sensors that can be connected:

Channel input: Ability to connect to external sensors: Sampling rates: On-Board Memory Type of memory: Size: On-Board Processing Availability of on-board processor: Primary functionality: Embedded code for on-board computations: Software Abailability of host software: Software features for configuration of hardware for efficient monitoring: Other software features that can be useful for health monitoring: Other Details Housing details: Mounting details: Operating temperature:

Cost:

Prognostics and Health Management of Electronics

2.4 GHz, direct-sequence spread spectrum, license free worldwide (2.450-2.490 GHz,l6 channels) NIA ~

~~

~

~~

~

u p to 8 Six thermocouple inputs, types J, K, R, S, T, E, B and one CJC channel; optional relative humidity sensor; single-channel unit on request TC-Link% features six standard thermocouple input connectors with an embedded cold junction temperature sensor. On-board linearization algorithms are software programmable to support a wide range of thermocouple types (J, K, R, S, T, E, B). 1 full differential, 350 R or higher (optional bridge completion), 1 single ended input (0-3 V) Six thermocouple inputs, types J, K, R, S, T, E, B and one CJC channel; single-channel unit on request Programmable from 2 samples/s to 1 sampleil7 min for data logging or LDC modes Flash 2 MB (approximately 1,000,000 data points) No NIA NIA

Apile-LinkTM(Windows XP compatible) NIA

ABS nlastic Screw - 20- +60 “C with standard internal battery and enclosure, extended temperature range optional with custom battery and enclosure; - 40 - +85 ‘C for electronics only S 1550 (starter kit)

Picture of Product (Courtesy of Microstrain, Inc.)

Size: 110 mm x 62 mm x 28 mm 76 mm

x

58 mm x 23 mm board only

157

Appendix A

A.12

ICHMB 20/20-0ceana Sensor

Company Information Name: Brief description:

Maior area of work: Major customers: Case studies and publications (in PHM area):

Oceana Sensor, Inc Manufacturer of OEM vibration sensors and smart, wireless sensing systems for a wide variety of applications including machinery health monitoring Sensor systems N /A N/A

Product Information Name: Brief description:

Intelligent Component Health Monitor@ (ICHM) 20/20 This is a data acquisition and processing module designed to serve in any type of monitoring application NIA

Is the product designed for a specific purpose: Potential applications:

Machinery monitoring in power generation and pulpipaper industries, used in a smart ventilation monitoring system aboard a U.S. Navy nuclear aircraft carrier ( U S Carl Vinson)

Pnwer

Portable power source: NIA Powered by 12 V DC

Nonconventional power sources (if any): Power rating: Power management capabilities: Physical Characteristics Size (without batteries):

Ni a NIA

4.72 in. (1 19.88 mm)width 3.16 in. (80.26 mm) length NIA

Weight: Communications Interface with host: Interface type: Wireless protocol:

Sensors

Type of sensors that can be connected: Channel input: Ability to connect to external sensors: Sampling rates:

2.20 in. (55.88 mm) height

x

Wireless NIA Bluetooth [industry standard (IEEE 802.1 5) wireless NiA NIA Can communicate with PCllaptop

Wireless acquisition modes: Range for RF: Ability to communicate with other portable devices: Number of channels: Channel configuration:

x

~~~~~

Up to 6 channels 2 dynamic channels at 24-bit resolution and 4 static channels at 12-bit resolution ICPE sensors, 0-5 V DC,accommodates various other sensor types using an in-line signal conditioning interface module 3.3 V AC peak to peak, 0-5 V DC, ICPD NIA

Up to 96 kHz

158

Prognostics and Health Management of Electronics

On-Board Memory Type of memory: Size: On-Board Processing Availability of on-board processor: Primary functionality: Embedded code for on-board computations: Software Availability of host software: Software features for configuration of hardware for efficient monitoring:

Other software features that can be useful for health monitoring: Other Details Housing details: Mounting details: Operating temperature: cost:

NIA NIA Yes FFT and band analysis, provides feature extraction of raw data prior to wireless transmission Various signal processing per on-board DSP inherent functionality

ICHM Monitor Software Block size, frequency range, sample frequency, window type, averaging, triggering, band monitoring, display and storage of dynamic time series and spectral data (1 or 2 channels), display and storage of static channel data, time historyitrending plots, file importiexport capability NIA

NEMA 4 enclosure NIA NIA NIA

Picture of Product (Courtesy of Oceana Sensor, Inc)

Size: 119.88 mm width

x

55.88 mm height

x

80.26 mm length

Appendix A

A.13

159

Miniature Wireless Unattended Data Logger-Radio Microlog

Company Information Name: Brief description: Major area of work: Major customers: Case studies and publications (in PHM area):

JR Dvnamics Ltd No information available Data loggers N/A

NIA

Product Information Product of interest: Brief description:

Miniature wireless unattended data logger This in-service data logging system uses wireless (radio) communication that allows convenient programming and data download from areas where wired connections would be

Is the product designed for a specific purpose: Potential applications:

Designed for long-term unattended data logging

Power Portable power source: Nonconventional power sources (if Power rating: Power management capabilities: Physical Characteristics Size (with batteries) Weight (with batteries): Communications Interface with host: Interface type: Wireless urotocol: Wireless acquisition modes: Range for RF: Ability to communicate with other portable devices: Sensors Number of channels: Channel configuration: Type of sensors that can be connected: Channel inuut: Ability to connect to external sensors: Sampling rates:

On-Board Memory Tvpe of memorv: Size:

Load and stress monitoring- of dynamically loaded systems and structures and endurance and fatigue life estimation of products 3.3 V DC power supply or batteries NIA

NIA

37mmx24mmxlOmm

Yes Digital radio, infrared, or cable communication via RS232 interface NIA NIA NIA NIA

Two-channel multiplexed, &bit/ 10-bit resolution Two channel strain gauge signal conditioning NIA NIA

4 kHz aggregate, 2 kHz per channel in 2-channel mode RAM and flash (backup memory) 4 MB RAM (digital radio model), 256 kB flash

Prognostics and Health Management of Electronics

I60

On-Board Processing Availability of on-board processor: Primary functionality:

Embedded code for on-board computations: Software Availability of host software: Software features for configuration of hardware for efficient monitoring: Other software features that can be useful for health monitoring: Other Details Housing details: Mounting details: Operating temperature:

Yes On-line rainbow count (64 x 32), range pairs, level crossing, time at level; Short bursts of time domain data; 100 highest events recorded in time domain, including accurate time stamp 100 highest events in time domain

Yes Simple offset and gain adjustments can be made via software

NIA

NIA -40-+85"C NIA

Picture of Product (Courtesy of JR Dynamics, Ltd.)

Size: 37 mm

x

24 mm

x

10 mm

Appendix A

Name: Brief description: Major area of work: Major customers: Case studies and publications

Sensicast Systems Develops intelligent wireless sensor network systems Wireless sensor network systems NIA NIA

Name: Brief description:

EMS 200 Environmental Monitoring Sensor Wireless transceiver with built-in temperature and humidity sensors To monitor and report real-time readings of temperature and humidity Data-center monitoring (servers), refrigerator monitoring (hospitals, labs, retail stores), in-building precision area

Is the product designed for a specific purpose: Potential applications: '

161

Power Portable power source: Nonconventional power sources (if any): Power rating: Power management capabilities:

11 Physical Characteristics Size (without batteries): Weight: Communications I Interface with host: Interface type: Wireless protocol:

,

Wireless acquisition modes: Range for RF: Ability to communicate with other portable devices: Sensors Number of channels: Channel configuration: Type of sensors that can be

Ability to connect to external sensors: Sampling rates:

2-AA alkaline battery pack (3-year life) NIA 15 dBm (output power) Automatic low-battery supervision reporting

120 mm NIA

x

48 mm

x

20 mm

NIA NIA IEEE 802.15.4 compliant, 2.4 GHz distributed frequency saread saectrum NIA 2 12 m (700 ft) outdoors 70 m (230 ft) indoors NIA

NIA NIA Temperature and humidity

NIA NIA

Prognostics and Health Management of Electronics

I62

/

On-Board Memorv Type of memory: Size: On-Board Processing Availability of on-board processor: Primary functionality: Embedded code for on-board computations: Software A\ailability of host softv,are: Software features for configuration of hard\\are for efficient

Other software features that can be useful for health monitoring: Other Details Housing details: Mounting details: Operating temperature: cost:

None N/A None

N /A N/A

EMS Management Application Software NIA

N/A

N/A One or two screws, or double-sided adhesive tape - 40 "C - +123.8 "C $2999 (for starter package)

Picture of Product (Courtesy of Sensicast Systems)

Appendix A

A.15

163

SZNAPB-RLW

Company Information Name: Brief description: Major area of work: Major customers: Case studies and publications (in PHM area):

RLW Inc. Develop software, devices and systems for condition-based maintenance (CBM) applications Condition-based maintenance NASSCO shipyard in San Diego, at NGSS Ingalls, and Portsmouth Naval Shipyard NIA

Product Information Product of interest: Brief description:

S~NAPCZ The S2NAP@ is a wireless device that transmits machinery health data gathered from a variety of analog sensors including vibration, pressure, temperature, level, current, and voltage. Shipyard cranes and other mobile and fixed shipyard equipment Army and Marine Corps vehicles and all services' aircraft

Is the product designed for a specific purpose: Potential applications: Power Portable power source: Nonconventional power sources (if any): Power rating: Power management capabilities: Physical Characteristics Size: Weight: Communications Interface with host: Interface type: Wireless protocol: Wireless acquisition modes: Range for RF: Ability to communicate with other oortable devices: Sensors Number of channels: Channel configuration: Type of sensors that can be connected:

Channel input: Ability to connect to external sensors: Sampling rates:

Power supply NIA Input voltage: 12 - 30 V DC regulated Input power: 5 W max The S2NAP'B provides regulated power for sensors

NIA Wi-Fi ~

N/A

802.1 1b, cellular, or wired Ethernet NIA NIA

8 channels (differential or single ended. .AC or DC') 8 channels. analog input .Accelerometers (ICP). \.arious pressure. force and torque (ICP). strdin gciupe bridge sensors. 4-20 m.A loop transmitters. prouimit) probes. \vex debris sensors. an!' \.ohage output sensor \\ ith ? I0 \ ' DC

s :I 8 analog inputs. I tachometer input NIA

Prognostics and Health Management of Electronics

164

On-Board Memory Type of memory:

Primary functionality:

Embedded code for on-board computations: Software Availability of host software: Software features for configuration of hardware for efficient monitoring: Other software features that can be useful for health monitoring: Other Details Housing details: Mounting details: ODeratine temperature: cost: I

I

NIA NIA Yes “Conditional” measurement: data are recorded only at appropriate times based on the occurrences of significant events; synchronous measurements: obtain phase information from concurrent orthogonal vibration measurements synchronized to tachometer output for rotational machinery. Enhanced FFT resolution: from 100 to 5000 Hz; 400 to 1600 lines XML configuration file SHARC DSP signal processing and data manipulation functions RLW Inc. is pleased to accommodate other organizations’ algorithms, transducers, and RF gear Enclosure type: NEMA 4 NIA - 2 0 - + 6 0 ‘C NIA

Picture of Product (Courtesy of RLW, Inc.)

I

Appendix A

A.16

165

SR-1 Series Strain Measurement System-DMI

Company information Name: Brief description: Major area of work: Major customers: Case studies and publications (in PHM area):

Direct Measurements, Inc. (DMI) DMI manufactures wireless strain and fatigue measurement sensors Wireless strain and fatigue sensors Aircraft company Conduct test of pre-crack fatigue detection and monitoring technology with Lockheed Martin Aeronautics Company

Product information Product of interest:

Is the product designed for a specific purpose: Potential applications: Power Portable power source: Nonconventional power sources (if any): Power management capabilities: Physical Characteristics Size (without batteries):

Weight (without batteries): Communications Interface with host: Interface type: Wireless protocol: Wireless acquisition modes: Range for RF: Ability to communicate with other portable devices: Sensors Number of channels: Channel configuration: Type of sensors that can be connected: Channel input: Ability to connect to external sensors: Sampling rates: On-Board Memory Type of memory: Size:

SR- 1 Series Strain Measurement System (gage, reader, software) Strain and fatigue measurement and monitoring Strain and fatigue monitoring, health monitoring Standard 115 V AC NIA NIA Reader tube: 229 mm XO64 mm Computer size: 183 mm x 299 mm Gage size: 12.5mm x 12.5mm 7 lb

x

58 mm

Wireless between gage and reader NIA Laser identification NIA NIA Wired or wireless to PDA, laptop, Internet, remote access

NIA NIA NIA

NIA NIA NIA

NIA NIA

Prognostics and Health Management of Electronics

166

On-Board Processing Availability of on-board processor: Primary functionality: Embedded code for on-board computations: Software Availability of host software: Software features for configuration of hardware for efficient monitoring Other software features that can be useful for health monitoring: Other details Mounting details: Operating temperature: cost:

Yes Postprocess, analyze, present on-board using third-party software NIA

DMI SR-1 interface,Ver. 2.0.1, Windows XP Tablet system Postprocess, analyze, present the strain measurement data

N/A

Plastic N/A Reader: - 20 'C - + 60 "C Gage: - 129 "C - + 93 'C NIA

Picture of Product (Courtesy of Direct Measurements, Inc.)

SR-1 Gages, sold in packages of 5

SR-1 Strain reader

Appendix B PHM in Industry, Academia, and Government An assessment of companies, government branches, and universities was conducted to identify the state-of-practice and state-of-research specific to prognostics. While the organizations included in this assessment are working on prognostics in general, an effort was made to target those that are more involved in prognostics for electronics systems. At the beginning of this assessment, a search for prognostic approaches, implementation case studies, technical publications, and extent of intellectual property of numerous organizations was conducted. From this search, companies, government organizations, and universities that are researching, developing, and/or implementing PHM for electronic products and systems were identified. For each organization, their approaches to PHM were identified. This included the specific models and algorithms that the organization is utilizing to detect and predict impending failures of their products and systems as well as the system-level applications of PHM. The assessment of the key organizations was carried out through visits, personal contacts, conversations, interviews, data mining, and literature searches. The organizations and their respective divisions that were included in the assessment are listed in Table B. 1.

Table B.l: Organizations Included in Survey Companies

I

Divisions Included in Survey

BAE Systems The Boeing Company European Aeronautic Defence and Space Company

Advanced Technology Center Commercial Airplanes, Integrated Defense Systems

Expert Microsystems

All business units Advanced Information Systems, Development and Integration Systems All business units All business units All business units Aerospace, Automation and Control Systems All business units All business units

General Dynamics General Electric General Motors GMA Industries Honeywell Impact Technologies Intelligent Automation, Inc.

Airbus, Eurocopter

Prognostics and Health Management ofElectronics. By Michael G. Pecht Copyright C 2008 John Wiley & Sons, Inc.

I

Prognostics and Health Management of Electronics

I68

Companies

Divisions Included in Survey Aircraft & Logistics Centers, Integrated Systems & Solutions Newport News (Aircraft Carrier Systems), Integrated Systems, Electronic Systems

Lockheed Martin borthrop Grumman

1

,

1

I All business units

Qualtech Systems. Inc.

I1

Raytheon

Integrated Defense Systems. Network Centric Systems, Aircraft Systems

Ridgetop Group

All business units

Rocknell Automation Sentient Corporation

All business units All business units

Scientific Monitoring

All business units

I All business units

Smartsignal Sniiths Aerospace (GE) Sun Microsystems VEXTEC Corporation Government ~~

~ _ _ _ _ _

Dhisions Included in Survey

~~~~

~

National Aeronautics and Space Administration

Ames Research Center (Intelligent Systems Division)

1

Sandia National Laboratories

1

U.S. Air Force

I

Air Force Research Laboratory

U.S.Army

I

Army Logistics Integration Agency, Army Materiel Command Army Research Office, Army Materiel Systems Analysis Activity. Army Research Laboratory Vehicle Technology Directorate / NASA Glenn Research Center

1

I

Electronic Systems, Mechanical Systems San Diego Physical Sciences Research Center All business units

I Optimization and Uncertainty Estimation Department I

Naval Surface Warfare Center, Naval Air Systems Command China

1

I

Universities

1 Auburn Universitv 1 Georgia Institute of Technology /I Pennsylvania State University

~ ~ ~ ~ _ _ _ _ _ _ _

/I Univeisitv of California at L i s Angeles v

University of Maryland Uni\ ersity of Tennessee CniIersity of North Carolina

Divisions Included in Survey

I Mechanical Engineering Deuartment I Intelligent Control Svstems Laboratow

I, A k l i e d Research Laboratow I Nondestructive Evaluation Research Groun I

Center for Advanced Life Cycle Engineering PHM Cnnsnrtiiim

Nuclear Engineering Department Center for Logistics and Digital Strategv

I Institute for Software Integrated Systems ~~

/

Vaiiderbilt University

/I I/ I/

I

II/ I1

Appendix B

B.l

169

ARINC

Aeronautical Radio Incorporated (ARINC) focuses on transportation communications and systems engineering. The company develops and operates communications and information processing systems and provides systems engineering and integration solutions to five industries: airports, aviation, defense, government, and surface transportation. Founded to provide reliable and efficient radio communications for the airlines, ARINC is headquartered in Annapolis, Maryland, and operates regional offices in London and Singapore, with over 3000 employees worldwide.

B.l.l

Approach to PHM

Monitoring the continued health of aircraft subsystems and identifying problems before they affect airworthiness has been a long-term goal of both the U.S. military and commercial aviation industry. The ARINC Aircraft Condition Analysis and Management System (ACAMS) is an aircraft diagnosis and prognosis system being developed to offer real-time diagnosis. Working with NASA's Langley Research Center under the Aviation Safety Program, ARINC is developing ACAMS to automatically diagnose and predict faults in complex aircraft subsystems, including flight subsystems, landing gear, and structural elements. ACAMS can also be configured for propulsion health management. ACAMS accesses and analyzes information sources to isolate fault behavior at the system level rather than at the LRU level. In certain cases, ACAMS can predict faults in the monitored subsystems before they occur. The diagnostic and prognostic results delivered by data link service are prioritized in accordance with user-specific criteria to assess the impact of fault conditions on future operation. If a critical anomaly is identified, that information is transmitted to the ground crew. Aircraft maintenance and ground crews then use this information for operational and maintenance planning and take any needed corrective action. On the ground, an off-board ACAMS combines the collected and analyzed in-flight data with historical information, such as component maintenance history, reliability and maintainability data, and-for commercial air carriers-quick-access recorder (QAR) and flight operations quality assurance (FOQA) data. The system analyzes these data for higher fidelity diagnosis and prognosis, helping to facilitate long-term, fleet-wide aircraft dispositioning and improve maintenance scheduling and parts supply management. ACAMS was demonstrated in an ARINC team effort with NASA's Langley Research Center. The system identified and diagnosed faults in the landing gear subassembly of a NASA-owned Boeing 757 and isolated and predicted faults in simulated lap joint structures. The system then delivered these "health messages" to the ground, using a radio frequency data link. ACAMS provides several benefits to its users. It improves maintenance scheduling, maintenance operations, and parts supply management; enables condition-based maintenance to ensure efficient and timely replacement of faulty parts in complex systems; reduces logistics support costs for complex systems; and analyzes health of aircraft subsystems in flight. In February 2006, ARINC Engineering Services (AES) was named a prime contractor by NASA for support of aircraft flight-critical systems research. In announcing the awards, NASA said tasks under the contract will cover support of flight-critical systems development and integration, as well as flight dynamics, guidance and control, crew systems and aviation operations, and reliable and robust avionics systems. The tasks will be

Prognostics and Health Management of Electronics

170

accessible to all five primary contractors. In addition to AES (a subsidary of ARINC), the consortium includes the Boeing Company, Phantom Works, Honeywell Corporation, Lockheed Martin Aeronautics Company, and Rannoch Corporation.

B.1.2 Related Publications 1.

W. Tomczykowski, “An Ounce of Prevention,” Avionics Magazine, p. 70, June 2005.

2.

E.C. Larson, B.E. Parker, Jr., and B.R Clark, “Modern Spectral Estimation Techniques for Structural Health Monitoring,” Proceedings of the 2002 American Control Conference, Vol. 5, pp. 42204223, May 8-10,2002.

3.

A. Bartolini, and T.E. Munns, “Regulatory and Implementation Issues in Aircraft Health Management,” Proceedings of the 19th Digital Avionics Systems Conference (DASC), Vol. 2, pp. 6B5/1-6B5/7, October 7-13, 2000.

4.

R.M. Kent, “Fiber Ultrasonics for Health Monitoring of Composites,” Proceedings of the 19th Digital Avionics Systems Conferences (DASC), Vol. 2, pp. 6D3/1-6D3/6, October 7-13,2000.

5. T.E. Munns, and R.M. Kent, “Structural Health Monitoring: Degradation Mechanisms and System Requirements,” Proceedings of the 19th Digital Avionics Systems Conferences (DASC), Vol. 2, pp. 6C211 -6C2i8, October 7-13,2000,

B.1.3 Related Patent U.S. Patent: 6,943,699: Wireless engine monitoring system

Appendix B

B.2

171

BAE Systems

BAE Systems is an international company engaged in the development, delivery, and support of advanced defence and aerospace systems. The company designs, manufactures, and supports military aircraft, combat vehicles, surface ships, submarines, radar, avionics, communications, electronics, and guided weapon systems. BAE Systems has major operations across five continents and customers in some 130 countries. The company employs nearly 100,000 people and generates annual sales of approximately $25 billion through its wholly owned and joint-venture operations. BAE Systems, Inc. is the U.S. subsidiary of RAE Systems plc. Headquartered in Rockville, Maryland, BAE Systems, Inc. is responsible for developing BAE Systems’ transatlantic business, relationships with the U.S. government, administration of BAE Systems’ Special Security Agreement, and managing its US.-based operating groups.

B.2.1 Approach to PHM BAE Systems has developed the Engine Health Diagnostics Using Radar (EHDUR) system, which provides an ability to measure and analyze high-cycle fatigue (HCF) failure of the engine installed on the aircraft. The effects of HCF lead to a reduction in engine performance and effective operating life. The sensing antennas enable contactless monitoring of compressor blades and are designed to fit existing inspection ports, thereby minimizing disruption to the engine. The system will detect changes in the vibration characteristics of the engine and changes in engine speed with an ability to provide temporal profiles of engine performance. The information obtained from the EHDUR system allows changes to be taken into account when setting operating profiles for engine usage, thereby increasing engine life, reducing risk of blade failure, and improving maintenance efficiency. BAE Systems’ EHDUR may be installed as a permanent engine sensor on an operational aircraft, used as an on-ground maintenance tool to provide engine characterization between flights, or used on test rigs to support engine development programs. Alternative applications include marine turbines, vehicle turbines, and electrical power generation plants providing the operator with increased engine life and reduced maintenance requirements. BAE Systems facilities in both the United States and United Kingdom are responsible for PHM integration as well as the design and delivery of the fuel system, crew escape, life support system of the F-35 JSF platform, and weapon systems. Besides being a member of the CALCE PHM Consortium, BAE is also supporting a PHM research project at Loughborough University in the United Kingdom. The project titled “Advanced Diagnostic and Prognostic Techniques for Complex Systems” [3] aims to investigate and develop methodologies and techniques to predict useful life in complex systems. The top-level objective is to deliver an integrated health management system that is compatible with emerging technologies that provide system architectures that are modular, reconfigurable, and fault tolerant. Several areas of interest for research include (1) mapping of sensor technologies with stress and damage models to assess real-time life consumption monitoring (LCM) of electronic systems, (2) demonstrating the LCM methodology on an electronic board operated in harsh environments, (3) evaluating diagnostic built-in-test (BIT) software-firmware systems to identify and locate faults, that incorporate error detection and correction circuits, and self-checking and self-verification circuits, (4) integrating in situ semiconductor prognostic monitors consisting of precalibrated cells (circuits) to predict remaining life due to semiconductor defects and failure mechanisms, and (5) developing

172

Prognostics and Health Management of Electronics

software modules (data collection, simplification, damage accumulation, and remaining-life estimation) for environment and usage data collection that enable PHM.

B.2.2 Related Publications 1.

E. Wemhoff and A. Khalak, “A Self-Assessment Approach to Confidence Modeling in Diagnostics and Prognostics,” paper presented at the 59th Meeting of the Society for Machinery Failure Prevention Technology, Virginia Beach, VA, April 2005.

2.

H. Azzam, F. Beaven, I. Hebden, L. Gill, and M. Wallace, “Fusion and Decision Making Techniques for Structural Prognostic Health Management,” paper presented at the 2005 IEEE Aerospace Conference, Big Sky, MT, pp. 1-12, March 5-12,2005.

3.

P. Bennett, R. Dixon, and J. Pearson, “Advanced Diagnostic and Prognostic Techniques for Complex Systems,” paper presented at the 2nd ESC Division Mini-Conference, January 24,2005.

4.

S.L Dreyer, “Air Vehicle Integrated Diagnostics,” Proceedings of the 2004 IEEE Autotestcon Conference, San Antonio, TX, pp. 5 1 1-5 17, September 20-23,2004.

B.2.3 Related Patent U.S. Patent: 6,826,982: Method and apparatus for detection of structural damage

Appendix B

B.3

173

The Boeing Company

Boeing is a leading aerospace company and manufacturer of commercial jetliners and military aircraft, with capabilities in rotorcraft, electronic and defense systems, missiles, satellites, launch vehicles, and advanced information and communication systems. Boeing has customers in 145 countries around the world and is the number one U.S. exporter in terms of sales. Headquartered in Chicago, Illinois. Boeing employs more than 153,000 people in more than 67 countries. Boeing’s broad range of capabilities includes creating new, more efficient members of its commercial airplane family; integrating military platforms, defense systems and the war fighter through network-centric operations; creating advanced technology solutions that reach across business units; e-enabling airplanes and providing connectivity on moving platforms; and arranging financing solutions for its customers.

B.3.1 Approach to PHM As a large-scale systems integrator, Boeing leverages the best of industry, government, and academia to implement affordable, state-of-the-art integrated vehicle health management (IVHM) capabilities in the systems and vehicles produced by Boeing’s Commercial Airplanes and Integrated Defense Systems units. These IVHM capabilities result in improved operations capability, reduced maintenance and logistics costs, and increased system life. In addition, Boeing’s advanced research and development unit, Phantom Works, is developing new IVHM technologies and architectures for incorporation into its commercial. defense, space, and communications products and services.

B.3.2 Boeing Commercial Airplanes and Integrated Defense Systems Boeing’s two major business units, Boeing Commercial Airplanes (BCA) and Integrated Defense Systems (IDS), provide large-scale systems for military, government, and commercial customers. IVHM capabilities are included or planned for many of these systems. IDS provides products such as the Integrated Information Maintenance System for the F-15, T-38, and C-130 and the Automated Maintenance Environment for the FIA-18. IDS also has IVHM solutions for the C-17, 737 Airborne Early Warning & Control aircraft, P-8A Multi-mission Maritime Aircraft, and various rotorcraft programs. Boeing Commercial Airplanes offers a service called airplane health management (AHM), an in-flight health monitoring and prognostic maintenance system. AHM uses real-time airplane data from the airplane central maintenance computer and electronic logbook to provide enhanced fault forwarding, troubleshooting, and historical fix information to reduce schedule interruptions and increase maintenance efficiency. Boeing’s AHM program consists of a database of past trends and outcomes, the real-time acquisition of airplane information while in flight, the analysis of historical and present data using a set of prognostics, and a set of built-in decision support tools. While an airplane is still in flight, the airline operations center can view the maintenance fault status of the airplane on the ground or in the air. The data are transmitted to the ground in real time via ACARS (Aircraft Communications and Reporting System). From a console, the operator can examine previous trends for an airplane, view relevant documents, and make the operational or fix-or-fly decisions that can make the difference between profit and loss. In addition to diagnosing an airplane problem in flight, AHM can also be used to predict when parts might fail so that they can be replaced or repaired during a regularly scheduled maintenance check.

I74

Prognostics and Health Management of‘ Electronics

IDS developed and fielded the Computerized Fault Reporting System (CFRS) in support of the U.S. Air Force (USAF) F-15 program. In late 2004, in conjunction with expanding CFRS to support the C-130 and T-38, the USAF redesignated CFRS as the Integrated Maintenance Information System (IMIS). IMIS is an automated fault diagnosis and maintenance reporting/tracking system designed to reduce the time spent performing fault isolation and to simplify operator and maintenance record keeping. IMIS is designed to determine the operational status of complex systems based on data received from platform download and/or operator input, isolate faults using an expert system that processes data from BIT capabilities and operator input, and schedule and record maintenance. Maintenance tasks are assigned job control numbers and forwarded electronically to the correct work center or shop. Appropriate information is provided to the USAF Core Automated Maintenance System, or CAMS. Since 1998, IDS has served as the lead system integrator (LSI) of the automated maintenance environment (AME) to provide effective data capture, debrief, and fault diagnostics in support for the FIA-18. The AME downloads aircraft BIT data and debriefs pilots and maintainers based on these data to diagnose faults. Upon identifying a fault, AME opens a work order in the customer’s maintenance management system and takes the maintainer directly to the correct starting place in the F/A-18 Interactive Electronic Technical Manual (IETM). Authorized maintainers anywhere in the world can review the data associated with this fault in a Web-based data warehouse to evaluate lessons learned, conduct BIT trend analyses, identify bad actors, determine bad weapons replaceable assemblies (WRAs), and help forecast spares and position parts across the fleet. Additional AME capabilities include the AME’s wiring illuminator (WI), which is directly linked to IETM actions, eliminates maintainer requirements to sift through numerous wiring repair sheets and complicated wiring diagrams, while substantially decreasing maintainer mistakes. As appropriate for electrical problems, the IETM automatically launches WI and returns the maintainer to the IETM when the electrical problem has been resolved. Also within the AME’s toolset is an advanced external structure damage evaluation capability for composite structures. Some composite repairs, which were previously performed at the intermediate and depot levels, can now be safely accomplished at the organizational level to expeditiously return the aircraft to the combat mission schedule. The AME also provides PC flight playbacks of aircraft missions to enable maintainers to accurately ascertain the aircraft flight mode when a fault originated. For additional exploitation of AME failure and performance data, Boeing proposes to integrate organizational-level maintenance and wholesale supply activities through performance-based logistics (PBL) systems such as FIA- 18E/F integrated readiness support teaming (FIRST). Working with the engine supplier, Boeing’s C-17 Globemaster-111 Sustainment Partnership (GSP) program provides the customer innovative propulsion system health monitoring. Boeing has developed software for the USAF that reads and analyzes the aircraft’s quick-access recorder (QAR) propulsion data, producing multiple reports, such as unique event reports, unique numerical review reports, and fault reports. The fault reports incorporate the corresponding fault isolation tree reference that includes a matrix of fault criticality to assist in maintenance planning. The data are packaged and sent via the Internet to the engine manufacturer’s commercial derivative trending and engine management portal for hosting the C-17 engine health information. This portal allows secure World Wide Web access for the whole C-17 propulsion community to view engine health information, including the C-17 engine QAR produced reports. The portal also analyzes the QAR data for engine health performance and tracks life-limited parts. The data are trended and projected for maintenance action, alerting, and depot planning. A critical-event alerting

Appendix B

175

system, utilizing the C- 17’s on-board airline operational control (AOC) ACARS-based fault reporting, is also used by field representatives. The 737 Airborne Early Warning & Control (AEW&C) program has deployed an integrated Health and Usage Monitoring System (HUMS) and Operational Loads Monitoring System (OLMS) to meet the Australian customer’s aircraft structural integrity needs. The system includes high-sample-rate data collection, total airplane performance monitoring, and calibrated load bridges for capturing actual airplane loads on major components. The heart of the system is a data recorder capable of recording over 20 h of continuous flight data at sample rates of up to 100 samples per second. The AEW&C program is also nearing completion of an automated aircraft structural integrity ground station (ASIGS) that will process and analyze the HUMSiOLMS data. The system will calculate accumulated fatigue damage at selected points representing the entire airplane. In addition to analyzing individual flights, the ASIGS will track the lifetime history of individual airplanes as well as the entire Australian AEW&C fleet. The system reports the usedlremaining fatigue life based on actual airplane usage and provides trending tools for assessing “what if” usage scenarios. The integrated architecture of the AEW&C system, including the on-board HUMS and OLMS and the ground-based ASIGS, will maintain the aircraft’s continued airworthiness status and optimize fleet usage. The service life management features provide an integrated approach to in-service maintenance providing real cost savings to the customer. The system also allows the customer to ensure the airplane is properly used, extending aircraft service life and reducing structural maintenance. The AEW&C fleet management system has performed well through extensive airplane flight testing and system development/demonstration. The system has been deployed in some of the aircrafts. As a critical element of the 737 AEW&C airworthiness, the ASIGS software will be certified by the Commonwealth of Australia Director General Technical Airworthiness (DGTA)-Aircraft Structural Integrity (ASI) agencies. The on-board hardware for the 737 AEW&C fleet management system is fully FAA-qualified. Future development projects planned for the Australian AEW&C fleet include the development of a conditional monitoring database program that will capture the actual maintenance outcomes and findings as well as historical structural repair of the fleet. This tool will help “right-size’’ the maintenance program, provide useful reference information when designing structural repairs, and help maintain airworthiness for the fleet. For the P-8A Multi-mission Maritime Aircraft (MMA) program, IDS is using a mix of vehicle health management philosophies to provide an integrated health management system (IHMS) to the U.S. Navy. The mission equipment integrated into the P-8A MMA will contain a centralized HMS capability residing within mission computing. The equipment will acquire the status and collect HMS events, including parametric data, to be passed to the HMS recorder residing within the aircraft avionics. This will provide improved diagnostic capabilities reducing fault isolation times. The P-8A MMA will also use prognostics to increase operational availability and reduce cost by collecting parametric data on life-limited components. If the prognostics algorithms indicate a component is nearing end of life, it may be scheduled for replacement at a convenient time based on fleet management constraints. The parametric data, along with fault codes for the mission equipment and aircraft kinematics and configuration data, will be collected on a data recording device-currently projected to be a PCMCIA card-and removed after each flight. Once downloaded from the aircraft, the fault data and parametric data will be processed by the ground-based health management system to provide work orders to maintainers.

Prognostics and Health Management of Electronics

I76

IDS’S rotorcraft division is working with the U.S. Army customer and suppliers to develop a condition-based maintenance (CBM) approach for the AH-64D Apache Longbow and CIMH-47FIG helicopters. CBM provides health monitoring of rotating mechanical components, structural monitoring of fatigue-critical components, and in-flight transmission of aircraft maintenance data for selected components. The CBM solution includes a HUMS, a structural usage monitoring system (SUMS), and a structural integrity monitoring system (SIMS). Data from these systems is processed by a ground station that performs diagnostic and prognostic analysis to assess the structural integrity and remaining useful life of individual aircraft components. These results are recorded and reported via the IETM. IDS, working with NASA Ames Research Center, also developed and installed an airborne HUMS on an experimental JUH-60A helicopter called NASA/Army Rotorcraft Aircrew Systems Concepts Airborne Laboratory (RASCAL). The system acquires rotor and drive train loads, in addition to vehicle flight parameter data, and was used for component loads monitoring during full authority advanced flight control system for the aircraft. Advancements of this system over previous systems include the addition of a partially wireless rotor data system and serially multiplexed smart digital strain sensors. As a large-scale systems integrator, Boeing is integrating health management into the system design process for its aircraft, missile, satellite, launch vehicle, and advanced information and communication systems. In addition to those discussed previously, Boeing is developing IVHM solutions for the 747-400, 777, 787, B-52, 767 tanker, and crew exploration vehicle (CEV).

B.3.3 Phantom Works Phantom Works is the advanced research and development unit at Boeing. This includes IVHM laboratories, IVHM solutions for unmanned aerial vehicles and space vehicles, and Boeing’s IVHM Solution Center. As part of its role in developing advanced unmanned aerial vehicles (UAVs) and space vehicles, Phantom Works is developing IVHM adaptive control capabilities. These capabilities augment the operator’s control abilities with reconfigurable control and mission planning. This ensures mission completion in the presence of battle damage or control system failures by intelligently exploiting remaining system capabilities. IVHM determines the capability of the system, modifies the system control authority to achieve the optimal system performance with the given degradations, and supports mission replanning to adjust the mission goals and plans to current vehicle capabilities. In this manner, the vehicle can successfully and safely complete its flight. Boeing has demonstrated the integration of IVHM and reconfigurable flight control using an FIA-18 manned flight simulation and hardware-in-the-loop actuator under a Navy-sponsored reconfigurable control and fault identification system (RCFIS). The major features of the RCFIS applications include: Expansion of diagnostic capability by detecting levels of actuator degradation that can be used to reduce cannot duplicates (CND) and false alarms in the current BIT systems Fusion of system- and component-level health assessment results Ability to enable a more accurate health assessment by performing structured tests of control system actuator components during flight when loads and temperature environment are present. These actuator tests are integrated with high-level reconfigurable control capability to negate the impact on the flight trajectory or ride quality

Appendix B

0

I77

Modification of actuator control limits in order to make the best use of the degraded actuator remaining capabilities and/or extend the life of the component Modification of system-level controls to compensate for the degraded component/subsystem capability or damage.

B.3.4 Related Publications 1.

2.

J. Sheppard and T. Wilmering, “Recent Advances in IEEE Standards for Diagnosis and Diagnostic Maturation,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 4-1 1,2006. B. Chidambaram and D. Gilbertson, “Recorders, Reasoners, and Artificial Intelligence Strategies for Integrated Diagnostics on Military Transport Aircraft,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, 2006.

-

3. Z. Williams, “Benefits of IVHM: An Analytical Approach,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, 2006.

4. G. Andresen, “Providing Best-Value IVHM Solutions for Aging Aircraft,” paper presented at the 9th Joint DoD/FAA/NASA Conference on Aging Aircraft, 2006. 5.

T. Larchuk, L. Fitzwater, R. Christ, and A. Schaff, “Demonstration of a System Reliability Approach for the Validation of the Application of Usage Monitoring to the Retirement of Fatigue Life Limited Flight Critical Rotorcraft Structure,” Proceedings of the American Helicopter Society 61st Forum, Grapevine, TX, June 1-3, 2005.

6.

L. Gokdere, S. Chiu, K. Keller, and J. Vian, “Lifetime Control of Electromechanical Actuators,” paper presented at the IEEE Aerospace Conference, Big Sky, MT,March 5-12,2005.

7 . A. Trego and G. Clark, “Structural Health Monitoring System: From Collection to Analysis,” Proceedings of the 2005 International Workshop on Structural Health Monitoring, Stanford University, Stanford, CA, September 12-14, 2005, pp. 17851792. 8.

K. Keller, K. Swearingen, and A. Del Amo, “General Reasoning System for Health Management,” paper presented at the North American Fuzzy Information Processing Society Conference, Detroit, MI, June 2005.

9.

B. Chidambaram, D. Gilbertson, and K. Keller, “Condition-Based Monitoring of an Electro-Hydraulic System Using an Open System Architecture,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, 2005.

10. X. Hu, G. Clark, M. Travis, J. Vian, and D. Wunsch 11, “Aircraft Cabin Noise Minimization via Neural Network Inverse Model,” paper presented at the International Joint Conference on Neural Networks, Montreal, QC, 2005. 11. T. Wilmering and J. Henmann, “A Practical Approach to Generalized Maintenance Information Integration in Support of Aerospace Closed-Loop Diagnostics and Maturation,” paper presented at the IEEE Autotestcon, Orlando, FL, 2005. 12. T. Wilmering and A. Ramesh, “Assessing the Impact of Health Management Approaches on System Total Cost of Ownership,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 2005.

178

Prognostics and Health Management of Electronics

13. E. N. Brown, B. Chidambaram, and G.B. Aaseng, “Applying Health Management Technology to the NASA Exploration System-of-Systems,” paper presented at the AIAA Space 2005, Long Beach, CA, 2005. 14. J. Huang, “Health Monitoring for Thermal Protection Systems of Space Vehicles,” Proceedings of the 5th International Workshop on Structural Health Monitoring, Stanford, CA, 2005. 15. J. Huang, “Structural Sensor Testing for Space Vehicle Applications,” paper presented at the SPlE Smart Structures and Materials and NDE for Health Monitoring and diagnostics, San Diego, CA, 2005. 16. J. Vian et al., “Indications of Propulsion System Malfunctions,” United States Department of Transportation Federal Aviation Administration, DOTIFAA AR-03/72, November 2004. 17. S. Black, K. Keller, G. Biswas, and J. Davis, “Diagnostic/Prognostic Modeling and Reconfigurable Control,” paper presented at the IEEE Autotestcon, San Antonio, TX, September 2004. 18. S. Black, K. Keller, K. Swearingen, M. Vandernoot, M. Hood, J. Urnes, and A. Page, “Reconfigurable Control and Fault Identification System (RCFIS),” paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 2004. 19. D. Followell, D. Gilbertson, and K. Keller, “Implications of an Open System Approach to Vehicle Health Management,” IEEE Aerospace Conference Proceedings, Vol. 6. pp. 3717-3724,2004. 20. G. Clark, M. Hafner, and J. Vian., “Electronic Systems Health Monitoring Using Electromagnetic Emissions,” Journal of Aerospace, Vol. 1 13, pp 1781-1786,2004. 2 1. T. Wilmering, “Approaches to Semantic Interoperability for Advanced Diagnostics Architectures,” paper presented at the IEEE Autotestcon, San Antonio, TX, September 2004. 22. S. Ofsthun and T. Wilmering, “Model-Driven Development of Integrated Health Management Architectures,” IEEE Aerospace Conference Proceedings, Vol. 6, pp. 3692-2704,2004. 23. S. Zaat, “Method and System for Discovering and Recovering Unused Service Life,” paper presented at Material Science and Technology Conference- Materials Damage Prognosis Session, New Orleans, LA, September 26-29,2004, 24. J. Vian, “Integrated Vehicle Health Monitoring for Electrical Systems Development Test-Bed Final Report,” NASA Glenn Research Center Contract NAS3-02067, Boeing Phantom Works, June 19,2003. 25. E. Saad, J. Choi, J. Vian, and D. Wunsch, “Query-Based Learning for Aerospace Applications,” IEEE Transactions on Neural Networks, Vol. 14, No. 6, pp. 1437-1448, November 2003. 26. H. Xiao, J. Vian, and D. Wunsch 11, “Neural Network Classification of Engine-Induced Aircraft Vibration Using Time-domain Signal Component Analysis,” paper presented at the Second International Conference on Computational Intelligence, Robotics, and Autonomous Systems, Singapore, December 15-1 8,2003.

Appendix B

I79

27. T. Wilmering, J. Yuan, and D. VanRossum, “A Metadata Architecture for Mediated Integration of Product Usage Data,” paper presented at the IEEE Autotestcon, Anaheim, CA, 2003. 28. T. Wilmering, “When Good Diagnostics Go Bad-Why Maturation Is Still Hard,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, 2003. 29. J. Amsell, “Enhanced Integrated Satellite Factory Test Environment,” paper presented at the Autotestcon, Anaheim, CA, 2003. 30. P. Goggin, J. Huang, E. White, and E. Haugse, “Challenges for SHM Transition to Future Aerospace Systems (Keynote Paper),” Proceedings of the 4th International Workshou on Structural Health Monitoring, Stanford, CA, 2003. 3 1. S. Ofsthun, “Integrated Vehicle Health Management for Aerospace Platforms,” IEEE Instrumentation and Measurement Magazine, Vol. 5, No. 3, pp. 21-24, September 2002. 32. B.G. Cameron and P. Shanthakumaran, “WAH-64 Apache HUMS Phase-1 Implementation,” Proceedings of the American Helicopter Society 55th Forum, June 11-13,2002. 33. Y. Lu, R. Christ, T. Puckett, R. Teal, and B. Thompson, “AH-64D Apache Longbow Structural Usage Monitoring System,” Proceedings of the American Helicopter Society 58th Forum, Montreal, June 11-13,2002. 34. S. Narayanan, R. Marks, J. Vian, J. Choi, M. El-sharkawi, and B. Thompson “Set Constraint Discovery: Missing Sensor Data Restoration Using Auto-associative Regression Machines,” paper presented at the World Congress on Computational Intelligence, pp. 2872-2877, Honolulu, HI, May 12-17,2002. 35. J. Vian, and M. Swayne, “Application of Integrated Vehicle Management Concepts to Automated Power Subsystems Final Report,” NASA Contract NAS3-278 12, Boeing Phantom Works, February 25,2002. 36. K. Keller and T. Wilmering, “JV 2020 and Beyond: Implications for Intelligent Support Architectures,” paper presented at the NDIA System Engineering Conference, Tampa, FL, 2002. 37. T. Wilmering, “Knowledge Representation Considerations for Integration of Diagnostic Maturation Information,” paper presented at the IEEE Autotestcon, Huntsville, AL, 2002. 38. E. Lavretsky and B. Chidambaram, “Health Monitoring of an Electro-Hydraulic System Using Ordered Neural Networks,” paper presented at the World Conference of Computational Intelligence, Honolulu, HI, 2002. 39. D. Gilbertson and B. Chidambaram, “Open System Architecture for Condition-Based Monitoring-A CORBAiXML Implementation,” Machinew Failure Prevention Technoloev, Virginia Beach, VA, 2002. 40. P.L. Koon and S. Greene, “Integration of Health Management and Support Systems Is Key to Achieving Cost Reduction and Operational Concept Goals of the 2nd Generation Reusable Launch Vehicle,” Proceedings of the SPIE-The International Society for Optical Engineering, Vol. 4733, pp. 37-48, 2002.

Prognostics and Health Management of Electronics

I80

4 1. J. Huang, “Structural Health Monitoring for Reusable Launch Vehicles-An Integrated System and Process Approach,” Proceedings of the 3rd International Workshop on Structural Health Monitoring, Stanford University, Stanford, CA, September 200 1. 42. K. Keller, D. Wiegand, K. Swearingen, C. Reisig, S. Black, A. Gillis, and M. Vandernoot, “An Architecture to Implement Integrated Vehicle Health Management Systems,” paper presented at the IEEE Autotestcon, Valley Forge, PA, 2001. 43. T. Wilmering, “Semantic Requirements on Information Integration for Diagnostic Maturation,” paper presented at the IEEE Autotestcon, Valley Forge, PA, 2001. 44. J. Vian, “Aerospace Applications of Neural Networks,” Proceedings of 2000 Irish Signals and Systems Conference (Keynote Address), Dublin, Ireland, June 29-30, 2000, pp. 48 1. 45. B. Austin, K. Edwards, E. Hennings, J. Vian, and N. Warner, “Active Network Guidance and Emergency Logic (ANGEL) Program,” Journal of Aerosuace, Vol. 109, pp 312-315,2000. 46. E. Saad, J. Choi, J. Vian, and D. Wunsch, “Efficient Training Techniques For Classification With Vast Input Space,” Proceedings of 1999 International Joint Conference on Neural Networks, Washington D. C., July 10-16, 1999, Vol. 2, pp. 1333-1338. 47. P. Ellerbrock, Z Halmos, and P. Shanthakuamran, “Development of New Health and Usage Monitoring System Tools Using a NASA/Army Rotorcraft,” Proceedings of the American Helicouter Society 55th Forum, May 25-27, 1999. 48. R. Teal, D. Miller, J. Evernham, D. Marquith, T. Larchuk, F. White, and D. Deibler, “Regime Recognition for MH-47E Structural Usage Monitoring,” Proceedings of the American Helicopter Society 53rd Forum, Virginia Beach, VA, 1997. 49. M. Sudolsky, “The Fault Recording and Reporting Method,” paper presented at the 1998 Autotestcon, Salt Lake City, UT, August, 1998, and 2001 Teledyne Controls User Conference, Los Angeles, CA, March 2001.

B.3.5 Related Patents U.S. Patents: 1. 4,215,412: Real time performance monitoring of gas turbine engines

2. 4,943,919: Central maintenance computer system and fault data handling method 3.

5,158,720: Method and system for continuous in situ monitoring of viscosity

4.

5,265,475: Fiber optic joint sensor

5.

5,372,426: Thermal condition sensor system for monitoring equipment operation

6.

5,525,796: Fiber optic sensing apparatus for detecting a fracture in a metallic workpiece and an associated method of attaching a fiber optic sensing element to the metallic workpiece

7.

5,8 14,729: System for in-situ delamination detection in composites

8.

5 3 3 1,176: Fluid flow measurement assembly

9.

5,952,836: Device and method for detecting workpiece fractures

Appendix B

181

10. 5,974,529: Systems and methods for control flow error detection in reduced instruction set computer processors 1 1. 6,006,163: Active damage interrogation method for structural health monitoring

12. 6,114,976: Vehicle emergency warning and control system 13. 6,115,656: Fault recording and reporting method 14. 6,3 17,658: Neurocomputing control distribution for failed control effectors 15. 6,354,140: Fluid leakage detector for vacuum applications 16. 6,532,426: System and method for analyzing different scenarios for operating and designing equipment 17. 6,574,537: Diagnostic system and method 18. 6,618,654: Method and system for discovering and recovering unused service life 19. 6,69 1,007: Vehicle condition monitoring system 20. 6,75 1,536: Diagnostic system and method for enabling multistage decision optimization for aircraft preflight dispatch 21. 6,778,906: System and method for ensuring retention of situational awareness by employing active network guidance and emergency logic 22. 6,789,007: Integrated on-board maintenance documentation with a central maintenance system 23. 6,8 14,330: Method and computer program product for controlling the control effectors of an aerodynamic vehicle 24. 10/9729 16 (pending): Ownership cost calculator for aerospace health management 25. 03/687 (pending): Methods and systems for analyzing engine unbalance conditions 26. 04/01 10 (pending): System, method, and computer program for real-time event identification and course of action interpretation

Prognostics and Health Management of Electronics

182

B.4

European Aeronautic Defence and Space Company

The European Aeronautic Defence and Space Company (EADS) is Europe’s premier aerospace and defence company. Founded in 2000, EADS includes the aircraft manufacturer Airbus, the world’s largest helicopter supplier Eurocopter, and the joint venture (MBDA), an international leader in missile systems. EADS is the major partner in the Eurofighter consortium, is the prime contractor for the Ariane launcher, develops the A400M military transport aircraft, and is the largest industrial partner for the European satellite navigation system Galileo. EADS employs about 110,000 people at more than 70 production sites. In 2006, the company generated revenues of over $54 billion. The company is subdivided into five divisions: Airbus, Military Transport Aircraft, Eurocopter, Space, and Defence and Security Systems.

B.4.1 Approach to PHM The EADS approach to PHM involves monitoring the usage and life-cycle environment of a system using a miniaturized integrated health monitoring device. Life-cycle loads, be they thermal, mechanical, chemical, physical, or electrical, may lead to performance or physical degradation of a product and reduce its service life. EADS aims to measure these loads insitu and use the load profiles in conjunction with models to assess the degradation due to cumulative load exposures. EADS has developed a mobile, wireless diagnostic tool for aircraft maintenance called sensor-based aircraft maintenance support (SAMS) for the Trent 900 and Alliance GP7200 engines on the Airbus A380 and the Trent 1000 and Genx engines for the Boeing 787; see Figure B. 1 . The system uses over 30 external and internal sensor inputs, discrete signals, and data from engine control via a digital data bus interface to continuously acquire and process data and detect anomalies. The monitoring system automatically senses aircraft and engine type and runs the software for cockpit indication of rotor imbalance and engine maintenance. The SAMS system also performs in-service analyses of vibration signals to identify potential bearing failures, enabling the operator to plan engine removal for maintenance. This helps to prevent service disruption and secondary damage to other engine parts. EADS claims this tool can provide accurate and fast location of defects in aircraft and thus shorten maintenance times and lower costs. The EADS “Clio” project involves the development of a time-stress measurement device (TSMD) for health monitoring. The aim of this work is to prototype an autonomous, low-power, nonintrusive, miniaturized, multiapplication TSMD that could be used in civil and military domains. The Clio TSMD keeps a continuous and detailed record of an aerospace system’s on-board environment so that, should a problem or a system failure occur, the causes can easily be traced. To have a clear picture of the on-board environment, measurements of temperature, humidity, shocks, vibrations, voltage, and other factors are gathered. In effect, the Clio TSMD works like a mini black box, collecting and storing a multitude of data over time. Initial work on the Clio TSMD began at the EADS Corporate Research Center in Suresnes, France. The EADS research team defined and patented a generic electronic architecture, which included communication between the Clio TSMD and its remote sensors by tneans of a serial bus. For this first-generation Clio TSMD, the integration of the electronics involved 3D technology to obtain a small cube measuring 43x33~41mm and weighing 150 g. This first prototype was also designed to allow up to 123 external sensors to be connected to it via the serial bus.

Appendix B

183

Figure B. 1: Photograph of EADS’ SAMS tool. (Courtesy of EADS.) EADS has funded CALCE at the University of Maryland to help develop miniaturized integrated hardware-software HUMS that can enable prognostic health monitoring of electronics in real time. Recent work focused on developing an even smaller and more autonomous second-generation Clio TSMD (also known as HUMS); see Figure B. 2 . The prototype is 20x40~15mm and weighs 10 g without the battery. This design also includes flash memory that can be inserted in a multimedia card support fixed under the homemade double-sided printed circuit board. The use of removable flash memory makes it possible to store practically unlimited data and avoids the need for downloads, which require relatively large amounts of energy. The only requirement is that users need to replace the multimedia cards once they are full. The validation of the EADS HUMS for PHM of electronic boards is in progress. As a first step, the device is being tested to provide prognostics for select board-level failure mechanisms. The integrated HUMS will include embedded algorithms that can reduce the raw sensor data and process the time-load signal for extracting the signal parameters (such as temperature range and ramp rates) required for damage assessment. A database of damage fractions was developed based on simulations and testing of the board, and the results were correlated with signal parameters. This database will also be embedded in the HUMS hardware to further enhance its capabilities to perform damage accumulation.

Prognostics and Health Management of‘ Electronics

184

Figure B. 2: Photograph of EADS’ integrated HUMS. (Courtesy of EADS.)

B.4.2 Related Publications 1.

V. Rouet, B. Foucher, N. Vichare, P. Rodgers, and M.G. Pecht, “Improvement of a Miniaturized Health Monitoring Device for Advanced Alerts,” paper presented at the International Conference on Condition Monitoring and Diagnosis, Changwon, Korea, April 2-5,2006,

2.

V. Rouet and B. Foucher, “Development and Use of a Miniaturized Health Monitoring Device,” Annual Proceedings-Reliability Physics (Symposium), 2004, pp, 645-646.

3.

K.W. Dittrich and G. Mueller, “In-Flight Measurement of Acoustic Background Noise for the Development of Impact Detection Algorithms,” Proceedings of SPIE-The International Society for Optical Engineering, Vol. 5046, pp. 263-271,2003,

4.

C. Boller, “Ways and Options for Aircraft Structural Health Management,” Smart Materials and Structures, Vol. 10, No. 3, pp. 432440, June 2001.

5.

J. Saniger, L. Reithler, D. Guedra-Degeorges, N. Takeda, and J.P. Dupuis, “Structural Health Monitoring Methodology for Aircraft Condition Based Maintenance,” Vol. 4332, pp. 88-97,2001.

x 6.

,

M. Kehlenbach, A. Horoschenkoff, M.N. Trutzel, and D. Betz, “Performance of FiberOptic Bragg Grating Sensors in CFRP Structures,” Proceedings of SPIE-The International Society for Optical Engineering, Vol. 4328, pp. 199-208, 200 I .

7 . V. Rouet and B. Foucher, “Time Stress Measurement Device on the way to Miniaturisation, Extension of Its Application Domain,” European Aeronautic Defense and Space Company, Service DCWBE, Cedex, France.

Appendix B

185

B.4.3 Related Patents European Patents: 1. EP 140123 1 : Apparatus and method for recording environmental data

2.

DE102004026711: Travel sensor for electronic components for use in aircraft, method for controlling electronic components aboard aircraft and electronic component

U.S. Patents: 1. 6,553,324: Method and device for detection of a defect in a sensor system

2 . 6,474,162: Micromechanical rate of rotation sensor (DRS) 3. 6,907,782: Micromechanical inertial sensor 4.

6,898,972: Micromechanical speed sensor

5. 6,564,637: Sensor having a resonance structure, especially an acceleration or rotation rate sensor, and a device for carrying out a self-test 6.

6,752,020: Device for measuring pressure, sound and vibration and method of analyzing flow on surfaces of structural parts

Prognostics and Health Management of Electronics

186

B.5

Emerson

Emerson makes process control systems, climate control technologies, power technologies, industrial automation, electric motors, storage systems, and professional tools. Revenues are over $17 billion. There are over 114,000 employees in 60 business units of Emerson, including: Emerson Network Power: power solutions for telecommunications and data network infrastructure Emerson Process Management: industry solutions that optimize process plant productivity through intelligent field devices, performance software, and consulting and engineering expertise Emerson Climate Technologies: heating, ventilating, air conditioning, and refrigeration technologies and systems for commercial and residential applications Emerson Industrial Automation: motion control systems and components; plastics joining and precision cleaning equipment; materials testing equipment and supplies Emerson Appliance and Tools: motors for a broad range of applications, appliances, and integrated appliance solutions and tools for both homeowners and professionals as well as home and commercial storage systems

B.5.1 Approach to PHM Astec Power (Astec), a subsidiary of Emerson Network Power. is a leading supplier of standard, modified, and custom AC-DC and DC-DC power supplies from 1 W to 6 kW. Astec has an international customer base and design and production operations on three continents. In October 2005, Astec introduced a new “intelligent” medium-power (IMP) series AC-DC power supply that expands upon its medium-power series. The iMP series contains integrated digital capabilities that provide intelligent communications and programmable control features such as output voltage, output sequencing, power factor correction, DC thresholds, output current limit set-point and type, overload temperature threshold, fan speed, and in-rush current. The controls are made possible by the intelligent inter integrated circuit (I’C) control communication protocol, which captures configuration and use information as well as parametric data for each output. The 12Cbus was developed by Philips Semiconductors as a simple bidirectional two-wire bus for efficient inter-IC control. This design concept solves many interfacing problems encountered when designing digital control circuits. The 12Cbus has become a popular standard that is now implemented in over one thousand different ICs and is licensed to more than 50 companies. In Astec’s IMP, the 12Cbus monitors voltage, current, and temperature. Programmable options include disabling all outputs when internal temperature exceeds safe operating range and providing warning before shutdown. More options can be programmed that can utilize the continuously monitored information to enable prognostics and equipment safety.

Appendix B

B.6

187

Expert Microsystems

Expert Microsystems, Inc. of Orangevale, California, develops software solutions for on-line sensor and data validation, equipment health monitoring and life prediction, and automated reasoning with uncertainty. The company’s flagship product, SureSenseG, is a real-time diagnostic monitoring software package that was developed as a result of Small Business Innovation Research (SBIR) contracts. Initially, SureSense was created in response to a NASA need to verify automatically the reliability of sensor input used to control the Space Shuttle’s main engines during launch.

B.6.1 Approach to PHM Expert Microsystems’ approach is to detect failures in equipment and sensors used in performance-critical systems. wherein erroneous system shutdowns due to unexpected failures are costly or unsafe. The approach is designed to transform sensor data into actionable knowledge of critical system events impacting system operation and reliability. Expert Microsystems’ approach is implemented by monitoring an asset’s instrument or data signals. The monitored asset can be any source of observable data, such as a machine, a human being, a business process, or a computer network. The technology is accomplished in four steps, these being prediction, detection, decision, and prognosis. The prediction step involves the use of one or more models describing the asset when it is operating correctly. For each new observation of the asset, the prediction model is used to estimate the expected values of the observed data parameters. In the detection step, the observed data values are compared to their estimated values to determine a pattern of agreement or disagreement betReen the observed and estimated parameters. The decision step correlates this pattern of agreement or disagreement with the most likely normal or abnormal state of the equipment. The prognosis step determines the probable remaining useful life of the monitored asset. Expert Microsystems’ approach includes mode partitioning algorithms that are patented and distinguish the company’s ability to reduce false alarms greatly by recognizing distinct operating modes and not alarming erroneously when changing between these modes. Expert Microsystems developed prediction model elements that compare historical sensor readings with current sensor readings to estimate the expected values of the observed data parameters. Originally, the prototype system was designed to validate 15 Space Shuttle main-engine (SSME) sensors in real time and detect sensor failures from start to shutdown command. The company has since structured the software to enable application to many process control environments, such as computer integrated manufacturing, power plants, and hazardous gas sensing and control systems. Expert Microsystems was awarded SBIR grants from the U.S. Department of Energy to produce software to determine whether signals from a nuclear power plant indicate an impending equipment problem. As part of this effort, the SureSense software was selected by the Electric Power Research Institute (EPRI) for use in its online instrument calibration and monitoring program. In SBIR work for the U.S. Air Force, the SureSense application was further developed for monitoring military turbine engines under test at Arnold Engineering Development Center. Expert Microsystems was awarded an SBIR electronic system prognostics development contract from NAVAIR in which the company will provide an automated decision tool for autonomic logistics that predicts when digital electronics will fail to meet their service requirements. Incipient fault-to-failure progression characteristics will be identified at the component and subsystem level to develop verifiable prognostic models driven by existing parameters and measurands. The project has the goal to distinguish between normal

Prognostics and Health Management of Electronics

188

equipment aging and slow failure modes. In Phase I, the company will develop selfcalibrating fault detection models that update dynamically and continuously to individual assets and to changes in individual assets over time. Useful life models and feasibility demonstration prototypes will be developed in subsequent project phases.

B.6.2 Related Publications 1.

R. Bickford, D. Malloy, J. Monaco, and D. Kidman, “Ground Test Data Validation using a Subscale F/A-22 Engine Inlet Empirical Model,” Proceedings of the ASMEAGTI Turbo Expo Conference, Barcelona, Spain, May 2006.

2.

R. Bickford and D. Malloy, “Ground Test Facility Implementation of a Real-Time Turbine Engine Diagnostic System,” AIAA-2005-4334, paper presented at the 41 st AIAAIASMEJSAE Joint Propulsion Conference, Tucson, AZ, July 2005.

3.

R. Bickford, E. Davis, R. Rusaw, and R. Shankar, “Development of an Online Predictive Monitoring System For Power Generating Plants,” Instrumentation, Control> and Automation in the Power Industry, Proceedings, Vol. 45, No. 421, pp. 137-146, 2004.

4.

K.C. Gross, V. Bhardwaj, and R.L. Bickford, “Proactive Detection of Software Aging Mechanisms in Performance-Critical Computers,” paper presented at 27th Annual IEEEMASA Software Engineering Symposium, Greenbelt, MD, December 2002.

5.

E. Liu and D. Zhang, “Diagnosis of Component Failures in the Space Shuttle Main Engines Using Bayesian Belief Networks: A Feasibility Study,” Proceedings of the International Conference on Tools with Artificial Intelligence, 2002, pp. 181-1 88.

6. R. Bickford, C. Meyer, and V. Lee, “Online Signal Validation for Assured Data Quality,” Proceedings of the International Instrumentation Symposium, Vol. 47, pp. 107-1 10,2001.

B.6.3 Related Patents G.S. Patents: I . 6,9 17,839: Surveillance system and method having an operating mode partitioned fault classification model 2.

6,898,469: Surveillance system and method having parameter estimation and operating mode partitioning

3.

6,892,163: Surveillance system and method having an adaptive sequential probability fault detection test

4.

6,609,036: Surveillance system and method having parameter estimation and operating mode partitioning

Appendix B

B.7

189

General Dynamics

General Dynamics, headquartered in Falls Church, Virginia, employs approximately 7 1,900 people worldwide and had 2006 revenue of $24.1 billion. The company is a market leader in mission-critical information systems and technologies; land and expeditionary combat systems, armaments, and munitions; shipbuilding and marine systems; and business aviation. General Dynamics Advanced Information Systems (GD-AIS), a division of General Dynamics, is a provider of transformational mission solutions in command, control, communications, computers, intelligence, surveillance, and reconnaissance.

B. 7.1 Approach to PHM In June 1996, General Dynamics won the contract to develop and build the U.S. Marine Corps’ new expeditionary fighting vehicle (EFV), formerly known as the advanced amphibious assault vehicle, or AAAV. The EFV is being designed to replace current AAV7 vehicles within the U.S. Marine Corps. It will enable Marines to perform surface assault from ships located beyond the line of sight. In addition, the EFV will provide armor protected, land mobility, and direct fire support during land combat operations. The system development and demonstration (SDD) phase, valued at S7 12 million, commenced in 2001. In November 2004, General Dynamics was awarded a $176 million modification contract to continue work on the EFV SDD phase. As part of the SDD phase, General Dynamics is developing a condition-based maintenance program for the EFV. Its approach uses prognostics to predict the remaining useful life of components on the EFV and therefore support smart maintenance decisions, reduce the number of depot overhauls over the vehicle life, and avoid collateral damage. General Dynamics has teamed up with Oceana Sensor Technologies, Inc., a wireless sensor company based in Virginia Beach, Virginia, to develop prognostic tools for the EFV drive-train components using Oceana Sensor’s intelligent component health monitor wireless data acquisition system. At GD-AIS, researchers have developed a sensor system for detecting unwanted deflections and vibrations in gas-turbine engine blades using an eddy current sensor. The standard configuration of the blade tip sensing system consists of four eddy current sensors, a four-channel electronics unit, a feature extractor, and a digital signal processor hosted on a personal computer. According to program manager Jerry Mulholland, “This sensor, about the same diameter as a pencil, is placed in the turbine casing, and generates a balanced magnetic field that is disturbed as the metal blade tip passes through it. The change in magnetic flux indicates blade tip time-of-arrival and clearance. Blade tip motion provides insight into blade vibration, which in turn allows for detection and avoidance of resonant response, foreign-object damage detection, blade damage identification, and assessment of the structural life of the blade.” Progress has been focused on the technical aspects of blade tip sensing-the development and validation of the sensor, processing, and algorithms. Specific issues were the creation and implementation of robust, reliable algorithms, transition of the sensor to a manufacturing environment, and the evaluation of flight data. In addition, there was development of diagnostic and prognostic assessment functions based on sensor data (clearance measurement, foreign-object impact detection, integral-vibration measurement, and stall detection) collected from field tests.

Prognostics and Health Management of Electronics

I90

B. 7.2 Related Publication 1.

M. Dowell, G. Sylvester, R. Krupp, and G. Zipfel, “Progress in Turbomachinery Prognostics and Health Management via Eddy-Current Sensing,” paper presented at the IEEE Aerospace Conference Proceedings, Vol. 6, pp. 133-143,2000.

B. 7.3 Related Patents U.S. Patents:

1.

6,882,158: Series arc fault diagnostic for aircraft wiring

2.

6,777,953: Parallel arc fault diagnostic for aircraft wiring

Appendix B

B.8

191

General Electric

General Electric (GE) is one of the top organizations with operations in more than 100 countries. GE’s products and services range from appliances, aviation, consumer electronics, electrical distribution, energy, finance, health care, lighting, oil and gas, media and entertainment, rail, security, and water. GE Global Research is one of the world’s largest industrial research organizations and has been vital for the growth and progress of GE. GE Global Research has its headquarters in Niskayuna, New York. It also has facilities in Bangalore, India; Shanghai, China; and Munich, Germany.

B.8.1 Approach to PHM GE’s PHM efforts are geared toward achieving improved reliability, availability, and performance of its high-value equipment. GE Research has many subdivisions that work on specific areas of business interest. GE’s Remote Prognostics Lab develops methods for conducting in situ diagnostics and prognostics for its products such as gas and steam turbines, generators, coal plants, and so on. It uses physics-based models along with empirical models for prognostics. The lab is also developing real-time reduced order modeling for failure prediction and advanced sensor application for diagnostics and prognostics. This lab uses simulation equipment to develop new real-time diagnostics and prognostics. The industrial artificial intelligence lab, comprised of engineers, scientists, mathematicians, and researchers from multiple fields, uses state-of-the-art intelligent reasoning technologies to model, predict, and manage GE’s products and assets. The team at the industrial artificial intelligence lab focuses on computational algorithms, prognostics and health management, condition-based maintenance, and optimization and decision support. The sensor technologies lab focuses on new technology in ion mobility spectroscopy, infrared subsystems, terahertz devices, and electromagnetic sensors. The sensor technology spans from security systems to make air travel safer to inspection technology for wind mills and position monitoring in minimal invasive surgery. The sensor informatics technology laboratory is comprised of computer scientists and engineers dedicated to creating the systems, algorithms, and tools in the areas of wireless communication and low-cost sensing. Remote monitoring and diagnostics is a tool in a variety of services offered by GE such as GE Healthcare, GE Rail, GE Aviation, and GE Energy. Over 100,000 pieces of fielded equipment are monitored and the collected information is processed automatically enabling the services GE provides. The computational intelligence lab works primarily to develop novel ways to identify anomalies and patterns within very large or sometimes very small data sets containing quantitative and qualitative data. Areas of work include identifying frauds in financial statements, early warning of incipient equipment failure, and identifying good loan prospects. Techniques used are machine learning, text extraction, natural language understanding, information fusion, information visualization, and so on. The prediction algorithms lab focuses on prediction algorithms to drive a paradigm shift from reactive diagnosis to proactive and predictive prognosis. The team works in the technology areas of predictive reliability and modeling, time-series forecasting, econometrics, risk analytics, quantitative finance, prediction fusion technologies, machine learning, probabilistic text mining, pattern recognition, data visualization techniques, and data mining.

Prognostics and Health Management of Electronics

192

B.8.2 PHM Activities GE’s arc fault detection technology is specifically targeted at aerospace applications, wherein it is used to avoid any serious damages to airplanes and their associated systems due to electrical and wiring problems. It provides protection against series and parallel electrical arcing faults. Arc fault circuit breakers are able to distinguish normal electrical load waveforms from arcing events that can lead to overheated wiring. It has been shown that this overheated wiring may sometimes even lead to serious fire. Electrical arcs can be caused by worn, damaged, or contaminated insulation in wiring or connectors. In case of a fault, the electrical circuit is deenergized. The arc fault detection technology is also being integrated into the control systems of GE’s latest power distribution system Prognostics and integrated vehicle health management (IVHM) is the implementation of PHM in vehicles to monitor an electrically powered vehicle’s systems health. The overall health of the vehicle is monitored using microprocessors embedded in digitally controlled power distribution systems. The data transmitted to and from these controllers can be used to characterize the system and component operating signatures. These capabilities help ensure a high system reliability as well as reduce life-cycle ownership costs. GE has developed and demonstrated health management algorithms for electric actuation and fuel pumpsivalves. The algorithms were developed based on engineering approaches supported by data collected from seeded and accelerated run-to-failure laboratory testing. A demonstration of the integration of the technology with hardware (electric actuator, fuel pump, fuel valve, arc fault, and power distribution unit) has been performed. A nonintrusive technology to detect the location of wiring faults is being developed, the objective of which is to automate the process of time domain reflectometry (TDR) waveform interpretation and incorporate the technology in to the power and utilities systems of the future, providing users with unrivalled maintenance information. The Technologies and Techniquies for new Maintenance (TATEM) project is another approach of GE toward PHM. This project is led by GE Aviation with the objective of researching and validating technologies and techniques for reducing maintenance-related costs in the aviation industry. The research in the project includes new maintenance philosophies, technologies, and techniques to develop new approaches for maintaining aircraft structure, avionics, utilities, landing gear, and engines. The project has defined the architecture of a future integrated health management approach to aircraft maintenance and currently the focus is to build and physically integrate the critical elements of this architecture at a subsystem, system, aircraft, and fleet level. One of the main avenues in the new approach is the use of prognostics to enable predictive maintenance planning.

B.8.3 Related Publications 1.

P.P. Bonissone and N. Iyer, “Soft Computing Applications to Prognostics and Health Management (PHM): Leveraging Field Data and Domain Knowledge,” F. Sandoval, A. Prieto, J. Cabestany, M. GraAa, in Computational and Ambient Intelligence, Springer, Berlin, 2007, pp. 928-939.

2.

F. Xue, W. Yan, N. Roddy, and A. Varma, “Operational Data Based Anomaly Detection for Locomotive Diagnostics,” available: http://ww 1.ucmss.comlbooks/LFS/ CSREA2006/MLM4028.pdf, accessed January 13,2008.

3.

N. Roddy, A. Varma, and S. Linthicum, “Managing Case Quality in a CBR Diagnostic System,” paper presented at the International Conference on Artificial Intelligence, 2005.

Appendix B

193

4. S. Kumar, G. Sulzberger, J. Bono, D. Skvoretz, G.I. Allen, T.R. Clem, M. Ebbert, S.L. Bennett, R.K. Ostrom, and A. Tzouris, “Underwater Magnetic Gradiometer for Magnetic Anomaly Detection, Localization, and Tracking,” in Detection and Remediation Technologies for Mines and Minelike Targets, Vol. 12, R.S. Harmon, J.T. Broach, and J.H. Holloway, Jr., Eds., SPIE-International Society for Optical Engine, January 2007. 5.

M. Devaney and B. Cheetham, “Case-Based Reasoning for Gas Turbine Diagnostics,” paper presented at the 18th International FLAIRS Conference (FLAIRS-05), Clearwater Beach, FL, May 2005.

6.

W. Cheetham, A. Varma, and K. Goebel, “Case-Based Reasoning at General Electric,” GE Research & Development Center, 2001 CRD045, March 200 1.

7.

R. Rausch, D. E. Viassolo, A. Kumar, K. Goebel, N. Eklund, B. Brunell, and P. Bonissone, “Towards In-Flight Detection and Accommodation of Faults in Aircraft Engines,” paper presented at the AIAA 1st Intelligent Systems Technical Conference, Chicago, IL, 20-22 September, 2004.

8.

A. T. Patera, “General Electric Research Team Mathematics Project,” Final report, September 30, 2003- September 29, 2004, available: http://en.scientificcommons.org/.

9.

A. Varma and N. Roddy, “ICARUS: Design and Deployment of a Case-Based Reasoning System for Locomotive Diagnostics,” Engineering; Auulications of Artificial Intelligence, Vol. 12, No. 6, pp. 681-690, 1999.

10 N. H. Eklund and K. F. Goebel, “Using Neural Networks and the Rank Permutation Transformation to Detect Abnormal Conditions in Aircraft Engines,” IEEE MidSummer Workshop on Soft Computing in Industrial Applications, Espoo, Finland, June. 2005.

B.8.4 Related Patents U.S. Patents: 1.

7,254,5 14: Method and system for predicting remaining life for motors featuring online insulation condition monitor

2.

7,266,5 15: Method and system for graphically identifying replacement parts for generally complex equipment

3.

7,299,697: Method and system for inspecting objects using ultrasound scan data

4.

7,254,514: Method and system for predicting remaining life for motors featuring online insulation condition monitor

5.

7,293,462: Isolation of short-circuited sensor cells for high-reliability operation of sensor array

6.

7,286,960: Systems and methods for monitoring fouling and slagging in heat transfer devices in coal fired power plants

7.

5,584,298: Noninvasive hemodynamic analyzer alterable to a continuous invasive hemodynamic monitor

8.

5.420,5 18: Sensor and method for the in situ monitoring and control of microstructure during rapid metal forming processes

Prognostics and Health Management of Electronics

194

9.

7,293,400: System and method for sensor validation and fusion

10. 7,280,941 : Method and apparatus for in-situ detection and isolation of aircraft engine faults

1 1. 7,286,923: System and method for estimating turbine engine deterioration rate with noisy data 12. 7,254,001 : Circuit protection system 13. 5,530,43 1: Anti-theft device for protecting electronic equipment

Appendix B

B.9

195

General Motors

General Motors Corporation (GM), the world‘s largest automaker, has been the annual global industry sales leader for 76 years. It was founded in 1908 and currently employs about 284,000 people around the world. GM has its global headquarters in Detroit and manufactures its cars and trucks in 33 countries. In 2006, 9.1 million GM cars and trucks were sold globally under the following brands: Buick, Cadillac, Chevrolet, GMC, GM Daewoo, Holden, HUMMER, Opel, Pontiac, Saab, Saturn, and Vauxhall. GM’s On-Star subsidiary is the industry leader in vehicle safety, security, and information services. The largest national market of GM is the United States, followed by China, Canada, the United Kingdom, and Germany.

B.9.1 Approach to PHM GM follows a unique prognostics approach with focus on safety and diversity of its products. It works to make each of its new models safer than the one it replaces. It is a leader in global research, engineering, and innovation to improve road safety and reduce injuries and fatalities. This is achieved by the application of technologies such as On-Star and StabiliTrak to its vehicles so as to reduce the chances of failures and hence enhance the safety. One of GE’s current objectives is to apply these technologies to all its vehicles by 20 10. General Motors’ On-Star Vehicle Diagnostics is the first of its kind in vehicle prognostics in the auto industry. This service collects valuable maintenance information on four of the vehicle’s key operating systems from hundreds of diagnostic checks and sends a personalized e-mail directly to the owner. This facility is expected to be available on most GM retail vehicles in the United States and Canada by the end of 2008. Currently, On-Star provides services to more than 4 million subscribers in the United States and Canada. It helps the vehicle owners in improving the fuel economy and reducing the maintenance cost. It has a feature of monthly diagnostic e-mail that allows drivers to closely monitor conditions that can affect their fuel efficiency, including tire pressure and vehicle emissions. The On-Star Vehicle Diagnostics offerings include: (1) Tire Pressure Monitoring-This feature would notify consumers of their vehicles’ current tire pressure and the manufacturer’s recommended tire pressure. Additionally the in-vehicle tire pressure monitor provides a real-time warning when a tire is significantly underinflated. On-Star Vehicle Diagnostics notifies drivers each month of the actual psi reading for each tire, thereby allowing consumers to make small adjustments that can improve fuel efficiency. (2) E85 Ethanol Compatibility-Subscribers are notified via e-mail if their vehicles are E85 ethanol compatible and links owners to GM’s website, www.livegreengoyellow.com, which links to the National Ethanol Vehicle Coalition’s (NEVC) E85 ethanol station on-line look-up tool. (3) Enhanced Vehicle Emissions Data-E-mail communications from On-Star includes detailed vehicle power train emissions data, thereby helping drivers make decisions on maintenance of the vehicle. (4) Enhanced Oil-Life Monitoring-E-mail alerts to indicate the mileage at which the next oil change will be needed based on current driving patterns are provided, thus allowing drivers to know when an oil change is likely to be necessary. Another prognostic tool from GM is the StabiliTrak Smart System. An intelligent chassis control system (ICCS) extends the ability of the driver to avoid a traffic hazard far beyond just braking and steering. When the inputs obtained by the on-board computer from steering, braking, shocks, transmission, loads, and other places are used to assist the driver, it becomes a powerful tool. The intelligent electronic optimizes the vehicle control and

196

Prognostics and Health Management of Electronics

handling by monitoring the road surfaces and evaluating the vehicle's motion. GM's advanced intelligent chassis control networks the sophisticated computer software and directional sensors with a vehicle's suspension, steering, throttle, transmission, antilock braking, and traction control systems to evaluate the dynamic conditions as and when needed. All these systems greatly help the driver to maintain the control of the vehicle and improved the handling on bad roads. GM's first application of ICCS technology began with road sensing suspension in 1996. The idea of a total vehicle stability enhancement system (VSES) was born as StabiliTrak on three Cadillac models when engineers added electronic throttle controls in 1997. StabiliTrak is part of GM's initiative for technology use for safety, maneuverability, and driveability enhancement of all of its cars and trucks. StabiliTrak selectively and automatically manipulates the brakes and/or throttle to help follow the path the driver intends, either through more predictable steerability or by minimizing fishtailing during cornering. This helps diffuse potentially dangerous situations involving oversteer or understeer. A system of sensors and microprocessors continually monitors the vehicle's movement dynamics. By combining data from wheel speed sensors, steering angle sensors (direction), yaw sensors (rate the vehicle body rotates around its axis), and lateral acceleration sensors, the main computer knows immediately when a vehicle is not behaving in a way the driver intends. Inputs like steering wheel and throttle pedal position, help the processor understand how to improve traction at any slipping wheel and restore the vehicle's intended path. The StabiliTrak system optimizes tire contact with the road surface and thus provides a smooth, transparent stability and traction assistance whenever required. It restores control with minimal fuss and alarm by stabilizing the vehicle at critical times, thus providing owners with safety advantages that are way beyond the ability of even experienced drivers.

Appendix B

B.10

197

GMA Industries

GMA Industries, Inc. is a research and development firm founded in 1990 and located in Annapolis, Maryland. GMA is involved in a wide range of engineering problems such as test technology, image processing, medical applications, nanotechnology, computer graphics and animation, and data compression.

B.10.1 Approach to PHM GMA focus in the area of health monitoring lies in developing advanced sensors and software tools for enabling effective diagnostics of avionics circuit boards for maintenance. GMA is developing a technology for maintenance of shop replaceable unit (SRU) circuit boards by designing molecular test equipment (MTE) embedded within ICs to enable them to continuously test themselves during normal operation and to provide a visual indication that they have failed. The MTE is fabricated and embedded within the individual IC in the chip substrate. The molecular-sized sensors can be used to measure voltage, current, and other electrical parameters as well as sense changes in the chemical structure of ICs that are indicative of pending or actual circuit failure. This research focuses on the development of specialized doping techniques for carbon nanotubes to form the basic structure comprising the sensors. The integration of these sensors within conventional IC devices as well as using molecular wires for the interconnection of sensor networks is a focus of this research area. Nanoprobes contain MTE and/or provide a communications conduit that connects between the surface and the substrate of the IC at various functional areas. An electrochemical indicator placed on the surface of the chip receives signals from the MTE and nanoprobes, initiating a chemical reaction that provides a visual indication to a technician that the IC is faulty. MTE devices are designed to communicate an indication of a failure event to the failure indicator on the surface of the IC using substrate-based and nanoprobe-based methods. The placement and routing of the nanoprobes are through the cross section of the IC package and the possible distribution across the failure indicator (see Figure B. 3). The nanoprobe may vary in length from one or two substrate layers or may extend all the way to the surface of the IC, depending on the scope and function of the MTE it contains. Communication of a failure from the nanoprobe to the failure indicator would be accomplished using signal paths in the same manner as for substrate MTE. Implementation of MTE nanotechnology devices is accomplished through the use of molecular OR, AND, and XOR gates. The structures of these circuits are analogous to the conventional gate designs; however, they are molecular in size. This project is sponsored by the U.S. Air Force, Ogden Air Logistics Center. GMA has a few conference publications on this work but no actual prototype or products have been reported. Another project at GMA is the development of software to support large-scale longitudinal analyses of electronic equipment test performance data and visualization of test results for fault diagnosis and resolution. This software is to aid in the performance of remote testing and recognition of changes in the tolerances of equipment performance and other areas that effect diagnostic ability. It may also facilitate taking appropriate corrective action to predict and/or compensate for such behavior before significant mission impact or failure occurs. This project is sponsored by the U.S. Air Force, Ogden Air Logistics Center.

Prognostics and Health Management of Electronics

198

Figure B. 3: Cross section of self-diagnosing integrated circuit.

B.10.2 Related Publications I.

E. Keenan, R.G. Wright, R. Mulligan, and L.V. Kirkland, “Terahertz and Laser Imaging for Printed Circuit Board Failure Detection,” IEEE Systems Readiness Technology Conference Proceedings Autotestcon 2004, September 2 1,2004.

2.

E. Keenan, R.G. Wright and L.V. Kirkland, “Failure Diagnosis of Integrated Circuits Using IR Laser Images,” paper presented at the IEEE Aerospace Conference, March 815,2003, VOI. 6, pp. 2557-2564.

3.

R.G. Wright, M. Zgol, D. Adebimpe, and L.V. Kirkland, “Functional Circuit Board Testing Using Nanoscale Sensors,” IEEE Systems Readiness Technology Conference Proceedings Autotestcon, September 22-25,2003, pp. 266-272.

4.

R.G. Wright and L.V. Kirkland, “Nano-Scaled Electrical Sensor Devices For Integrated Circuit Diagnostics,” IEEE Aerospace Conference Proceedings, Vol. 6, pp. 2549-2555, March 2003.

5.

R.G. Wright, M. Zgol, S. Keeton, and L.V. Kirkland, “Nanotechnology-Based Molecular Test Equipment (MTE),” IEEE Aerospace and Electronic Systems Magazine, Vol. 16, No. 6, pp. 15-19, June 2001.

Appendix B

B.11

199

Honeywell

Honeywell International is a $27 billion diversified technology and manufacturing company, serving customers worldwide with aerospace products and services; control technologies for buildings, homes, and industry; automotive products; turbochargers; and specialty materials. It is a major supplier of engineering services and avionics for NASA, OEMs such as Boeing and Airbus, and the U.S. Department of Defense. The company is headquartered in Morristown, New Jersey, with major facilities in Phoenix and Tucson, Arizona, and Minneapolis, Minnesota. Honeywell is a Fortune 50 company with a workforce of over 100,000 employees. There are four divisions of Honeywell: Aerospace, Automation and Control Solutions, Specialty Materials, and Transportation Systems.

B . l l . l Approach to PHM Honeywell has many approaches to PHM from systems to subsystems. The listed experience covers platform health management, business and regional jet engine health management, and mechanical drive trains. Integrated vehicle health management (IVHM) represents integrated systems that include advanced sensors, model-based reasoning systems, diagnostic and prognostic software, and intelligent software managers and planners. These technologies will allow the collection, processing, and integration of information about the vehicle’s health and provide critical advice to astronauts, mission control, and maintenance repair teams. Once IVHM is implemented, Honeywell claims it will diagnose the root cause of a system failure, furnish data and recommend solutions in real time, provide prognostic capability to identify potential issues before they become critical, and capture and retain knowledge. Honeywell is the contractor designing, developing, and implementing the Future Combat System (FCS) Platform Soldier Mission Readiness System (PSMRS). This program is for the design and development of platform health management software for all FCS vehicles. The software addresses physical and functional availability of each subsystem and does a complete platform “state” assessment in relation to the mission it is tasked to perform. Tasks include systems engineering, software design and development, architecture design, validation and verification, and integration in vehicles. The predictive trend monitoring (PTM) tool trends engine performance and provides early warning of imminent engine failures to prompt preventative actions. This information allows airlines to avoid costly delays, cancellations, and unnecessary maintenance. The original PTM release provided trending of exhaust gas temperature and basic sensor checks. The latest PTM version went into service on November 2 1. 2005, providing advanced capabilities for 33 1-350 auxiliary power units and LF507 propulsion health management. With PTM Release 3 .O, operators receive regression trending of exhaust gas temperature including margin detection, auto shutdown decoding and reporting, oilAubrication system trending, and directed troubleshooting through Honeywell’s aircraft maintenance and operations support system diagnostics. Honeywell is also developing commercial IVHM system technology, which collects and processes vehicle health information throughout every phase of operation. Commercial aircraft examples include the 777 Central Maintenance Computer designed, developed, and implemented by Honeywell for Boeing. Honeywell is also developing a similar system for the new Boeing 787 Dreamliner aircraft. Currently, Honeywell is developing IVHM for spacecraft such as NASA’s Crew Exploration Vehicle program and other space applications such as mission control and ground support.

Prognostics and Health Management of Electronics

200

Honeywell has a number of available and emerging ePHM technologies. One of these technologies is called Corrosion & Corrosivity Monitoring System (C’MS). The C’MS, developed for rotorcraft platforms, consists of two types of sensors. One type of sensor, located in the main gearbox of a rotorcraft, detects galvanic corrosion in the mounting fasteners by detecting when there is a failure in the protective coatingisealant system and moisture has penetrated into the fasteners. The second type of sensor, located in the floorboard compartment of a rotorcraft, continuously monitors the corrosivity of the environment inside the compartment and provides a recommended maintenance date. Another Honeywell technology, Expeditionary Force Vehicle (EFV) Drive Train Prognostics System (DTPS), is being developed for marineiautomotive applications to perform condition-based maintenance on the vehicle and provide failure warnings during missions. The prognostics capability covers the power transfer module for marine applications and the PTM, transmission, and final drive assemblies (vehicles with tracks) for automotive applications. Another technology being developed at Honeywell is called planetary gearbox diagnostics/prognostics. This technology encompasses gearbox vibration analysis and prognostics algorithms and was acquired by Honeywell from the Australian Defense Science and Technology Organization. Honeywell has demonstrated the prognostics algorithms on their Chadwick Division VXP health and usage monitoring system and is currently seeking quality test data to mature the analysis methods. The Honeywell Laboratories is developing smart sensors based on MEMS technology for avionics and mechanical systems. MEMS sensors that have been developed and applied include RF communication components, single- and multisensor arrays, and sensors colocated with processing electronics. These devices provide integrated sensing and processing capabilities and transfer health management processing onto the system being monitored. These smart sensors are also used to autonomously quantify and qualify the environment surrounding avionics and mechanical systems by monitoring a wide variety of parameters that range from EMI, fatigue loading, thermal cycling, vibration and shock levels, acoustic emissions, and corrosive environments. Honeywell Laboratories developed a distributed shipboard system to perform diagnostics and prognostics on mechanical equipment (e.g., engines, generators, and chilled water systems) for the Office of Naval Research (ONR). This condition-based maintenance system, called MPROS for iMachinery Prognostics System, consists of MEMS and conventional sensors on the machinery, local intelligent signal processing devices, and a centrally located subsystem (called the PDME for prognostics, diagnostics, monitoring engine), which is designed so that it can run under shipboard monitoring systems such as an integrated condition assessment system (ICAS). MPROS includes and augments periodic vibration analysis by collecting data continuously from vibration and other sensors, including temperature, pressure, current, and voltage. These data streams are integrated as necessary in the signal processing devices.

B.11.2 Related Publications 1.

B. Haowei, M. Atiquzzaman, and D. Lilja, “Wireless Sensor Network for Aircraft Health Monitoring,” Proceedings, First International Conference on Broadband Networks, Los Alamitos, CA, 2004, pp. 748-750, IEEE Computer Society.

2.

G. D. Hadden, P. Bergstrom, B. H. Bennett, G. J. Vachtsevanos, and J. Van Dyke, “Distributed Multi-Algorithm Diagnostics and Prognostics for US Navy Ships,” Proceedings of the 2002 AAAI Spring Svmposium,2002.

Appendix B

20 1

0. Uluyol, A. L. Buczak, and E. Nwadiogbu, “Neural-Networks-Based Sensor Validation and Recovery Methodology for Advanced Aircraft Engines,” Proceedings of the SPIE-The International Society for Optical Engineering, Vol. 4389, pp. 102-109, 200 1. G. D. Hadden, P. Bergstrom, G. Vachtsevanos, B. H. Bennett, and J. Van Dyke, “Shipboard Machinery Diagnostics and PrognosticsKondition Based Maintenance: A Progress Report,” 2000 IEEE Aerospace Conference Proceedings, Vol. 6, pp. 277-292, 2000. B. H. Bennett and G. D. Hadden, “Condition-Based Maintenance: Algorithms and Applications for Embedded High Performance Computing,” Proceedings of the 4th International Workshop on Embedded HPC Systems and Applications (EHPC’991, pp. 1418-1438, 1999. S. A. Lewis and T. G. Edwards, “Smart Sensors and System Health Management Tools for Avionics and Mechanical Systems,” 16th DASC, AIAALEEE Digital Avionics Systems Conference, Reflections to the Future, Proceedings, Vol. 2, pp. 8.5.1-8.5.7, 1997. R. A. Braunling and P. F. Dietrich, “Corrosion and Corrosivity Monitoring System,” Proceedings, Smart Structures and Materials 2005 : Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems, pp. 485492,2005.

B.11.3 Related Patents U.S. Patents: 1. 6,456,928: Prognostics monitor for systems that are subject to failure 2.

6,928,345: Vehicle health management system

3.

6,845,952: Flywheel prognostic health and fault management system and method

4.

6,907,4 16: Adaptive knowledge management system for vehicle trend monitoring, health management and preventive maintenance

202

B.12

Prognostics and Health Management of Electronics

Impact Technologies

Impact Technologies is a prognostics and health management system's development company providing advanced solutions and software tools to the aircraft, marine, land-based equipment, power, nuclear, and defense industries. Headquartered in Rochester, New York, with additional offices located in State College, Pennsylvania, and Atlanta, Georgia, Impact employs about 95 employees as of January 2007. Impact has been involved in numerous diagnostics, prognostics, and vehicle health management efforts and has developed various real-time algorithms for evaluating, detecting, and predicting failures in electrical, fluid, and mechanical systems.

B.12.1 Approach to PHM The Impact approach to health monitoring is focused on feature-based diagnostics and prognostics that are implemented through identifying key prognostic features that correlate with failure progression and can be combined appropriately with physics-of-failure models to predict future behavior. Correlated health features are tracked and trended over the system's life and compared with the model-based useful life remaining estimates to provide collaborative evidence of a degrading or failing condition. One of Impact's software products is called PHM DesignTM, which is focused on designing, developing, evaluating and, deploying prognostic and health management or condition-based maintenance systems. The PHM design models are built on a system level using sensors, algorithms (BIT, diagnostic, and prognostic), failure modes, symptoms, effects, and maintenance tasks to demonstrate PHM system functionality. Additional Impact products are SignalProTMand ReasonProTM,which are a suite of software applications that can be implemented for advanced avionics health management. Specific applications of these products are ReasonProTM-At-Wing, and ReasonProTMEmbedded, which provide reasoning and capture of diagnostic and prognostic information on-board, at-wing, and throughout component test and repair. ReasonProTM-At-Winghas been developed and tested on Windows CE based PDA devices as a ground support application. The graphical user interface is a .Net-developed application that integrates multiple data management and reasoning modules. The main reasoning engine is a module developed in C programming language and can be compiled to multiple targets. Impact has demonstrated automated ground station database upload with dynamically collected maintenance. ReasonProTM-Embeddedis a flexible software module for deployment on-board an aircraft and will reside in an avionics operational unit. The software will provide real-time PHM capability on currently deployed systems. The entire SignalProTM and ReasonProTM software suite contains Electronic Maintenance Action Forms (eeMAF), which provides transfer of digital maintenance data between ships and shore facilities. In addition, the software implements portable handheld PDA-type data acquisition and reasoning; incorporates key inputs such as pilot squawk, BIT, operational and environmental data, aircraft maintenance history, and component maintenance history; and employs evidence-based and Bayesian network reasoning for health assessment.

Appendix B

203

B.12.2 Case Studies in PHM In 2005, Impact Technologies published two specific case studies on electronic prognostics. In the first case study, “Prognostic Health Management for Avionics System Power Supplies,” Impact found that switching transistors, filtering capacitors, and rectifying diodes are the critical components that commonly fail in an electronic switch-mode power supply (SMPS) unit [l]. The approach used to predict failures in SMPS was feature-based incipient fault detection using sensed parameters. The monitoring parameters were temperature, input and output power, output voltage, output current, and efficiency. These data were analyzed using advanced fault detection and damage accumulation algorithms to predict remaining useful life of SMPS components. The accuracy of the failure progression models and diagnostic features for incipient fault detection were validated through accelerated failure tests on a commercially available computer power supply. In the second case study, “Electronic Prognostics-A Case Study Using Global Positioning System (GPS),” Impact demonstrated that the remaining useful life of a commercial GPS system could be predicted within plus or minus five thermal cycles [3]. The failure modes for GPS included ( I ) precision failure due to increase in position error and ( 2 ) solution failure due to increased outage probability. These failure progressions were monitored in situ by recording system-level features reported using the NMEA protocol. The GPS was characterized to collect the principal feature value for a range of operating conditions. The approach was validated by conducting an accelerated thermal cycling of the GPS with the offset of the principal feature value measured in situ. Based on experimental results, parametric models were developed to correlate the offset in the principal feature value with solution failure. Impact claims that this approach can be implemented with no external circuit requirements, sensors, or system changes since feature extraction is performed using the 24 active satellites of the GPS constellation. Impact has recently developed an integrated diagnostic/prognostic system for assessing the remaining useful life of aircraft digital electronic boards. Implementation of the PHM approach builds on a foundation of usage monitoring, incipient fault detection, and physics-of-failure modeling to enable an ability to relate health state information, including operational history and detection of incipient faults to accurate RUL predictions at any point in the component‘s life cycle. Impact has been successful in detecting discernable degraded health states of critical digital components, specifically the processor, prior to BIT detection. Impact formulated an in-depth understanding of the prevalent failure mechanisms and the associated physics of failure affecting semiconductor devices, specifically targeting those found on avionic systems. Specific fault-to-failure and system dependency inference models were developed and tuned based on accelerated failure testing and fault injection simulation efforts to provide increased confidence in the developed technologies and the prognostic output. Impact Technologies’ SignalProTManalysis engine was used to evaluate electronics system performance by employing a combination of signal processing, statistics, and datadriven modeling techniques. Two models, a combined failure model and life consumption model, have been developed and evaluated. The combination of different failure or aging mechanisms produced a cumulative effect that contributed to the combined failure model. A life consumption model has been derived from the physics-of-failure research regarding MOSFET failure mechanisms.

Prognostics and Health Management of Electronics

204

B.12.3 Related Publications 1.

R. Orsagh, D. Brown, M. Roemer, T. Dabney, and A. Hess, “Prognostic Health Management for Avionics System Power Supplies,” paper presented at the IEEE Aerospace conference, Big Sky, MT, March 2005.

2.

D. Brown and R. Orsagh, “Diagnostics & Prognostics for Switching Mode Power Supply Remaining Useful Life Assessment,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 2005.

3.

D. Brown, P.W. Kalgren, C.S. Bylington, and .R. F.Orsagh, “Electronic PrognosticsA Case Study Using Global Positioning System (GPS),” paper presented at the 2005 IEEE Autotestcon Conference, Orlando, FL, October 2005.

4.

C. Byington, M. Watson, and D. Edwards, “Dynamic Signal Analysis and Neural Network Modeling for Life Prediction of Flight Control Actuators,” paper presented at the AHS Forum 60, Baltimore, MD, June 7-10,2004,

5.

C. Byington, M. Watson, and D. Edwards, “A Model-Based Approach to Prognostics and Health Management for Flight Control Actuators,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 2004.

6.

C. Byington, M. Watson, and D. Edwards, “Data-Driven Neural Network Methodology to Remaining Life Predictions for Aircraft Actuator Components,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 2004.

7 . G. J. Kacprzynski, A. Sarlashkar, M. Roemer, A. Hess, and W. Hardman, “Predicting Remaining Life by Fusing the Physics of Failure Modeling with Diagnostics,” JOM: Journal of the Minerals, Metals and Materials Society, Vol. 56, No. 3, pp. 29-35, March 2004. 8. C. S. Byington, M. Roemer, M. Watson, T. R. Galie, and C. Savage, “Prognostic Enhancements to Diagnostic Systems (PEDS) Applied to Shipboard Power Generation Systems,” Proceedings of the ASME Turbo Expo, Vol. 2, pp. 825-833,2004. 9.

R. Orsagh, M. Roemer, J. K Sheldon, and J. Christopher, “A Comprehensive Prognostics Approach for Predicting Gas Turbine Engine Bearing Life,” Proceedings of the ASME Turbo Expo, Vol. 2, pp. 777-785,2004.

10 G.J. Kacprzynski, A. Liberson, A. Palladino, M. Roemer, A. Hess, and M. Begin,

“Metrics and Development Tools for Prognostic Algorithms,” IEEE Aerospace Conference Proceedings, Vol. 6, pp. 3809-3815,2004.

11. C. S. Byington, M. Watson, D. Edwards, and P. Stoelting, “A Model-Based Approach to Prognostics and Health Management for Flight Control Actuators,” IEEE Aerospace Conference Proceedings, Vol. 6, pp. 355 1-3562,2004, 12. R. Orsagh, J. Sheldon, J. Klenke, and J. Christopher, “Prognostics/Diagnostics for Gas Turbine Engine Bearings,” American Society of Mechanical Engineers, International Gas Turbine Institute, Turbo Expo (Publication) IGTI, Vol. 1, pp. 159-167,2003. 13. C. S Byington, P. W. Kalgren, J. Robert, and R. J. Beers, “Embedded Diagnostic/Prognostic Reasoning and Information Continuity for Improved Avionics Maintenance,” AUTOTESTCON (Proceedings), Future Sustainment for Military and Aerospace, 2003, pp. 320-329.

Appendix B

205

14. R. Orsagh, G.Kacprzynski, M. Roemer, J. W. Scharschan, D. E. Caguiat, and J. J. McGroarty. “Applications of Diagnostic Algorithms for Maintenance Optimization of Marine Gas Turbines,” American Society of Mechanical Engineers, International Gas Turbine Institute, Turbo Expo (Publication) IGTI, Vol. 2B, pp. 1025-1033,2002. 15. M. Roemer, G. Kacprzynski, E.O. Nwadiogbu, and G. Bloor, “Development of Diagnostic and Prognostic Technologies for Aerospace Health Management Applications,” IEEE Aerospace Conference Proceedinm, Vol. 6, pp. 63 139-63 147, 2001. 16. M. Roemer, G. Kacprzynski, and T. H. McCloskey, “Advanced Steam TurbineGenerator Maintenance Planning Using Prognostic Models,” Proceedings of the International Joint Power Generation Conference, Vol. 2, pp. 273-282, 2001.

17. M. Roemer. G. Kacprzynski, and R. Orsagh, “Assessment of Data and Knowledge Fusion Strategies for Prognostics and Health Management,” IEEE Aerospace Conference Proceedings, Vol. 6, pp. 62979-62988, 2001. 18. G . Kacprzynski, M. Roemer; A. Hess and K.R. Bladen, “Extending FMECA-Health Management Design Optimization for Aerospace Applications,” IEEE Aerospace Conference Proceedings, Vol. 6, pp. 63 105-63 1 12,2001. 19. C. S. Byington, P. W. Kalgren, B. K. Dunkin, and B. P.Donovan, “Advanced Diagnostic/Prognostic Reasoning and Evidence Transformation Techniques for Improved Avionics Maintenance,” IEEE Aerospace Conference Proceedings, Vol. 5, pp. 3424-3433, March 2004. 20. M. Roemer, and G. Kacprzynski, “Advanced Diagnostics and Prognostics For Gas Turbine Engine Risk Assessment,” IEEE Aerospace Conference Proceedings, Vol. 6, pp. 345-354,2000.

Prognostics and Health Management of Electronics

206

B.13

Intelligent Automation, Inc.

Intelligent Automation, Inc (IAI) is a small research and development company based out of Rockville, Maryland, and founded in 1987. Much of its research is funded through U.S. government Small Business Innovation Research (SBIR) contracts and Broad Agency Announcements (BAAS). IAI’s research projects often transform into collaboration with larger companies in order to commercialize the technology or product that was developed. IAI’s areas of research range from distributed intelligent systems, sensors, signal processing. robotics, manufacturing, forensics, and transportation to information technology and educationalitraining technology. Although each area addresses very diverse applications, all of IAI’s current programs are built on the foundation of artificial intelligence technology.

B.13.1 Approach to PHM IAI’s chief strategy for implementing ePHM involves signal processing technology. Using techniques based on a combination of fast fourier transform (FFT), PCA and fuzzy cerebellar model arithmetic computer (CMAC), IAI has applied its health monitoring approach to various applications, including real-time failure prediction of a gearbox by using infrared images (experiment carried at Penn State), real-time fault detection and isolation of a rotating shaft system with remote monitoring and control capability, NASA liquid propellant engine fault detection (using real data), helicopter gearbox failure detection and isolation (using real data), sensor fault detection of a bell helicopter unmanned aerial vehicle (UAV), sensor failure detection and isolation of a boiler process (using real data), call anomaly detection in satellite communications for Motorola, and circuit anomaly detection and isolation. One of IAI’s products for implementing ePHM is called the Health Monitoring Toolkit. Although no information regarding this product has been released by IAI yet, it is advertised on the Web to be “coming soon.” In December 2004, NASA Ames Research Center awarded an SBIR Phase I1 award to IAI with Penn State as a subcontractor for a project entitled “Agent-Based Health Monitoring System.” The objective of this work was to achieve integrated system health management and self-reliant systems. This included integration of the maintenance and logistics scheduling systems in order to achieve fully automated systems. In February 2005, NASA Dryden Research Center awarded IAI and Penn State as a subcontractor a small business technology transfer program (STTR) Phase I project to create a system for detecting damage in aircraft structures. The system combines thin film transistor (TFT) based thin-film actuatorsisensors for signal acquisition and software for fault diagnosis and prognosis. The goal is to reduce the cost, improve the safety, and predict structural failures of aircraft. In April 2005, Naval Sea Logistics awarded an SBIR Phase I contract to IAI with Zoya Propovic proposed as a potential consultant for the proposal entitled “Wireless, In Situ Guided Wave Structural Health Monitoring System with a Power Harvesting Rectenna.” IAI proposes a wireless in situ guided wave health monitoring system. This system consists of small, low-cost guided wave leave-in-place health monitoring sensors called piezo-disks, a correlation analysis technique (CAT) for quick defect sizing and location, and many other elements. This system can inspect a very large area and provide information on the size of the defect and its location. Additionally, this system is designed to be quick and effective compared to traditionally used inspection methods. In November 2005, NAVAL Air Warfare Center AD (LKE) awarded an SBIR Phase I contract to IAI with CALCE Electronic Products and Systems Center (EPSC) at the

Appendix B

207

University of Maryland as the proposed subcontractor for the proposal entitled “Enhanced Prognostic Model for Digital Electronics.” IAI proposes to predict failures in aircraft electronic boards, their digital component elements and devices that have the potential to reduce the risks of unanticipated failures while significantly reducing support costs. IAI and CALCE EPSC propose an enhanced life consumption monitoring methodology for digital electronic boards and their components. This approach involves a process to conduct life consumption monitoring, including failure modes, mechanisms and effects analysis, sensor data preprocessing/feature selection, fault detection/identification/isolation, virtual reality assessment, stress and damage accumulation analysis, and remaining-life estimation.

B.13.2 Related Publication 1.

G. Zhang, C. Kwan, R. Xu, N. Vichare, and M. Pecht, “An Enhanced Prognostic Model for Intermittent Failures in Digital Electronics,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 2007.

Prognostics and Health Management of Electronics

208

B.14

Lockheed Martin Aeronautics Company

Lockheed Martin is an advanced technology company headquartered in Bethesda, Maryland. Lockheed Martin employs about 135,000 people worldwide and is principally engaged in the research, design, development, manufacture, and integration of advanced technology systems, products, and services. The company reported 2006 revenue of $39.6 billion. As a lead systems integrator and information technology company, nearly 80% of Lockheed Martin’s business is with the U.S. Department of Defense and U.S. federal government agencies. In fact, Lockheed Martin is the largest provider of IT services, systems integration, and training to the U.S. government. Lockheed Martin’s operating units are organized into five business areas: Aeronautics, Electronic Systems, Integrated Systems & Solutions, Space Systems, and Information & Technology Services.

B.14.1 Approach to PHM Lockheed Martin’s approach to PHM is based on its involvement in developing the F35 JSF, a next-generation, supersonic, multirole stealth aircraft to be used by the U.S. Air Force, Navy, and Marine Corps and the U.K. Royal Navy. The F-35 operates with a suite of PHM capabilities that monitor the status of aircraft systems and automatically transmit the information to maintainers on the ground. PHM enables technicians to meet the aircraft with all necessary tools, people, and parts to conduct the maintenance and get the plane airborne again quickly. Lockheed Martin is developing the F-35 in collaboration with Northrop Grumman and BAE Systems. In 1997, Lockheed Martin was awarded a $6.5 million Air Vehicle Prognostics & Health Manager demonstration contract to support the company’s JSF program. Lockheed Martin will demonstrate a prototype of the PHM subsystem for its preferred weapon system concept. The PHM subsystem will be used for storage, distribution, and extended processing of on-board information concerning aircraft health, safety, diagnostics, and prognostics. The air vehicle prognostics and health manager (AVPHM) will electronically transfer data to the aircraft maintainer indicating the health of the aircraft and necessary maintenance actions. The PHM subsystem will provide improved diagnostics over previous aircraft and add prognostics capabilities when feasible and cost effective. In 2000, the Lockheed Martin JSF team successfully demonstrated a working prototype of the AVPHM and autonomic logistics information system (ALIS). The combination enables mission completion through the transfer of data from a healthy aircraft to a disabled one. It also sets in motion the ground repair planning process even before the affected aircraft is back on the ground. The demonstration consisted of a simulated two-ship mission during which a radar system failure was artificially induced into one of the aircraft. “We predicted the time remaining before radar shutdown caused by a gradual increase in temperature,” said Barry Ferrell, PHM manager of the Lockheed Martin JSF team. “We estimated the time remaining, then transmitted the information to the in-flight mission replanning software.” The software then presented mission options to the pilot of the disabled ship, in order of lethality against the enemy. The pilot selected the preferred option and was able to shut down his radar, trade places with his wingman, and receive information from his wingman’s radar. Because the wingman’s aircraft was able to supply radar information to both ships, the pilots were able to deploy their weapons, destroy the target, and complete the mission. After PHM correctly isolated the failure and while pilots were finishing the mission, the system transmitted a health report to the home base and triggered the autonomics logistics (AL) processes. AL then initiated a search to obtain a replacement part

Appendix B

209

for the stricken aircraft, find a properly trained maintenance person, select a computer-based refresher course for the maintainer, and obtain necessary support equipment to complete the repairs. Lockheed Martin demonstrated the viability of Bluetooth wireless technology for PHM on an F-16B test aircraft. The purpose of this test was to determine if there was any electromagnetic interference with the F-16B subsystems and to determine if dynamic sensor data could be transmitted in real time, wirelessly from within the aircraft. Test results showed that the Bluetooth wireless temperature sensor attached to a printed circuit board successfully transmitted data up to 18 ft from behind the sealed doors of the aircraft. However, communication was not successful when transmitting through opposite sides of the aircraft. One of the recommendations that came about as a result of this test was to increase the reception distance and improve communications between multiple bulkheads in the aircraft. In 2005, Lockheed Martin teamed with the Air Force Research Laboratory (AFRL) to conduct the first demonstration of an integrated PHM (IPHM) control system. Lockheed Martin and AFRL used a real-time, hardware-in-the-loop simulation to demonstrate an IPHM control system. Engineers integrated a space vehicle control surface actuator representation into a simulation environment. They evaluated the IPHM control system’s ability to compensate for simulated actuator failures such as heat degradation and power loss. The IPHM system successfully compensated for all introduced failures throughout a variety of scenarios, including takeoff, flight, and landing. This evaluation was the first demonstration of IPHM control system’s ability to make flight-critical system adaptations. Another PHM tool developed by Lockheed Martin for the JSF program is the Prognostics and Health Management Tool, or PHM Pro. PHM Pro is a Microsoft Access database designed to derive prognostics algorithms for the F-35. It categorizes the 2000 line replaceable units in the F-35 based on need and applicability for prognostics. Once categorized, PHM Pro performs signal collection, on-board preprocessing, failure progression trending, time-to-maintenance prediction, maturation, and verification to build the prognostic algorithm as it applies to the architecture of the aircraft. The JSF program is the first to place much emphasis on affordably sustaining the air system as up-and-away performance. Of the six key performance parameters (KPPs) assigned to all variants of the JSF, three are supportability related. They are sortie generation rate, logistics footprint, and mission reliability. The end result is a JSF logistics system (known as autonomic logistics) that integrates all elements of logistics throughout the design and developmental and operational test activities. Lockheed Martin is to demonstrate a prototype of the PHM subsystem for its preferred weapon system concept. The PHM subsystem will be used for storage, distribution, and extended processing of on-board aircraft information concerning aircraft health, safety, diagnostics, and prognostics. The AVPHM will electronically transfer data to the aircraft maintainer indicating the health of the aircraft and necessary maintenance actions. The PHM subsystem will provide improved diagnostics over previous aircraft and add prognostics capabilities when feasible and cost effective. Lockheed Martin used the following strategy to develop the PHM system. 0 Develop system architecture to include an air vehicle level management function and area PHM managers in each of the primary control areas: mission systems, VMSisubsystems, and propulsion with other stand-alone management functions 0 Make use of traditional BIT approaches beyond the level of legacy systems at subsystem and component levels

Prognostics and Health Management of Electronics

210

0

Supplement BIT, using control and feedback data to real-time equipment models to enhance failure detection and fault isolation at area manager and air vehicle levels Acquire and analyze performance data for selected components and trend the performance data to predict the remaining life of components

B.14.2 Related Publications I.

J. Gardner and K. Hendrickson, “Integrating the Digital Battlefield with Autonomic Logistics: Cognitive Implications,” Proceedings of 43rd AIAA Aerospace Sciences Meeting and Exhibit, Reno, NV, January 10-13,2005.

2.

G. W. Eger 111, B. J. Jambor and J. B. Schroeder “Framework for Testing Prognosis Technology,” paper presented at the 4 1st AIAAIASMEISAEIASEE Joint Propulsion Conference and Exhibit, Tucson, AZ, July 10-13,2005.

3.

D. S Bodden, W. Hadden, B.E Grube, and N.S. Clements, “Prognostics and Health Management as Design Variable in Air Vehicle Conceptual Design,” Proceedings of the 2005 IEEE Aerospace Conference, Big Sky, MT, pp. 1-1 1,2005.

4.

J.K. Line and N.S. Clements, “A Systematic Approach for Developing Prognostic Algorithms on Large Complex Systems,” paper presented at the 2005 IEEE Aerospace Conference, Page(s): 1-7 March 5-12,2005.

5.

D. S. Bodden, W. Hadden, B. E. Grube, and N. S. Clements, “PHM as a Design Variable in Air Vehicle Conceptual Design,” paper presented at the 2005 IEEE Aerospace Conference, Big Sky, MT, pp. 1-1 1, March 5-12,2005.

6.

B. L. Ferrell, J. L. Cruickshank, B. J. Gilmartin, S.J. Massam, C. Fisher, and F. D. Gass, “Case Study Methodology for Information Fusion Interface Definition,” Proceedings of 2001 IEEE Aerospace Conference, Vol. 6, pp. 3003-3015, March 10-17,2001.

7 . M. Gandy, “Wireless Sensors for Aging Aircraft Health Monitoring,” Lockheed Martin Aeronautics Company Technical Report, 2000, available:http:/lwww.jcaa.us/ AA-Conference-200 1/Papers/7B-2.pdf, accessed January 2006. 8. B.L. Ferrell, “Air Vehicle Prognostics and Health Management,” Proceedings of 2000 IEEE Aerospace Conference, Vol. 6, pp. 145-146,2000, 9.

R. Garbos-Sanders, L. Melvin, B. Childers, and B. Jambor, “System Health ManagementIVehicle Health Management for Future Manned Space Systems,” 16th DASC, AIAAAEEE Digital Avionics Systems Conference, Vol. 2, pp. 8.5.8-8.5.17, October 26-30, 1997.

B.14.3 Related Patents L.S. Patents: 1. 6.301,572: Neural network based analysis system for vibration analysis and condition monitoring 2.

6.944,566: Method and system for multi-sensor data fusion using a modified Dempster-Shafer theory

3.

6,909,985: Method and apparatus for recording changes associated with acceleration of a structure

4.

6,219,626: Automated diagnostic system

Appendix B

B.15

21 1

Northrop Grumman

Northrop Grumman is a global defense company headquartered in Los Angeles, California. Northrop Grumman provides a broad array of technologically advanced, innovative products, services and solutions in systems integration, defense electronics, information technology, advanced aircraft, shipbuilding, and space technology. The company has more than 125,000 employees and operates in the United States and 25 countries. Net sales in 2004 were $29.9 billion. Northrop Grumman is comprised of seven business sectors: Electronic Systems, Information Technology, Integrated Systems, Mission Systems, Newport News, Ship Systems, and Space Technology.

B.15.1 Approach to PHM Northrop Grumman’s entrance into failure prediction began in 1993 with an Advanced Research Project Agency (ARPA) technology reinvestment program called just-in-time maintenance (JITM). This program leveraged technologies developed under prior U.S. Air Force funding (Connectionist Networks for Information Exploitation) that used neural networks and fuzzy systems to exploit intelligence information and predict enemy intent. The JITM program refocused these technologies on assessing the current state and predicting the remaining useful life of rotating machinery. Experience gained from JITM was augmented by probabilistic methods and reapplied in another Air Force program (Network Early Warning System). From October 2000 to January 2003, Northrop Grumman was engaged in the Predictive Failures and Advanced Diagnostics (PFAD) for Legacy Aircraft program from AFRL. The objectives of this program were to reduce legacy aircraft downtime by enhancing the capability of maintainers to identify the causes of system failures through better diagnostics and, where possible, to identify imminent system failures (failure prognostics) so that replacements could be made before an actual failure occurs. NGC AEW&EW teamed with Electronic Systems in Baltimore, Maryland and Datamat Systems Research, McLean, Virginia for this program. The first task for PFAD was to identify the types and sources of data required to diagnose and predict component failures. The results of this task were then compared with the data that are currently collected from legacy aircraft. Technologies to collect the required but currently missing data from legacy aircraft were evaluated. The next tasks were to apply the collected data to both diagnostic and predictive failure algorithms. These algorithms were used to identify specific part failures and eventually to predict impending part failures. The successful technologies were evaluated against problems, restrictions, and constraints associated with aircraft retrofitting. There were four final reports, one for each of the major topics-data sources, predictive failures, advanced diagnostics, and PFAD transition. Two additional reports outlined the system architecture and the maintenance concept of operations. In November 2003, Northrop Grumman was awarded a two-year $14.1 million first-phase contract by the Defense Advanced Research Projects Agency (DARPA) to develop technologies and tools for predicting the near-term structural health and readiness of aircraft to carry out its missions. The structural integrity prognosis system (SIPS) will give the U.S. military the ability to predict the effect of the stresses and strains of flight operations, combat, and environmental corrosives on structural aircraft components and power train components. It promises to reduce aircraft maintenance life-cycle costs, increase crew safety, and enhance aircraft availability. Under the SIPS contract, a Northrop Grumman Integrated Systems-led team will research, develop, and demonstrate a prognosis

212

Prognostics and Health Management of Electronics

system for real-time or near-real-time asset capability prediction, with specific emphasis on materials and structures. The goal of the system is to provide military commanders with data and quantitative performance predictors they can use to manage and deploy individual combat systems. The SIPS program aims to develop a system to predict the service life of DoD vehicles. While the program’s focus is the EA-6B Prowler and the BLACK HAWWSEAHAWK helicopter transmissions, the results of the research apply to the structures and power train systems of any military platform. The first phase of the system will ultimately follow each critical structural component of a vehicle from fabrication to retirement. The structural health prognosis is based on inputs from multiple sensor systems. Three sensor types are being tested-eddy current, electrochemical, and ultrasonic. Sensors monitor crack formation and growth in real time, feeding data to a reasoning module that interprets the data and transforms it into usable information for failure prediction. The SIPS program also makes extensive use of modeling and simulation systems. Secure data links will deliver the data to field commanders, enabling them to make critical decisions about asset capability and mission content. SIPS objectives also include reducing the operational costs by avoiding unnecessary requirements and transforming the way the DoD certifies the structural integrity of weapon systems. When the project began in late 2003, the team selected the EA-6B Prowler as its test bed. Since then, the SIPS program has attracted greater interest in the Air Force and the Naval Rotary Wing communities. As a result, the Air Force is now engaged in tests with Northrop Grumman involving one of its planes, the A-10 Thunderbolt, and the Navy will receive the first SIPS technology transition product for helicopter transmission prognosis by the end of 2006. Under the two-year second phase of the contract, Northrop will evaluate SIPS in actual full-scale aircraft fatigue tests. with the goal of having a deliverable structural prognostic system available by the end of the contract in January 2008. Discussions with the JSF structures PHM community are also underway. In July 2004, the Center for Naval Shipbuilding Technology (CNST) awarded a $435,000 wireless technology project to Northrop Grumman Newport News. The Newport News-led team includes Northrop Grumman Ship Systems, Ingalls Operations, and RLW, Inc. The team is developing a wireless automated diagnostic and prognostic system for monitoring shipyard facility diesel engines with the potential to save the shipbuilding industry millions of dollars in repair and maintenance costs. Northrop Grumman Newport News is also implementing diagnostics and prognostics capabilities in military equipment to reduce the incidence of unplanned repairs. Newport News has partnered with Altarum and BAE Systems to develop and pilot data schemes and information system architectures that are reusable and extendable for multiple equipment and weapons platforms. Two pilots are underway: (1) a shipboard pilot which can be extended to multiple ship types and shipboard equipment and (2) an armored vehicle pilot which can be extended to any equipment or vehicle supported by computer intelligent integrated electronic technical manuals. The pilot systems are currently installed on the USS George Washington (CVN-73) and an M88 A2 Hercules armored tracked vehicle. The Newport News-led team selected four mobile diesel engines for evaluation, determined the parameters for monitoring, and developed the required software algorithms and hardware components. Based on the review of maintenance and failure data, a few of the selected parameters to be monitored include vibration, oil temperature, oil pressure, and air intake pressure. All four diesel engines have been retrofitted with the required wireless automated diagnostic and prognostic sensor suite. As part of the project, engineers developed the required diagnostics and prognostics algorithms to support the software application. Another key feature that makes the system potentially more valuable is the

Appendix B

213

capability to simultaneously transmit the sensor suite parameters to both the local on-site server for use and a remote server for monitoring and review. Northrop Grumman is developing a new integrated system health management architecture for NASA. The modular, open architecture will use a “plug-and-play” approach to interconnect and continuously monitor the fbnctional health of spacecraft subsystems such as power, propulsion, life support, and structures. Spacecraft computers will use health management software to detect, identify, and resolve problems in these subsystems. If the computers cannot resolve a problem automatically by reallocating available resources, they will prokide interactive guidance to astronauts on how to manually make the repairs. Northrop Grumman is developing the integrated system health management architecture as part of an Exploration Systems Research and Technology contract awarded to the company recently by NASA Ames Research Center in Mountain View, California. The contract is worth up to $26.8 million over a four-year period. The architecture will also support work that a Northrop Grumman-led team is currently doing to define design solutions for NASA’s proposed crew exploration vehicle. The integrated system health management architecture envisioned for NASA’s exploration systems is the first step in a longer-term effort to define interface standards that would support the development of flexible “intelligent modular systems,” These systems would become the fundamental building blocks of reconfigurable spacecraft and lunar surface facilities. Each intelligent modular system could fimction as a stand-alone system or could be linked with other intelligent modular systems to create larger, more capable systems. Like an individual intelligent modular system, this health management system would be able to reconfigure itself to share all of its available resources among member parts.

B.15.2 Related Publications 1.

A. Hess, G. Calvello, P. Frith, S.J Engel, and D. Hoitsma, “More Challenges, Issues and Lessons Learned Chasing Real Prognostic Capabilities,” Proceedings of the Society for Machinery Failure Prevention Technologv Conference, April 3-6,2006.

2.

A. Hess, G. Calvello, P. Frith, S.J Engel, and D. Hoitsma, “Challenges, Issues and Lessons Learned Chasing The ‘Big P’ : Real Predictive Prognostics Part 2,” Proceedings of the IEEE Aerospace Conference, March 4-1 1,2006.

3.

J.K. Scully, “The Integration of Vehicular Diagnostics and Prognostics with Autonomic Logistics,” Proceedings of the IEEE Autotestcon, September 20-23, 2004, pp. 482488.

4.

B.J. Gilmartin, J. Castrigno, G.M. Rovnack, J. Bala, P. Faas, and K. Eizenga, “Preliminary Architecture for the Predictive Failures and Advanced Diagnostics Program,” Proceedings of the SPIE--The International Societv for Optical Engineering, V O ~4733, . pp. 13-24,2002.

5.

S.J. Engel, B.J. Gilmartin, K. Bongort, and A. Hess, “Prognostics, The Real Issues Involved with Predicting Life Remaining,” Proceedings of the IEEE Aerospace Conference, Vol. 6, pp. 457469, March 18-25,2000.

B.15.3 Related Patents U.S. Patents: 1. 5,937,366: Smart built-in-test 2.

5,761,383: Adaptive filtering neural network classifier

214

B.16

Prognostics and Health Management of Electronics

Qualtech Systems, Inc.

Founded in 1993, Qualtech Systems, Inc. (QSI) is a company that provides advanced diagnostics and health management software solutions. In the course of completing projects for NASA and the military, QSI has developed and deployed diagnostic tools for modeling, analysis, embedded run-time diagnostics, and remote (telemaintenance) or field maintenance and repair that are being deployed in large-scale commercial operations.

B.16.1 Approach to PHM Qualtech’s philosophy is that whether a system propels a spacecraft or an automobile, generates power, carries data, refines chemicals, performs medical functions, or produces semiconductors, the conditions that cause the equipment to fail (“failure modes”) can be modeled, analyzed, linked to test procedures, and used to generate an intelligent diagnostic solution. Qualtech’s key product is called TEAMS, a software tool for test sequencing and design for testability analysis of complex systems. The TEAMS model captures a system’s structure, interconnections, tests, procedures, and failures. The model links these failures to the system’s built-in tests, troubleshooting steps and repair procedures. By aiding systems designers and test engineers to embed testability features, including “built-in-test’’ requirements into a system design, and by aiding the maintenance engineer to develop diagnostic strategies, TEAMS contributes to better designs that reduce support costs and improve availability. After a system is analyzed using TEAMS, the system’s testability shortcomings (e.g., uncovered faults, redundant tests, ambiguity groups, and feedback loops) and testability recommendations (e.g., test-point placement and buffer placements to break feedback loops) are marked directly on the functional model. TEAMS also serves as an interactive tool for conducting FMECA of the system. Furthermore, TEAMS is able to capture interactive electronic technical manuals and automatic test equipment information for documentation and deployment in remote diagnostic/troubleshooting applications. Qualtech has been contracted by NASA to develop an automated process that will identify faults within specific software-based systems on-board NASA spacecraft, determine how serious the problems are, and provide immediate instructions on how to recover from system failures. By automating much of the process and employing the use of “run-time’’ diagnostics, Qualtech’s ISHM solution will enable the spacecraft to recover from a system failure quickly and on its own, thereby reducing dependence on manual intervention from NASA ground personnel. The goal is to develop a solution that evaluates the impact of a failure(s) on a current mission and provide appropriate and timely information that facilitates mission reconfiguration. NASA spacecraft should have the capability to detect, mitigate, and recover from unexpected failures and therefore extend critical missions and achieve desired objectives.

Appendix B

B.17

215

Raytheon Company

Raytheon Company is a major U.S. military contractor based in Waltham, Massachusetts. The company has 80,000 employees worldwide and annual revenues of approximately $20 billion. More than 90% of Raytheon's revenues are obtained from defense contracts, and as of 2005, it is the fifth largest military contractor in the world. Major products include missiles, radar systems, sensor payloads, laser products, and corporate jet aircraft, although Raytheon also produces one military aircraft. William H. Swanson is the Chairman and CEO. Raytheon is composed of seven major businesses: 0 0 0 0

0

Integrated Defense Systems, based in Tewksbury, Massachusetts Intelligence and Information Systems, based in Garland, Texas Missile Systems, based in Tucson, Arizona Network Centric Systems, based in McKinney, Texas Raytheon Aircraft Company, based in Wichita, Kansas Raytheon Technical Services Company LLC, based in Reston, Virginia Space and Airborne Systems, based in El Segundo, California

B.17.1 Approach to PHM Raytheon has a dedicated Health Management Systems (HMS) Technology Interest Group (TIG) with corporate sponsorship from the Systems, Software, and Processing Technology Networks and business units within Raytheon. This HMS TIG is comprised of HMS/diagnostic/prognostic domain experts from the respective Raytheon business units. The charter of this HMS TIG includes defining and communicating companywide processes, procedures, tools, and technology for PHM systems. This HMS TIG is very active with defining diagnosticiprognostic solution sets, technology roadmaps, and insertion opportunities for its vast array of military products working with industry and the DoD. The Raytheon HMS TIG plans to use prognostics and HMS to reduce total ownership cost through CBM, reduced factory test cost, reduced field test cost, and improved system availability. Goals for system self-test include reduction of false alarms, cannot duplicate (CND) failures, increased thoroughness of system coverage, identification of early warning failure precursor characteristics for RUL predictions, and development of prognostic reasoners.

Prognostics and Health Management of Electronics

216

B.18

Ridgetop Group

Ridgetop Group, Inc., founded in 2000, has 25 employees and is located in Tucson, Arizona. The focus of Ridgetop Group has been in the areas of electronic prognostics, reliability, built-in self-test, radiation electronics, and fault-tolerant electronics. The firm received an SBIR contract from Air Force Research Laboratory (AFRL)at Wright Patterson Air Force Base, Ohio, on extracting electronic signatures associated with impending performance problems in aircraft “fly by-wire’’ control systems. Ridgetop Group also received an SBIR contract involving semiconductor prognostics and built-in self-test designs that are tolerant to the damaging effects of radiation. Ridgetop group also received SBIR contracts from the JSF program Office of the U.S. Navy Air System Command (NAVAIR), NASAIAmes, NASAiGoddard, and the Department of Energy.

B.18.1 Approaches to PHM The Ridgetop Group’s approach to PHM involves using Sentinel SemiconductorTM prognostic cells that are precalibrated and colocated with the host circuit on an IC, to act as an early-warning sentinel for upcoming device failures. The prognostic cell experiences the same manufacturing process and environmental stresses as that of the actual circuit. The environmental stresses that contribute to the electronic aging of the circuit may include voltage conditions, transient spikes, radiation exposure, humidity, and extreme temperature. Since the operational stresses are the same for the host and prognostic circuit, the damage rate is expected to be the same for both circuits. However, the prognostic cell is designed to fail faster by increasing the stress on the cell structure through scaling. For example, with the same amount of current passing through both circuits, if the cross-sectional area of the current-carrying paths in the cells is decreased, a higher current density is achieved, which results in faster degradation of the prognostic cell. At the board, module, and interconnect levels, additional sensors for monitoring higher failure rate components augment the existing measurands available at the interfaces to the board. These additional sensors serve to improve the observability of impending failure events and comprise an array of data that is correlated and processed with an on-board calculation of RUL using well-documented algorithms. Ridgetop has become involved in prognostic cells for 0.35-, 0.25-, 0.18-, and 0.13-pm CMOS processes. Power consumption is approximately 600 pW. The cell size is typically 800 pm2 at the 0.25-pm process node. Currently Ridgetop provides prognostic cells for semiconductor failure mechanisms of electrostatic discharge (ESD), hot carrier, metal migration, time-dependent dielectric breakdown (TDDB), negative bias temperature instability (NBTI), and radiation effects. The prognostic distance can be further adjusted by scaling the area of the cell. Ridgetop has nominally set this at 80% of the statistical end-of-life point. This point can be adjusted to some other early indication level. Multiple trigger points can also be provided using multiple cells evenly spaced over the bathtub curve. The interface to the prognostic cell can be configured for a simple buffered logic high or low output to indicate an impending failure event. Optional interfacing can be made using the JTAG bus structure with a JTAG Toolkit. It is also possible to include a register in a chip design that permits an interface using the standard scan test bus that employs IEEE 1 149.1. The semiconductor prognostics work has been conducted in collaboration with Sandia National Labs and the U.S. Navy.

Appendix B

217

B.18.2 Related Publications 1.

D.L. Goodman, B. Vermeire, J. Ralston-Good, and R. Graves, “A Board Level Prognostic Monitor for MOSFET TDDB,” Proccedings of the IEEE Aerospace Conference, pp. 4-1 1 March 2006.

2.

D.L. Goodman, B. Venneire, P. Spuhler, and H. Venkatramani, “Practical Application of PHMiPrognostics to COTS Power Converters,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, 2006.

3.

D.L. Goodman, S. Wood, and A.Turner, “Return-On-Investment (RoI) for Electronic Prognostics in MiliAero Systems,” paper presented at the IEEE AUTOTESTCON, Orlando, FL, 2005.

4.

S. Mishra, M. Pecht, and D.L. Goodman, “In Situ Sensors for Product Reliability Monitoring,” Proceedings of the SPIE - The International Society for Optical Engineering, Vol. 4755, pp. 10-19, 2002.

5.

D.L. Goodman, “Prognostic Methodology for Deep Submicron Semiconductor Failure Modes,” IEEE Transactions on Components and Packaging Technologies, Vol. 24, No. 1 , pp. 109-1 11, March 2001.

B.18.3 Related Patent U.S. Patent: 1. 6,964,004: Method and apparatus for testing a system-on-a-chip

218

B.19

Prognostics and Health Management of Electronics

Rockwell Automation

Rockwell Automation is a leading controls and information solutions company with expertise spanning the entire automotive manufacturing supply chain, machine tools, and spare parts suppliers. Business units include automation systems, industrial ACiDC drives, limit switches, motors and motor controls, and sensors. The company has 21,000 employees serving customers in more than 80 countries. In 2005 sales for the full year were over $5 billion. Rockwell Automation is involved in developing hardware-software solutions for condition monitoring of machines and electrical systems.

B.19.1 Approach to PHM Rockwell Automation has developed several products for PHM, though none specifically for electronic systems. The PHM approach includes collecting sensor data across an entire plant to enable condition-based maintenance. The PHM focus includes vibration monitoring and analysis of rotors, infrared thermography based PHM for electrical and mechanical systems, and oil data collection and analysis. Rockwell developed on-site software and web portals for communicating the sensor data and analysis to personnel to enable a companywide computerized maintenance management system (CMMS). Rockwell Automation’s health monitoring product line includes individual sensors, integrated sensor systems, portable recorders and signal analyzers, and a software suite. Its sensor products are accelerometers for a wide range of frequency applications and noncontacting eddy-current-type transducers that measure the dynamic and/or static displacement of the target relative to the mounting. These sensors are used to measure radial and axial vibrations in shafts, shaft eccentricity, shaft axial position, case expansion, differential expansion, and other instances where noncontacting relative measurements are needed. Another health monitoring product is EnpacE, a WindowsE CE based, portable hand-held data collector and signal analyzer. Enpac is currently used for condition monitoring of equipment in many process industries. The portable device is loaded with custom software for in situ data analysis. Enpac is designed to be rugged and lightweight, with 1I4VGA color liquid crystal display (LCD) allowing for clear viewing in most light situations. Data can be simultaneously collected from different sensors. Input signal types for Enpac include acceleration, velocity, time waveform, displacement, temperature, phase, voltage, or any user-specified variable. The XM series product of Rockwell is designed for machine condition monitoring and protection. This device connects to other modules or links the device into the host system to enable the user to have control over the configuring of the host system to satisfy more advanced monitoring requirements such as those based on speed, orders, spike energy measurements, vibration levels, or other parameters. The XM-720 machine monitor includes an XM measurement module, standard dynamic measurement module. low-frequency dynamic measurement module, g’s Spike EnergyTM(gSE) vibration module, expansion relay module and three relays. Light emitting diode (LED) indicators are provided for indicators such as transducer fault, warning and circuit tripping. Emonitor is the health monitoring software developed by Rockwell. It provides customers with a suite of integrated maintenance data functions to leverage information about their assets. Emonitor Odyssey@ is Rockwell’s complete machinery information software system. The software has applications in infrared thermography, magnified images

Appendix B

219

of oil samples, and other condition-indicating measurements. The software’s capabilities include statistical alarms, condition-based diagnostics, and state-based alarms.

B.19.2 Related Publications 1. N. Anderson and R. Wilcoxon, “Framework for Prognostics of Electronic Systems,” Proceedings of International Military and Aerospace/Avionics COTS Conference, Seattle, WA, August 3-5, 2004.

2. J. Robinson, C.D. Whelan, and N.K. Haggerty, “Trends in Advanced Motor Protection and Monitoring,” IEEE Transactions on Industry Applications, Vol. 40, No. 3, pp. 853860, MaylJune 2004. 3.

S. Lapcewich, “Reaping the Rewards of a Remote Monitoring and Diagnostics Program,” paper presented at the TAPPI Paper Summit-Spring Technical and International Environmental Conference, Atlanta, GA, 2004, pp. 241-247.

4.

R. M Tallam, T. G Habetler, and R. G Harley, “Stator Winding Turn-Fault Detection for Closed-Loop Induction Motor Drives,’’ IEEE Transactions on Industw Applications, Vol. 39, No. 3, pp. 720-724, May-June 2003.

5.

Y . El-Ibiary, “Serial Communication and Embedded Sensors on Bearings and Gear Reducers for Increased Productivity,” paper presented at the IEEE Cement Industry Technical Conference, Roanoke, VA, 1999, pp. 49-53.

6.

S.R. Das and L.E. Holloway, “Characterizing Discrete Event Timing Relationships for Fault Monitoring Of Manufacturing Systems,” IEEE Conference on Control Applications-Proceedings, 1996, pp. 1012-1018.

7 . E. Dummermuth, “Advanced Diagnostic Methods in Process Control,” Transactions, Vol. 37, No. 2, pp. 79-85, 1998.

ISA

B.19.3 Related Patents U.S. Patents: 1. 6,978,225: Networked control system with real time monitoring 2.

6,967,454: Monitoring apparatus and method for monitoring a pole circuit of an electrical power converter

3.

6,862,553: Diagnostics method and apparatus for use with enterprise controls

4.

6,745,232: Strobed synchronization providing diagnostics in a distributed system

5.

6,646,397: Integrated control and diagnostics system

6.

6,636,823: Method and apparatus for motor fault diagnosis

7.

6,529,135: Integrated electric motor monitor

8.

6,456,898: Press monitoring and control system

9.

6,374,195: System for monitoring and tracking tool and tool performance

10. 6,199,018: Distributed diagnostic system

Prognostics and Health Management of Electronics

220

B.20

Sentient Corporation

Sentient Corporation, an Idaho-based firm, develops machinery health management solutions for military, aerospace, and industrial applications. Since the company’s inception in 2001, Sentient has grown to 19 employees as of January 2007 and plans to continue its rapid growth. The company has received most of its funding from SBIR research contracts with the U.S. military. Sentient specializes in developing the prognostic algorithms and models for processing raw sensor data, determining condition, and estimating remaining life. The company’s expertise spans a wide range of mechanical and electromechanical systems, from industrial machinery to military turbine engines. Sentient’s philosophy is that the best prognostic technologies result from a thorough understanding of the physics behind component degradation and failure. Sentient maintains a significant physical testing and research program, including unique laboratory facilities for studying machine component degradation and failure. Sentient has particular expertise in rolling element bearings for industrial, aircraft propulsion, and space applications.

B.20.1 Approach to PHM Sentient has conducted several studies on bearing condition monitoring in reaction wheel applications for the Air Force and the Missile Defense Agency. One outcome of this research is a unique wireless sensor that is embedded in bearing cages to monitor temperature and instability. The sensor provides the sensitivity needed to manage lubricant flow for optimal torque and life characteristics. Sentient is currently developing a complete health management system for reaction wheels based on this technology. Sentient is a subcontractor to GE Aircraft Engines for the engine-bearing module under the DARPA Prognosis Program. Sentient is responsible for conducting all subscale bearing fault tests, working with GE to plan full-scale tests, developing a detailed physics-based model of bearing spall propagation, and integrating the model into the prognostic architecture. In cooperation with the JSF program and Pratt & Whitney, Sentient is trying to develop a general purpose modeling toolset for simulating the vibration produced by rotating machinery. This toolset will be used to model the JSF lift fan gearbox and other subsystems and will provide data for PHM development and validation. Sentient is also developing a physics-based data reduction algorithm for the Pratt & Whitney FlO0 turbine engine. This software will be integrated into the next engine controller upgrade and will greatly reduce the tremendous volume of performance and usage data acquired on modern engines without impacting the accuracy of downstream damage and lifing algorithms. This advance will make it easier to transition to usage-based lifing algorithms and will make it feasible to permanently archive data from every flight. Other PHM work includes an SBIR contract with NAVAIR to develop high-fidelity actuator models and advanced diagnostics for the converging nozzle on the JSF, and a project with the Army, Sikorsky, and Goodrich to develop advanced diagnostics and predictive prognostics for the UH-60 hanger and oil cooler bearing applications and integrate those algorithms into the Goodrich IMD-HUMS system.

Appendix B

221

B.20.2 Related Publications I.

N. Lybeck, B. Morton, A. Hess, J. Kelly, and S. Marble, “Modeling and Simulation of Vibration Signatures in Propulsion Subsystems,” Proceedings of the 2006 IEEE Aerospace Conference, March 4-1 1,2006, CD-ROM Paper No. 1351.

2.

S. Marble and D. Tow, “Bearing Health Monitoring and Life Extension in Satellite Momentum/Reaction Wheels,” Proceedings of the 2006 IEEE Aerospace Conference, March 4-1 1,2006, CD-ROM Paper No. 1352.

3.

S. Marble and B.P. Morton, “Predicting the Remaining Life of Propulsion System Bearings,” Proceedings of the 2006 IEEE Aerospace ConferencE, March 4-1 1, 2006, CD-ROM Paper No. 1350.

222

B.21

Prognostics and Health Management of Electronics

Scientific Monitoring, Inc.

Scientific Monitoring, Inc. (SMI) develops and sells equipment monitoring and asset management solutions for industries that own or operate capital-intensive, downtime-sensitive, or safety-critical assets. These industries include aviation, power generation, mass transportation, and oiligas exploration. Their solutions rely on advanced and intelligent software to monitor asset performance, predict system capabilities, alert on impending threats, and recommend corrective actions. Their customers benefit from reduced asset downtime, increased availability, optimized spare-parts inventory, increased maintenance efficiency, and reduced costs of operation and support. SMI was founded in 1993 as an R&D company with expertise in the design and development of aerospace control and integrated monitoring systems. Since 2005, SMI has started offering its monitoring software and asset management solutions to the commercial and industrial sectors. SMI is located Arizona with satellite offices in California, Ohio, and Washington State.

B.21.1 Approach to PHM SMI PHM solutions include both software- and hardware-based products as well as customer services for monitoring and management of the critical assets identified above. SMI offers two PHM product lines: (1) equipment monitoring and asset management software and (2) hardware and software that validate and verify monitoring and control implementations. SMI's tools provide leading-edge capability because they are based on SMI's unique, patented model-based techniques. These products are described below. The equipment monitoring and asset management product line features two software products: I-TrendTM and I-PredictTM. These products provide health management capabilities, including trending, diagnostics, prognostics, and decision support, for individual components as well as complete systems. The products support customers with legacy assets as well as future systems. Software environments include Web-based implementations, application libraries, or unique, customer-defined, stand-alone systems. I-TrendTM(or iTrend) provides trending, fault detection, fault isolation, and the alerting of faults. Software features include automatic generation and notification of alerts for timely action. I-PredictTbl (or iPredict) provides failure prediction, remaining-life projection, enterprise-level capability assessment, and decision support for corrective actions (such as operations and maintenance workscope planning). The validation and verification product line features hardware and software. The hardware product is the modeling and simulation test bench SIDAL (System Integration and Design Assurance Laboratory). The software products enable cost-effective V&V of these monitoring and control implementations. These products facilitate certification and maturation of asset health management software, solutions, or systems. SIDAL is a compact, reconfigurable computer which supports the modeling, simulation, and testing needs of all phases of a control and monitoring product's life cycle, from design to deployment (or sustainment). SIDAL is an all-in-one solution for real-time simulation, non-real-time analysis, and rapid prototyping for cost-effective V&V analysis. The V&V toolbox is a suite of tools for validation and verification of control, monitoring, and management models or software. The V&V toolbox is specifically tailored to software for systems subject to uncertainties and adaptive or changing environments.

Appendix B

223

B.21.2 Awarded Contracts In October 2003, the U.S. Army and its Future Combat Systems (FCS) lead system integrators Boeing and SAIC awarded a prime contract to Honeywell and a subcontract to SMI for the development and demonstration of the health management and decision support software system called the Platform-Soldier Mission Readiness System (PSMRS). The PSMRS will perform diagnostics, prognostics, and readiness assessment from both the functional and the physical availability standpoints for an FCS vehicle. In December 2004, Boeing awarded SMI a contract to provide analytical tools and software to further enhance the airplane data monitoring and decision support capabilities of the airplane health management (AHM) team in Boeing Commercial Aviation Services. AHM will use advanced analytic tools and algorithms provided by SMI to improve reliability and maintenance planning for Boeing’s AHM service customers. In February 2005, the U.S. Air Force Air Logistics Center awarded SMI a multiyear contract to enhance its National Security System (NSS) engine health management (EHM) software across the USAF engine fleet. The EHM tool is called DATT (Diagnostics, Analysis, and Trending Tool). DATT is a Windows-based, graphical analysis and reporting software that enables efficient EHM. The software is a part of the DoD’s NSS. In April 2005, U.S. Air Force Research Laboratory at Wright-Patterson awarded SMI a Phase 2 SBIR to develop an affordable V&V solution to facilitate certification of flight-critical software. SMI is developing a suite of tools for V&V while maturing the SIDAL system bench as the host platform for the affordable V&V solution. In May 2006, the U.S. Air Force Research Laboratory at Wright-Patterson awarded SMI a Phase 2 SBIR to develop a prototype of a CBM+ research environment for the fleet-wide aircraft/engine health information system as a part of the Global Combat Support System (GCSS) which was formerly called the Air Force Knowledge Services (AFKS). SMI is assisting the Air Force in the development of the data architecture and research environment tools. The EPRI awarded SMI a study contract to evaluate differences between aviation and power generation condition monitoring and recommend tools for power generation monitoring.

B.21.3 Related Publications 1.

P. Rendek et al., “Successful Trending and Diagnostics Technology Transition,” Proceedings of ASME International Gas Turbine and Aeroengine Conference, Montreal, Canada, May 2007.

2.

L. C. Jaw et al., “Verification of Flight Software with Karnough Map-Based Checking,” Proceedings of the 2007 IEEE Aerospace Conference, Big Sky, MT, March 2007.

3.

Y. Wang et al., “Demonstration Of A Reliability Centered Maintenance (RCM) Tool to Extend Engine’s Time-on-Wing (TOW),” Proceedings of the 2007 IEEE Aerospace Conference, Big Sky, MT, March 2007.

4.

A. Khalak and J. Tierno, “Influence of Prognostic Health Management on Logistic Supply Chain,” Proceedings of the 2006 American Control Conference, June 2006.

5.

L. C. Jaw, “Mathematical Formulation of Model-Based Methods for Diagnostics and Prognostics,” paper presented at the ASME Turbo Expo 2006, Barcelona, Spain, May 2006.

Prognostics and Health Management of Electronics

224

6.

G. Mink and A. Behbahani, “The AFRL ICF Generic Gas Turbine Engine Model,” paper presented at the Joint Propulsion Conference, Tucson, AZ, July 2005.

7.

L.C Jaw, “Recent Advancements in Aircraft Engine Health Management (EHM) Technologies and Recommendations for the Next Step,” Proceedings of Turbo Expo 2005: 50th ASME International Gas Turbine & Aeroengine Technical Congress, RenoTahoe, NV, June 6-9,2005.

8.

“Condition Based Monitoring Essentials,” Scientific Monitoring Inc. White Paper, available: http:l/www.scientificmonitoring.com/resources/papers.htm,accessed October 2005.

9.

W. Wang, L. C. Jaw, and J. Wang, “Petri Net Based Modeling Approach for Fault Affected UAV Subsystems Reconfiguration,” paper presented at the AIAA 1st Intelligent Systems Technical Conference, Chicago, Illinois, September 20-22,2004.

10. L. C. Jaw and W. Wang, “A Run-Time Test System for Maturing Intelligent SystedVehicle Capabilities-SIDAL,” Proceedinns of the 2004 IEEE Aerospace Conference, Big Sky, MT, March 6-13,2004, Vol. 6, pp. 3756-3763. 11. L.C. Jaw and H.T. Van, “Model-Based Fault Identification of Power Generation Turbine Engines using Optimal Pursuit,” Proceedings of the 2004 IEEE Aerospace Conference, Big Sky, MT, Vol. 5 , March 6-13,2004, pp. 3502-3506.

12. W. Wang and L.C. Jaw, “Practical Diagnostic Algorithms for Run-Time Systems,” Proceedings of the 2004 IEEE Aerospace Conference, Big Sky, MT, March 6-13, Vol. 5,2004, pp. 3476-3480. 13. “The State-of-the-Play in Engine Condition Monitoring,” Aircraft Technologv Engineering & Maintenance, Issue 66, pp. 54-60, October-November, 2003. 14. L.C. Jaw and D.N. Wu, “Anomaly Detection and Reasoning with Embedded Physical Model,” Proceedings of the 2002 IEEE Aerospace Conference Proceedings, Vol. 6, pp. 6-3073-6-3081, 2002.

15. L.C. Jaw, “Putting CBM and EHM in Perspective,” Maintenance Technology, pp. 14 18, November 200 1. 16. L.C. Jaw and R. Friend, “ICEMS: A Platform for Advanced Condition-Based Health Management,” Proceedings of the 2001 IEEE Aerosuace Conference, March 10-1 7 , 2001, Vol. 6, pp. 2909-2914.

17. L.C. Jaw, “Neural Networks for Model-Based Prognostics,” Proceedings of the 1999 IEEE Aerospace Conference, March 6-13, 1999, Vol. 3, pp. 21 - 28.

B.21.4 Related Patents U.S. Patents: 1. 6,898,554: Fault detection in a physical system 2.

6,87 1,160: Intelligent condition-based enginelequipment management system

3.

6,490,543: Lifeometer for measuring and displaying life systemslparts

Appendix B

B.22

225

Smartsignal Corporation

Smartsignal Corporation is a privately held software solutions company founded by the University of Chicago to make Similarity-Based ModelingTM (SBM) technology commercially available. The SBM technology engine in the EPI*Center software solution enables holistic, real-time analytics on instrumented, industrial equipment-plantwide and fleetwide. With real-time analytics, companies receive early predictive warning of developing equipment anomalies, indicative of reliability, availability, efficiency, and compliance problems. With this early warning, companies can improve equipment performance by preempting small problems before they become large problems. Smartsignal serves a growing list of Fortune 500 companies, currently in the power generation, pipeline, refinery, process, and aviation industries. Recent SmartSignal/client awards include: Smartsignal client, TransAlta, 2005 Marmaduke Award for excellence in O&M, Power Magazine; Smartsignal, finalist, “Commercial Technology of the Year,” Platts 2004 and 2005 Global Energy Awards; Smartsignal, “Top 40 North American M2M Innovators,” The FocalPoint Group; Smartsignal, M2M Magazine, Value Chain Award; Smartsignal, Editor’s Choice Award, 3/01, Control Engineering Magazine; Smartsignal, Annual Award for Business Excellence, Business Ledger, among many others. Smartsignal has many case studies available on its website, including “Kansas City Power & Light,” “The Calpine Westbrook Experience,” “Wisconsin Public Service Corporation,” “Panhandle Energy: Compressor Maintenance Expense Cut,” “Early Detection of Developing Bearing Issue on Turbine Generator Avoids Unscheduled Downtime,” and “SmartSignal EPI*Center Installation at APS Palo Verde.”

B.22.1 Approach to PHM Smartsignal’s EPI*Center software analyzes historical data from available and relevant sensors and data historians for each serial-number-specific piece of equipment. It correlates the data and constructs an empirical model of normal. holistic, expected operations, under all operating and weather conditions, that is, it develops a personality for each asset. It generates an estimated model of how each data point-typically, sensor value-should perform over time and relative to each other. During real-time operation, 24/7, Smartsignal compares model-expected sensor values against actual real-time data collected from the piece of equipment. Smartsignal’s EPI*Center then calculates residual values by subtracting the actual real-time sensor values from expected values generated by the Smartsignal model. Analysis of the residuals identifies very subtle process deviations well within normal threshold alarm limits and detects the earliest possible signs of equipment malfunction. Smartsignal’s EPI*Center software then identifies only the abnormal readings on an exception-based watchlist of anomalies, thereby allowing its clients to avoid irrelevant data and to focus on impending problems. It also provides initial diagnoses. The watchlist can be viewed remotely or on-site, and notifications of anomalies can be made via alerts on pagers and email. Developing problems can be communicated in real time to maintenance managers for corrective action. Smartsignal works across all machines, brands, OEMs, and sensors, providing warning earlier than traditional condition-monitoring methods.

Prognostics and Health Management of Electronics

226

B.23

Smiths Aerospace (GE)

Smiths Aerospace is a leading transatlantic aerospace systems and equipment company, with more than 10,000 employees and 52 billion revenues globally. It was bought by GE in mid-2007. The company supplies to military and civil aircraft and engine manufacturers and is a world leader in digital computing, electrical power, mechanical systems, engine components, and customer services.

B.23.1 Approach to PHM Smiths Aerospace has pioneered the development and production of helicopter HUMSs since the late 1980s. In order to address the need for further prognostic, decision making. usage, and fleet management capabilities for fixed-wing aircraft, helicopters and gas turbines, Smiths has developed Flight Usage Management Software (FUMSTM).The Smiths health management collaborative efforts have also included aircraft electrical power systems prognostics and health (AEPHM) work with Boeing and probabilistic diagnostic and prognostic system (ProDAPS) work with AFRL. Smiths has not published an approach specific to electronics but has claimed that the FUMST" technologies can be applied to the diagnosticsiprognostics of any aircraft system, including avionics. The existing HUMS offer one or more of the following: drive train diagnostics, rotor track and balance (RTB), exceedence monitoring, engine power assurance, and simple usage monitoring. The Smiths civil and military HUMS have been fitted into many helicopters and accumulated more than one million in-service flying hours. GenHUMS is a multi-rotorcraft-capable integrated data acquisition and recorder system with built-in HUMS capability, which monitors the health of mechanical components in a rotorcraft, including the rotor, engine, transmission, and gearbox. GenHUMS has been developed for the UK MOD and is being installed on the Boeing CH-47 Chinook, Bell 4 12, Bell Agusta BA-609, and Sikorsky CH-53 fleets. Since the early 1990s, Smiths has continued its research efforts to improve the HUMS algorithms and to expand the HUMS usage capability through the use of mathematical models including anomaly detection algorithms. These efforts were the bases from which the FUMSTMtechnology efforts have been launched. Since the mid-I990s, Smiths has worked closely with the UK MOD in developing FUMSTM.It was designed based on the requirements of MOD for supporting aircraft health monitoring. MOD was having problems with the existing expert-labor-intensive procedures for analysis of health data, correlation of disparate data sources, and sharing of information. Hence, an automated analysis tool with the capability to quickly fuse and analyze many information sources was needed. The FUMSTMframework is designed to interface with, analyze, fuse, and mine aircraft data from different sources; support openness and rapid configuration of diverse applications by the user; and enable access control to optimize the distribution of information such that each user can be provided with information consistent with his or her role. The framework is a series of static modules to which applications are added. Each module is registered with the framework and then dynamically loaded at run time. The processes within registered modules can call and pass data between each other. In this way, the framework supports open-system architecture. An interface is defined to allow the translation of external data into common data format (CDF) to give quick access to and fast processing of huge volumes of flight data by successive data caching from permanent storage media. CDF stores integers, strings, floating point numbers, and binary large object

Appendix B

221

data. It also stores a collection of CDF objects in a way similar to storing database tables and can handle flight data sampled at different rates. The FUMSTMsoftware framework has analyzed data downloaded from the Eurofighter, F16, Tornado, Harrier, Chinook, Lynx, Apache, RB199, and Pegasus. The FUMSTM software provides flight data display, analysis, and prognostics assessments to optimize aircraft management and component-level usage. It has the capability to perform model-based data synthesis of virtual sensors and intelligent data trending and has a graphical user interface for logistics/maintenance databases. The FUMST“ software includes a suite of statistical analysis tools, singular value decomposition, principal component analysis, and neural networks for nonlinear multivariant analysis. It also includes a range of signal processing capabilities such as resampling, signal averaging, fast fourier transform, short-term fourier transform, wavelet transform, filtering and signal enhancement, and demodulation via Hilbert Transform. Smiths claims that by using these tools the user can confirm whether alerts are due to signal corruptions or fault cases. Since 2001, Smiths and BAE Systems have launched collaborative work to evolve a certifiable, practical structural prognostic health management (SPHM) system using the FUMSTMnonadaptive prediction models. The Smiths models were trained (calibrated) using strain data from three operational airplanes flown in four configurations. The models were calibrated for four structural locations chosen by BAE Systems. After calibration, BAE Systems only supplied flight data covering four configurations and 15 years of operations of another aircraft and did not supply strain data. Smiths applied its calibrated models to the supplied flight parameters and blindly predicted fatigue. After publishing the blind test results, the fatigue values computed from strains were supplied for comparison with the blind test results. For three locations, the model accuracies were better than those of a strain gauge system with 1% error; for the fourth location, the model error was less than the fatigue error produced from strains with 2% error. Therefore, it was concluded that the Smiths models could form the core of affordable, certifiable, and accurate SPHM systems. For each structural location, a single Smiths model accurately predicted fatigue for various configurations across potential structural repairs, different aircraft, and for a variety of operations over 15 years without the need for midlife recalibrations. Consequently, BAE Systems is currently implementing these models in the airborne system of the JSF. The SmithsiBoeing Aircraft Electrical Power Systems Prognostics and Health Management (AEPHM) program established the feasibility of detecting and trending actuator and he1 system degradations via electrical signature analysis as well as the role of the power drive unit (PDU) as an effective device for implementing arc fault protection and as a means to perform data collection and processing in support of electrical system PHM. The AEPHM architecture supports system-level fusion of evidence and state information from multiple sources to improve estimates of degradation. Phase I of the program was completed with an end-to-end, hardware-in-the-loop (electric actuator, fuel pump, fuel valve, arc fault, and power distribution unit) demonstration with on-line data generation to show the integration of the technology into a realistic setting. Probabilistic Diagnostic and Prognostic System, or ProDAPS, provides artificial intelligence tools to facilitate the extraction of knowledge from system health data, reasoning with knowledge for diagnostics and prognostics, and decision support to aid maintenance and logistics. These tools are founded upon two core technologies: a data-mining engine and a probabilistic network inference engine. These core tools exist as stand-alone functional tools but their real power is revealed when they are utilized as the backbone for other advanced tools, for example, an anomaly detection system. The ProDAPS tools exist as “open” components that can be mixed and configured for different

Prognostics and Health Management of Electronics

228

application scenarios from on-board analysis to advanced ground station functionality. Particular emphasis is being given to the research and development of techniques that allow these tools to scale.

B.23.2 Related Publications 1. R. Callan, B. Larder, and J. Sandiford, “An Integrated Approach to the Development of an Intelligent Prognostic Health Management System,” IEEE Proceedings of Aerospace Conference, March 2006. 2.

K. Keller, K. Swearingen, J. Sheahan, M. Bailey, J. Dunsdon, K. Wojtek Przytula and B. Jordan, “Aircraft Electrical Power Systems Prognostics and Health Management,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 2006.

3. H. Azzam, F. Beaven, and W. Wallace, “Optimisation of Fusion and Decision Making Techniques for Affordable SPHM,” Proceedings of the 2006 IEEE Aerospace Conference, March 2006, pp. 1-10,4-11. 4.

H. Azzam, J. Cook, P. Knight, and E. Moses, “FUMST” Fusion and Decision Support for Intelligent Management of Aircraft Data,” Proceedings of the 2006 IEEE Aerospace Conference, March 2006, pp. 1-1 6,4-11.

5 . N. H. Wakefield, P. R. Knight, K. P. J. Bryant, and H. Azzam, “FUMSTh’ Artificial Intelligence Technologies Including Fuzzy Logic for Automatic Decision Making,” Annual Meeting of the North American FUZZYInformation Processing Society, Detroit. MI, June 26-28,2005, pp. 25-30. 6. P. Knight, J. Cook, and H. Azzam, “Intelligent Management of HUMS Data,” Proceedings of the Institute of Mechanical Engineers, Part G: Journal of Aerospace Engineering, Vol. 21 9, No. 6, pp. 507-524(18), 2005. 7.

K. Keller, B. Jordan, and A. Del Amo, “Aircraft Electrical Power Systems Prognostics and Health Management,” paper presented at the Society of Automotive Engineers Conference, Reno, NV, November 2004.

8.

H. Azzam, J. Cook, and S. Driver, “FUMSTMTechnologies for Verifiable Affordable Prognostics Health Management (PHM),” IEEE Aerospace Conference Proceedings, Vol. 6, pp. 3764-3781, March 2004.

9.

M. Wallace, H. Azzam, and S. Newman. “Indirect Approaches to Individual Aircraft Structural Monitoring,” Proceedings of the Institution of Mechanical Engineers, Part G: p Vol., 218, NO. G5, pp. 329-346(18), 2004.

10. J. Cook, J. Gourlay, and L. Boardman, “Contrasting Approaches to HUMS Validation-A Military User’s Perspective,” Proceedings of the 2004 IEEE Aerospace Conference, VoI.6, pp. 3748-3755,6-13, March 2004: 11. R. Callan and B. Larder, “The Implementation of Advanced Diagnostic and Prognostic Concepts-Practical Tools for Effective Diagnostic and Prognostic Health Management,” IEEE Proceedings of Aerospace Conference, March s-15, 2003, pp. 3 165-3 175. 12. A. Draper, “The Operational Benefits of Health and Usage Monitoring Systems in UK Military Helicopters,” Third International Conference on Health and Usage Monitoring-HUMS2003, DSTO Platforms Sciences Laboratory, Melbourne. Australia, December 2002.

Appendix B

229

13. R. Callan and B. Larder, “The Development and Demonstration of a Probabilistic Diagnostic and Prognostic System (ProDAPS) for Gas Turbines,” Proceedings of IEEE Aerospace Conference, Big Sky, MT, March 2002. 14. F. Beaven, H. Azzam, I. Hebden, and L. Gill. “A Mathematical Network Approach to Structural Prognostic Health Management,” Proceedings of the First European Conference on Structural Health Monitoring, Paris, France, 2002.

15. B. Larder, H. Azzam, C. Trammel, and G. Vossler, “Smiths Industries HUMS: Changing the M from Monitoring to Management,” IEEE Aerospace Conference Proceedings, March 18-25,2000, Vol. 6, pp. 449455. 16. H. Azzam and N. Harrison, “A Demonstration of the Feasibility and Performance of an

Intelligent Management System Operating on HUMS In-Service Data,” CAA Paper 99006, The Civil Aviation Authority, 1999. 17. H. Azzam, “The Use of Mathematical Models and Artificial Intelligence Techniques to Improve HUMS Prediction Capabilities,” Proceedings of Innovation in Rotorcraft Technology, The Royal Aeronautical Society, London, June 1997, pp. 16.1-16.14. 18. H. Azzam, “A Practical Approach for the Indirect Prediction of Structural Fatigue from Measured Flight Parameters,” Journal of Aerospace Engineering, Proceedings of the Institution Mechanical Engineers, Part G, Vol. 21 1, No. 1, pp. 29-38(10), 1997.

Prognostics and Health Management of Electronics

230

B.24

Sun Microsystems

Sun Microsystems is an industry leader in servers, storage, software and services with a primary focus on network computing. Sun was founded in 1982 and is headquartered in Santa Clara, California. Sun’s manufacturing facilities are located in Beaverton, Oregon, and Linlithgow, Scotland. Sun’s servers and workstations are based on its own SPARC and AMD’s Opteron processors, the Solaris and Linux operating systems, the NFS network file system, and the Java platform. From June 2005, Sun also produces laptops called Ultra 3 Mobile Workstation. With S1 I .17 billion in annual revenues (FY 2005), Sun ranks 194 on the Fortune 500 and can be found in more than 100 countries.

B.24.1 Approach to PHM The electronic prognostics work at Sun Microsystems is driven by the system dynamics characterization and control (SDCC) team. SDCC conducts its research at Sun‘s Physical Sciences Center in San Diego, California. Sun’s approach for environmental management and electronics prognostics is based on the concept of monitoring and reasoning of variables that are indicative of impending failure. Sun uses the term continuous system telemetry harness (CSTH) for its PHM approach. The first step in this approach is to identify the appropriate parameters to monitor. The next step involves the characterization of these variables so as to develop a model or algorithm that can be used during real-time implementation. CSTH collects system data such as temperature, current, voltage, CPU loads, network 110 traffic, and “canary” variables and provides both predictive and reactive failure information such as incipient failure annunciation, root cause analysis, software aging and rejuvenation, and responsible failure mechanisms. During implementation, the performance variables are continuously monitored using physical sensors. Sun’s high-end servers contain hundreds of physical sensors (distributed board, module, and ASIC temperature sensors, voltages, and currents) that collect data and protect the system by detecting when a parameter is out of bounds and then shutting down a component, board, domain, or entire system. These sensors, along with software features that monitor performance logs, are used as input data for CSTH. The variables are stored in a “black-box”-type recorder that is made of a circular file structure. A circular file has a wrap-around structure that behaves like a standard sequential file until it is full. As records are written to a circular file, they are appended to the tail of the file; when the file is filled, the next record added causes the block at the head of the file to be deleted and all other blocks to be logically shifted toward the head of the file. Circular files are particularly useful as history files and debugging files. The file retains data collected at high sampling rates for 72 h and data collected at lower sampling rate for 30 days. The sampling rates are configurable, depending upon the particular application and customer requirements. CSTH utilizes two advanced statistical pattern recognition tools for electronics prognostics called the multivariate state estimation technique (MSET) and the sequential probability ratio test (SPRT). A brief background of each algorithm is given below. The mathematical foundations of these algorithms can be obtained from textbooks and journals. The MSET was developed at Argonne National Laboratories in the late 1990s and has been proven in a broad spectrum of safety-critical and mission-critical application domains. The technique is designed specifically for providing an early warning system for performance of sensors, equipment, and plant processes. In addition to being named one of the top 100 innovations of 1998 by R&D Magazine, the MSET can be found in a number of

Appendix B

23 1

journals and conference papers. The MSET has been licensed to Smartsignal Corporation based in Lisle. Illinois. In CSTH the MSET is used for analysis of dynamic signals. The SPRT was developed by Abraham Wald in the early 1940s and has been well documented in several statistics textbooks. It is primarily used for sequential hypothesis testing of stationary time-series signals. The accuracy of SPRT is dependent on the initial characterization that builds the distribution of the degraded signal values. The SPRT is used for analyzing stationary signals. The time-series signals obtained from the various monitored variables are analyzed using the MSET tool. For each signal being monitored, an expected signal is generated in real time. A new signal of residuals is then generated. This residue is the arithmetic difference of the actual and expected time-series signal values. These values are used as input to the SPRT tool. The SPRT monitors the residuals between the actual observations and the estimates that the MSET predicts on the basis of the correlated variables. SPRT provides an alarm if these deviations are of concern. When a sensor failure is detected, the MSET module swaps out the degraded sensor signal and swaps in an “analytical estimate” of the physical variable. The analytical estimate is supplied by the MSET and is called an “inferential sensor.” This analytical estimate can be used indefinitely or until the field replaceable unit (FRU) containing the failed sensor needs to be replaced for other reasons. This approach helps if “false” sensor information is received. Sun has specific instances of its electronics prognostics system on both midrange (firmware) and high-end (software) servers. Sun plans to incorporate this technology into operating systems and other layers of the software stack in the future. Sun claims to have achieved substantial benefits due to PHM implementation, including: advance warning of failure, availability of signal data for faster and more accurate root cause analysis, availability of data for software aging and rejuvenation, and captured signatures which helped reveal the mechanisms responsible for no-trouble-found (NTF) events. One of Sun’s projects is called Secure Adhoc Communications, which involves investigating and developing mechanisms to ease the deployment of secure wireless sensor networks. In the project, tiny, wireless, battery-powered devices with the ability to sense and respond to their surroundings are being developed. Proposed applications for networks of these ”wireless sensors” cover many diverse areas ranging from intelligent agriculture to homeland security to health care. Typical applications require the network to be self-organizing, self-healing, remotely upgradeable, and secure from unauthorized access and eavesdropping. These requirements are made challenging by multiple constraints, including the low cost of devices; limited energy, memory, and computation, and the dynamic nature of wireless networks. Another of Sun’s projects is called Sun Small Programmable Object Technology (Sun SPOT), which involves small Java powered wireless transducer devices. The project consists of an exploration of wireless transducer technologies that will enable emerging network realization. Sun is building a hardware and software research platform to overcome the challenges that currently inhibit development of tiny sensing devices. These changes may dramatically affect the nature and type of wireless sensor network applications. The Sun SPOT hardware platform consists of small, battery-operated, wireless devices optimized for the JavaT”virtual machine (VM). This VM acts as both operating system and software application platform. Sun is developing device drivers and system code in Java to make a flexible and powerful platform for software application development. In addition, Sun is pursuing a full set of high-level tools to support small, wireless devices as well as networking and security infrastructure to enable a new generation of applications and

232

Prognostics and Health Management of Electronics

devices. Sun is aggressively experimenting with a wide range of applications from environmental monitoring to robotics to gesture-based interfaces. Another of Sun’s projects is called Emergence, which involves exploring making large, powerful, robust systems from small, weak, fragile devices. The concept behind project Emergence is that the world is awash in large, complex systems networked together, but most of these systems still treat each individual component as a separate entity rather than as part of a larger whole. Emergence takes a holistic view of large systems and hopes to be able to harness the unrealized power of the entire system. Project Emergence is working to solve: how we deal with continual failure, how to achieve true scalability, and how to make systems that display robustness through replication. Computer systems, as they stand today, only partially solve these problems and each encounters limits as system size and complexity are increased. These goals are to be reached using a combination of agent-based simulations, mathematical modeling, and real-world system construction and observation. Ultimately, the Emergence project intends to build a system that can use emergent system principles to do useful work as well as a set of principles that can be codified, enumerated, and reasoned about. This system would be able to display robustness of function in the face of disruptive, unforeseen perturbation. Early research has already yielded interesting results in the area of network formation and small sensor interaction. A key target is computer users who want systems that continue to run at some level despite encountering failure and unforeseen circumstances and ultimately for systems that self-configure and self-repair.

B.24.2 Related Publications 1.

K. Vaidyanathan and K. C. Gross, “Monte Carlo Simulation of Telemetric Signals for Enhanced Proactive Fault Monitoring of Computer Servers,” paper presented at the Simulation Multi-Conference, Philadelphia, PA, July 2005.

2.

K. Whisnant, K. C. Gross, and N. Lingurovska, “Proactive Fault Monitoring in Enterprise Servers,” paper presented at the IEEE International Conference in Computer Science & Computer Engineering, Las Vegas, NV, June 2005.

3.

E. Schuster and K. C. Gross, “Dynamic System Characterization of Enterprise Servers via Nonparametric Identification,” paper presented at the American Control Conference, Portland, OR, June 2005.

4.

K. C. Gross and E. Schuster, “Spectral Decomposition and Reconstruction of Telemetry Signals from Enterprise Computing Systems,” paper presented at the IEEE Conference in Computer Science & Computer Engineering, Las Vegas, NV, June 2005.

5.

K. Vaidyanathan and K. C. Gross, “Monte Carlo Simulation for Optimized Sensitivity of Online Proactive Fault Monitoring Schemes,” paper presented at the International Conference on Computer Design, Las Vegas, NV, June 2005.

6. A. Urmanov and K. C. Gross, “Failure Avoidance in Computer Systems,” Proceedings 59th Meeting, of the Society for Machinery Failure Prevention Technology, Virginia Beach, VA, April 18-2 1,2005. 7 . K. Vaidyanathan and K. C. Gross, “Proactive Detection of Software Anomalies through MSET,” Proceedings of IEEE Workshop on Predictive Software Models (PSM-20041, Chicago, IL, September 17-19,2004.

8.

B. Guenin, K. C. Gross, A. Gribok, and A. Urmanov, “A New Sensor Validation Technique for the Enhanced RAS of High End Servers,” paper presented at the

Appendix B

233

International Conference on Machine Learning Models, Technologies and Applications (MLMTA‘04), Las Vegas, NV, June 21- 24,2004. 9.

K. C. Gross and K. Mishra, “Improved Methods for Early Fault Detection in Enterprise Computing Servers Using SAS Tools,” paper presented at the SAS Users Group International (SUGI 29), Montreal, Canada, May 9-12, 2004.

10. K. Mishra and K. C. Gross, “Dynamic Stimulation Tool for Improved Performance Modeling and Resource Provisioning of Enterprise Servers,” Proceedings of 14th IEEE International Symposium on Software Reliability Ena. (ISSRE’031, Denver, CO, November 2003. 11 K. C. Gross and K. Mishra, “Frequency-Domain Pattern Recognition for Dynamic System Characterization of Enterprise Servers,” Proceedings of 2003 International Conference on Artificial Intelligence (ICAI’031, Las Vegas, NV, June 23-26,2003.

12 K. C. Gross, W. Lu, and K. Mishra, “Spectral Decomposition of Performance Variables for Dynamic System Characterization of Web Servers,” Proceedings of the Conference of the SAS Users Group International (SUGI28), Seattle, WA, March 30-April 2, 2003. 13 K.J. Cassidy, K.C. Gross, and A. Malekpour, “Advanced Pattern Recognition for Detection of Complex Software Aging Phenomena in Online Transaction Processing Servers,” Proceedings of International Performance and Deuendabilitv Svmuosium, Washington, DC, June 23-26,2002.

B.24.3 Related Patents U.S. Patents: 1. 6,950,773: Detecting thermal anomalies in computer systems based on correlations between instrumentation signals 2.

6,950,773: Pattern recognition for proactive identification of air-flow disturbance perturbations in enterprise computer systems

3.

6,789,049: Dynamically characterizing computer system performance by varying multiple input variables simultaneously

4.

6,763,32 1 : Multifrequency sinusoidal tool for service level availability measurement

5.

7,283,919: Determining the quality and reliability of a component by monitoring dynamic variables

Prognostics and Health Management of Electronics

234

B.25

VEXTEC Corporation

Founded in 2000, VEXTEC is a privately owned software, consulting, and services company headquartered in Brentwood, Tennessee. The company has a comprehensive staff of engineering, scientific, and software development and business application specialists located in Tennessee, California, Michigan, and Louisiana. Also VEXTEC is engaged as a PHM consultant to the power industry in Japan.

B.25.1 Approach to PHM In 2005, the U.S. Navy awarded VEXTEC a multiyear contract for development of electronic prognostics technology for the JSF aircraft development program. VEXTEC’s investigative research indicates that electronic devices fail mainly due to material failure within the module, printed circuit board, or chip scale package. VEXTEC’s prognosis software framework for prediction of electronic power supply system failure is based upon material fatigue. This is a reliability simulation software which accounts for individual interconnect and overall product reliability scale-up. The framework can be used to evaluate a board with many interconnects, packages, and devices while considering failure interrelationships. In the future the framework will also be used to evaluate noninterconnect fatigue failure modes of electronic circuit boards. The VEXTEC prognostics approach is based on prediction of the global stresses for each of the monitored components using FEA-based surrogate modeling. The global loads are translated to the material microstructural level through 3D representative volume element modeling. Thereafter, VPS-MICROTMpredicts crack nucleation and small and large crack growth through a Monte Carlo type probabilistic analysis. VEXTEC’s core PHM product is called VPS-MICROTM,a material-based simulation software tool that accounts for interconnect variability and predicts probability of failure as a function of the operating conditions. VPS-MICROTMmodels fatigue damage (dislocations, slip band, small and long cracks) and damage interaction with material microstructural features such as crystallographic planes and phase boundaries to allow for simulation of real material degradation modeling. Using Monte Carlo techniques, VPS-MICROTMcan model parameters such as loading, temperature, and microstructure to simulate real-world conditions by varying randomly with time.

B.25.2 Related Publications 1.

L. Nasser, R. Tryon, and A. Dey, “Material Simulation-Based Electronic Device Prognosis,” paper presented at the IEEE Aerospace Conference, Big Sky, MT. March 5-12,2005, pp. 1-6.

2.

L. Nasser, and R. Tryon, “Integration of Material-Based Simulation into Prognosis Architectures,” Proceedings of the Aerospace Conference, March 6-1 3, 2004, Vol. 6, pp. 3742-3747.

3.

L. Nasser, and R. Tryon, “On-board Prognostic System for Microstructural-Based Reliability Prediction,” Proceedings of the Aerospace Conference, March 8-1 5 , 2003, Vol. 7, pp. 3305-33 11.

Appendix B

B.26

235

National Aeronautics and Space Administration

The National Aeronautics and Space Administration (NASA) is an agency of the U.S. federal government responsible for the nation’s public space program. NASA grew out of the National Advisory Committee on Aeronautics (NACA), which had been researching flight technology for more than 40 years. Today, NASA conducts its work in four principal organizations, called mission directorates: Aeronautics, Exploration Systems, Science, and Space Operations. There are 10 NASA field installations, including Ames Research Center, Dryden Flight Research Center, Glenn Research Center, Goddard Space Flight Center, Jet Propulsion Laboratory, Johnson Space Center, Kennedy Space Center, Langley Research Center, Marshall Space Flight Center, and Stennis Space Center. NASA Ames Research Center, located at Moffett Field, California, was founded December 20, 1939, as an aircraft research laboratory by the NACA and in 1958 it became part of NASA. With over $3.0 billion in capital equipment, 2300 research personnel, and a $600 million annual budget, Ames plays a critical role in virtually all NASA missions in support of America’s space and aeronautics programs.

B.26.1

Approach to PHM

At NASA Ames, health management concepts and technologies are playing a critical role in the new vision for space exploration. While much of the current troubleshooting for crewed systems is done on the ground with a standing army of experts, as humans venture farther out of low earth orbit it becomes important to migrate that health management functionality to the crewed system itself so that the safety of the crew and the likelihood of mission success are not adversely impacted by the increasing delays of communications with earth. This will also lead to more sustainable and cost-effective operations. In order to achieve this migration of functionality it is necessary to understand and mature the health management technologies that will be employed for exploration vehicles and habitats. The applicability of health management technologies is not limited to crewed systems. Any complex engineering system is likely to show improvements in affordability, reliability, and effectiveness by the incorporation of health management concepts and technologies. The Intelligent Data Understanding (IDU) Group at NASA Ames has developed a suite of data-driven fault detection algorithms using unsupervised, semisupervised, and supervised learning. New learning methods include Gaussian mixture models, hidden Markov models and Kalman filtering, orca, and virtual sensors. These methods characterize nominal behavior to detect off-nominal situations. The IDU Group used Space Shuttle main engine (SSME) test stand data from Rocketdyne to design algorithms that will aid in the early detection of impending failures during operation. Methods implemented on SSME data will be improved, extended, and used for future SASA platforms such as the crew exploration vehicle (CEV) and crew launch vehicle (CLV). The Advanced Diagnostics and Prognostics Testbed (ADAPT) is a facility developed at NASA Ames for supporting the development of diagnostic and prognostic models, for evaluating advanced warning systems, and for testing diagnostic tools and algorithms against a standardized testbed. The facility’s hardware consists of an electrical power system with components for power generation, storage, and distribution. Over a hundred sensors report the status of the system to the test article that monitors the health status of the system. The testbed provides a controlled environment to inject failures, either through software or hardware, in a repeatable manner.

Prognostics and Health Management of Electronics

236

The health management application currently deployed is the Hybrid Diagnostic Engine (HyDE). HyDE is a model-based software tool for diagnosing systems. It provides online system mode tracking, fault detection, and fault isolation to the component level. HyDE itself is a general inference engine: it is adapted to specific systems by loading a model of the system. In future efforts health management applications from industry, academia, and government employing other techniques will be integrated and tested. These techniques may be model based or data driven, addressing fault detection, isolation, and recovery, prognostics, variable autonomy, or data and diagnostic fusion concepts. It is anticipated that these technologies will exhibit different strengths and weaknesses that will need to be considered when constructing a health management solution for a particular system. Other ongoing projects at NASA Ames include investigation of damage propagation mechanisms on select safety-relevant actuators for aeronautics, investigation of damage mechanisms on aircraft wiring insulation, and investigation of damage propagation mechanisms for critical electrical components in select avionic equipment.

B.26.2

Related Publications

1.

D. Maclise, A. Goforth. A. Sweet, D. Sanderfer, J. Camisa, G. Temple, D. Nishikawa, E. Barszcz, L. Olson, and S. Uckun, “Advanced Diagnostics and Prognostics Testbed (ADAPT).” paper presented at the First International Forum on Integrated System Health Engineering and Management in Aerospace, Napa, CA, November 7-1 0,2005.

2.

M.A. Schwabacher, “A Survey of Data-Driven Prognostics,” Proceedings of the AIAA Infotech@,Aerosuace 2005 Conference, Arlington, VA, September 26-29, 2005.

3.

A.N. Srivastava, “Discovering System Health Anomalies Using Data Mining Techniques,” Proceedings of the Joint Army Navy NASA Air Force Conference on Propulsion, Charleston, SC, June 2005.

4.

K. Tumer and A. Agogino, “Complexity Signatures for System Health Monitoring,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 5-12, 2005, pp. 3803-3813.

5.

M. Schwabacher, “Machine Learning for Rocket Propulsion Health Monitoring.” in Proceedings of the SAE World Aerospace Congress, Vol. 114-1, Society of Automotive Engineers, Warrendale, PA, 2005.

6.

E. Barszcz, M. Mosher, and E. Huff, “Healthwatch-2 System Overview,” paper presented at the American Helicopter Society 60th Annual Forum, Baltimore, MD, June 7- 10,2004.

7.

A. Gross, A. Patterson-Hine, B. Glass, and J. Pallix, “Integrated System Health Management for Reusable, In-Space Transportation Systems,” Proceedings of the 54th International Astronautical Congress of the International Astronautical Foundation, the International Academy of Astronautics, and the International Institute of Space Law, Bremen, Gemany, September 29-October 3,2003.

8.

M. Schwabacher, J. Samuels, and L. Brownston, “The NASA Integrated Vehicle Health Management Technology Experiment for X-37,” Proceedings of the SPIE AeroSense 2002 Symposium, 2002.

Appendix B 9.

231

I. Tumer and E. Huff, “Using Triaxial Accelerometer Data for Vibration Monitoring of Helicopter Gearboxes,” Proceedings of DETC’O 1 200 1 ASME Design Engineering Technical Conference, Pittsburgh, PA, September 9-12,2001.

10. A. Patterson-Hine, W. Hindson, and D. Sanderfer, “A Model-based Health Monitoring and Diagnostic System for the UH-60 Helicopter,” paper presented at the American Helicopter Society 57th Annual Forum, Washington, DC, May 9-1 1, 2001. 11. J. Bardina, G.J. Follen, T.M. Blaser, W.R. Pavlik, D. Zhang, and X. Liu, “Integrated Airplane Health Management System,” Proceedings of the 1St JANNAF Modeling and Simulation Subcommittee Meeting, Monterey, CA, November 13-1 7, 2000. 12. I. Tumer and A. Bajwa, “A Survey of Aircraft Engine Health Monitoring Systems,” Proceedings of the 35thAIAA Joint Propulsion Conference and Exhibit, Los Angeles, CA, June 20-24, 1999.

Prognostics and Health Management of Electronics

238

B.27

Sandia National Laboratories

Sandia National Laboratories is a multiprogram government laboratory primarily involved in national defense R&D, energy, and environment projects. Sandia’s original mission of providing engineering design for all nonnuclear components in the nation’s nuclear weapons continues today, but Sandia now also performs a wide variety of national security R&D work. Sandia’s mission is to meet national needs in five key areas: nuclear weapons, nonproliferation and assessments, military technologies and applications, energy and infrastructure assurance, and homeland security.

B.27.1 Approach to PHM The Optimization and Uncertainty Estimation Department at Sandia worked in collaboration with Lockheed Martin to develop prognostic algorithms for the F- 16 accessory drive gearbox to enable less-expensive predictive, rather than routine, maintenance. Located directly in front of the engine, the accessory drive gearbox operates a variety of wing and tail surface controls and provides an engine starter. The PHM team developed mathematical formulas to manipulate sensor data to answer the question “What is the health of this component and what is the best thing to do?” in a variety of situations. Researchers worked on the PHM concept by attaching microsensors to shop instruments in Albuquerque and at a DOE facility in Kansas City and used the data to develop the life prediction algorithms. Sandia has been considering the establishment of a PHM Center of Excellence (COE) in collaboration with the DOE, the DoD, industry and academia. The purpose is to champion technology development and technology test and validation capabilities. Sandia claims that the COE will advance state-of-the-art PHM through experimental analysis and relevant research, comprehensive science-based modeling, simulation, experimentation, and technology transfer. On December 6-8, 2004, the Sandia PHM COE held its Phase 1 Advisory Board Meeting, which was open to all members of the PHM community. However, at this time, the COE has not yet been created.

B.27.2 Related Publications Internal publications only (nonaccessible).

Appendix B

B.28

239

United States Air Force

The U.S. Air Force (USAF) is the aviation branch of the U.S. armed forces. The mission of the USAF is “to defend the United States and protect its interests through air and space power.” It was created as a separate branch of the U.S. government on September 18, 1947. The USAF is the largest modern air force in the world, with over 7000 aircraft in service and air bases around the world.

B.28.1 Approach to PHM As of January 2002, one of the ongoing research projects is called predictive failures and advanced diagnostics (PFAD). The objective of this project is to reduce aircraft downtime by enhancing the capability of maintainers to identify the causes of system failures through better diagnostics and, where possible, identify imminent system failures (failure prognostics) so repairs can be made more quickly. The approach involves researching the various areas that make up the diagnostics and prognostics process and focus on the improvements that offer the best return on investment. Initial efforts involve an analysis of the diagnostics process, identification of those variables used to diagnose faults, identification of other variables for which data may be available (such as built-in test sensor data), and identification of historical information (such as failure rates and component failure histories for specific aircraft and components and for fleet aircraft and components). These data sources will then be used to develop advanced diagnostic algorithms. The algorithms will employ state-of-the art pattern recognition techniques, data-mining applications, intelligent agents, and self-adapting, artificial intelligence techniques. Based on work in the diagnostics area, aircraft prognostic techniques will be investigated. A complete PFAD system will be defined in a concept of operations and system architecture report, and a subset of the PFAD tool suite will be developed and tested. It is hoped that the new diagnostics capability will significantly increase the accuracy with which technicians are able to diagnose the causes of system failures, thereby restoring aircraft to operational status sooner and reducing the consumption of spare parts. Prognostics capability will make it possible to replace about-to-fail parts before they fail, reducing system failures, in-flight aborts, and aircraft accidents. The Air Force Research Laboratory (AFRL) teamed up with Lockheed Martin to conduct a demonstration of an integrated prognostics and health management (IPHM) control system. This achievement is an important step toward the Air Force’s goal to develop operationally responsive space access capabilities. The team used a real-time, hardware-in-the-loop simulation to demonstrate the IPHM control system. Engineers integrated a space vehicle control surface actuator representation into a simulation environment. They evaluated the IPHM control system’s ability to compensate for simulated actuator failures, such as heat degradation and power loss. Throughout a variety of scenarios, including takeoff, flight, and landing, the IPHM system successfully compensated for all introduced failures. This evaluation was the first to demonstrate an IPHM control system’s ability to make flight-critical system adaptations. The IPHM system determines the operational status of a vehicle’s individual components and helps correct component failures. The AFRL claims that IPHM technology will aid in quick vehicle turnaround, as it readily diagnoses subsystem problems and component degradation and indicates required corrective measures. The U.S. Air Force is actively involved in the DoD SBIR program. The SBIR is funded at approximately $1.079 billion in FY 2005 and is made up of 10 participating components:

240

Prognostics and Health Management of Electronics

Army, Navy, Air Force, Missile Defense Agency (MDA), Defense Advanced Research Projects Agency (DARPA), Chemical Biological Defense (CBD), Special Operations Command (SOCOM), Defense Threat Reduction Agency (DTRA), National GeospatialIntelligence Agency (NGA), and the Office of Secretary of Defense (OSD). SBIR is the largest source of early-stage technology financing in the United States. Total federal SBIWSTTR funding in FY 2004 was $2 billion. The DoD accounts for nearly half of the total SBIWSTTR program. A list of SBIR’s in the area of EM and ePHM for 2004, 2005, and 2006 is given in Appendix A.

B.28.2 Related Publication 1.

M. Hoffman, “Air Force Research Laboratory-Current Journal of LoPistics, Winter 200 1.

Logistics Research,” Air Force

Appendix B

B.29

24 I

United States Army

The Army is the branch of the U.S. Armed Forces that has primary responsibility for land-based military operations. The U.S. Army intends to make extensive use of prognostics technology on weapons platforms, support vehicles, and munitions.

B.29.1 Approach to PHM Prognostics is the process of predicting the future state of a system. Prognostics systems include sensors, a data acquisition system, and microprocessor-based software to perform sensor fusion, analysis, and interpreting of results with little or no human intervention in real time or near real time. The U.S. Army intends to make extensive use of prognostics technology on weapons platforms, support vehicles, and even munitions. With technical assistance from Pacific Northwest National Laboratory (PNNL), the U.S. Army Logistics Integration Agency (USALIA) has been assessing the state-of-the-art in prognostics technology, extending the capabilities of the technology, and examining implementation approaches to provide maximum benefit to the Army from prognostics data. PNNL has investigated prognostics applications for several U.S. Army vehicles, performed research on methods for real-time, on-board prognostics/engine life expectancy forecasting, and developed a feasibility demonstration prototype on-board PHM system for the gas turbine engine used on the MI Abrams tank. Research was performed on methods for real-time, on-board prognostics/engine life expectancy forecasting, and a prototype system was designed, developed, and installed on several test tanks. Comprising a set of sensors mounted on the turbine engine, a data acquisition system, and a computer to process the information, the system uses artificial neural networks, rule-based algorithms, and predictive trend analyses to diagnose and predict engine conditions. In the analysis conducted for the M1 Abrams tank AGT1500 gas turbine engine, several different deployment options were considered. For example, different collections of sensors were evaluated based on the value of the data provided by the candidate sensors (toward diagnosing and predicting failures) and the cost and ease of installation of the sensors. This analysis led to the development of an initial prototype system called TEDANN (Turbine Engine Diagnostics Using Artificial Neural Networks), which was later expanded to incorporate prognostics and a larger number of potential failuresifaults-the new version was called REDI-PRO (Real Time Engine Diagnostics-Prognostics). REDI-PRO receives input from 38 sensors mounted on the AGT1500 engine. Of these sensors, 25 are factory installed for engine control and basic diagnostics performed by the engine control unit. The other 13 sensors-retrofitted to the engine using a wiring harnessinclude pressure sensors, temperature sensors, and vibration sensors located at strategic points on the engine to provide a more detailed thermodynamic picture of the engine’s state. The REDI-PRO system architecture includes real-time data acquisition, sensor validation, digital signal processing of vibration waveforms, engine health analyses using artificial neural networks and rule-based algorithms, and prognostics analyses that perform engine health forecasting. While the artificial neural networks and rule-based diagnostic algorithms are specific to the AGTl500 engine, the methods and architecture employed in REDI-PRO have general applicability, as do the life expectancy forecasting methods employed for prognostic analysis. The REDI-PRO is smaller than a briefcase and contains electronics for data acquisition, custom-made signal conditioning, and a inultipurpose computer system. REDI-PRO has

242

Prognostics and Health Management of Electronics

undergone a limited amount of field testing. At the present time, the possible integration of the REDI-PRO on-board prognostics capability into the M1 Abrams tank is very much uncertain. Furthermore, the Army Materiel Command’s (AMC’s) Army Research Office at Research Triangle Park, North Carolina, is leading the Army’s efforts to develop sensors that detect failures in structures and mechanical systems. Some of its previous work includes health monitoring of planetary gear-train damage using piezoelectric sensors, fiber-optic sensors for smart structures, controlled vibratory response of damage in composite structures, cross-over monitoring of a traversing bridge, impedance-based qualitative health monitoring, MEMS-based smart gas turbine engines. and application of MEMS technology to intelligent turbine engines. Another PHM program the Army is working on is called the Army Diagnostic Improvement Program, or ADIP. This program is aimed at improving the diagnostics and prognostics of all Army weapon systems and equipment by the application of common technologies across multiple systems. ADIP addresses all Army commodities and systems. In fact, it addresses more types of equipment than any other military program and is the broadest in scope of DoD’s legacy equipment maintenance improvement programs. ADIP has three time-phased “thrusts” grouped according to the time frame required for implementation: Short term: immediate technology insertion programs to improve diagnostics Mid term: to develop anticipatory maintenance capability in ground vehicles and helicopters Long term: to develop an embedded diagnostics proof-of-concept for a common architecture and approach (similar to the JSF PHM embedded architecture design goals) The ADIP concept is to access on-board data using a portable maintenance aid (PMA) as the primary data collection and communication tool. The PMA runs sensor health checks, and the sensor-coupled Interactive Electronic Technical Manual (IETM) automatically collects the data and transmits it to Global Combat Support System-Army. The PMA reads the vehicle sensors either directly via a common multiple-pin connector or by connecting to the data bus and capturing the data from sensors that have been processed by the on-board engine control unit. The PMA collects the sensor data through a stand-alone process known as a health check or it selectively interrogates only those sensors appropriate to a troubleshooting session for a known symptom using the sensor-coupled IETM. Once collected, the sensor data are stored in a database for subsequent trend analysis.

B.29.2 Condition-Based Maintenance Plus (CBM+) The CBM is a plan that provides broad guidance, measurable milestones, and the vision for aviation to transition to a CBM+ program by the end of fiscal year 2015. At the strategic level, CBM+ is a set of maintenance actions based on real-time or near-real-time assessment of equipment status, obtained from embedded sensors and/or external measurements or tests performed by man-portable equipment. Data collected from health usage monitoring system or man-portable equipment are then translated into predictive trends and metrics, which are capable of anticipating when component failures will occur based on the actual operating environment.

Appendix B

243

At the enterprise level, this predictive approach allows for anticipatory logistic, the ability to proactively acquire and deliver requisite spare parts needed to perform maintenance, prior to component failure. At the operationalitactical level, CBM+ is the ability to translate aircraft condition data and usage into proactive maintenance actions that enable unit maintenance personnel to achieve and maintain higher aircraft operational availability. It also gives commanders a mission planning tool. Through the use of prognostics, the aircraft predicts its remaining mission availability and/or time to failure, providing valuable information for the commander to determine which aircraft are ready for battle and aircraft that require maintenance. The vision of CBM+ is to enable the future of Army aviation with predictive and proactive maintenance environment. Component replacement metrics are derived by Aviation Engineering Directorate (AED) engineers in concert with original equipment manufacturer (OEM) engineers based on environmentally adjusted actual weapons systedcomponent usage and system health data collected using embedded (on-board), automated health usage monitoring systems. These systems will enable the development of component replacement programs based on actual weapons systedcomponent usage and system health in a specific operating environment.

B.29.3 U.S. Army Materiel Systems Analysis Activity (AMSAA) AMSAA is an analysis organization of the U.S. Army. AMSAA’s overall goal is to provide soldiers with the best Army materiel possible. AMSAA supports the Army by conducting systems and engineering analyses to support decisions on technology, materiel acquisitions, and the designing, developing, and sustaining of Army weapon systems. AMSAA relies on highly skilled engineers, operations research analysts, mathematicians, and computer scientists to perform a wide range of critical analyses in support of the Army and Department of Defense. AMSAA directly supports the mission of the Research, Development, and Engineering Command (RDECOM) to get the right integrated technologies into the hands of warfighters quicker. AMSAA, with significant Aberdeen Test Center support, fielded CBM boxes, software, CBM templates, and thermal instrumentation in support of Operation Iraqi Freedom (OIF) and Army training centers. These fieldings allowed near-real-time assessment of vehicle operation, soldier thermal environment (safety), and vehicle health. In addition, AMSAA is working on predictive maintenance algorithms using both the maintenance and operating histories of vehicles. The on-board system that AMSAA has designed in conjunction with the Aberdeen Test Center at Aberdeen Proving Ground collects data from on board vehicle sensors, data bus, terrain sensors, and GPS and analyzes the data in order to determine vehicle condition. In phase 1 of implementing CBM, AMSAA identified appropriate hardware and software for an engineering development HUMS (EDHUMS) and completed initial in-theater installations of data acquisition systems. Phase 2 consisted of developing a robust military-grade EDHUMS, designing a data analysis process, testing EDHUMS in the continental U.S. training environment, and beginning to field EDHUMS in operational units outside the Continental U.S. AMSAA is currently working on developing an interim solution for the information management process using nCode’s Library software. Phase 3 is in process and consists of identifying a small, inexpensive focused HUMS (FHUMS), relative to the cost of the vehicle. The final phase includes integrating proven FHUMS hardware into platforms by OEMs at the time of manufacture or integrating the developed algorithms into other installed CBM hardware.

244

Prognostics and Health Management of Electronics

AMSAA has successfully demonstrated hardware and software capabilities, data quality checks, and rudimentary usage characterization. Many vehicles have been fully instrumented and data are being captured from over 80 analog channels, multiple SAE 5-1708 or SAE 5-1939 bus channels, and GPS sensors. AMSAA also collects acceleration data for identifying failure of components in the future and for a more accurate representation of what the terrain the vehicle experiences. Currently only one accelerometer is collecting data on the unsprung mass for the terrain identification algorithm. AMSAA is working with a sensor company to develop a robust, military-type environment sensor that is also cost effective and durable opposed to the current lab sensor. These vehicles have run over all Aberdeen Proving Ground (APG) test courses multiple times, which has provided detailed data for prognostic algorithm development. Aberdeen Testing Center (ATC) and AMSAA have also measured and analyzed data from 20 wheeled vehicles of three different types in Iraq for over a year. This has provided a great amount of usage data and operating parameters that will be extremely useful for improving fleet management, engineering design improvement, and optimizing testing. Also, the data are being aligned with maintenance records to identify specific prognostics algorithms. EDHUMS testing has been ongoing since June 2006. AMSAA has instrumented tactical wheeled vehicles at Army training centers as a test-bed for developing and testing the systems before fielding in support of OIF. Data are currently being collected, reduced. and analyzed for reporting to fleet managers, engineers, and maintainers. Usage characterization and initial versions of diagnosticiprognostic algorithms are installed and are being refined. Five EDHUMS systems were installed in support of OIF in December 2006 and five more systems were installed for OIF in February 2007. Some of the analyses that AMSAA has been able to provide include time in gear, fuel consumption, soldier thermal environment, time at speed, and some rudimentary terrain identification. AMSAA’s goal is to generate this information using algorithms on-board which will help reduce the quantity of data that is processed off line. Information can be provided as graphical displays, or a two-page vehicle usage summary report. which processes the data (using nCode Glyphworks software) into a useful information report.

B.29.4 U.S. Army Research Laboratory VTD/NASA Glenn Research Center The U.S. Army Research Laboratory (ARL) Vehicle Technology Directorate (VTD) has been colocated at the NASA Glenn Research Center (GRC) for the last 40 years. They, along with NASA’s Mechanical Components Branch of the Materials and Structures Division, have been performing research on drive systems for many years. With primary applications to rotorcraft, research goals have been to reduce weight, increase life, reduce noise, and reduce cost of drive systems, with particular attention paid to the gearing in drive systems. Since 1989, the group has had a concentrated effort in developing fault detection, diagnostics, and prognostics activities of gear systems. The objective of current diagnosticsiprognostics activities at ARL VTD/NASA Glenn is to develop advanced health management technologies to reliably and accurately detect and quantify damage of critical mechanical components in aerospace drive systems, with primary emphasis on the gearing in these drive systems. This is accomplished through an approach of developing multisensor intelligent systems integrating different measurement technologies/diagnostics, conducting basic failure progression research experiments, and correlation of results with flight data. Accomplishments have been in the areas of (1) tapered roller bearing damage detection tests, (2) development of a hybrid bearing test facility, (3) gear and bearing damage detection using debris particle distributions, and (4) planetary transmission fault detection.

Appendix B

245

TaDered roller bearing darnage detection tests: A diagnostic tool was developed for detecting fatigue damage of tapered roller bearings. Tapered roller bearings are used in helicopter transmissions and have potential for use in high-bypass advanced gas turbine aircraft engines. A diagnostic tool was developed and evaluated experimentally by collecting oil debris data from failure progression tests conducted using health monitoring hardware. Failure progression tests were performed with tapered roller bearings under simulated engine load conditions. Tests were performed on one healthy bearing and three predamaged bearings. During each test, data from an on-line, in-line, inductance-type oil debris sensor and three accelerometers were monitored and recorded for the occurrence of bearing failure. The bearing was removed and inspected periodically for damage progression throughout testing. Using data fusion techniques, two different monitoring technologies, oil debris analysis and vibration, were integrated into a health monitoring system for detecting bearing surface fatigue pitting damage. The data fusion diagnostic tool was evaluated during bearing failure progression tests under simulated engine load conditions. This integrated system showed improved detection of fatigue damage and health assessment of the tapered roller bearings as compared to using individual health monitoring technologies. Hybrid bearing test facility: A new hybrid bearing prognostic test rig to evaluate the performance of sensors and algorithms in predicting failures of rolling element bearings for aeronautics and space applications is being developed. The failure progression of both conventional and hybrid (ceramic rolling elements, metal races) bearings can be tested from fault initiation to total failure. The effects of different lubricants on bearing life can also be evaluated. Test conditions monitored and recorded during the test include load, oil temperature, vibration, and oil debris. New diagnostic research instrumentation will also be evaluated for hybrid bearing damage detection. Gear and bearing damage detection using debris particle distributions: A diagnostic tool was developed for detecting fatigue damage to spur gears, spiral bevel gears, and rolling element bearings. This diagnostic tool was developed and evaluated experimentally by collecting oil debris data from fatigue tests performed in the NASA Glenn Spur Gear Fatigue Rig, Spiral Bevel Gear Test Facility, and the 500-hp Helicopter Transmission Test Stand. During each test, data from an on-line, in-line, inductance-type oil debris sensor was monitored and recorded for the occurrence of pitting damage. Results indicate oil debris alone cannot discriminate between bearing and gear fatigue damage. Planetary transmission fault detection: A methodology for detecting and diagnosing gear faults in the planetary stage of a helicopter transmission is presented. This diagnostic technique is based on the constrained adaptive lifting algorithm. The lifting scheme, developed by Wim Sweldens of Bell Labs, is a time domain, prediction-error realization of the wavelet transform that allows for greater flexibility in the construction of wavelet bases. Classic lifting analyzes a given signal using wavelets derived from a single fundamental basis function. A number of researchers have proposed techniques for adding adaptivity to the lifting scheme, allowing the transform to choose from a set of fundamental bases the basis that best fits the signal. This characteristic is desirable for gear diagnostics as it allows the technique to tailor itself to a specific transmission by selecting a set of wavelets that best represent vibration signals obtained while the gearbox is operating under healthy-state conditions. However, constraints on

246

Prognostics and Health Management of Electronics

certain basis characteristics are necessary to enhance the detection of local waveform changes caused by certain types of gear damage. The proposed methodology analyzes individual tooth-mesh waveforms from a healthy-state gearbox vibration signal that was generated using the vibration separation (synchronous signal averaging) algorithm. Each waveform is separated into analysis domains using zeros of its slope and curvature. The bases selected in each analysis domain are chosen to minimize the prediction error and constrained to have the same-sign local slope and curvature as the original signal. The resulting set of bases is used to analyze future-state vibration signals and the lifting prediction error is inspected. The constraints allow the transform to effectively adapt to global amplitude changes, yielding small prediction errors. However, local waveform changes associated with certain types of gear damage are poorly adapted, causing a significant change in the prediction error. Current activities by the ARL VTD and the NASA GRC Mechanical Components Branch are in the following areas: (1) vibration algorithm threshold assessment on test rig and flight data, ( 2 ) face gear fault detection, (3) OH-58 transmission multisensor fault detection, and (4) support of U.S. Army CBM program. Vibration algorithm threshold assessment on test rig and flight data: This project is sponsored by the Federal Aviation Administration (FAA) through a space act agreement with the FAA and NASA. The objective of the work is to define a method of defining thresholds for transmission diagnostic algorithms that provides the minimum number of false alarms while maintaining sensitivity to transmission damage. Due to limited damage data in flight, diagnostic tool damage detection capabilities must be developed in controlled ground test environments. Defining thresholds for different types of failures is required for predicting future helicopter component failures. The objective of this research is to assess the performance of vibration based diagnostic tools on both flight data and test rig data using thresholds defined in a test rig environment. The threshold assessment will be applied to experimental data collected in NASA GRC test rigs under normal conditions and damage progression conditions and to data collected under normal conditions on a helicopter. Statistical approaches will be reviewed to determine if the signature data from the rig resembled the data from the aircraft during normal conditions. Test rig and flight data will be analyzed and compared by plotting the estimated probabilities of the relative frequency plots (PDF) using histograms determined by minimum and maximum algorithm values. Hypothesis testing and a decision matrix will be identified to define possible fault detectiodfalse alarm combinations. Probability of false alarms will be compared with the probability of detection to determine the optimum thresholds for the algorithms. Face gear fault detection: A few years ago, the U.S. Army funded a study named the Advanced Rotorcraft Transmission Program (ART I) to investigate improved concepts to reduce helicopter drive system weight and noise and increase life. From that, the use of face gears in transmissions emerged as a way to meet these goals. Face gear research has been performed by the ARL VTD for the past 15 years. Current research efforts have been in support of incorporating face gears in future AH-64 Apache helicopters. Recently, experimental fatigue tests were performed at the GRC to determine the surface durability life of a face gear in mesh with a tapered spur involute pinion. Twenty-four sets of gears were tested. State-of-the-art vibration-based gear fault detection instrumentation was installed for these tests. Current efforts are underway to determine the effectiveness of these modern techniques in detecting face gear failures.

Appendix B

247

OH-58 transmission multisensor fault detection: This project is a cooperative effort between ARL VTD, NASA Glenn, and the University of Maryland to study transmission gear and bearing fault detection capabilities in an actual helicopter transmission. Experimental testing will be performed in an OH-58A helicopter main-rotor transmission using the NASA Glenn 500-hp helicopter transmission test facility. A variety of tests are planned to validate various fault detection schemes. First, tests on new, undamaged components will be performed to establish a baseline signature for “healthy” parts. Second, experiments using previously tested components will be performed. These components have a variety of faults such as gear and bearing fatigue spalls. The tests will be used to validate the fault detection capabilities. Lastly, seeded fault and accelerated fatigue tests will be performed to check the effectiveness of the fault detection schemes using multiple senors. Transmission vibration, acoustics, acoustic intensity, transmission error (via accelerometers), and oil debris monitoring will be recorded and processed during the tests. These data will be used on-line with a PC-based health monitoring system to detect fault detection and progression during the tests. Support of U.S. Army CBM program: The transition from the Army aviation’s current methods for performing scheduled and unscheduled maintenance to a CBM method of operation by fiscal years 2013-2015 is one of its most challenging initiatives. This transition is contingent upon incorporating advanced technology into existing Army air items and embedding these capabilities in future new or major modifications to existing systems. This technology insertion will lead to improved system reliability and safety and thus increase operational availability. Tests were recently completed at the GRC to support the Army aviation CBM program. The scope of this work is a series of tests of five bearing used for the Apache tail hanger assembly. The bearings used for testing were purchased via GSA source and were considered as being in a “new” condition as received from the vendor. Shakedown testing was done to properly configure the test rig and a series of tests runs were performed to record vibration signatures of these bearings. The bearings will be provided to the University of South Carolina for additional CBM-related testing. The vibration data from this series of tests is intended to establish a baseline for comparing the vibration signature data from bearings having laboratory-seeded faulty conditions.

B.29.5 Related Publications H.Luo, H.Rodriguez, D. Hallman, and D. G. Lewicki, “A Resonant Synchronous Vibration Based Approach for Rotor Imbalance Detection,” NASA TM-2006-214335, AIAA-2006-1966, ARL-TR-3908, August 2006. P. J. Dempsey, G. Kreider, and T. Fichter, “Tapered Roller Bearing Damage Detection Using Decision Fusion Analysis,” NASA TM-2006-214380, July 2006. P. J. Dempsey, G. Kreider, and T. Fichter, “Investigation of Tapered Roller Bearing Damage Detection Using Oil Debris Analysis,” NASA TM-2006-2 14082, February 2006.

J. J. Zakrajsek, P. J. Dempsey, E. M. Huff, M. Augustin, R. Safa-Bakhsh, A. Duke, P. Ephraim, P. Grabil, and H. J. Decker, “Rotorcraft Health Management Issues and Challenges,” NASA TM-2006-2 14022, February 2006.

248

Prognostics and Health Management of Electronics

5.

P. J. Dempsey, J. M. Certo, R. F. Handschuh, and F. Dimofte, “Hybrid Bearing Prognostic Test Rig,” NASA TM-2005-2 13597, ARL-TR-3454, August 2005.

6.

P. J. Dempsey, D. G. Lewicki, and H. J. Decker, “Transmission Bearing Damage Detection Using Decision Fusion Analysis,” NASA TM-2004-213382, ARL-TR-3328, November 2004.

7.

Army Aviation, “Condition Based Maintenance Plus (CBM+) Plan,” U.S. Army Technical Report, November 29, 2004, available: http://www.acq.osd.mil/log/mppr/ cbm+lArmy/Aviation%2OCBM+%2OPLAN%2029%20NOV%2004v6.doc, accessed January 23,2006.

8. P. J. Dempsey, J. M. Certo, and W. Morales, “Current Status of Hybrid Bearing Damage Detection,” NASA TM-2004-212882, ARL-TR-3 1 19, June 2004. 9.

P. J. Dempsey, D. G. Lewicki, and H. J. Decker, “Investigation of Gear and Bearing Fatigue Damage Using Debris Particle Distributions,” NASA TM-2004-2 12883, ARL-TR-3133, May 2004.

10. P. D. Samuel, J. K. Conroy, and D. J. Pines, “Planetary Transmission Diagnostics,” NASA CR-2004-213068, May 2004. 1 1 P. J. Dempsey, M. Mosher, and E. M. Huff, “Threshold Assessment of Gear Diagnostic Tools on Flight and Test Rig Data,” NASA TM-2003-212220-REV1, August 2003.

12. H. J. Decker and D. G. Lewicki, “Spiral Bevel Pinion Crack Detection in a Helicopter Gearbox,” NASA TM-2003-2 12327, ARL-TR-2958, June 2003. 13. H. J. Decker, “Effects on Diagnostic Parameters after Removing Additional Synchronous Gear Meshes,” NASA TM-2003-2123 12, ARL-TR-2933, April 2003. 14. P. J. Dempsey, “Integrating Oil Debris and Vibration Measurements for Intelligent Machine Health Monitoring,” NASAITM 2003-2 11307, March 2003. 15. P. J. Dempsey, W. Morales, and A.A. Afjeh, “Investigation of Spur Gear Fatigue Damage Using Wear Debris,” Journal of the Society of Tribologists and Lubrication Engineers, Vo1.58, No. 11, pp. 18-22, November 2002. 16. M. L. Araiza, R. Kent, and R. Espinosa, “Real-Time, Embedded Diagnostics and Prognostics in Advanced Artillery Systems,” Proceedings of the 2002 IEEE Autotestcon Conference, October 2002, pp. 8 18 - 84 1. 17. P. J. Dempsey and A.A. Afjeh, “Integrating Oil Debris and Vibration Gear Damage Detection Technologies Using Fuzzy Logic,” NASA TM-2002-2 1 1 126, July 2002, paper presented at the 58th Annual Forum of the American Helicopter Society. Montreal, Quebec, June 11-13,2002. 18. H. J. Decker, “Gear Crack Detection Using Tooth Analysis,” NASA TM-2002-211491. ARL-TR-2681, Apr 2002, Presented at 56th Mtg. sponsored by Society for Machinery Failure Prevention Technology, Virginia Beach, VA, Apr 15- 19,2002. 19. H.J. Decker, “Crack Detection for Aerospace Quality Spur Gears,” NASA TM-200221 1492, ARL-TR-2682, April 2002, paper presented at the 58th Annual Forum of the American Helicopter Society, Montreal, Quebec, June 11-1 3,2002. 20. P.J. Dempsey, R.F. Handschuh, and A.A. Afjeh “Spiral Bevel Gear Damage Detection Using Decision Fusion Analysis,” NASA TM-2002-2 1 1814, ARL-TR-2744, August

Appendix B

249

2002, paper presented at the 5th International Conference on Information Fusion, Annapolis, MD, July 8-1 1,2002. 21. A. Suryavanashi, S. Wang, R. Gao, K. Danai, and D.G. Lewicki “Condition Monitoring of Helicopter Gearboxes by Embedded Sensing,” Proceedings of the 58th American Helicopter Society International Forum, June 2002. 22. L. Liu, and D.J Pines, “Fault Detection Sensitivity of Spur Gear Design Parameters to a Tooth Crack,” Proceedings of the 58th American Helicopter Society International Forum, June 2002. 23. F. Greitzer, “Embedded Prognostics Health Monitoring,” paper presented at the International Instrumentation Symposium, Embedded Health Monitoring Workshop, San Diego, CA, May 2002. 24. A.H. Pryor, M. Mosher, and D.G. Lewicki, “The Application of Time-Frequency Methods to HUMS,” Proceedings of the 57th American Helicopter Society International Forum, Washington, D.C., May 2001. 25. P. Grabill. J. Berry, L. Grant, and J. Porter, “Automated Helicopter Vibration Diagnostics for the US Army and National Guard,” Proceedings of the 2001 American Helicopter Society 57thAnnual Forum, Washington, D.C., May 2001. 26. P.J. Dempsey and J.J. Zakrajsek, “Minimizing Load Effects on NA4 Gear Vibration Diagnostic Parameter,” NASA TM-2001-210671, paper presented at the 55th Meeting of Society for Machinery Failure Prevention Technology, Virginia Beach, VA, April 25,2001. 27. P. J. Dempsey, “Gear Damage Detection Using Oil Debris Analysis,” NASA TM-2001210936, ARL-TR-1104, October 2001. 28. P.D. Samuel, D.J. Pines, and D.G Lewicki, “A Comparison of Stationary and Non-Stationary Metrics for Detection Faults in Helicopter Gearboxes,” Journal of the American Helicopter Society, Vol. 45, No. 2, pp. 125-136, April 2000. 29. P. J. Dempsey, “A Comparison of Vibration and Oil Debris Gear Damage Detection Methods Applied to Pitting Damage,” NASA TM-2000-21037 1 , September 2000. 30. K. Wang, D. Yang, K. Danai, and D.G Lewicki “Model-Based Selection of Accelerometer Locations for Helicopter Gearbox Monitoring,” Journal of the American Helicopter Society, Vol. 44, No. 3, pp. 269-275, October 1999. 3 1. F. Greitzer, L. Kangas, and K. Terrones, “Gas Turbine Engine Health Monitoring and Prognostics,” paper presented at the International Society of Logistics (SOLE) 1999 Symposium, Las Vegas, NV, August 30--September 2, 1999. 32. H.J. Decker and J.J. Zakrajsek, “Comparison of Interpolation Methods as Applied to Time Synchronous Averaging,” NASA TM- 1999-209086, ARL-TR- 1960, paper presented at the 53rd Meeting of the Society for Machine Failure Prevention Technology, Virginia Beach, VA, April 19-22, 1999. 33, V.B. Jammu, K. Danai, and D.G. Lewicki, “Structure-Based Connectionist Network for Fault Diagnosis of Helicopter Gearboxes,” ASME Journal of Mechanical Design, Vol. 120, No. 1, pp. 100-105, March 1998. 34. V.B. Jammu, K. Danai, and D.G. Lewicki,. “Experimental Evaluation of a Structure-Based Connectionist Network for Fault Diagnosis of Helicopter Gearboxes,” ASME Journal of Mechanical Design, Vol. 120, No. 1, pp. 106-1 12, March 1998.

250

Prognostics and Health Management of Electronics

35. J.J. Zakrajsek and D.G. Lewicki, “Detecting Gear Tooth Fatigue Cracks in Advance of Complete Fracture,” TriboTest Journal, Vol. 4, No. 4, pp. 407-422 , June 1998. 36. P.D Samuel, D.J. Pines, and D.G. Lewicki, “Fault Detection in the OH-58A Main Transmission Using the Wavelet Transform,” Proceedings of the 52nd Meeting of the Society for Mechanical Failure Prevention Technology, March/April 1998, pp. 323335. 37. P.D. Samuel, D.J. Pines, and D.G. Lewicki, “A Comparison of Stationary and NonStationary Transforms for Early Fault Detection in the OH-58A Main Transmission,” Proceedings of the 54th American Helicopter Society International Forum, May 1998, pp. 867-887. 38. J. Cronkhite, B. Dickson, W. Martin, and G. Collingwood, “Operational Evaluation of a Health and Usage Monitoring System (HUMS),” NASA-CR- 1998-207409, April 1998. 39. V.B. Jammu and K. Danai. “Diagnostic Analyzer for Gearboxes (DAG): Guide,” NASA-CR-4762, ARL CR 333, January 1997.

User’s

40. J.J. Zakrajsek and D.G Lewicki, “Detecting Gear Tooth Fatigue Cracks in Advance of Complete Fracture,” NASA-TM- 107 145, ARL-TR-970, paper presented at the Integrated Monitoring, Diagnostics & Failure Prevention Technology Showcase, Society for Machinery Failure Prevention Technology, Mobile, AL, April 22-26, 1996. 41. V.B. Jammu., K. Wang, K. Danai, and D.G. Lewicki, “Model-Based Sensor Location Selection for Helicopter Gearbox Monitoring,” NASA-TM- 1072 19, ARL TR 1099, paper presented at the Integrated Monitoring, Diagnostics & Failure Prevention Technology Showcase, Society for Machinery Failure Prevention Technology, Mobile, AL, April 22-26, 1996. 42. D.P. Townsend, J.J. Zakrajsek, and R.F. Handschuh, “Evaluation of a Vibration Diagnostic System for the Detection of Spiral Bevel Gear Pitting Failures,” NASA TM 107228, ARL-TR- 1 106, paper presented at ASME 7th International Power Transmission and Gearing Conference, San Diego, CA, October 6-9, 1996. 43. V.B. Jammu, K. Danai, and D.G. Lewicki, “Improving the Performance of the Structure-Based Connectionist Network for Diagnosis of Helicopter Gearboxes,” NASA TM 107408, ARL TR 1285, paper presented at the Intelligent Ships Symposium, Philadelphia, PA, November 25-26, 1996. 44. R. Romero, H. Summers, and J. Cronkhite, “Feasibility Study of a Rotorcraft Health and Usage Monitoring System (HUMS): Results of Operator’s Evaluation,” NASA-CR 198446, ARL-CR-289, February 1996. 45. B. Dickson, J. Cronkhite, S. Bielefeld, L. Killian, and R. Hayden, “Feasibility Study of a Rotorcraft Health and Usage Monitoring System (HUMS): Usage and Structural Life Monitoring Evaluation,” NASA-CR 198447, ARL-CR-290, February 1996. 46. F.K. Choy, V. Polyshchuk, J.J. Zakrajsek, R.F. Handschuh, and D.P.Townsend, “Analysis of the Effects of Surface Pitting and Wear on the Vibration of a Gear Transmission System,” Tribolom International, Vol. 29, No. 1, pp. 77-83, 1996. 47. F.K. Choy, S. Huang, J.J. Zakrajsek., R.F. Handschuh, and D.P. Townsend, “Vibration Signature Analysis of a Faulted Gear Transmission System,” AIAA-94-2937, Journal of Propulsion and Power, Vol. 12. No. 2, pp. 289-295, MarchiApril 1996.

Appendix B

25 1

48. V.B. Jammu, K. Danai, and D.G Lewicki, “Unsupervised Pattern Classifier for Abnormality-Scaling of Vibration Features for Helicopter Gearbox Fault Diagnosis,” Machine Vibration, Vol 5, pp. 154-162, 1996 49. J.J. Zakrajsek, R.F. Handschuh, D.G Lewicki, and H.J. Decker, “Detecting Gear Tooth Fracture in a High Contact Ratio Face Gear Mesh,” NASA-TM-106822, ARL-TR-600, paper presented at the 49th Meeting Society for Machinery Failure Prevention Technology, Virginia Beach, VA, April 18-20 1995. 50. J.J. Zakrajsek, D.P. Townsend, D.G Lewicki., H.J. Decker, and R.F. Handschuh, “Transmission Diagnostic Research at NASA Lewis Research Center,” NASA-TM10690 I , ARL-TR-748, paper presented at the Institution of Mechanical Engineers 2nd International Conference on Gearbox Noise, Vibration and Diagnostics, London, November 16-17, 1995. 5 I . V.B. Jammu, K. Danai, and D.G Lewicki, “Diagnosis of Helicopter Gearboxes Using Structure-Based Networks,” NASA-TM-I 06932, ATL-TR-761, paper presented at the 1995 Symposium on Applied Monitoring & Diagnostics, Seattle, WA, June 21-23, 1995. 52. H. Chin, K. Danai, and D.G Lewicki, “Fault Detection of Helicopter Gearboxes Using the Multi-Valued Influence Matrix Method,” ASME Journal of Mechanical Design, Vol. 117. No. 2, pp. 248-253, June 1995. 53. J.J. Zakrajsek, R.F. Handschuh, and H.J. Decker, “Application of Fault Detection Techniques to Spiral Bevel Gear Fatigue Data,” NASA TM 106467, ARL-TR 345, paper presented at the 48th Meeting of Mechanical Failure Prevention Group, of Naval Research, Wakefield, MA, April 19-21, 1994. 54. H.J. Decker, R.F. Handschuh, and J.J. Zakrajsek, “An Enhancement to the NH4 Gear Vibration Diagnostic Parameter,” NASA TM 106553, ARL-TR 389, paper presented at the 18th Annual Meeting Sponsored by the Vibration Institute, Hershey, PA, June 2023, 1994. 55. F.K. Choy, S. Huang, J.J. Zakrajsek, R.F. Handschuh, and D.P. Townsend, “Vibration Signature Analysis of a Faulted Gear Transmission System,” NASA TM-106623, ARLTR-475, AIAA-94-2937, paper presented at the 30th AIAAIASMEISAEIASEE Joint Propulsion Conference, Indianapolis, IN, June 27-29, 1994. 56. F.K. Choy, V. Polyshchuk, J.J. Zakrajsek, R.F. Handschuh, and D.P. Townsend, “Analysis of the Effects of Surface Pitting and Wear on the Vibrations of a Gear Transmission System,” NASA TM- 106678, ARL-TR-520, paper presented at the AUSTRIB ’94, Perth, Australia, December 5-8, 1994. 57. F.K. Choy, Y.F. Ruan, J.J. Zakrajsek, D.P. Townsend,., F.B. Oswald, and H.J. Decker, “Modal Simulation of Gearbox Vibration With Experimental Correlation,” Journal of Propulsion and Power, Vol. 9. No. 2, pp. 301-306, MarchiApril 1993. 58. H. Chin, K. Danai, and D.G Lewicki.: “Pattern Classifier for Fault Diagnosis of Helicopter Gearboxes,” Control Enpineering Practice, Vol. 1, No. 5, pp. 771-778, 1993. 59. J.J. Zakrajsek, D.P. Townsend, and H.J. Decker, “An Analysis of Gear Fault Detection Methods as Applied to Pitting Fatigue Failure Data,” NASA TM 105950, AVSCOM TR 92 C 035, paper presented at the 47th Meeting of Mechanical Failure Prevention Group, sponsored by Office of Naval Research, Virginia Beach, VA, April 13- 15, 1993.

252

Prognostics and Health Management of Electronics

60. H. Chin, K. Danai, and D.G Lewicki, “Pattern Classifier for Health Monitoring of Helicopter Gearboxes,” NASA TM 106099, AVSCOM TR 92-C-033, paper presented at the 47th Meeting of Mechanical Failure Prevention Group, sponsored by Office of Naval Research, Virginia Beach, VA, April 13-1 5, 1993. 61, H. Chin, K. Danai, and D.G Lewicki, “Fault Detection of Helicopter Gearboxes Using the Multi-Valued Influence Matrix Method,” NASA TM 106100, AVSCOM TR 92-C015, March 1993. 62. D.P. Townsend, and J.J. Zakrajsek, “Evaluation of a Vibration Diagnostic System for the Detection of Spur Gear Pitting Failures,” NASA TM 106103, AX-TR-11, AIAA93-2298, paper presented at the AIAAISAEIASMEIASEE 29th Joint Propulsion Conference, Monterey, CA, June 28-30, 1993. 63. H. Chin, K. Danai, and D.G Lewicki, “Efficient Fault Diagnosis of Helicopter Gearboxes.” SASA TM 106253, AVSCOM TR 92-C-034, paper presented at the 12th World Congress, International Federation of Automatic Control, Sydney, Australia, July 19-23. 1993. 64. J.J. Zakrajsek, D.P. Townsend, F.B. Oswald, and H.J. Decker, “Analysis and Modification of a Single-Mesh Gear Fatigue Rig for Use in Diagnostic Studies,” NASA TM 105416, AVSCOM TR 91-C-049, May 1992. 65. D.G Lewicki, D.M. Blanchette, and G. Biron, “Evaluation of an Oil-Debris Monitoring Device for Use in Helicopter Transmissions,” NASA TM 105830, AVSCOM TR 92-C-007, August 1992. 66. J.J. Zakrajsek, “An Investigation of Gear Mesh Failure Prediction Techniques,” NASA TM 102340, AVSCOM TM 89-C-005, November, 1989. 67. Technology Integration, Inc., “Gearbox Vibration Diagnostic Analyzer,” Final Report for Contract NAS3-26134, April 1992.

Appendix B

B.30

253

United States Navy

The U.S. Navy is the branch of the U.S. Armed Forces primarily designated for naval warfare operations. It includes operations conducted by surface vessels (ships), submarine vessels, and sea-borne aviation as well as ancillary support, communications, training, and other fields. The Navy currently consists of 281 ships, including a fleet of 12 aircraft carriers, and over 4000 aircraft.

B.30.1 Approach to PHM With support from the Defense Advanced Research Projects Agency Prognosis Seedling Effort, a U.S. Navy strategy was launched, in 2004, to develop, integrate, and demonstrate diagnostics, prognostics, health monitoring, and life management for propulsion and mechanical systems. The SH-60 Seahawk helicopter program was initiated as the first proof-of-concept effort to develop, demonstrate, and integrate available and advanced mechanical diagnostic technologies for propulsion and power-drive system monitoring. Included in these technologies were various rule- and model-based analysis techniques to demonstrate and validate diagnostic and trending capabilities. Since then, there has been increased emphasis on prognostic capabilities, which provide early detection of the precursor and/or incipient fault condition to a component or subelement failure condition and can manage and predict the progression of this fault condition to the component failure. The purpose is to increase safety and significantly reduce supportability costs over the aircraft life cycle. The Naval Surface Warfare Center is working together with the U.S. Army on a project called Joint Advanced Health and Usage Monitoring System (JAHUMS) Advanced Concept Technology Demonstration (ACTD). The goal of JAHUMS is to demonstrate and validate the operational value of advanced HUMS technology and to demonstrate and validate an open systems implementation approach for HUMS. The JAHUMS consists of a main processor unit, a data transfer unit, a remote data concentrator, an on-board display, and a ground station. It will provide in-flight monitoring of rotor, engine, and mechanical systems as well as ground-based diagnostics and maintenance management system interface. JAHUMS development testing is being carried out at the Naval Air Warfare Center in Patuxent River, Maryland, and Ft. Rucker, Alabama. The JAHUMS technology is currently being used on two Navy SH-60B’s and will soon be installed on additional NavyiArmy aircraft to support operational value assessment and data collection. The Navy’s Integrated Condition Assessment System (ICAS), as the program of record for shipboard machinery health monitoring, uses a modified commercial off-the-shelf product for PHM maintenance management. ICAS is a Microsoft Windows NT-based maintenance program that combines performance monitoring techniques with computerized maintenance management. It uses graphic diagrams to display machines, systems, and sensors and monitors and predicts machinery failure modes by comparing on-line, portable, and manual sensor data to established engineering performance criteria. If a machine’s actual performance violates a specified limit, an alarm is activated. ICAS automatically logs performance data, stores it in a database folder for future evaluations, and alerts the operator with a visual or audible message. Currently, ICAS prototype systems are installed aboard the USS Virginia, USS Norfolk, and USS Memphis. In addition, the Naval Sea Systems Command (NAVSEA 08) has authorized the use of ICAS for monitoring primary and secondary plants on submarines and nuclear aircraft carriers. The Navy’s FORCEnet was created to develop both the architecture by which Navy joint and coalition force systems communicate and the logistics to support such a network.

Prognostics and Health Management of Electronics

254

By looking across warfare mission areas to identify and maximize current capabilities, while adapting network-enabled delivery of tactical data in a secure environment, the Navy expects to effectively deliver multiple sources of collected information in a collaborative, at sea environment. In layman’s terms, this translates to a communications system employed and delivered to each participating platform within a combat scenario whether joint (U.S. forces), NATO, or coalition. To assemble and disseminate information on EM and PHM, the Naval Air Systems Command (NAVAIR) created the Prognostics, Advanced Diagnostics and Health Management (PADHM) system, an on-line collaborative environment for exchange of ideas among all military branches and the private sector. PADHM also serves as a central repository of desired information. It supports the PHM process of determining the state of a component to perform its functions and supports the capability to make appropriate decisions about maintenance actions based on diagnosticsiprognostics information, available resources, and operational demand. The PADHM system provides users the ability to share files, presentations, documentation, and other relevant materials in a secure environment. The U.S. Navy is actively involved in the DoD SBIR program. The SBIR, funded at approximately $1.079 billion in FY 2005, is made up of 10 participating components: Army, Navy, Air Force, Missile Defense Agency (MDA), Defense Advanced Research Projects Agency (DARPA), Chemical Biological Defense (CBD), Special Operations Command (SOCOM), Defense Threat Reduction Agency (DTRA), National Geospatial-Intelligence Agency (NGA), and the Office of Secretary of Defense (OSD). SBIR is the largest source of early-stage technology financing in the United States. Total Federal SBIWSTTR funding in FY 2004 was S2 billion. The DoD accounts for nearly half of the total SBIWSTTR program. A list of SBIR in the area of EM and ePHM for 2004, 2005, and 2006 is given in appendix A.

B.30.2 Related Publications 1. B. Finley and E.A Schneider, “ICAS: The Center of Diagnostics and Prognostics for the United States Navy,” Proceedings of the SPIE-The International Society for Optical Engineering, Vol. 4389, pp. 186-193, 2001.

2.

M. DiUlio, C. Savage, B. Finley, and E. Schneider, “Taking the Integrated Condition Assessment System to the Year 2010,” Proceedings of the 13th International Ship Control Systems Symposium (SCSS), Orlando, FL, April 2003.

3.

C. Savage, M. DiUlio, B. Finley, K. Krooner, P. Martinez, and P Horton, “Enterprise Remote Monitoring (ICAS & Distance Support), Tomorrow’s Vision Being Executed Every Day,” Proceedings of the 2005 American Society of Naval Engineers (ASNE) Fleet Maintenance Svmposium, August 30-September 1,2005.

Appendix B

B.31

255

China

Prognostics and health management is a relatively new concept in China, although it was gradually applied to military aircraft and civil helicopters during the late 1990s and early 2000s in the United States and Europe. International competition in advanced science and technology has led the Chinese government to pressure scientists in universities and research institutions to keep close track of the latest high-tech PHM publications from around the world and to painstakingly translate foreign publications into Chinese. The Chinese struggle for parity in high-tech fields has led to the vigorous introduction of PHM into the country in the early 2000s. Although the concept of PHM is still new in China, research on health monitoring, fault diagnostics, and fault prediction has been conducted for many years. The early applications of modem PHM in China have applied mainly to avionics. Software tools monitoring plane engines have been created, but the integration of PHM systems is still rather scarce. In this section we discuss the history of PHM in China, the people currently conducting research on PHM, and the barriers to implementing prognostics in China.

B.31.1 History and Effects of PHM in China A typical PHM system architecture is shown in Figure B. 4 and used as a guideline for successive discussions.

I Data Manipulation Pre-processing

i

1 Condition

I

N

I

Feature Extraction Signal Charact

7-

Sn

Monitor

1

L----

Thresholds 9

:uzzy Logic

Prognostics

Health Assessment Component Specific Feature 'Extraction

1 Anomaly3 ~

1

Diagnostics Reasoners

Automatic Decision Reasoning

' Feature-Based ~

Prognostics

Y Model-Based Prognostics

I

r-Human

Data I

*I Computer 1

bFusion

Interface Classifier Response Generator

i >

~

-c--

Figure B. 4: Conceptual PHM system functional architecture. An investigation into the Total Chinese Journal Database, which comprises more than 8200 Chinese journals and lists about 25 million papers, was conducted to assess the key PHM references. The relevant papers focus on science, engineering, agriculture, military, and medical science.

B.31.2

Data Acquisition and Transmission

Data acquisition collects the necessary information for consecutive data manipulation in PHM. Among the data acquisition techniques, the use of sensors is a key approach. Sensor publications in China first appeared in the mid-1950s. The earliest paper involved using capacitance changeable sensors to test vibration frequencies on the oriented impeller blade

Prognostics and Health Management of Electronics

256

[ 11. From 1950 to 1970, no more than 10 papers were published to study sensors every year, but from the mid-1970s on, hundreds of papers on sensors were published each year. In 2006, more than 10,000 papers discussed research on and applications for sensors. Figure B. 5 shows this trend. The earlier Chinese papers on sensors dealt primarily with tracing and introducing research and applications for foreign sensors, including the force-electric sensor, thermal-electric sensor, light-electric sensor, sound-electric sensor, capacitance sensor, and inductance sensor. These include, for example, resistance wire sensors [2, 31, pressure sensors [4, 51, palstance sensor [6], fluid speed sensors [7, 81, small displacement sensors [9], mechanical sensors [lo], load sensors [ 1 11, inductance sensors [ 121, vibration sensors [13, 141, stress chip sensors [15, 161, temperature sensors [17], raster pattern sensors [18], gas speed sensors [19, 201, jet current sensors [21], solid image sensors [22], probe [23], thermal flux sensors [24], sensor arrays [25], high-temperature accelerated sensors [26, 271, semiconductor stress chip sensors [28], precise digital optics sensors [29], solid film sensors [30], electromagnetic sensors [3 11, thermal-electric sensors [32], electronic sensors [33], infrared sensors [34], electro-vortex sensors [35], fiber optic sensors [36, 371, ceramic humidity sensors [38], high-temperature hull gallium arsenide sensors [39], and optic no-contact sensors [40]. With the maturation of sensor technologies, sensors have become smaller, more precise, and multifunctional. In addition to traditional sensors, new types of sensors are being investigated and studied-for instance, biosensors [4143], wireless sensor networks [44461, gas sensors [47, 481, intelligent temperature sensors [49], ultrasonic sensors [50], hightemperature superconductive magnetic flux sensors [ 511, hydrogen optic chemical sensors [52], photochemical sensors [53], and acoustic sensors [54].

Sensor 14000 12000 r 6 10000 D 8000 Z 6000

r

Data transfer

Data acquisition system

5

4000

-

***-**

Figure B. 5: Publications on data acquisition systems/ data transmission/sensor techniques in China. Data transmission is an important step inside the PHM architecture that determines whether PHM is locally or remotely implemented. The earliest paper on data transmission, published in 1958, introduced a linear binary data transmission system [55]. The speed of these developments can be seen in Figure B. 5 The current data transmission techniques in China include wire and wireless data transmission [56-591, encryption [60-661 and no-encryption data transmission, data compression [67-741 and storage [75-801, high-speed data transmission [8 1-85], and long-distance data transmission [86-9 11.

Appendix B

257

B.31.3 Health Monitoring Health monitoring is a nondestructive technique that allows the integrity of systems or structures to be actively monitored during operation and/or throughout their lives in order to prevent disastrous failures and reduce maintenance costs. The earliest paper on health monitoring, in 1971, was the introduction of a Polish automation process for coal mining [92]. From 1971 to 1979, only about 200 papers were published to discuss monitoring technologies, and these focused on monitoring instruments or equipment, such as monitoring machines using computers [93], high-temperature accelerated sensors used in vibration monitoring on aero-engines [94], manual probes for testing chips or integrated circuits (ICs) [95], temperature measurement systems of large electromechanical rotors [96], synthesis fiber monitors [97], vacuum-deposit optic film thickness monitors [98], new methods for multipoint monitoring of vibration generators [99], and new refrigeration equipment for infrared detection [IOO]. In the late 1 9 7 0 ~health monitoring systems started to be introduced, including the production monitoring system for the weaving industry [ 10 I], a computer-integrated coal monitoring system [ 1021, the MINOS long-distance monitoring system [ 1031, a wireless interceptor with channel scan and a priority channel monitoring system [ 1041, an SD-electric network in-time monitoring system [105], and a diesel engine condition monitoring system [106]. However, many of these papers were discussing foreign health monitoring technologies. From the middle 1990s, an appreciably increasing trend of health monitoring technologies emerged. Much research and many applications including monitors, monitoring methods, and monitoring systems have been focused on civil and military uses. For example, the computer monitoring system for a converting station [ 1071, the on-line monitoring system of ferromagnetic synchronous vibration on power supply [ 1081, the engine condition monitoring system [ 109, 1lo], the electrostatically supported gyro monitoring system [ 11 11, the data collection and monitor control system for oil field surface engineering [ 1 121, the real-time interruption processing and multitask monitoring system [113], the multichannel carrier wave monitoring and alarm system [114], and the neural network-based liquid rocket engine health monitoring system [115]. In 2006, there were nearly 9000 papers discussing health monitoring technologies and more than 4000 papers discussing health monitoring systems, as shown in Figure B. 6. A

Monitoring

Monitoring system

i

A

loooo 8000 1

A AA

1970

1980

1990 Year

2000

2010

Figure B. 6: Publications on monitoring technologies/ systems in China. New monitoring technologies have become a highlighted area in recent years. The research into and applications of new monitoring technologies mainly include intelligent monitoring [116-1181, wireless monitoring [ 1191, long distance monitoring [ 120-1221,

Prognostics and Health Management of Electronics

258

real-time monitoring [123], and embedded monitoring [124, 1251. From 2003, the concept of modern PHM was increasingly introduced into China [126-1381, and research on PHM for electronic products and systems commenced [ 139-1 521.

B.31.4 Diagnostics and Prognostics Models In China, research on failure diagnostics began earlier than that on health monitoring and prognostics. The first paper on failure diagnostics was published in 1964 [153], while the prognostics publications started in 1979 [154]. There were many fewer papers on prognostics than on diagnostics. Figure B. 7 shows this trend. Failure diagnostics 3500 3000 $ 2500 2000 z 1500 1000 5 00 0 1960

' Failure prognostics

L

2

** 1970

1980

1990

**

2000

2010

Year

Figure B. 7: Publications on failure diagnostics/prognostics technologies in China. Many kinds of mathematical models have been used in diagnostics and prognostics. These models can be classified into PoF, artificial intelligence, statistical, and hybrid models. Figure B. 8 shows the history of publications on fatigue research. 1000 L

e, D

z

800

. ...........

-

600 -

*... . . ..

*.a

400 200 .a*..*

ern

Figure B. 8: Chinese publications on fatigue research. PoF models have been developed by Chinese experts in individual component fields and tested on large data sets to guarantee their accuracy. Many publications on PoF presented component creep and wear models, high-cycle and low-cycle fatigue models, thermal fatigue models, contact fatigue models, and so on. Some of this research focused on the fatigue life prediction of tenons in aeroengine cold-section discs based on small crack growth [ 1551, multiaxial fatigue life prediction of fan disks using the critical plane method [ 1561, a general life model adapted to both stress fatigue and strain fatigue [ 1571, predictions of high-temperature low-cycle fatigue life of an aeroengine's turbine blades at the

Appendix B

259

low-pressure stage [158], low-cycle fatigue life of an aeroengine gas turbine disk [159], and life prediction methods of a turbine blade for creepifatigue interaction [ 1601. The Chinese have also been active in developing artificial intelligence systems. Some of these %ere case-based or rule-based expert systems [161-1651 and h z z y logic systems [166-1691. Figure B. 9 shows the history of research on expert and fuzzy logic systems in China. Expert system * Expert system-dp Fuzzy logic system Fuzzy logic system-dp I 5oor

4

1975

1980

1985

1990 1995 Year

2000

2005

2010

Figure B. 9: Diagnostics or prognostics publications on expert systems and fuzzy logic systems on in China. The third type of mathematical model used includes statistical models such as general statistical methods and data-mining tools. Examples of general statistical methods are like polynomials, data filters, Monte Carlo simulations, second order, Taylor expansion, recursive least squares (RLS), and orthogonal arrays. The data mining tools for diagnostics and prognostics chiefly comprise neural networks (NN) [ 170-1741, evidence theories [ 1751771, genetic algorithms [ 178-1811, the Grey theory [ 182, 1831, support vector machines (RLS) [ 184-1 881, time series (the autoregressive, AR model; moving average, MA model, ARMA model [189-1931; etc). Much research has been conducted on these algorithms and many applications have been applied, especially for NN, as shown in Figure B. 10 and B. 11. Evidence theory

Genetic algorithm

NN x SVM

Time series

6000 ;

5

5000 1

%

4000

$!

3000 2000 1000

0

1 1 1 * * a *

1970

,adLh,*,?

1975

1980

1985

1990 Year

1995

2000

2005

2010

Figure B. 10: Publications on data-mining tools in China The fourth type is the hybrid model, which always combines deterministic features of real systems or subsystems with statistical models to reduce calculating errors and to improve the confidence level of prognostics. Many publications on hybrid models use NN in

Prognostics and Health Management of Electronics

260

combination with other algorithms like SVM (support vector machine), PCA (principal component analysis), and so on [ 194-2001.

600 500 L

2 400 E

300 200 100

-

-

I

B.31.5

I -

A

--a

Decision Support System

The decision support system is a data processing mode that emphasizes user friendliness and specific query, reporting, and analysis capabilities. The research on decision support systems (DSSs) started in the late 1980s. The concept of decision-making systems originated from the First Academic Conference of the China Automatic Academy for Systems Engineering in 1979 [201]. DSS was subsequently researched and applied in many areas, such as marketing, military artificial intelligence (AI), industrial planning and management, and running electric networks. Figure B. 12 shows the trend of DSS publications. However, few publications on DSS applied PHM. 1000

r

L

800 -

-f

600 -

'

0 0

0

.

0 . 0

400 -

1980

0 . 0 .

1985

1990

1995

2000

Year Figure B. 12: Publications on DSS

2005

2010

Appendix B

B.31.6

261

Data Fusion

Data fusion is a process of using cooperative or competitive information to reach a more confident decision about both diagnostics and prognostics. It plays a key role in advanced prognostics, generating useful features, combining features, and incorporating model-based information. Data can be fused at different levels for PHM. (1) In sensor-level fusion, multiple sensors measuring correlated parameters are combined. (2) Feature-level fusion combines features intelligently to obtain the best possible diagnostic information (3) Decision-level fusion combines experience-based information with signal-based information. Publications on data fusion in China started in the late 1980s. The concept of data fusion was first discussed at the Second Academic Conference of the International Federation of Automatic Control (IFAC) for the Application of Artificial Intelligence in Real Time Control in 1989 [202]. Since then, much research on data fusion has been performed. Figure B. 13 shows the trend of publications on data fusion. To date, research and applications on data fusion focus principally on multi-sensor-level fusion and featurelevel fusion [203-2051. The algorithms used include Kalman filter, the D-S evidence theory, genetic algorithms (GA) [206], Bayesian detection, back-propagation artificial neural network (BP ANN), SVM, fuzzy induction, optimal linear data fusion, wavelet analysis [207], RLS, [208] and others.

6

9

700 600 500 -

0

400 300 200 100 -

0

.. 0

.

*

*

0

0 ' I

Figure B. 13: Publications on data fusion.

B.31.7 Application Areas of Diagnostics and Prognostics in China Most research in health monitoring, diagnostics, or prognostics in China was performed in mechanical and electromechanical areas. Examples included research on the transformer [209], automobile engine [210], aeroengine [211], locomotive running gear [212], caterpillar crane [213], coal shearer [214], centrifugal pump [215], rolling element beating [216], hydropower unit [217], turbine generator set [218], nuclear power plant [219], boiler [220], bearing [221], fire control system [222], and gun [223]. For electronic products, there were also some publications on diagnostics or prognostics, but they were much fewer than those for mechanical or electromechanical products. This situation is shown in Figure B. 14. Examples include articles on the electronic circuit and system by BIT [224], self-test and fault diagnostics for microprocessor-based instruments and equipment [225], computer diagnostics [226], digital circuit diagnostics [227], thyristor trouble detection in power electronic equipment by waveform analysis [228], and PWB diagnostics by infrared technologies [229]. Almost all these publications used diagnostics methods; many used statistical approaches and only a few relied on PoF models.

Prognostics and Health Management of Electronics

262

50

r

*

I

*

10 a m

0 - ~ - ~ - --1985 1990

*

*

*

*

*

a

a

_________~~

1995

2000

--J

2005

2010

Year

Figure B. 14: Publications on failure diagnostics and prognostics for electronic products in China.

B.31.8

Current PHM Opportunities and Research in China

As noted above, although there has been considerable research efforts on health monitoring, diagnostics and prognostics in China, the concept of modem PHM is still new there [127-1391. The Chinese government announced its plan to make large planes and lists this as one of its important mid- to long-term development plans [230]. However, this plane project consumes considerable resources in time, money, and staff. Conserving limited resources requires proper operational and support ( O M ) management, sensitive and accurate fault detection, timely inspection, efficient logistics, and high-quality design-the major advantages of PHM. The project thus brings opportunities for the application and development of PHM into the aviation industry. This potential for the aviation industry will also boost the development of relevant electronics and manufacturing industries and drive the application of PHM in these industries. A potential market exists for new sensor types and mature software algorithms that can process signal data speedily and are interesting to many companies. Publications on prognostics from the Total Chinese Journal Database originate primarily in universities, as shown in Figure B. 15.

Figure B. 15: Recent sources of research on prognostics .In the area of prognostics for electronics, the Failure Analysis and Physics Research Laboratory (FAPRL) in the Department of Project Systems Engineering at Beijing

Appendix B

263

University of Aeronautics and Astronautics has published papers on basic approaches and applications of PHM [140-1541. Publication topics include the PHM flow chart and its examples for electronic products, hardware and software prototypes of wireless data acquisition systems, the technology framework of PHM for electronic products, and failure mechanism modeling [including plated-through hole (PTH) modeling, ESD failure modeling, solder point reliability modeling, heat exchange modeling, effect modeling of caloric carriers, TDDB physics modeling]. FAPRL made its PHM plan for the next several years with a roadmap, shown in Figure B. 16Error! Reference source not found.. The main PHM research at FAPRL will focus on sensors (chemistry, radiation, and specialty sensors), PHM software tools, electromechanical PHM, and so on. PHM research on aeroengines will also be conducted [23 11.

Roadmap of PHM Technologies 2006

2007

2008

2009

2010

Hardware Wireless network of temperature and humidity monitoring

--

-

Electric load monitoring /BIT network

PHM sensor network and prototype combined with products Line acceleration /angular acceleration /inertia MEMS

Initial synthesis for electromechanical PHM

c

~

t

Radiation sensor Chemistry sensor

--

Digital network for specialty sensors

Software PHM general software platform design Life prediction based on current theories PHM software tools ~

-

c

Figure B. 16:The roadmap of P H M technologies in the FAPRL. About 8% of the publications on prognostics came from research institutions such as the China Aero-Polytechnology Establishment (CAPE), the China Aviation Wireless Electronic Institute, and the China Development and Research Center of Aviation Industry. Some Chinese industries or companies are also conducting research on prognostics, but they tend not to publish papers. Figure B. 15 shows that only about 9% of the publications came from industries or companies. These companies include Emerson Network Power Co., Ltd. [232], Dongfeng Automobile Co., Ltd. [233], Mayong Power Supply Co., Ltd., Guang Dong [234], and Shenzhen Total-Automation System Co., Ltd. [235]. The investment in PHM by the Chinese government seems to have grown appreciably. This can be seen from publications on 863 projects launched by the Chinese government in 1986 to enhance overall capability in research and development on high technologies [236].

Prognostics and Health Management of Electronics

264

Figure B. 17 shows this trend. In these publications, 97% of the projects were centered in universities and research institutions.

250 200 0

5

:

150 100

-

50

*

+

* * * + * --L* .__. .

0 L--*-*&--?-1990

1995

2000

1.

__.-_____I

2005

2010

Year

Figure B. 17: Publications on 863 projects on health monitoring, diagnostics and prognostics in China.

B.31.9 Barriers to Implementing Prognostics in China Several barriers impede the implementation of prognostics in China. 1. Developing high-performance sensors: Monitoring sensors must be available that are more stable than those monitored. Another challenge comes from the attempt to make sensors with integrated functions while maintaining small size and high data processing speed. 2. Prognostics research approaches: For statistical approaches, the barriers are the high cost of samples and experiment time. For PoF approaches, the barriers are the difficulty in characterizing and verifying failure mechanisms and the unavailability of high-precision, reasonably priced instruments and apparatus. This prohibitive situation largely results from the overall low technology level of the Chinese manufacturing industry and the inconsistency of Chinese products. In addition, the uncertainties in the experiment samples make failure mechanism analysis and experimental design difficult. 3. Application of prognostics: The lack of information systems in many industries in China poses a huge barrier to the application of prognostics for a usage-based approach. Another point limiting the application of prognostics is the scarcity of integrated PHM software tools. 4. Lack of collaboration between research institutions and industries: Research is often constrained inside universities or research institutions, where researchers focus more on theoretical frameworks than on practical applications. Although many industries or companies need prognostics to improve the reliability of their products, they are unable to perform PHM research independently and cannot cooperate with universities effectively. The root cause of this problem is that market competition in China is still not strong enough to drive industries to willingly sponsor research in universities or research institutions, unless they encounter problems they cannot tackle and have to resort to the intelligence power of research institutions. 5. Lack of recognition of the importance of PHM: In general, Chinese academia understands the importance and advantages of PHM, but most industries do not. Government decision-makers’ lack of a full understanding of PHM makes it difficult for research institutions to obtain enough government support to conduct PHM research.

Appendix B

265

6. Lack of originality: The whole Chinese industry remains in the period of manufacturing rather than making designs, so it lacks creative ability.

B.31.10

Summary and Conclusions

From the middle 1980s to the early 1990s, the research on PHM increased only slightly, but since the mid-1 990s, PHM research has been rapidly increasing. However, new concepts in the field always originated in foreign countries, and domestic researchers mainly focused on applications and translations of foreign publications. In terms of research areas, large mechanical and electromechanical products are still the focus, while electronic products receive less attention. However, based on OSA technological levels, from data acquisition to decision making, research has been conducted on many kinds of hardware and software, which provides a strong basis for PHM. Many Chinese believe that PHM will enjoy a boom period in the near future because of the current prosperity of the Chinese economy. A close collaboration between academia and industries will be needed to accelerate the research and application of PHM. Future research on and applications of PHM in China will mainly focus on (1) high-performance sensors and new, advanced sensors for anomaly detection; (2) data fusion and feature extraction technologies on different levels, including multiple sensor fusion, multifountain information fusion and long-distance data fusion based on networks; (3) embedded real-time health monitoring, diagnostics, and prognostics; (4) algorithms to boost the performance of diagnostics and prognostics; (5) development of PoF models; (6) research on decision architecture and decision-making models based on condition monitoring and prognostics and on aptitude maintenance systems; ( 7 ) evaluation technologies and products to improve the accuracy of prognostics; and (8) integration of advanced prognostic systems.

B.31.11

References

1. N. Huang and Z. Zhongqin, “Verification on Vibration Frequency of Engine BHA on X Jet Plane,” Journal of Beiiing University of Aeronautics and Astronautics, 1956.

2.

Z. G. Wu, “Stress Test for High Pressure Vessel Demolition Blast Trial,” Journal of Tsinehua University (Science and Technology), No. 1, 1959.

3.

H. N. Liu, “Research on Transition Process of Slide Valve Hydraulic Follow-up System for Machine Usage,” Journal of Dalian University of Technoloev, No. 4, 1962.

4.

B. L. Sun, X. S. Lin, “Research and Development on Pressure Sensor for the Usage of Physiology Experiment,” Progress in Physiological Sciences, No. 1, 1963.

5.

P. Z. Jiang, “Investigation of the Rotating Stall on Isolated Rotor of an Axial Flow Compressor,” Journal of Xi’an Jiaotong University, No. 4, 1964.

6.

X. Z. Guo, “Parameter Calculation and Graphic Method for Palstance Gyroscope,” Journal of Naniing University of Aeronautics & Astronautics, No. 3, 1964.

7 . A Group of Dalian Engineering College, “Stress-Type Flow Speed Sensor and the Method of Its Determining Speed,” Journal of Dalian University of Technology, No. 1, 1965.

8.

Z. D. Wang, “Application of a Ball Flow Speed Sensor and Its Measurement in Wave Flow Speed,” Journal of Dalian Universitv of Technology, 1966.

266

9.

Prognostics and Health Management of Electronics

Z. D. Zhang, “Little Displacement Sensor Using Stress Resistance Wire,” Modular Machine Tool & Automatic Manufacturing Techniaue, No. 4, 1965.

10. “An Instrument of Test Slope Using Hydraulic Pendulum,” Exploration Engineering (Drilling & Tunneling), No. 8, 1965. 11. J. J. Bells, and Z. J. Chen, “Indicator Instrument for Crane Moment,” Hoisting and Conveying Machinery, No. 1, 1965. 12. A Dynamic Poise Group of Shanghai University of Commutation, “Discussion of Several Problems of Precise Dynamic Poise Machine,” Journal of Shanghai Jiaotong University, No. 2, 1965. 13. Z. Q. Tang and Z.Y. Hong, “Inherent Vibration Feature of Cone Shell,” Acts Mechanica Sinica, No. 1, 1966. 14. “Research on Vibration Sensor of Lower Frequency Turning,” Tool Engineering, No. 2, 1972. 15. Y. M. Liang and S. T. Chen, “Manufacturing of Resistance Strain Gages without Transverse Sensitivity,” Acta Mechanica Sinica, No. 1, 1966. 16. L. S. Zhou, “Multiple Stress Sensors,” Construction Machinery and Equipment, No. 4, 1966. 17. W. Y. Song and B. R. Wu, “Error and Its Compensation Method of Temperature Sensor Using in the Sea,” Oceanologia Et Limnologia Sinica, No. 2, 1966. 18. Shanghai Measurement and Tool Bit Factory, “Introduction of Raster Pattern Sensor,” ToolEnrrineering, No. 1, 1971. 19. “Gas-Electric Converter,” Instrument Technique and Sensor, No. S1, 1971. 20. “Transitional Temperature Sensor of High Temperature and High Speed Gas Flow,” Journal of Functional Materials, No. 1, 1975. 2 1. “Jet Current Sensor,” Instrument Techniaue and Sensor, No. S 1, 197 1. 22. J. C. Xu, “Integrated Array of Silicon Light-Electric Detector Using Image Sensor,” Microelectronic Technology, No. 3, 197 1. 23. “A New Concept of Minimum Measurement,” Instrument Technique and Sensor, No. 5, 1971. 24. “Low Temperature and High Sensitive Thermal Flux Sensor with Temperature Compensation,” Instrument Technique and Sensor, No. 6, 197 1. 25. F. Xu, “Silicon Light-Electric Automatic Scanning Image Pickup Tube with 256 Diodes,” Microelectronic Technology, No. 1, 1972. 26. “Accelerated Sensor working in 650”C,” Instrument Technique and Sensor, No. 1, 1972. 27. “High Temperature Accelerated Sensor,” Instrument Technique and Sensor, No. 5, 1972. 28. “Semiconductor Stress Chip,” Instrument Technique and Sensor, No. 5, 1972. 29. W. Adler and X. Wang, “Star-Type Precise Digital Optics Sensor,” Acta Photonica Sinica, No. 2 , 1973.

Appendix B

261

30. Luo, “Solid Film Sensor,” Analytical Instrumentation, No. 1 3 1. A Research Group of Magnetic and Elastic Material in Chongqing Instrument Institute, “Research and Development on Electromagnetic Sensor with Magnetic and Elastic Material,” Journal of Functional Materials, No. Z 1, 1973. 32. C. Ren, -‘Thermal-electric Detector,” Laser & Infrared, No. 1 1, 1975. 33. J. R. Wang, “Electronic Sensor,” Auto Electric Parts, No. 2, 1978. 34. N. Xin, “Research and Development on Infrared Detector for Distance Feeling Sensor in the Future,” Laser & Infrared, No. 3, 1977. 35. Z. G. Tan and S. C. Chen, “The Analysis and Parameter Selection on the Fundamentals of Eddy Current Transducers,” Chinese Journal of Scientific Instrument, No. 1, 1980. 36. C. L. Shin, “Fiber Optic Sensor,” Chinese Journal of Lasers, No. Z1, 1980. 37. S. D. Pu, “Optic Sensor-- Two Dimensional Optic Sensor,” Laser Technology, No. 1, 1979. 38. T. Nitta, S. Hayakawa, and D. Luo, “Ceramics Humidity Sensor,” Piezoelectric & Acoustooptics, No. 3, 1981. 39. P. M. 6 a 6 a e ~and J. H. Zhang, “High Temperature Hull Gallium Arsenide Sensor,” Journal of Functional Materials, No. Z1, 1973. 40. “Optic No-contact Sensor,” Grinder and Grinding, No. 1, 1974. 41. C. Q. Zhang, F. Zhou, B. J. Qu, T. L. Ren, and L.T. Liu. “Study of a Novel Biosensor Based on GMR Effect,” Micronanoelectronic Technology, No. Z1, 2007. 42. G. X. Wang, Y. F. Li, W. X. Wang, S. Y. Ma, W. L. Jia, and H. S. Wang, “Advances in the Electrochemical Deoxyribonucleic Acid Biosensors,” Journal of Liaocheng University (Natural Science Edition), No. 2,2007. 43. X. Chen, H. D. Li, J. X. Wang, and Y. N. Liu, ”Investigation of Nanomaterial-based Biosensors,” World Sci-Tech R & D, No. 5, 2007. 44. R. H. Hou, H. S. Shi and S. J. Yang, “An Adaptive Cooperative Communication Routing Protocol for Wireless Sensor Networks,” Journal of Electronics & Information Technology, No. 10, 2007. 45. Y. F. Liang, S. Hao, and S. D. Zhang, “Research and Realization on Long Distance Monitoring System Based on Wireless Sensor Network,” Application of Electronic Techniaue, No. 6.2007. 46. L. Yao, Z. B. Zhao, N. An, and W. J. Wen, “Data Acquisition and Processing of Wireless Sensor Networks in Gateway,” Chinese Journal of Scientific Instrument, No. S2,2006.

47. G. H. Hui, L. L. Wu, M. Pan, Y. Q. Chen, T. Li, and X. B. Zhang, “Carbon Nanotubes Gas Sensor Based on Corona Discharge,” Chinese Journal of Analytical Chemistrv, No. 12, 2006. 48. T. P. Lv and C. Dai, “Application and Research Trend of nitrogen oxide Sensor in Life Science,” West China Journal of Pharmaceutical Sciences, No. 5,2007.

268

Prognostics and Health Management of Electronics

49. L. X. Du, G. B. Yang and J. P. Xu, “Application of the Intelligent Temperature Sensor DS18B20 in Automatic Tobacco-baking Equipment,” Instrumentation Technology, No. 10, 2007.

50. B. Zhang, “Study on Distance Measuring Algorithm Suitable for Narrow-Band Ultrasonic Sensors,” Transducer and Microsystem Technologies, No. 10,2007.

5 1. G. Zhang and J. Li, “Research on High Temperature Superconductor Magnetic Sensor,” Ordnance Industry Automation, No. 9,2007. 52. L. Xiong, L. Wang and Y.L. Wang, “Recent Advances in Optochemical Sensors for the Detection of H-2 in Air,” Meteorological, Hvdroloaical and Marine Instruments, No. 3, 2007. 53. L. Y. Jia, “The Development and Application on A Photochemical Sensor,” Environmental Protection Science, No. 5, 2007. 54. Y. T. Cao, A. K. Xue and Y. S. Lin, “Research on the Implementation of Orientation Algorithm in Acoustic Sensor Network,” Computer Engineering and Applications, No. 8,2007.

55. M. L. Doelz and M. Q. Meng, “Data Transfer Technology of Linear Binary System,” Telecommunications Science, June 1958. 56. X. L. Li, G. F. Pan and Y.C. Suin, “Realization Of Wireless Data Transmission System Of A New Type Intelligent Gas Sensor,” Sensor World, No. 7, 2007. 57. Z. Z. Lu, “Design of Wireless Data Transmission Module with Band Rate SelfAdapting,” International Electronic Elements, No. 8,2007.

58. J. Jiang, “Development and Design on Wireless Data Transmission System,” Electronic Technology, No. 05, 2007. 59. J. Liu. Z. H. Xiao and X. H. Zhang, “Design of Intelligent Monitoring and Control System Based on WLAN,” Information of Micros Computer, No. 2, 2007. 60. H. Shi, “Research on Encryption Techniques in Network Data Transmission,” Science Information, No. S4, 2006. 61. S. H. Zhang, “The Research on Cryptographic Schemes,” ZTE Communications, No. 5, 2007. 62. C. Li, “Common Encryption Techniques in Data-Communication,’’ Guanqxi Communication Technology, No. 3, 2007. 63. Y. Z. Zhao, S. H. Pei, H. J. Wang and X. L. Yang, “Using the Ergodic Matrices over Finite Field to Construct the Dynamic Encryptor,” Journal of Chinese Computer Systems, No. 11, 2007. 64. S. J. Yan, Y. Z. Jiang, L. Jiao and J. C. Pan, “A Gray Image Encryption Arithmetic Base on Chaotic Sequences,” Xuzhou Institute of Technology, No. 6,2007. 65. W. P. Ding, “Encryption Algorithm Based on Discrete Hopfield Neural Network,” Journal of Hunan Institute of Science and Technology (Natural Sciences), No. 3,2007. 66. J. X. Qiao, “Research on XML Encryption and XML Signature Technology,” Information Technologv, No. 7, 2007. 67. T. Li, D. J. Wang and F. L. Liu, “An Improved BAVQ Algorithm for SAR Raw Data Compression,” Modern Radar, No. 9, 2007.

Appendix B

269

68. L. H. Su, J. J. Yang, and J. W. Wan, “Evaluation on the Compression Performance of Hyperspectral Data Using Vector Quantization,” Computer Engineering & Science, No. 9, 2007. 69. Z. H. Zhao and H. J. Mu, “ECG Data Compressing Based on the Wavelet Neural Network,” Techniques of Automation and Applications, No. 7,2007. 70. G. Q. Wu, H. Chen, and X. W. Xu, “Scientific Data Compression Method Based on Optimized Interpolate Prediction,” Computer Science, No. 8, 2007. 71. J. J. Wang, Y. P. Lin, and S. W. Zhou, “Distributed Data Compression algorithm for Wireless Sensor Networks,” Computer Engineering and Applications, No. 27.2007. 72. J. Zhang, Y. Du, D. T. Lu, and D. L. Li, “Chromatogram Data Compression Based on Linear Fitting of Time Series,” Journal of Computer Applications, No. 7, 2007. 73. A. L. Ding, G. M. Shi, N. Zhang, and L. C. Jiao, “Signal Compression and Design of Wavelet Based on Waveform Matching,” Journal of Electronics & Information Technology, No. 4,2007. 74. Q. F. Zhang, “Application on Long-distance Education for Data Compress Techniques,” China Water Transport (Theom Edition), No. 6,2006. 75. L. Yang, “Key Technology to the Storage of Basic Geographic Information Data,” Geospatial Information, No. 5, 2007. 76. F. Gao, Z. X. Tang, Z. Q. Li, and L. R. Zhu, “Intelligence Data Acquisition and Analysis System,” Instrument Technique and Sensor, No. 9,2007. 77. W. P. Wu and H. Fan, “Design the SAT Signal Networking Monitoring & Management System Based on LabVIEW and SQL Server,” Modern Electronics Technique, No. 17, 2007. 78. J. Wu and P. T. Wang, “Path Optimization in Undirected Network Based on Genetic Algorithm,” Journal of Tianiin Normal University (Natural Science Edition), No. 3, 2007. 79. Y. D. Liu, C. Q. Gao, M. W. Gao, and F. Li, ”Realizing High Density Optical Data Storage by Using Orbital Angular Momentum of Light Beam,” Acta Phvsica Sinica, No. 2, 2007. 80. W. J. Xie. J. Xu, and C. H. Wu, “A Chord-Based Optimization Method for Distributed Spatial Data Storage,” Journal of Computer Auplications, No. 3, 2007. 81. Y. F. Qiao, H. Ni, L. Pan, C. H. Zhang, and J. L. Wang, “Implementation of High Speed Data Transport of Embedded Hardware Platform System,” Computer Engineering, No. 20, 2007. 82. L. F. Lin, “Application of PCI Bus Interface Technology in High Speed Data Transmission System,” Process Automation Instrumentation, No. S 1, 2007. 83. J. F. Yan, N. Wu, “High Speed DMA Data Transfer System Based on PCI Bus,” Journal of University of Electronic Science and Technology of China, No. 5,2007. 84. L. H. Liu and P. Li, “Realization on Long Distance Data Transmission System Based on LVDS Technology,” Electronic Component & Device Applications, No. 1,2007.

Prognostics and Health Management of Electronics

270

85. Y. Q. Hu. H. H. Jiang, and Q. Li. “Research on WLAN High-speed Data Transmission Techniques Based on DSSS-CCK,” China Water Transport (Theory Edition), No. 3, 2006. 86. L. H. Fang, “Study on Application of Data Transmission Radio in Distant Monitoring System,” Programmable Controller & Factory Automation, No. 10, 2006. 87. L. Wang, Y. B. Xu, and D. Y. Ma, Research of Remote Data Communication System Based on GPRS Technology,” Digital Communication World, No. 7,2007. I’

88. X. J. Zhang, C. H. Sun, and Y. P. Wang, “Long Distance Data Transmission System Based on 73M290 1 ,” Microcontrollers & Embedded Systems, No. 1,2007. 89. H. J. Gu and B. Qu, “Remote Intelligent Alarm System of IVC Based on DTMF,” Journal of Soochow University (Engineering Science Edition), No. 2,2007. 90. C. J. Liu and H . B. Huang, “The Optimizing Design on the Long-distance Inspect and Control System on Network Environment,” Microelectronics & Computer, No. 2, 2007. 91. H. W. Yao, L. B. Liu, G. Q. Liu and S. L. Han, “Plan and Design of Remote Service Center for Steam Turbine,” Turbine Technology, No. 2,2007. 92. “Coal Mine Institute of Science in Fu Shun Automatic Coal Mine Well,” Safety in Coal Mines, No. 2, 197 1. 93. “Monitoring Machines Using computers,” Modular Machine Tool & Automatic Manufacturing Technique, No. S2, 1972. 94. “High Temperature Accelerated Sensor,” Instrument Technique and Sensor, No. 5. 1972. 95. X. Xin, Probe, Microelectronics. 96. Z. M. Ge, “Temperature Measurement System for Large Electromechanical Rotor,” Electrical Measurement & Instrumentation, No. 1 1, 1973. 97. “Synthesis Fiber Monitor,” Synthetic Fiber in China, No. 4, 1974. 98. C. Ren, “Vacuum Deposit Optic Film Thickness Monitor,” Laser & Infrared, No. 1 , 1973. 99. “A New Method of Multipoint Monitoring of Vibration Generator,” Structure & Environment Engineering, No. 4, 1974. 100.L. Zong, “A New Type Refrigeration Equipment of Infrared Detection,” Laser & Infrared, No. 4, 1973.

10 1 .“Monitoring System for Weaving Industry,” Shanghai Textile Science & Technology, No. 02, 1978. 102. D. M. Shi, “Coal Industry Monitoring System Using Computer,” Coal Preparation Technology, No. 3, 1978. 103.X. T. Zhang, “MINOS Long Distance Monitoring System for England Coal Operation,” Industry and Mine Automation, No. 3, 1978. 104.C. J. Xu, “Monitoring System of Wireless Incepting and Channel Scan or Priority Channel,” Mobile Communications, No. 2, 1978.

Appendix B

27 1

105.M. L. Chen, “Application Software of SD-Electric Network In-time Monitoring Systems,” Automation of Electric Power Systems. No. 3, 1979. 106.X. R. Zhang, “Diesel Engine Condition Monitoring,” Locomotive & Rolling Stock Technology, No. 3, 1979. 107.C. M. Zhao, “The Computer Monitoring System for 500V Convert Station in Zhengzhou,” Central China Electric Power, No. 1, 1990. 108.B. Y. Tian, Y. X. Zu, “On-line Monitoring for Electricity Supply System FerroMagnetic Synchronous Vibration,” Hebei Electric Power, No. 1, 1990. 109.Z. F. Lin, Z. M. Fan, “Development and Application of Engine Baseline Equations,” Journal of Civil Aviation University of China, No. 4, 1992. 110.Z. F. Lin and Z. M. Fan, “A Study of Engine Condition Monitoring and Fault Diagnosis System,” Journal of Civil Aviation Universitv of China, No. 3, 1993. 11 1.Z. S. Xiao, “Electrostatically Supported Gyro Monitor System,” Journal of Chinese Inertial Technology, No. 2, 1992.

112,s. X. Li, “Application of Data Collecting and Monitor-Control System to Oilfield No., 5, 1992. Surface Engineering,” p 113.G. Huang and Y. F. Liu. “The Interrupt Processing in Real-Time Multitask Monitoring System,” Journal of Naniing University of Posts and Telecommunications, No. 4, 1993. 1 14.H. M. Li, “Multi-Channel Carrier Wave Monitoring and Alarm System,” Electronics Practice, No. 1, 1994.

115.M. C. Huang, X. Feng, and Y. L. Zhang, “Neural Network Approach to Liquid Rocket Engine Health Monitoring,” Journal of Aerospace Power, No. 4, 1993. 116.4. Fu and J. T. Hou, “Anti-interruption Problems of Intelligent Signal Power Monitoring System,” Xi‘an Railway Technolom, No. 4, 2006. 117.5. F. Wang and F. Lin, “Interconnect Technology Disguises on Electronic Equipment in Water and Power Station,” Automation for Water and Power Industry, No. 4, 2006. 1 18,s. J. Zhang, “Application of Intelligence Electric Monitor System in High-tension Distribution,” Programmable Controller & Factory Automation, No. 7, 2006.

119.B. Feng, E. J. Zhang, and rc’. Kui, “Study Wireless Monitoring Terminal Base on Embedded Operating System,” Power System Technology, No. S2,2006. 120.G. G. Yang, J. H. Zhang, and J. Li, “Design of the Long Distance Automation Monitoring System in an Oil Well,” Computer Amlications of Petroleum, No. 4, 2006. 121.W. F. Liu and A. L. Ye, “Research on Real-Time Long Distance Monitoring System for Lift Run Status,” Servo Control, No. 4, 2006. 122,s. T. Cai, C. G. Yuan, and Y. T. Du, “Remote Monitoring of Vehicle Fault Via CDMAKAN,” Programmable Controller I% Factow Automation, No. 8, 2006. 123.G. X. Zhou, L. Shi, and J. H. Han, “A Test Method for the Real-Time Embedded System with the Visual Simulate,” Journal of System Simulation, No. 12, 2006.

272

Prognostics and Health Management of Electronics

124.5. Z. Zhou, Z. J. Yu, “Application on Embedded View Monitoring System for Power Station Storing Energy in Xikou,” Automation of Hydropower Plants, No. 2,2006. 125.J. Zhao and Z. Z. Wang, “Study on Embed Video Surveillance Systems Over the Internet Protocol,” Chinese Journal of Scientific Instrument, No. S3,2006. 126.B. Z. Zhang and T. X. Zeng, “Advanced Prognostics and Health Management Technology,” Measurement & Control Technology. No. 1 1,2003. 127.P. Xu and R. Kang, “Research on Prognostic and Health Management (PHM) Technology,” Measurement & Control Technology. No. 12,2004. 128.5. Tian and T. D. Zhao, “Research on PHM Technologies for Mechanical Products,” Proceedings of Mechanical Reliabilitv Academv in China, 2004. 129.S. K. Zeng, M. G. Pecht, and J. Wu, “Status and Perspectives of Prognostics and Health Management Technologies,” Acta Aeronautica Et Astronautica Sinica. No. 5,2005. 130.Y. J. Jiang, “Research of Intelligent Prognostics and Health-Management,’’ System for Machinery Safety and Maintenance, Metallurgical Equipment, No. 5,2005. 131.2. G. Mu, H. F. Hu, and N. Q. Hu, “Design of Prognostic and Health Management System for Weapon Equipment,” Ordnance Industrv Automation. No. 3,2006. 132.H. Jia, H. Q. Xu, and Q. L. Qu, “PHM-based Research on Maintenance Support for Ship Equipment,” Electronic Instrumentation Customer, No. 3, 2006. 133.S. N. Zhang, J. S. Xie, and R. Kang, “Framework of Prognostic and Health Monitoring Technologies of Electronic Products,” Measurement & Control Technology, No. 2, 2007. 134.X. W. He, Z. J. Zeng, H. Jia, and H. Q. Xu, “Research on PHM Technology Application in Sensor Networks of Anti-ship Missile’s Maintenance Support,” Electronic Instrumentation Customer, No. 2,2007. 135.Y. L. Zhang and Y. Z. He, “Avionic Maintenance Information Management System,” Avionics Technology, No. 2, 2007. 136.A. J. Li, W. G. Zhang, and J. Tan, “Survey on Aircraft Health Management Technology,” Electronics Optics & Control, No. 3, 2007. 137.X. Liang, X. S. Li, L. Zhang, and J. S. Yu, “Survey of Fault Prognostics Supporting Condition Based Maintenance,” Measurement & Control Technology, No. 6,2007. 138.B. Sun, R. Kang, and J. S. Xie, “The Application Technique of Prognostic and Health Management System,” Measurement & Control Technologv, No .7,2007. 139.B. Sun, R. Kang, and J. S. Xie, “Sensor Application and Data Transmission Technology in PHM,” Measurement & Control Technology, No. 7,2007. 140.J. S. Xie, “The Modes of Failure Model and the Basis Methods and Technique for RealTime Prognostics,” Proceedings of the 1 1th Reliabilitv Phvsics Academic Conference, Wenzhou, Zhejiang, October 2005, pp.283-290. 141.J. H. Ma, J. S. Xie, and R. Kang, “Flow Chart and Example of PHM for Electronic Products,” Proceedings of the 10th Reliabilitv Engineering Academic Conference, July 2006, pp.325-333. 142.5. S. Xie, R. Kang, Y. Zhang, and G. Guo “A PTH Reliability Model Considering Barrel Stress Distributions and Multiple PTHs in a PWB,” Proceedings of the 44th

Appendix B

213

Annual International Reliability Physics Symuosium (IRPS 2006) (EUISTP), March 2006, pp.256-265. 143.Y. Zhao and J. S. Xie, “Physics of Failure Model of General Electromigration for Semiconductor parts,” Proceedings of the 1 1th Reliability Physics Academic Conference, October 2005, October 2005, pp.77-83. 144.B. Sun, Y. Zhao, W. Huang, J. S. Xie, R. Kang, and R. Lv, “Example Research on PHM Methods for Electronic Products,” Systems Engineering and Electronic Technology, Vol. 29, No. 6, June 2007. pp.1012-1016. 145.B. Sun, Y . Zhao, W. Huang, and J. S. Xie, “Experimental Verification on PHM Example for Electronic Products,” Proceedings of the 10th Reliability Engineering Academic Conference, July 18-21, 2006. pp.334-341. 146.B. Sun, R. Kang, and J. S. Xie, “A Review of Research and Application Status on PHM,” System Engineering and Electronic Technology, 2007. 147.W. Huang, J. J. He, S. N. Zhang, and J. S. Xie, “Establishment on PHM Wireless Temperature and Humidity Acquisition Network,” Measurement & Control Technology, No. 10,2007. 148,s. N. Zhang, J. S. Xie, and R. Kang, “The Framework of PHM Technology for Electronic Products,” Measurement & Control Technology, Vol. 26, No.2 2007, pp. 1216, 18. 149.B. Sun, S. N. Zhang, J. S. Xie, and Y. Zhang, “Design Parameters Sensitive Analysis of PWB PTH Fatigue Life,” Electronic Parts and Material, Vo1.25, No. 9,2006, pp.60-63. 150.J. S. Xie, J. Shi, and X. Z. Wu, “A Case Study of Field Life Prediction and Reliability Assessment of Electronics Assemblies,” The 2007 IEEE Workshop on Accelerated Stress Testing & Reliability (ASTR 20072, Washington, DC, USA, October 3 1November 2,2007. 151.5. S. Xie, J. J. He, and P. McCluskey, “Aluminum Migration on Die Surfaces under High-Intensity Electrical Fields and the Migration-Induced Failures of a Power Transistor,” Proceedings of the 40th International Symuosium on Microelectronics (IMAPS 2007) (ISTP), 2007. 152.5. S. Xie, Y. J. Huo, Y. Zhang and M. Freda, “Effect of PWB Design Factors and Glass Transition Temperature on PTH Reliability,” Proceedings of the 39th International Symposium on Microelectronics (IMAPS 2006) (ISTP), 2006, pp.89 1-898. 153.T. B. Bashkow, H. Y. Zhou, “A Program System for Inspecting and Diagnosis on Machines,” Journal of Comuuter Research and Develoument,” No. 9, 1964. 154.Dept. of Mechanical Engineering at Huazhong Academy, “Reliability on Machine,” No. 6, 1979.

w,

155.S. L. Liu, X. R. Wu, H. F. Huang, J. Z. Liu, C. F. Ding, and X. P. Han, “Fatigue Life Prediction of Tenons in Aero-engine Cold-Section Discs Based on Small Crack Growth,” Journal of AerosDace Power, April 1999. 156.X. P. Wang, B. Z. Zhou, and X. G. Yang. “Multi-axial Fatigue Life Prediction of Fan Disk Using Critical Plane Method,” Journal of Shenyanp Institute of Aeronautical EngineerinR, April 2004.

274

Prognostics and Health Management of Electronics

157.F. X. Zhao, H. Q. Shi, and Z. X. Geng, “General Life Model Adapted to Both Stress Fatigue and Strain Fatigue,” Journal of Aerospace Power, January 2003. 158.L. J. Chen and L. Y . Xie, “Prediction of High-Temperature Low-Cycle Fatigue Life of Aero Engine’s Turbine Blades at Low-Pressure Stage,” Journal of Northeastern University (Natural Science), July 2005. 159.X. Y. Yang, X. J. Yan, F. X. Zhao, and L.W. Dong, “Low Cycle Fatigue Life of an Aero-engine Gas Turbine Disk,” Journal of Mechanical Strength, No. S 1, 2004. 160.L. J. Chen, T. Q. Jiang, and L. Y. Xie, “Overview on Life Prediction Methods of Turbine Blade for Creep/Fatigue Interaction’ ’, Aeronautical Manufacturing Technologv, December 2004. 161.2. C. Hu, “Probability Transfer of Rule Reasoning in Expert System,” Computer Engineering and Design, No. 1, 1984. 162.G. X. Zhou, “Pattern Recognition Expert System Based on Feature and Knowledge,” Robot, No. 2, 1988. 163.W. C. Wang, “A Plan for Artificial Intelligent war Indicating System in Ship,” Journal of Naval Universitv of Engineering, No. 4, 1985. 164.P. G. Wang, C. F. Wei, Y. Z. Li, and J. Wang, “Design of Expert System for Diagnosing Trouble of Compressor,” Compressor, Blower & Fan Technolom, No. 5, 2007. 165.G. Chen, L. Q. Song, L. B. Chen, and Z. G. Zhang, “Knowledge Acquisition of Aeroengine Spectrometric Oil Diagnosis Expert System Using Rough Set Theory,” Mechanical Science and Technology for Aerospace Engineering, No. 7,2007. 166.P. Song, W. H. Su, Y. Q. Pei, and J. T. Tao, “Study on Fuzzy Diagnosis Expert System for Diesel Engine,” Chinese Internal Combustion Engine Engineering, No. 4,2007. 167.Q. W. Ye, L. B. Cao, T. J. Zhou, “The Diagnostics Expert System of Gun Based on Fuzzy Logic,” Journal of Ordnance Engineering College, No. 2, 1990. 168.Y. J. Chen, “The Application of Fuzzy Logic to the Computer-Based Medical Diagnosis System,” Journal of National University of Defense Technology, No. 2, 1987. 169.F. Q. Gao and Z. Z. Tan, “The Passive BDiINS Integrated Navigation Fuzzy Adaptive Algorithm,” Journal of Astronautics, No. 5, 2007. 170.C. G. Li, “The Realization Process Study of Temperature Compensation in Pressure Sensors Based on BP NN,” Journal of Liaonina Provincial College of Communications, No. 3,2007. 171.4. M. Guo, Y. F. Jia, and J. Wang, “Load Moment Monitor System of Tower Crane Based on Neural Network,” Instrument Technique and Sensor, No. 10,2007. 172.5. C. Fu, H. Li, and Y . Guo, “Application of Neural Network in Coal-Mining Machine Fault Diagnosis Expert System,” Journal of Heilongiiang Institute of Science and Technologv, No. 5,2007. 173.C. Wang, X. B. Sun, G. J. Chen, and Y. L. Xie, “Fault Diagnosis in Analog Circuits Based on Neural Network Information Fusion Technique,” Journal of Circuits and Systems, No. 5,2007.

Appendix B

215

174.F. W. Liu, Y. Li, and P. Bai, “Intelligent Diagnosis to Engine Based on Neural Networks,” Control Engineering of China, No. S2,2006. 175.M. Y. Liao, “Drilling State Monitoring and Fault Diagnosis Based on Integrating Neural Network and Evidence Theory,” Journal of China University of Petroleum (Edition of Natural Science), No. 5,2007. 176.G. L. Hu, “Fault Diagnosis Based on the D-S Evidence Theory of Information Fusion,” Computer & Digital Engineering, No. 8, 2007. 177.Y. G. Liu, W. B. Bai, and B. G. Xu, “Application of Evidential Theory for Rotation Machine Fault Diagnosis,” Computer Measurement & Control, No. 2,2007. 178.X. D. Ma and W. Cao, “Application on Sensors Priority Distribution Using A Genetic Algorithm,” Sichuan Architecture, No. 5, 2007. 179.F. S. Wee and Z. X. Han, “Fault Section Estimation in Power Systems Using Genetic Algorithm and Simulated Annealing,” Chinese Society for Electrical Engineering, No. 3, 1994. 180.T. Xie and Y. L. Zhang, “Max Min Principle Based Selection for the Optimal Feature Parameters in Fault Diagnosis Using Genetic Algorithms,” Journal of National University of Defense Technology, No. 2, 1998. 181.H. L. Li, R. H. Huang, G. M. Han, R. J. Fu, “Wavelet Analysis of Matching Pursuit Based on Genetic Algorithm and Application in Non-Stationary Fault Signals,” Journal of Mechanical Strength, No. 3, 2001. 182.J. S. Dai, S. F. Song, and D. C. Dong, “The Use of Grey Theory in the Condition Monitoring and Fault Diagnosis of Large Rotating Machinery,” Journal of Vibration Enpineering, No. 4, 1993. 183.C. M. Jin, Y. Qiu, and Z. S. Duan, “Grey System Applications in Fault Diagnosis and Forecasting for Rotating Machinery,” Chinese Journal of Applied Mechanics, No. 3, 2000. 184.5. F. Liu, “The Application of Support Vector Machines in Pattern Recognition and Regression Models,” Journal of Henan Institute of Science and Technology (Natural Sciences Edition), No. 4,2007. 185.J. H. Xiao, K. Q. Fan, J. P. Wu, and S. Z. Yang, “A Study on SVM for Fault Diagnosis,” Journal of Vibration, Measurement & Diagnosis, No. 4,2001. 186.Y. J. Zhai, D. F. Wang, and P. Han, “Multi-Class Support Vector Machines for Fault Diagnosis of Turbogenerator Unit,” PowerEngineerinn, No. 5, 2003. 187.X. K. Wei, C. Lu, C. Wang, J. M. Lu, and Y. H. Li, “Applications of Support Vector Machines to Aeroengine Fault Diagnosis,” Journal of Aerospace Power, No. 6,2004. 188.5. S. Cheng, D. J. Yu, and Y. Yang, “Fault Diagnosis of Roller Bearings Based on EMD and SVM,” Journal of Aerospace Power, No. 3,2006. 189.W. S. Dai, “Using Time Series Analysis Method to Diagnose the Malfunction of Gear Box,” Journal of Shanghai Second Polvtechnic University, No. 1, 1988. 190.Z. X. Shen, Y. C. Su, and Z. L. Sun, “The Application of Time-Series Modeling Methods in Diesel Engine Fault Diagnosis,” Transactions of Csice, No. 4, 1989.

216

Prognostics and Health Management of Electronics

191.Y. H. Wei and G. K. He, “Application on Bearing Life Prediction Based on Time Series Analysis,” Bearing, No. 5, 1994. 192.F. Cao, D. J. Cui, and Z. P. Zhang, “A Real-Time Fault Detection Algorithm Based on Time Series Analysis for LRE,” Journal of Propulsion Technology, No. 1, 1996. 193.Y. T. Zhang, G. Z. Li, and G. Q. Ren, “Study on the Fault Diagnosis Methods of Diesel Engine Based on Time Series Model,” Lubrication Engineering, No. 12,2006. 194.L. Jiang, S. Y. Jiang, and J. D. Li, “Application on Deform Prognostics Based on Multiresolution Orthogonal Multi -wavelet NN,” Popular Science & Technology, No. 4, 2007. 195.W. H. Geng, Q. Sun, C. X. Zhang, and X. Y. Chen, “Short-Term Load Forecasting Based on Improved Fuzzy Neural Network,” Proceedings of the Chinese Society of Universities for Electric Power System and Automation, No. 5, 2007. 196.5. H. Xi and M. Han, “Prediction of Multivariate Time Series Based on Principal Component Analysis and Neural Networks,” Control Theory & Applications, No. 5, 2007. 197.M. C. Ren, “Speed Sensor-less DTC System of Neural Network Based on Genetic Algorithm,” Automation & Instrumentation, No. 5, 2007. 198.Y. S. Zhao and B. Y. Li, “Design on Bearing Fault Diagnostic System Based on Hybrid Intelligent Algorithm,” Journal of Anshan University of Science and Technology, No. 4, 2007. 199.C. C. Guo and J. Yu, “Fusion Research on NN and Many Intelligent Methods,” Software Guide, No. 9,2007. 200.5. Z. Zhang and G. L. Shan, “Information Fusion Fault Diagnosis Technology Based on Combination of SVM and Evidential Theory,” Electronics Optics & Control, No. 4, 2007. 201.5. D. C. Little and M. Y. Zhao, “Operational Research and Management Science,” Nature Magazine, No. 1, 1980. 202,s. Q. Su, “Reporting on the Second Academic Conference of the International federation of Automatic Control (IFAC) for the Application of Artificial Intelligence in Real Time Control,” Control and Decision, Shen Yang, No. 6, 1989. 203.X. L. Li, H. Q. Jiang, X. J. Lv, and B. Yu, “Application and Research of Multi-sensor Information Integration on Friction Welder,” Electric Welding Machine, No. 10, 2007. 204.5. F. Li, K. Yuan, Y. L. Wang, and W. Zou, “Design of a Fire Detection System Based on Multi-Sensor Data Fusion,” Chinese High Technolow Letters, No. 10, 2007. 205.F. W. Gao, G. X. Liu, L. Wang, and J. Zhang, “Multi-sensor Fusion Based on the Support Matrix and Optimal Weight,” Journal of Proiectiles, Rockets, Missiles and Guidance, No. 4,2007. 206.T. J. Wang, Z. Yang, and H. F. Hu, “An Adaptive Data Fusion Routing Algorithm Based on Genetic Algorithm for Wireless Sensor Networks,” Journal of Electronics & Information Technology, No. 9,2007. 207.S. Y. Liu, H. X. Zhang, and W. P. Luo, “Data Fusion Algorithm Based on Modified Wavelet multi-resolution Analysis,” Transducer and Microsystem Technologies, No. 8, 2007.

Appendix B

277

208.5. Zhao, J. Jiang, Y. J. Zhou, and Z. G. Han, “RLS-based Registration Algorithm for Multi-sensors,’’ Transducer and Microsystem Technologies, No. 8, 2007. 209. J. H. Jin, “Diagnosis for Original Failure by Gas Analysis,” East China Electric Power, No. 7, 1974. 210.F. R. Chen. “Gas Leak Diagnosis on Engine,” Automobile Technology, No. 1, 1978. 211.C. L. Sun and Z. F. Lin. “Aero-engine Condition Monitoring and Failure Diagnosis System,” No. 3, 1988. 212.5. M. Zhu, “Study of Non-Contact Automatic Diagnostic System of Dynamic Failure for Locomotive Running Gear,” Electric Drive for Locomotive, No. 5 , 1999. 2 13.X. Li, “The Principle and Application of CC2500 Caterpillar Crane’s Malfunction Diagnosis,” Mechanical Management and Development, No. 3,2007. 214.5. L. Li, X. S. Lu, X. D. Xu, and Z. S. Lian, “Research on Remote Monitoring and Fault Diagnosis System for Coal Shearer,” Coal Mine Machinery, No. 5, 2007. 215.Y. H. Yang and G. Y. Ying, “Automatic Fault Diagnosis System in Centrifugal Pump,” Oil and Gas Carrying, No. 5 , 1999. 216.4. Y. Fu, L. Feng, S. B. Xia, and Y. C. Peng, “Application of Short Fourier-Transform in Detection of Early Fault Rolling-Element Bearing,” Journal of Harbin Institute of Technology, No. 3, 1998. 217.2. T. Zhou and X. M. Xu, “Rough Set Theory and Its Application Potential Analysis in Hydropower Unit Mechanical Fault Diagnosis,” Journal of Northwest Hydroelectric Power, No. 2, 2007. 218.F. Wang, Z. Z. Ji, J. Z. Zhang, W. H. Huang, S. B. Xia, and S. C. Xu, “Study and Fabrication of Turbine Generator Set Condition Monitoring and Diagnosis System and its Application,” No. 4, 1994. 219.X. Fang, “Expert System of Nuclear Power Plant Failure Diagnosis,” Nuclear Power Engineering, No. 05, 199 1. 220.J. D. Lu, Y. H. Huang, K. Shen, and J. S. Chen, “An Expert System for the On-line Monitoring and Condition Diagnosis of Circulating Fluidized Bed Boilers,” Journal of Engineering for Thermal Energy and Power, No. 6, 200 1. 221 .C. G. Li, J. Yu, and J. P. Chen, “A New Fatigue Life Model for Roller Bearings Based on Contact Fatigue Strength,” Journal of Huazhong University of Science and Technology, No. 2, 1991. 222.Q. H. Zhu, “A New Idea for the Filter and Predication in Ship-borne Gun Fire Control,” Fire Control & Command Control, No. 2, 2002. 223.2. Y. Xia, J. H. Li, and Y. S. Chu, “Imitation and Prediction of the Gas Gun Interior Ballistics Based on Neural Networks,” Journal of Detection & Control, No. 4, 1998. 224.D. Pen, “A Self-Test Method,” Modern Radar, No. 6, 1980. 225. J. S. Liu, “Self-Test and Fault Diagnostics for Microprocessor-based Instruments and Equipments,” Microelectronics & Computer, No. 2, 1986. 226.W. Q. Shi, “Diagnosis for Computer,” Computing Technology and Automation, No. 3, 1986.

218

Prognostics and Health Management of Electronics

227.2. P. Fan, “Diagnosis for Digital Circuit,” Auulication Research of Computers, No. 2, 1987. 228.Y. R. Zhong and H. B. Liang, “Thyristor Trouble Detection in Power Electronic Equipment by Waveform Analysis,” Journal of Xi‘an University of Technology, No. 4, 1987. 229.G. X. Zhang, “Diagnosis for PWB by Infrared Technology,” Laser & Infrared, No. 1, 1992. 230.P. Xie, “Project Explanation on China’s Large Plane”, Economic Research of the Aviation Industry, No. 4,2007. 23 1.S. N. Zhang, J. S. Xie, and R. Kang, “A Review of Aero-Engine Health Monitoring and Management Technologies”, presented at the 7th Reliability, Maintainability and Supportability Conference, August 2007. 232.H. D. Beng, “Battery Management of Emerson UL33 UPS,” The World of Power w y , No. 1, 2006. 233 .H. M. Zhu, “Prognosis of Electric Equipment Overhot,” Technology of Equipment Maintenance, No. 2, 2007. 234.Y. H. Peng, “Design and Realization of DMS System in Urban Natural Gas Pipeline Network,” Journal of Wuhan University of Technology (Information & Management Engineering), No. 6,2007. 235.Y. S. He and G. Gan, “Fault Forecast Analysis based on BP Neural Network Model,” Information of Microcomputer, No. 16, 2006. 236.United Office of Hi-Tech Program, Ministry of Science and Technology, available: http:/lwww ,863.org.cn/english/annual~report/annual~repor~200 11200210090007.htm1, last accessed on 12/12/07.

Appendix B

B.32

219

Auburn University

There is no center of excellence in prognostics at Auburn, but Pradeep Lall, an associate professor in the Department of Mechanical Engineering at Auburn University, has been conducting PHM research.

B.32.1 Research in PHM Dr. Lall has developed a damage precursor-based residual life computation approach for various electronic package elements to implement prognostics to electronic systems prior to appearance of any macroindicators of damage. The focus is on determination of residual life of electronic systems via on-board sensing, damage detection algorithms, and data processing. In order to implement the system health monitoring system, precursor variables or leading indicators of failure have been identified for various package elements and failure mechanisms. Model algorithms have been developed to correlate precursors with impending failure for computation of residual life. Examples of damage proxies include phase growth rate of solder interconnects, intermetallics, normal stress at chip interface, and interfacial shear stress. The precursor-based damage computation approach eliminates the need for knowledge of prior or posterior operational stresses and enables the management of system reliability of deployed nonpristine materials under unknown loading conditions. The approach is powerful, since redeployed parts, subsystems, and systems seldom have readily available prior stress histories. Acquisition of prior histories, although possible, is often resource intensive. Use of precursors for damage computation addresses the limitation of life prediction model prognostication techniques, which target damage estimation for known stress histories imposed on pristine materials. Operational profiles are often unpredictable. In addition, it may not always be possible to characterize the operational loads under all possible scenarios (assuming they are known and can be simulated). Damage precursors target fundamental understanding of underlying degradations in electronic systems, such as thermomechanical interconnect fatigue and interfacial delamination of underfills. Once identified for specific package elements and failure mechanisms, the precursors are scalable for future package architectures and for other applications.

B.32.2 Related Publications P. Lall, M. Hande, N. Singh, J. Suhling, and J. Lee, “Feature Extraction and Damage Data for Prognostication of Leaded and Leadfree Electronics,” Proceedings of the 56th IEEE Electronic Comuonents and Technology Conference, San Diego, CA, 2006, pp. 718-727.

P. Lall, P. Choudhaty, and S. Gupte, “Health Monitoring for Damage Initiation & Progression during Mechanical Shock in Electronic Assemblies,” Proceedings of the 56th IEEE Electronic Components and Technology Conference, San Diego, CA, 2006, pp. 85-94. P. Lall, D. Panchagade, D. Iyengar, and J. Suhling, “Life Prediction and Damage Equivalency for Shock Survivability of Electronic Components,’’ Proceedings of the ITherm 2006, 10th Intersociety Conference on Thermal and Thermo-mechanical Phenomena, San Diego, CA, 2006, pp. 804-816.

280

4.

Prognostics and Health Management of Electronics

P. Lall, N. Islam, K. Rahim, and J. Suhling, “Prognostication and Health Monitoring of Leaded and Lead Free Electronic and MEMS Packages in Harsh Environments,” paper presented at the 55th Electronic Components and Technology Conference, ECTC, Lake Buena Vista, FL, 2005, pp. 1305-1313.

5. P. Lall, N. Islam, K. Rahim, and J. Suhling, “Prognosis Methodologies for Health Management of Electronics and MEMS Packaging,” American Society of Mechanical EnPineers, Electronic and Photonic Packaging, EPP, Vol. 4, pp. 227-236, 2004 6.

P. Lall, N. Islam, K. Rahim, J. Suhling, and S. Gale, “Leading Indicators-of-Failure for Prognosis of Electronic and MEMS Packaging,” paper presented at the 54th Electronic Components and Technology Conference, 2004, pp. 1570-1 578.

Appendix B

B.33

28 1

Georgia Institute of Technology

Researchers in the College of Engineering and the Georgia Tech Research Institute have been conducting research in integrated diagnostics, prognostics, and logistics support under sponsorship from government agencies and private industry.

B.33.1 Research in PHM The Intelligent Control Systems (ICS) Laboratory at Georgia Tech began research in diagnostics in 1985 with a series of projects funded by NASA. The algorithmic developments were implemented and demonstrated by Boeing on scaled models of the space station. Under sponsorship by Office of Naval Research (ONR), the same research team developed fault-tolerant control systems for a turbojet engine; this effort included diagnostic software that was tested on a laboratory engine. Since 1994, Dr. Vachtsevanos (now at Impact Technology), the director of ICS, has been developing vision-based defect detection algorithms for the textile industry under a series of grants from the National Textile Center. The enabling technologies include wavelet neural networks and fuzzy logic. The patented technology has been licensed by a commercial firm and is currently being marketed to the textile industry. The Manufacturing Research Center at Georgia Tech supported his work in failure detection of surface-mount components in the electronics manufacturing arena. Generic aspects of the diagnostidprognostic technologies have been under development with a series of funded projects in EEG classification and epileptic seizure prediction conducted in collaboration with the Medical College of Georgia and Emorys School of Medicine. They also assisted General Electric toward the prediction of gas turbine NO, emissions using polynomial neural networks under a contract with GE in 1994-1995. The ICS research team at Georgia Tech has been involved in the ONR CBM program with Honeywell Technology Center the prime contractor. Georgia Tech developed and implemented the fuzzy logic and wavelet neural network diagnostic and prognostic algorithms. The system was installed on a Navy ship and underwent testing for several years. The research team has also participated in the MURI Integrated Diagnostics program at Georgia Tech where they contributed in the detection of microcracks using a magnetic-electric impedance tomography technique. Dr. Vachtsevanos directed a research team engaged in the development, testing, and implementation of software-enabled controls for autonomous unmanned vehicles. This effort is sponsored by DARPA and involves diagnostic/prognostic technologies as part of the fault-tolerant control routines under development. The Army Research Office has also supported an effort toward the development of an autonomous scout rotorcraft testbed that incorporated a health monitoring module. This task was successfully completed and demonstrated to the sponsor. Under the Naval Surface Warfare Center (NSWC) funded SBIR Phase I and I1 program Prognostic Enhancements to Diagnostic Systems (PEDS), the Georgia Tech team helped develop prognostic algorithms for HVAC chiller systems that interface with the U.S. Navy diagnostic Integrated Condition Assessment System (ICAS) platform. The U.S. Army’s ADIP program sponsored an R&D activity to explore embedded diagnostic platforms for Army vehicles such as the Palletized Load System (PLS). A diagnostic architecture has been defined, fault data assembled at the Yuma Proving Ground were analyzed, and a testing program has been established to assist in the development of the at-platform embedded diagnostic capability.

Prognostics and Health Management of Electronics

282

The Georgia Tech research team is participating with industry partners in two DARPA-funded programs: the first one involves the development of a software simulation test bench focused on the design, development, and implementation of automated contingency management technologies for autonomous vehicleimission command and control. The second relates to the development of an autonomous health management supervisor architecture with a modeling paradigm that can be applied to various airiground, mannediunmanned vehicle platforms. In collaboration with the Georgia Tech Research Institute, the team is exploring the development of prognostic models for advanced power generation systems under ONR sponsorship. Georgia Tech is also addressing the development of prognostic technologies for critical gearbox components and, with the Northrop Grumman Corporation, Sikorsky, and NAVAIR, seeded fault testing and analysisifault prognosis for a planetary gear plate. They are also working with Pratt and Whitney on the application of prognostic algorithms to engine disk and blade fault modes. For both programs, novel model-based and adaptatiodlearning algorithms are used to determine fault detection thresholds and predict the remaining useful life. The Georgia Tech research team has also been awarded a contract by General Motors Corporation to investigate and develop fault diagnosis and prognosis technologies for automotive electrical systems (battery, alternator, load, and interfacing apparatus). Novel modeling and data processing fault diagnosis and prognosis algorithms are under development for electrical systems. Georgia Tech faculty have developed model-based prognostic technologies for helicopter intermediate gearbox and oil cooler failure modes under sponsorship by the Academic Consortium for Aging Aircraft (ACAA) program. Innovative technologies include a model-based reasoning (MBR) approach to fault diagnosis and fault propagation of individual aircraft components and Bayesian estimation methodology particle filtering to prognosis of the remaining useful life of a failing component. Georgia Tech has been actively collaborating with a number of small firms in the development and application of CBM technologies with support from SBIR programs. Specifically, automated contingency management techniques, diagnosticsiprognostics for naval machinery spaces, and sensor fusion and feature extraction methods have been under development to assist small-business partners.

B.33.2 Related Publications 1.

M. Orchard and G. Vachtsevanos, “A Particle Filtering-Based Framework for Real-Time Fault Diagnosis and Failure Prognosis in a Turbine Engine,” paper presented ath the 26th American Control Conference, New York, 2007.

2.

G. Vachtsevanos, F. Lewis, M. Roemer, A. Hess, and B. Wu, Intelligent Fault Diagnosis and Prognosis for Engineering Svstems, John Wiley & Sons, Hoboken, NJ, 2006.

3.

B. Saha and G. Vachtsevanos, “A Model-Based Reasoning Approach to System Fault Diagnosis,” WSEAS Transactions on Svstems, Vol. 5, No. 8, August 2006, pp. 19972004.

4.

M. Orchard, B. Wu, and G. Vachtsevanos, “A Particle Filter Framework for Failure Prognosis,” Proceedings of WTC2005, World Tribology Congress 111, Washington, DC, 2005.

5.

T. Khawaja, G. Vachtsevanos, and B. Wu, “Reasoning about Uncertainty in Prognosis: A Confidence Prediction Neural Network Approach,” Proceedings of NAFIPS ’05

Appendix B

283

Conference, North American Fuzzy Information Processing Society, Ann Arbor, MI, June 22-25,2005. 6.

B. Wu, A. Saxena, R. Patrick-Aldaco, and G. Vachtsevanos, “Vibration Monitoring for Fault Diagnosis of Helicopter Planetary Gears,” Proceedings of 16th IFAC World Congress, July 2005.

7.

A. Saxena, B. Wu, and G. Vachtsevanos, “Integrated Diagnosis and Prognosis Architecture for Fleet Vehicles Using Dynamic Case-Based Reasoning,” Proceedings of AUTOTESTCON 2005 Conference, Orlando, FL, September 25,2005.

8. M. Roemer, C. Byington, G. Kacprzyuski, and G. Vachtsevanos, “An Overview of Selected Prognostic Technologies with Reference to an Integrated PHM Architecture,” Proceedings of NASA Integrated Vehicle Health Management Workshop, Nappa, CA, November 7-1 0,2005. 9.

G. Drozeski, B. Saha, and G. Vachtsevanos, “A Fault Detection and Reconfigurable Control Architecture for Unmanned Aerial Vehicles,” Proceedings of the IEEE Aerospace Conference, Big Sky, MT, March 5-12,2005.

10 B. Wu, A. Saxena, R. Patrick-Aldaco, and G. Vachtsevanos, “Vibration Monitoring for Fault Diagnosis of Helicopter Planetary Gears,” Proceedings of the 16th IFAC World Congress, July 2005. 1 1 . N. Propes and G. Vachtsevanos, “Fuzzy Petri Net Based Mode Identification Algorithm for Fault Diagnosis of Complex Systems,” System Diagnosis and Prognosis: Security and Condition Monitoring Issues 111, SPIE proceedings series, 2003, Vol. 5107, pp. 4453, Bellingham, WA.

12. N. Propes and G. Vachtsevanos, “Fuzzy Petri Net Based Mode Identification Algorithm for Fault Diagnosis of Complex Systems, System Diagnosis and Prognosis: Security and Condition Monitoring Issues 111,” paper presented at the SPIES 17th Annual AeroSense Symposium, Orlando, FL, April 21,2003. 13. T. Hegazy and G. Vachtsevanos, “Sensor Placement for Isotropic Source Localization,” paper presented at the Second International Workshop on Information Processing in Sensor Networks (ISPN 03), Palto Alto, CA, April 22-23,2003.

14. Y. Zhang, D. Britton, Y. Zhang, B. Heck, and G. Vachtsevanos, An Integrated Systems Approach to Monitoring and Control of Complex Industrial Processes in Recent 1 Electrical and , Computer Engineering Series, WSEAS Press, 2003, pp. 54-59. 15. G. Zhang, S. Lee, N. Propes, Y. Zhao, G. Vachtsevanos, A. Thakker, and T. Galie, “A Novel Architecture for an Integrated Fault Diagnostic/Prognostic System,” Proceedinas of AAAI SprinP Symposium, Palo Alto, CA, March 25-27,2002. 16. G. Vachtsevanos and P. Wang, “A Wavelet Neural Network Framework for Diagnostics of Complex Engineered Systems” (invited), Proceedings of 200 1 Joint IEEE Conference on Control Applications and the International Symposium on Intelligent Control, Mexico City, Mexico, September 5-7, 2001. 17. N. Khiripet, G. Vachtsevanos, D. DeLaurentis, D. Mavris, and C. Patel, “A Forecasting Methodology with Uncertainty Representation and Causal Adjustment,” paper presented at the Second International Conference on Intelligent Technologies (InTech, 2001), Bangkok, Thailand, November 27-29,2001.

Prognostics and Health Management of Electronics

284

18. G. Hadden, P. Bergstrom, T. Samad, B. H. Bennett, G. Vachtsevanos, and J. Van Dyke, “System Health Management for Complex Systems,” in Automation, Control and Complexity, An Integrated Approach, T. Samad and J. Weyrauch, Eds., John Wiley & Sons, New York, 2000, pp. 191-214. 19. P. Wang and G. Vachtsevanos, “Fault Prognosis Using Wavelet Neural Networks,” Proceedings of 1999 AAAI Symposium on A1 in Equipment Maintenance, Service and Support, Palo Alto, CA, March 1999, pp. 132-139. 20. J. L. Dorrity and G.J. Vachtsevanos, “In-Process Fabric Defect Detection and Identification,” Proceedings of Mechatronics 98, Skovde, Sweden, September 9-1 1, 1998. 21. I. Dar and G. Vachtsevanos, “Feature Level Sensor Fusion for Pattern Recognition Using an Active Perception Approach,” Proceedings of IS&T/SPIEs Electronic Imaging 97: Science and Technology, San Jose, CA, February 8-14, 1997. 22. G. Vachtsevanos, P. Wang, N. Khiripet, A. Thakker, and T. Galie, “An Intelligent Approach to Prognostic Enhancements of Diagnostic Systems,” Proceedings of SPIE 15th Annual International Symposium. 23. G. Vachtsevanos, H. Kang, I. Kim, and J. Cheng, “Managing Ignorance and Uncertainty in System Fault Detection and Identification,” Proceedings of the 5th IEEE International Symposium on Intelligent Control, Philadelphia, PA, 1990, pp. 558-563. 24. G. Vachtsevanos and E. Verriest, “An Analytical/Intelligent Approach to Sensor Fusion for Manufacturing Systems,” Proceedings of the Capteurs 89, Paris, France, 1989, pp. 4 7 6 4 8 1. 25. H. Biglari, C. Cheng, and G. Vachtsevanos, “Fault Tolerant Intelligent Controller for Space Station Subsystems,” Proceedings of the 23rd IECEC, Denver, CO, August 1988.

B.33.3 Related Patents U.S. Patents: 1. 6,269,179Bl: Inspection system and method for bond detection and validation of surface mount devices using sensor fusion and active perception.

2.

5,963,662: Inspection system and method for bond detection and validation of surface mount devices.

3.

5,241,163: Article identification apparatus and method using a ferromagnetic tag.

Appendix B

B.34

285

Pennsylvania State University

The Applied Research Lab (ARL) at Penn State University is involved in PHM research. The Systems and Operations Automation (SOA) division of ARL was formed to address the emerging fields of CBM and advanced sensing and control. The division provides research and development efforts in these areas to support the U.S. Department of Defense and U.S. industry. The main areas of focus include sensors, data fusion, signal processing, approximate reasoning, and distributed architectures.

B.34.1 Research in PHM The SOA department within ARL implements advanced diagnostic technologies for machinery health monitoring and supports basic and applied research in complex systems monitoring and automation (CSM) technologies related to diagnostics and prognostics for electromechanical systems. These systems include rotating components, weapons system platforms, and machinery networks. The CSM department uses proven analysis techniques, such as artificial intelligence, neural networks, and signal processing. The department performs Navy-sponsored research pertaining to broad application of CSM technologies. Prior applications have included ship service gas turbine engines, a helicopter rotor diagnostic system, and a waterjet paint removal system. ARL diagnostic and prognostic systems research has focused on three primary areas: sensing, modeling, and reasoning, as characterized by the Multidisciplinary University Research Initiative for Integrated Predictive Diagnostics (MURI IPD). The objective of complex systems monitoring is to accurately detect the current state of mechanical systems and accurately predict systems’ remaining useful lives. This enables organizations to perform maintenance only when needed to prevent operational deficiencies or failures, essentially eliminating costly periodic maintenance and greatly reducing the likelihood of machinery failures. CSM uses integrated, multisensor systems to detect and diagnose emerging equipment problems and to predict how long the equipment can effectively serve its operational purpose. The systems collect, fuse, and evaluate real-time data using algorithms that link the unique signals to their causes (e.g., vibrations created by a developing fault). The system alerts maintenance personnel to the problem, enabling maintenance activities to be scheduled and performed, as needed, before operational effectiveness is compromised. ARL’s experimental capabilities for PHM include mechanical diagnostics, system integration and testing, battery diagnostics, lubrication systems, gear single-tooth bending fatigue, data fusion, torsional vibration, and power circulating gear fatigue. The PHM research in ARL expands the framework of prognostics, or predictive diagnostic capability, to micromechanical and dynamic models. Sensors, data fusion, signal processing, approximate reasoning, and distributed architecture are among the topics that have been studied. This includes both autonomous and man-in-the-loop decision making about maintenance actions and local and geographically distributed monitoring and data analysis architectures.

B.34.2 Related Publications 1. S.E. George, M. Bocko, and G.W. Kickerson, “Evaluation of a Vibration-Powered, Wireless Temperature Sensor for Health Monitoring,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 5-12,2005, pp. 1-7.

286

Prognostics and Health Management of Electronics

2 . K. C. Becker, C. S. Byington, N. A. Forbes, and G. W. Nickerson, “Predicting and Preventing Machine Failures,” Industrial Physicist, Vol. 4, No. 4, pp. 20-23, December 1998. 3. G. W. Nickerson and R. W. Lally, “Intelligent Component Health Monitoring System: A Building Block for a Distributed Approach to Machinery Health Assessment,” American Society of Mechanical Engineers, Tribology Division, TRIB, Vol. 7, Emerging Technologies for Machinery Health Monitoring and Prognosis, 1997, pp. 6973. 4.

G. W. Nickerson, “Prognostics: What Does it Mean in Condition-Based Maintenance?,” Proceedings-National Conference on Noise Control Engineering, Vol. 1, pp. 181-1 84, 1997

5. J. E. Deaton, A. Glenn, P. J. Federman, G. W. Nickerson, C. S. Byington, R. Malone, R. Stout, R. Oser, and R. R. Tyler, “Mechanical Fault Management in Navy Helicopters,” Proceedings of the Human Factors and Ergonomics Society, Vol. 1, pp. 66-69,1997. 6. R. J. Hansen, D. L. Hall, G. W. Nickerson, and S. Phoha, “Integrated Predictive Diagnostics: An Expanded View,” Proceedings of the ASME International Gas Turbine and Aeroengine Congress and Exhibition, Birmingham, UK, June 10-13, 1996 7. G. W. Nickerson, and C. P. Nemarich, “A Mechanical System Condition-Based Maintenance Demonstration Model,” AUTOTESTCON (Proceedings), 1990, pp. 529533.

Appendix B

B.35

287

University of California at Los Angeles

At UCLA the research in PHM is focused on an automated structural health monitoring system using acoustic emission and modal data. This is led by Ajit Mal, a professor in the Department of Mechanical and Aerospace Engineering and head of the Nondestructive Evaluation (NDE) Research Group.

B.35.1 Research in PHM The NDE Research Group has developed an automated structural health monitoring system using acoustic emission and modal data. It can automatically determine the presence, location, and severity of hidden damage in critical structural components in structural health monitoring. The structure is assumed to be instrumented with an array of actuators and sensors to excite and record its dynamic response, including vibration and wave propagation effects. The signals recorded by the sensors are processed in the frequency domain to take advantage of the response characteristics of the transducers and the associated electronics. In the vibration approach, the data consist of the modal response of the structure produced by the actuators while in the wave propagation approach they are the broadband signals due to ultrasonic waves propagating in the structure. Both types of signals are affected by the presence of defects. To determine the type and location of an unknown defect, a set of damage correlation indices is introduced from the frequency response function (FRF) of the data set. Using the initial measurements or calculations performed on an undamaged structure as baseline, the damage indices are evaluated from the comparison of the frequency response functions of the damaged structure. The methods are applied to simple structural models involving beams and plates with defects in the forms of cracks and holes. The same general concept is used to detect and characterize impact damage in composite structural components, including stiffened and woven materials used in newer military and civilian aircraft. The relative effectiveness of the methods is examined for a variety of boundary conditions and sensor locations relative to the defects.

B.35.2 Related Publications 1.

A. K. Mal, S. Banerjee, and F. Ricci, “An Automated Damage Identification Technique Based on Vibration and Wave Propagation Data,” Royal Society of London Transactions Series A, Vol. 365, No. 185 1, pp.47949 1, February 2007.

2. A. K. Mal, F. Ricci, and S. Banerjee, “A Conceptual Structural Health Monitoring System Based on Vibration and Wave Propagation,” Structural Health Monitoring: An International Journal, Vol. 4, pp. 283-293,2005. 3.

S. Banerjee, W. H. Prosser, and A. K Mal, “Calculation of the Response of a Composite Plate to Localized Dynamic Surface Loads Using a New Wavenumber Integral Method,” ASME Journal of Applied Mechanics, Vol. 72, pp. 18-24, 2005.

4.

S. Banerjee and A. K. Mal, “Acoustic Emission Waveform Simulation in Multilayered Composites,” Journal of Strain Analysis for Engineering Design, Vol. 40,pp. 25-32, 2005.

5.

P. Rizzo, F. Lanza Scalea, S. Banerjee, and A. K. Mal, “Ultrasonic Characterization and Inspection of Open Cell Foams,” ASCE Journal of Engineering Mechanics, Vol. 13 1 , pp. 1200-1208,2005.

288

Prognostics and Health Management of Electronics

6.

S. Banerjee, W. H. Prosser, and A. K. Mal, “Analysis of Transient Lamb Waves Generated by Dynamic Surface Sources in Thin Composite Plates,” Journal of the Acoustical Society of America, Vol. 115, pp. 1905-191 1, 2004.

7.

A. K. Mal, S. Banerjee, and F. Ricci, “An Autonomous Structural Health Monitoring System Based on Acoustic Emission and Modal Data,” Proceedings of SPIE, Health Monitoring and Smart NDE of Structural and Biological Systems 111, Vol. 5394, San Diego, CA, 2004, pp. 1-10,

8. A. K. Ma1 and S. Banerjee, “Guided Acoustic Emission Waves in a Thick Composite Plate,” Health Monitoring and Smart Nondestructive Evaluation of Structural and Biological Systems Conference 111, Proceedings of the SPIE, San Diego, CA, pp. 42-52, 2004. 9.

A. K. Mal, S . Banerjee, F. Ricci, F. Shih, and S. Gibson, “Damage Detection in Structural Components from Vibration and Wave Propagation Data,” Proceedings of 4th International Workshop on Structural Health Monitoring, 2003, pp. 675-685.

10. A. K. Mal, F. Shih, and S. Banerjee, “Acoustic Emission Waveforms in Composite Laminates under Low Velocity Impact,” Smart Nondestructive Evaluation and Health Monitoring of Structural and Biological Systems 11, Proceedings of the SPIE, Vol. 5047, pp. 1-12,2003. 11. K. Mal, F. Ricci, S. Gibson, and S. Banerjee, “Damage Detection in Structures from Vibration and Wave Propagation Data,” Smart Nondestructive Evaluation and Health Monitoring of Structural and Biological Systems 11, Proceedings of the SPIE, Vol. 5047, pp. 202-210,2003.

Appendix B

B.36

289

University of Maryland-CALCE

The Center for Advanced Life Cycle Engineering (CALCE) at the University of Maryland, College Park, is recognized as a key driving force behind the development and implementation of electronics prognostics as well as a world leader in PoF approaches to reliability, accelerated testing, electronic parts selection and management, and various supply chain management issues, including thermal uprating of parts, counterfeit parts, and part obsolescence. The CALCE Prognostics group performs research and development on the application of PHM to complex electronic products and systems as well as systems of systems. CALCE became the first academic research facility in the world to be I S 0 9001 certified in 1999. The interdisciplinary PHM team consists of faculty in electronics, reliability, materials, and mechanical engineering. This team of 22 research professors and scientists supported by approximately 60 Ph.D. and 30 M.S. candidates has authored over 40 internationally acclaimed textbooks and well over 850 research publications relevant to electronics reliability. Over the last 15 years, CALCE has invested over $50 million in developing methodologies, models, and tools that address the design, manufacture, analysis, and management of electronic products and systems.

B.36.1 Approach to PHM CALCE’s PHM efforts are geared towards achieving the following long term research objectives of the CALCE PHM Consortium: Provide proven prognostic sensors and in-situ monitoring strategies for cost effectively recording environmental, operational, and performance parameters of new and legacy systems. Develop models and algorithms for “health” assessment and prognostics to integrate cost effective prognostics with other technologies (WID, logistics, and Net-centric databases) for new and legacy systems. Build maintenance and logistical support methods that incorporate prognostic outputs. Provide solutions to the CND, NFF, NTF, intermittent failure problem. Provide techniques for self-healing and system reconfiguration based on prognostics outputs. Document best practices in the use of prognostic outputs for future designs and qualification planning Develop software to assess the return-on-investment (RoI) opportunities of prognostics. CALCE has a multifaceted approach to PHM focused on demonstrating that health monitoring can be implemented using a variety of methodologies, tools, and analyzing techniques for effective prognostics. CALCE approaches for PHM implementation are: (1) the use of expendable devices, such as canaries and fuses that fail earlier than the host product to provide advance warning of failure; ( 2 ) monitoring and reasoning of parameters that are precursors to impending failure, such as shifts in performance parameters; and (3) modeling of stress and damage in electronic parts and structures utilizing exposure conditions (e.g., usage, temperature, vibration, radiation) to compute accumulated damage. CALCE Prognostics group conducts research and development of prognostics and health management applications for electronic products and systems, as well as systems-ofsystems. The research focuses on computational algorithms, advanced sensors and data collection techniques, condition-based maintenance, prognostics and health management for

290

Prognostics and Health Management of Electronics

the application of in-situ diagnostics and prognostics. The group is using physics based models along with empirical models for prognostics. CALCE is researching the use of a hybrid approach, which combines physics of failure and data driven methods for accurate prognostics and diagnostics. The goal of the group is to develop novel ways to identify anomalies and patterns within very large data sets containing multiple parameters both qualitative and quantitative and has developed real-time reduced order modeling for failure prediction. Work in the areas of reliability modeling and prediction, pattern recognition, time series forecasting, machine learning, and fusion technologies is ongoing. The prognostics group is evaluating the use of intelligent reasoning technologies to model and manage the life cycle of electronic products. In addition optimal maintenance planning and business case development to assess the return on investment associated with the application of PHM to systems are being preformed at CALCE. The prognostic group collaborates with industry and research partners to develop advanced sensors for diagnostics and prognostics applications. Applications such as tamper proof low-cost autonomous sensors that incorporate wireless communication, high on-board memory capacity and can be attached to any product with minimal interference to the functioning of that product are being developed. CALCE Prognostics enables real time prognostics and health management of electronic products in their application environment.

B.36.2 PHM Activities CALCE’s projects on PHM have included (i) mapping of sensor technologies with stress and damage models for real-time life consumption monitoring (LCM) of electronic systems, (ii) demonstrating the LCM methodology on an electronic board operated in an automotive underhood environment, (iii) evaluating diagnostic built-in-test (BIT) softwarefirmware systems to identify and locate faults that incorporate error detection and correction circuits and self-checking and self-verification circuits, (iv) integrating in-situ semiconductor prognostic monitors consisting of precalibrated cells (circuits) to predict remaining life due to semiconductor defects and failure mechanisms, (v) developing software modules (data collection, simplification and damage accumulation, and remaining life estimation) for environment and usage data collection that enable PHM, (vi) assessing health using combinations of physical inspection, accelerated testing, and PoF analysis and (vii) developing of models and tools for optimizing maintenance planning and assessing RoI using PHM. In addition, CALCE has conducted several case studies to demonstrate the life consumption monitoring methodology. In one case study, the remaining life of electronics on-board a Space Shuttle remote manipulator system (SRMS) was predicted using health monitoring, inspection, accelerated testing and PoF analysis. The results of the case study showed that there was little degradation in the SRMS electronics and that they could be expected to last another 20 years. In another case study, CALCE conducted a remaining-life assessment of electronic hardware in Space Shuttle solid rocket boosters. The goals of this case study were to demonstrate simulation based remaining life assessment and provide inputs to assess the viability of a sustainment program. CALCE efforts for a Navy SBIR titled “Enhanced Prognostic Model for Digital Electronics” saw the development of prognostic based methodologies to predict failures in aircraft electronic boards, their digital component elements and devices that have the potential to reduce the risks of unanticipated failures, while significantly reducing support costs. This approach incorporated life consumption monitoring, including failure modes, mechanisms and effects analysis, sensor data pre-processingifeature selection, fault

Appendix B

29 1

detection/identification/isolation, virtual reality assessment, stress and damage accumulation analysis, and remaining life estimation. CALCE work on another Navy SBIR titled “IDDQ Trending as a Precursor to Semiconductor Failure” proposed a prognostic methodology based on simulation of times to failure for mechanisms that affect the direct drain quiescent current (IDDQ) of field effect transistors (FET) devices for airborne electronic systems. CALCE’s prognostics work with a major computer manufacturer is exploring prognostic methodologies for real-time online anomaly identification for their computers. The approach involved the characterization of the baseline performance of computers based on a multitude of test to simulate the operational conditions that a computer may experience. Based on the baseline characterization, prognostics algorithms to assess multivariate parameters are trained. These algorithms are then used to analyze data from fielded computers and thereby used to detect anomalies in the monitored parameters of the fielded computers. Using pattern recognition, time series analysis, and comparing the multivariate statistics faulty systems are being identified. CALCE worked with a major US defense contractor to develop an approach to implement failure precursor based prognostics for complex systems (system-of-systems). The project is focused on legacy system-of-systems, and uses a hybrid approach to prognostics in which statistical techniques are used in conjunction with failure modes, mechanisms, and effects analysis (FMMEA) to assess systems health. CALCE worked on an SBIR titled “Advanced Prognostic and Health Management (PHM) and Model Based Prognostic Useful Life Remaining Capabilities for Aircraft Tactical Information and Communication Systems.” The project objectives were to conduct a feasibility analysis focused on the developing real-time, sensory updated remaining life based models for estimating and predicting the remaining useful life of the F-35 JSF Tactical Information and Communications (CNI) systems and their components. CALCE is working on a STTR titled “Dynamic Data-Driven Prognostics and Condition Monitoring of On-board Electronics.” The project objectives are to implement a two-tiered failure prognostics approach; the first tier performs parameter forecasting and the second tier identifies the type of fault and its source. The project aims to utilize temporally adaptive decision fusion techniques for fusing evidence and probabilistic outcomes from parametric predictions. Phase I of the project was a success and CALCE has been invited for phase I1 of the project. CALCE Prognostics group is working with NASA on a project titled “Reliable Diagnostics and Prognostics for Critical Avionics Systems.” In this project, CALCE will develop and validate system and component (LRU) level diagnostic and prognostic methods for avionic systems. Research aims to improve the accuracy of avionics fault detection capability, boost in-flight performance, reduce maintenance costs and improve overall aircraft reliability. This is a multiyear project beginning in 2007. In 2006 and 2007 CALCE offered graduate level courses on Prognostics and Health Management at the University of Maryland. In 2006 CALCE co-hosted PHM conferences in Japan (in collaboration with Yokohoma National University and JEITA) and in the UK (in collaboration with MIRCE Academy).

B.36.3 Related Publications 1,

S. Kumar and M. Pecht, “Health Monitoring of Electronic Products using Symbolic Time Series Analysis,” Artificial Intelligence for Prognostics, paper presented at the AAAI Fall Symposium Series, Arlington, VA, November 9-1 1,2007.

292

2.

Prognostics and Health Management of Electronics

S. Cheng and M. Pecht, “Multivariate State Estimation Technique for Remaining Useful Life Prediction of Electronic Products,” Artificial Intelligence for Prognostics, paper presented at the AAAI Fall Symposium Series, Arlington, VA, November 9-1 1, 2007.

3. V. A. Sotiris and M. Pecht, “Support Vector Prognostics Analysis of Electronic Products and Systems,” Artificial Intelligence for Prognostics, paper presented at the AAAI Fall Symposium Series, Arlington, VA, November 9-1 1,2007. 4.

J. Gu, D. Barker and M. Pecht, “Uncertainty Assessment of Prognostics of Electronics Subject to Random Vibration,” Artificial Intelligence for Prognostics, paper presented at the AAAI Fall Symposium Series, Arlington, VA, November 9-1 1, 2007.

5. J. Gu and M. Pecht, “New Methods to Predict Reliability of Electronics,” paper presented at the International Conference on Reliability, Maintainability, and Safety, Beijing, China, pp. 440-45 1,2007. 6. P. A. Sandborn and C. Wilkinson, “A Maintenance Planning and Business Case Development Model for the Application of Prognostics and Health Management (PHM) to Electronic Systems,” Microelectronics Reliability, Vol. 47, No. 12, pp. 1889-1901, December 2007. 7.

E. Scanff, K.L. Feldman, S. Ghelam, P. Sandborn, M. Glade, and B. Foucher, “Life Cycle Cost Estimation of Using Prognostic Health Management (PHM) for Helicopter Avionics,” Microelectronics Reliability, Vol. 47, No. 12, pp. 1857-1 864, December 2007.

8. J. Gu, D. Barker, and M. Pecht, “Prognostics Implementation of Electronics under Vibration Loading,” Microelectronics Reliabilitv, Vol. 47, Issue 12, pp. 1849-1 856, December 2007. 9.

B. Tuchband and M. Pecht, “The Use of Prognostics in Military Electronic Systems,” Proceedings of the 32nd GOMACTech Conference, Lake Buena Vista, FL, March 1922,2007, pp. 157-160.

10. D. Han, M. Pecht, D. Anand, and R. Kavetsky, “Energetic Material/Systems Prognostics,” Proceedings of the 53rd Annual Reliability & Maintainability Symposium (RAMS), Orlando, FL, 2007, pp. 59-64. 11. G. Zhang, C. Kwan, R. Xu, N. Vichare, and M. Pecht, “An Enhanced Prognostic Model for Intermittent Failures in Digital Electronics,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 2007.

12. N. Vichare, P. Rodger, V. Eveloy, and M. Pecht, “Environment and Usage Monitoring of Electronic Products for Health Assessment and Product Design,” International Journal of Oualitv Technoloav and Quantitative Management, Vol. 4 No. 2, pp 235250,2007. 13. J. Gu, N. Vichare, T. Tracy, and M. Pecht, “Prognostics Implementation Methods for Electronics,” Proceedings of the 53rd Annual Reliabilitv & Maintainability SymDosium (RAMS), Orlando, FL, 2007, pp. 101-106. 14. S. Mathew, D. Das, M. Osterman, and M. Pecht, “Prognostic Assessment of Aluminum Support Structure on a Printed Circuit Board,” ASME Journal of Electronic Packaging, Vol. 128, No. 4, pp. 339-345, December 2006.

Appendix B

293

15. S. Mathew, P. Rodgers, V. Eveloy, N. Vichare, and M. Pecht, “A Methodology for Assessing the Remaining Life of Electronic Products,” International Journal of Performabilitv,Vol. 2, No. 4, pp. 383-395, October 2006. 16. N. Vichare and M. Pecht, “Enabling Electronic Prognostics Using Thermal Data,” Proceedings of the 12th International Workshop on Thermal Investigation of ICs and Systems, Nice, CBte d’Azur, France, September 2006. 17. N. Vichare, P. Rodgers, and M. Pecht, “Methods for Binning and Density Estimation of Load Parameters for Prognostics and Health Management,” International Journal of Performability Engineering, Vol. 2, No. 2, pp. 149-161, April 2006. 18. N . Vichare and M. Pecht, “Prognostics and Health Management of Electronics,” IEEE Transactions on Components and Packaging Technologies, Vol. 29, No. 1, pp. 222229, March 2006. 19. P. Sandborn, “A Decision Support Model for Determining the Applicability of Prognostic Health Management (PHM) Approaches to Electronic Systems,” Proceedings of Reliability and Maintainability Symposium, January 24-27, 2005, pp. 422427. 20. N. Vichare, P. Rodgers, V. Eveloy, and M. Pecht, “In Situ Temperature Measurement of a Notebook Computer-A Case Study in Health and Usage Monitoring of Electronics,” IEEE Transactions on Device and Materials Reliability, Vol. 4, No. 4, December 2004, pp. 658-663. 21. C. Wilkinson, “Prognostics and Health Management for Improved Dispatchability of Integrated Modular Avionics Equipped Aircraft,” paper presented at the 23rd Digital Avionics Systems Conference (DASC), Salt Lake City, UT, October 2004. 22. N. Vichare, P. Rodgers, M. Azarian, and M. Pecht, “Application of Health Monitoring to Product Take-back Decisions,” Proceedings of the Joint International Congress and Exhibition-Electronics Goes Green 2004, Berlin, Germany, September 2004, pp. 945951. 23. C. Wilkinson, D. Humphrey, B. Vermeire, and J. Houston, “Prognostics and Health Management for Avionics,” paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 2004. 24. S. Mishra, S. Ganesan, M. Pecht and J. Xie, “Life Consumption Monitoring for Electronics Prognostics,” Proceedings of the IEEE Aerospace Conference, Vol. 5, pp. 3455-3467, March 6-13,2004. 25. J. Xie and M. Pecht, “Application of In-Situ Health Monitoring and Prognostic Sensors,” paper presented at the 9th Pan Pacific Microelectronics Symposium Exhibits and Conference, Kahuku, Oahu, HI, February 2004. 26. A. Ramakrishnan and M. Pecht, “Load Characterization during Transportation,” Microelectronics Reliability, Vol. 44, No. 2, pp. 333-338, January 2004. 27. P. Casey, S. Ganesan, M. Pecht and D. Anand, “Methods for Predicting the Remaining Life of Electronic Assemblies with Carbon Nanotubes and an Optical Transduction Technique,” Proceedings of IMECE’ 03-2003 ASME International Mechanical Engineering Congress, Washington, D.C., November 15-21,2003.

294

Prognostics and Health Management of Electronics

28. A. Ramakrishnan and M. Pecht, “A Life Consumption Monitoring Methodology for Electronic Systems,” IEEE Transactions on Components and Packaging Technologies, Vol. 26, No. 3, pp. 625-634, September 2003. 29. R. Valentin, M. Osterman, and B. Newman, “Remaining Life Assessment of Aging Electronics in Avionic Applications,” The Annual Reliability and Maintainability Proceedings, Tampa, FL, January 27-30,2003, pp. 3 13-3 18. 30. R. Valentin, J. Cunningham, M. Osterman, A. Dasgupta, M. Pecht and D. Tsagos, “Virtual Life Assessment of Electronic Hardware Used in the Advanced Amphibious Assault Vehicle (AAAV),” Proceedings of the 2002 Winter Simulation Conference, Vol. 1, San Diego, CA, December 8-1 1,2002, pp. 948-953. 3 1. D. Humphrey, W. Shawlee, P. Sandborn, D. Lorenson, “Aging Aircraft Usable Life and Wear-out Issues,” Proceedings of World Aviation Congress, SAE Technical Paper: 2002-1-3013, Phoenix, AZ, November 2002. 32. V. Shetty, D. Das, M. Pecht, D. Hiemstra and S. Martin, “Remaining Life Assessment of Shuttle Remote Manipulator System End Effector Electronics Unit,” Proceedings of the 22nd Space Simulation Conference, Ellicott City, MD, October 21-23, 2002. 33. S. Mishra, M. Pecht, T. Smith, I. McNee, and R. Harris, “Remaining Life Prediction of Electronic Products Using Life Consumption Monitoring Approach,” Proceedings of the European Microelectronics Packaging and Interconnection Symposium, Cracow, June 16-18,2002, pp. 136-142. 34. S. Mishra and M. Pecht “In-situ Sensors for Product Reliability Monitoring,” Proceedings of SPIE, Vol. 4755, pp. 10-19,2002. 35. M. Pecht, M. Dube, M. Natishan, and I. Knowles, “Evaluation of Built-In Test,” IEEE Transactions on Aerospace and Electronic Systems, Vol. 37, No. 1, pp. 266-272, January 200 1. 36. N. Kelkar, A. Dasgupta, M. Pecht, I. Knowles, M. Hawley, and D. Jennings, “Smart Electronic Systems for Condition-Based Health Management,” Oualitv and Reliability Engineering International, Vol. 13, pp. 3-7, 1997. 37. K. Cluff, D. Barker, D. Robbins, and T. Edwards, “Characterizing the Commercial Avionics Thermal Environment for Field Reliability Assessment,” Proceedings of Institute of Environmental Sciences, 1996, pp. 50-57.

Appendix B

B.37

295

University of Tennessee (UT)

J. Wesley Hines and B.R. Upadhyaya work in the Nuclear Engineering Department at the University of Tennessee. Both have several research projects in the area of prognostics and health management.

B.37.1 Research in PHM Professor J. Wesley Hines has several active research projects in the area of prognostics and health management. His historical focus has been on the use of empirical models for online sensor, equipment, and process monitoring. The studies included the use of auto-associative neural networks, nonlinear partial least squares, auto-associative kernel regression, and the multivariate state estimation technique. Past research topics include data cleaning, automated variable grouping, analytic and Monte Carlo uncertainty estimation, performance and detection of ability measures, regularization, and other optimization techniques. He is currently funded by the Electric Power Research Institute (EPRI), Expert Microsystems (EM), Sun Microsystems, Idaho National Laboratories (INL), and the Nuclear Regulatory Agency (NRC). For EPRI, Dr. Hines is developing a method for prognostics using Bayesian methods to map residual signatures from the EPRI Fleet Wide Monitoring Program to failure distributions; for EM, he is developing and embedding an uncertainty module for the Sure Sense monitoring system; for Sun Microsystems, he is adding prognostic capabilities to the Process and Equipment Monitoring toolbox which was developed at UT for monitoring and diagnosis; for INL, he is developing monitoring, diagnosis, and prognosis techniques for supervisory control and data acquisition (SCADA) systems, including intrusion detection; and for the NRC he is writing three NUREGS related to on-line sensor calibration monitoring. Professor B. R. Upadhyaya focuses on the development of techniques for the condition diagnosis of motor-operated valves (MOVs). Since the mechanical parts of the valve are of primary concern, the most desirable parameter to be monitored is the mechanical load experienced by the valve operator. The electrical parameter that has the closest correlation to the MOV load is the motor power. Time-dependent characteristics of the motor power waveform are necessary to track the various events during valve stroke cycles. An expert system is being developed using the Visual C++ programming language. The expert system combines a syntactic pattern recognition module, a signal preprocessing module, a rule-based expert system for valve diagnostics, a knowledge base, an on-line help module, and an interface for Microsoft Excel for report preparation. The system is being tested using both plant and laboratory test data. Models of system degradation, advanced data processing, laboratory experimentation, and field data are being integrated to achieve specific objectives for induction motors. These include incipient detection of system faults, estimation of residual life of plant components, determination of “time-to-alarm’’ and “time-to-failure”, establishing an alarm level based on the variations of a physical parameter (if necessary, by the prediction of a “virtual trend”), and the development of a generic procedure for application to both rotating machinery and stationary components. Probabilistic approaches are being developed for establishing a relationship between lifetime and alarm levels.

296

Prognostics and Health Management of Electronics

B.37.2 Related Publications 1.

D.R. Garvey and J.W. Hines, “Nuclear Application of On-Line Sensor Calibration Monitoring for Safety Critical Sensors,” paper presented at the Inaugural World Congress on Engineering Asset Management, Queensland, Australia, July 11-14, 2006.

2.

R.M. Seibert, D.R. Garvey and J.W. Hines, “Prediction Intervals versus Confidence Intervals for On-Line Sensor Monitoring,” paper presented at the 60th Meeting of the Society for Machinery Failure Prevention Technology, Virginia Beach, VA, April 3-6, 2006.

3.

D.R. Garvey and J.W. Hines, “Development and Application of Fault Detectability Performance Metrics for Instrument Calibration Verification and Anomaly Detection,” Journal of Pattern Recognition Research, Vol. 1, No. 1, January 2006.

4.

0. A. Omicaomu, M. K. Jeong, A. B. Badiru, and J. W. Hines, “Motor Shaft Misalignment Prediction Using On-Line Support Vector Regression,” paper presented at the Institute for Operations Research and Management Science Annual Conference (INFORMS 2005), New Orleans, LA, November 2005.

5.

J. W. Hines, D. Garvey, and R. Seibert, “Independent Component Analysis (ICA) and Wavelet De-Noising Applied to Redundant Sensor,” paper presented at the International Symposium on the Future I&C for Nuclear Power Plant, Tongyeong, Gyungnam, Korea, November 1 4 , 2 0 0 5 .

6.

J. W. Hines and A. Usynin, “Autoassociative Model Input Variable Selection for Process Monitoring,” paper presented at the International Symposium on the Future I&C for Nuclear Power Plant, Tongyeong, Gyungnam, Korea, November 1 4 , 2 0 0 5 .

7. J.W Hines, D. Garvey, B. Seibert, A. Usynin, and S. Arndt, “Evaluation of Uncertainty Analysis Techniques for On-Line Sensor Calibration Monitoring and System Diagnosis,” Transactions of the American Nuclear Societv, Washington DC, November 2005. 8. J.W. Hines and B. Rasmussen, “Odine Sensor Calibration Monitoring Uncertainty Estimation,” Nuclear Technology, Vol. 151, No. 3, pp. 281-288, September 2005. 9. J.W. Hines and D. Garvey, “The Development of a Process and Equipment Monitoring (PEM) Toolbox and Its Application to Sensor Calibration Monitoring,” paper presented at the Fourth International Conference on Quality and Reliability, Beijing, P.R. China, August 2005. 10. J.W. Hines and D. Garvey, “An Autoassociative Empirical Modeling Toolbox for On-Line Monitoring,” paper presented at the 18th International Congress and Exhibition on Condition Monitoring and Diagnostic Engineering Management, Cranfield, Bedfordshire, United Kingdom, August 2005.

11. J. W. Hines, D. Garvey, R. Seibert, and S. A. Arndt, “On-Line Sensor Calibration Monitoring Challenges and Effective Monte Carlo Based Uncertainty Estimation,” paper presented at the IAEA Meeting: “On Line Condition Monitoring of Equipment and Processes in Nuclear Power Plants Using Advanced Diagnostic Systems,” Knoxville, TN,July 27-30, 2005. 12. J.W. Hines, B. Rasmussen, and R. Uhrig, “An On-Line Sensor Calibration Monitoring System,” International Journal of COMADEM, Vol. 8, No. 3, pp. 19-25, July 2005.

Appendix B

291

13. B. Lu and B. Upadhyaya, “Monitoring and Fault Diagnosis of the Steam Generator System of a Nuclear Power Plant Using Data-Driven Modeling and Residual Space Analysis,” Annals ofNuclear Energy, Vol. 32, No. 9, pp. 897-912, June 2005. 14. J.W. Hines and A. Usynin, “MSET Performance Optimization through Regularization,” Nuclear Engineering and Technology, Vol. 37, No. 2, pp. 177-184, April 2005. 15. J.W Hines and R.E. Uhrig, “Computational Intelligence in Nuclear Engineering,” Nuclear Engineering and Technology, Vol. 37, No. 2, pp. 127-138, April 2005. 16. B. Lu, B. R. Upadhyaya, and R. B. Perez, “Structural Integrity Monitoring of Steam Generator Tubing Using Transient Acoustic Signal Analysis,” IEEE Transactions on Nuclear Science, Vol. 52, No. 1, pp. 484-493, February 2005. 17. I.M. Goncalves, D.K.S. Ting, P.B. Ferreira, and B.R. Upadhyaya, “Monitoring an Experimental Reactor Using the Group Method of Data Handling Approach,” Nuclear Technology, Vol. 149, No. 1, pp. 110-1 2 1, January 2005. 18. J.W. Hines and E. Davis, “Lessons Learned from the U.S. Nuclear Power Plant On-Line Monitoring Programs,” Progress in Nuclear Energy, Vol. 46, No. 3 4 , pp. 176-189, 2005. 19. 0 A. Omitaomu, M. K. Jeong, A. B. Badiru, and J. W. Hines, “On-Line Support Vector Regression for Motor Shaft Misalignment Prediction,” IEEE Transactions on Knowledge and Data Engineering, 2005. 20. J.W. Hines and Robert E. Uhrig, “Trends in Computational Intelligence in Nuclear Engineering,” Progress in Nuclear Energy, Vol. 46, No. 3-4, pp. 167-175,2005. 21. B. Rasmussen and J.W. Hines, “Uncertainty Estimation for Empirical Signal Validation Modeling,” paper presented at the 4th International Topical Meeting on Nuclear Plant Instrumentation, Control and Human Machine Interface Technology (NPIC&HMIT ‘04), Columbus OH, September 19-22,2004. 22. J.W. Hines and J Ding, “Merging Inferential Models and Independent Component Analysis for 2-Channel Redundant Sensor Monitoring,” paper presented at the 4th International Topical Meeting on Nuclear Plant Instrumentation, Control and Human Machine Interface Technology (NPIC&HMIT ‘04), Columbus OH, September 19-22, 2004. 23. J.W. Hines and B. Rasmussen, “Prediction Interval Estimation Techniques for Empirical Modeling Strategies and Their Applications to Signal Validation Tasks,” paper presented at the 6th International Conference on Fuzzy Logic and Intelligent Technologies in Nuclear Science (FLINS), Blackenberge, Belgium, September 1-3, 2004. 24. J.W. Hines and E. Davis, “Implementation of On-Line Monitoring Programs at Nuclear Power Plants,” paper presented at the 6th International Conference on Fuzzy Logic and Intelligent Technologies in Nuclear Science (FLINS), Blackenberge, Belgium, September 1-3,2004. 25. J. W. Hines, and A. Usynin, “On-Line Monitoring Robustness Measures and Comparisons,” paper presented at the International Atomic Energy Agency Technical Meeting on Increasing Instrument Calibration Interval through On-Line Calibration Technology, OECD Halden Reactor Project, Halden, Norway, September 2004.

298

Prognostics and Health Management of Electronics

26. J.W. Hines and J. Bowling, “An Expert System for Long Term Monitoring of Special Nuclear Materials,” Journal of Nuclear Materials Management (JNMM), July 2004. 27. J.W. Hines and J. Bowling, “An Expert System Based Fault Detection and Isolation System to Monitor Weight and Radiation Sensors for Inventory Management,” paper presented at the 45th Institute of Nuclear Materials Management Annual Meeting, Orlando FL, July 2004. 28. B. Lu, B.R. Upadhyaya, and J.W. Hines, “Application of Hilbert-Huang Transform for Acoustic Signal Analysis for Steam Generator Structural Integrity,” Transactions of the American Nuclear Society, San Diego, CA, June 2004. 29. B. Lu, B.R. Upadhyaya, and J.W. Hines, “Nonstationary Signal Processing Techniques for Monitoring Structural Flaws,” paper presented at the Maintenance and Reliability Conference, Knoxville, TN, May 2004.

30. A. V. Gribok, A. M. Urmanov, and J. W. Hines, “Uncertainty Analysis of Memory Based Sensor Validation Techniques,” Real-Time Systems, Vol. 27, No. 1, May, 2004, pp. 7-26. 31. J. Ding, A. V. Gribok, and J. W. Hines, and B. Rasmussen, “Redundant Sensor Calibration Monitoring Using Independent Component Analysis and Principal Component Analysis,” Real-Time Systems, Vol. 27, No. 1, pp. 2 7 4 7 , May 2004. 32. A.V. Gribok, A. M. Urmanov, and J. W. Hines, “Uncertainty Analysis of Memory Based Sensor Validation Techniques,” Real Time Systems Special Issue on Applications of Intelligent Real-Time Systems for Nuclear Engineering, Vol. 27, No. 1, pp. 7-26, May 2004. 33. J.W. Hines, A. Usynin, and S. Wegerich,, “Autoassociative Model Input Variable Selection for Process Modeling,” paper presented at the 58th Meeting of the Society for Machinery Failure Prevention Technology, Virginia Beach, VA, April 26-30, 2004. 34. J.W. Hines, J Bowling, and T.J. Harrison, “A System for Rapid Detection and Diagnosis of Continuous Automated Vault Inventory System (CAVIST”) Alarms,” paper presented at the 7th International Conference on Facility Operations, Charleston, SC, February 28-March 5,2004. 35. J.W. Hines and E. Davis, “Implementation of On-Line Monitoring Programs at Nuclear Power Plants,” Applied Computational Intelligence-Proceedings of the 6th International FLINS Conference, 2004, pp. 543-548. 36. B. Rasmussen and J.W. Hines, “Prediction Interval Estimation Techniques for Empirical Modeling Strategies and Their Applications to Signal Validation Tasks,” Applied Computational Intelligence-Proceedings of the 6th International FLINS Conference, 2004, pp. 549-556. 37. A.V. Gribok, A.M. Urmanov, J.W. Hines, and R.E. Uhrig, “Use of Kernel Based Techniques for Sensor Validation in Nuclear Power Plants,” in Statistical Data Mining and Knowledge Discovery, Chapman and Hall/CRC Press LLC, Boca Raton, 2004, p p 2 17-23 1. 38. B.R. Upadhyaya, J.W. Hines, B. Lu, and X. Huang, “Structural Integrity Monitoring of Nuclear Plant Steam Generators,” Transactions of the American Nuclear Society, New Orleans, LA, November 16-20,2003,

Appendix B

299

39. J. Ding, J. W. Hines, and B. Rasmussen, “ICA Filter for Redundant Sensor Monitoring,” Transactions of the American Nuclear Society, New Orleans, LA, November 2003. 40. J. W. Hines and B. Rasmussen, “On-Line Calibration Monitoring of Process Sensors,” paper presented at the International Workshop on Monitoring and Diagnosis, Sao Paulo, Brazil, August 4-5,2003, 41. B. Rasmussen, J. W. Hines, and R. E. Uhrig, ”A Novel Approach to Process Modeling for Instrument Surveillance and Calibration Verification,” Nuclear Technology, Vol. 143, August 2003. 42. B.R. Upadhyaya, B. Lu, and J.W. Hines, “Defect Monitoring in Steam Generator Structures Using Piezoelectric Transducers and Time-Frequency Analysis,” Transactions of the American Nuclear Society, San Diego, CA, June 1-5, 2003. 43. J. Ding, J. W. Hines, and B. Rasmussen, “Independent Component Analysis for Redundant Sensor Validation,” Proceedings of the 2003 Maintenance and Reliability Conference, Knoxville, TN, May 4-7,2003. 44. B. Rasmussen, A. Gribok, and J. W. Hines, “An Applied Comparison of the Prediction Intervals of Common Empirical Modeling Strategies,” Proceedings of the 2003 Annual Maintenance and Reliability Conference, Knoxville, TN, May 4-7, 2003. 45. B. Lu, B.R. Upadhyaya, and J.W. Hines, “Time-Frequency Analysis of Acoustic Signals for Flaw Monitoring in Steam Generator Structures,” Proceedings of the Maintenance and Reliability Conference, MARCON 2003, Knoxville, TN, May 2003. 46. B.R. Upadhyaya, K. Zhao, and B. Lu, “Fault Monitoring of Nuclear Power Plant Sensors and Field Devices,” Progress in Nuclear E n e r a , Vol. 43, No. 1-4, SPEC, pp. 337-342,2003. 47. S.R.P. Perillo, B.R. Upadhyaya, B. Lu, and J.W. Hines, “Condition Monitoring of Steam Generators and Heat Exchangers Using Piezoelectric Devices,” paper presented at COBEM, Sao Paulo, Brazil, 2003. 48. A. V. Gribok, J. W. Hines, A. Urmanov, and R. E. Uhrig, “Heuristic, Systematic, and Informational Regularization for Process Monitoring,” International Journal of Intelligent Systems, Vol. 17, No. 8, pp. 723-749, August 2002. 49. B.R. Upadhyaya, M. Naghedolfeizi, and B. Raychaudhuri, “Residual Life Estimation of Plant Components,“ P/PM Technology, Vol. 7, No. 3, pp. 22-29, 1994. 50. J. Eklund, and B.R.Upadhyaya, “An Automated System for Motor-Operated Valve Diagnostics,” Power Engineering, Vol. 95, No. 12, pp. 38-41, December 1991.

Prognostics and Health Management of Electronics

300

B.38

University of North Carolina (UNC)

The Center for Logistics and Digital Strategy (CLDS) at the University of North Carolina provides an experimental environment for research and education in intelligent technologies for extended enterprise applications. Lab projects include prognostics integrated logistics, radio frequency identification, health management and logistics support, data mining, and pattern recognition.

B.38.1 Research in PHM Prognostics Integrated Logistics (PILOT) is a consortium of universities, aerospace manufacturing companies, and airlines that have aligned to engage in collaborative research and development activities to achieve improvements in the provisioning of intelligent aviation services. Specifically, PILOT hopes to achieve improvements in aviation asset management through the conduct of research and development activities that promote intelligent technologies, including prognostics and intelligent software, in activities such as aircraft health monitoring, maintenance planning, parts logistics, and supply chain management. UNC and Boeing are lead sponsors for the PILOT project. Industry partners are General Electric, Delta Airlines, Aero Mexico, and KLM. University partners, also members of the Center’s Global Logistics Research Initiative (GLORI), include Monterrey Tech, University of Wanvick, Delft University of Science and Technology, and the University of Applied Sciences in Cologne. The intelligent systems laboratory (ISL) is also involved in radio frequency identification (RFID) technology. It believes RFID tags may well spark the next revolution in intelligent supply chain management. “RFID technology is finally coming into its own, as engineers work out the bugs and vendors learn to make them more cheaply,” says Noel Greis, Director of the Kenan Institute’s Center for Logistics and Digital Strategy (CLDS). “Meanwhile, the private sector is driving RFID usage through their supply chain initiatives. The ISL strives to help companies learn not only how to respond to these challenges and opportunities but also to use new technologies to launch entirely new lines of businesses that create value from their supply chains. W I D and intelligent software are natural partners for building smart supply chains.” Noel Greis and her global partners from academia and industry are engaged in several projects leveraging RFID technology. UNC is part of a Boeing team developing new health management capabilities for the C-17 aircraft. CLDS is developing agent-enabled decision support tools that map fault data into maintenance and logistics response, including spareirepair parts logistics. Prognostics-enabled maintenance and logistics strategies, and corresponding design guidelines, represent a significant opportunity for the Air Force to achieve significant cost savings and enhanced readiness. Such a solution requires overcoming the disconnect between technology and operations-that is, building smart processes between prognosticsenabled aircraft and maintenanceAogistics practices. The central driver of new prognostics capabilities is the benefit derived from real-time, closed-loop systems in which prognostics serve as an integral element in a feedback control scheme for the dynamic management of sparehepair parts in real time. CLDS worked with NASA and SAIC to apply advanced data-mining techniques to the retrospective analysis of Space Shuttle problem reporting data. Using advanced data-mining tools, we linked occurrences of Orbiter anomalies with external events or conditions that had gone on before or were going on at the time that the anomaly was detected. These links

Appendix B

301

or relationships are then described by models or correlations that can be used to help predict the likelihood of another occurrence of the anomaly. The CLDS has developed new approaches to near-instantaneous agent-enabled learning for the purposes of real-time performance tracking, failure prediction, and decision support. Information overload and data complexity challenges in distributed information networks are demanding more powerful, scalable solutions to pattern and fault recognition, especially for complex systems like aircraft and defense platforms. Our research uses new associative memory technology that is capable of recognizing patterns in performance data in order to anticipate component or system failure in vehicles such as trucks and aircraft. The learning agents are capable of observing and learning complex correlations across multiple parameters and collaborating with other agents across a fleet. These agents lend themselves to distributed multiagent configurations for real-time networked visibility and decision support across complex functions, including supply chain management, maintenance, and fleet control.

B.38.2 Related Publications 1. N. Greis, J. Olin, and J. Kasarda, “The Intelligent Future,” Supplv Chain Management Review, pp. 18-25, MayIJune 2003,

Prognostics and Health Management of Electronics

302

B.39

Vanderbilt University

At Vanderbilt University, the Institute for Software Integrated Systems (ISIS) was established in 1998 as an outgrowth of the Measurement and Computing Systems Laboratory (MCSL) in the Department of Electrical and Computer Engineering. Research within ISIS is organized into two main areas: core technology for model-integrated computing and applications for software-integrated systems. The objective of their research in the core technology is to create an infrastructure for model-integrated computing. Applications are tightly integrated with the core research effort and cover a broad range in manufacturing, aerospace, and instrumentation. The Modeling and Analysis of Complex Systems (MACS) group within ISIS is currently focused on PHM research for embedded systems.

B.39.1 Research in PHM Current research is geared toward the development of schemes for monitoring, predicting, and diagnosing complex dynamic continuous systems. Earlier work applied diagnosis based on steady-state models. The core approach developed over the last 10 years has focused on monitoring and diagnosing from transient behaviors as faults occur in a system. Modeling nominal and faulty behavior starts from bond graphs, a topological energy-based scheme for modeling the dynamics of multidomain continuous systems, and derives a temporal causal graph of dynamic system behavior. The temporal causal graph is similar to signal flow diagrams used in control theory but includes additional information about component parameters to aid the diagnosis process. This model is used to identify system faults from deviating measurements and predict future behavior of the observed variables in terms of fault signatures which are expressed as parameter deviations and their effect on the measure variables, expressed as magnitude and higher order derivatives. Behavior and diagnostic analysis is performed in a qualitative reasoning framework that takes into account the system dynamics. As part of the Embedded and Hybrid Systems program sponsored by the National Science Foundation, the ISIS is working on a project called “Distributed Monitoring and Diagnosis of Embedded Systems Using Hierarchical Abstractions.” The objective of this project is to develop systematic, scalable, robust, on-line model-based fault detection and isolation (FDI) schemes for distributed embedded systems. The novelty of the research centers on (i) hierarchical abstraction schemes for managing the complexity of the FDI task and enabling the design and development of on-line model-based FDI algorithms that are provably robust and reliable, (ii) a unified framework for diagnosis of multiple types of faults that occur in the physical and the computational parts of embedded systems as well as faults with different fault profiles (abrupt and incipient faults), and (iii) the development of a tool suite for distributed embedded systems for on-line FDI. Experimental test-beds are used to demonstrate and verify the effectiveness of the developed methods. The impact of the project lies on providing guarantees for reliable safe operation of complex, distributed safety-critical systems. ISIS was awarded a Navy STTR in conjunction with Qualtech Systems, Inc. to work on a project titled “Aircraft Electrical Power System Diagnostics and Health Management.” The objective of this project is to improve the availability and reliability of aircraft power generator systems using health monitoring techniques that combine diagnostic and prognostic algorithms. ISIS working with Qualtech Systems hopes to provide an innovative scheme for diagnosis and prognosis that combines the use of dynamical physical system models augmented with signal models for analyzing vibration signatures and PoF models

Appendix B

303

for electrical, electronic, and mechanical generator components, such as rectifiers, transformers, batteries, converters, and bearings. These schemes estimate degrading device behavior as the system is involved in its regular operation. The fault diagnostic scheme uses innovative model-based approaches for root cause analysis, and the prognostic reasoning framework is based on simulation of the failing device (identified by diagnostic analysis) for relevant usage scenarios. Continued monitoring of system variables along with the degradation estimates will form the basis of algorithms that compute reliable estimates of the remaining-life curve for the degrading components. ISIS researchers are collaborating with NASA Ames, Hamilton Sundstrand, and Pratt and Whitney on an aeronautics NRA titled “Online Statistical Methods for Robust State Estimation, Anomaly Detection, and Degradation Analysis in Complex, Embedded Systems.” Complex, safety-critical systems in aircraft such as power generation systems have interacting subsystems that operate in multiple physical domains. A number of catastrophic accidents have demonstrated that these systems can degrade and fail in ways that are hard to predict at design time. The drive for increased safety, reliability, and autonomy imposes stringent requirements on system operation and performance, even in the presence of degradation and faults in components. Such requirements can be addressed only by accurate assessment of system health, and this has generated increased demands for on-board monitoring, analysis, and decision-making schemes. The proposed approach will combine model-based and statistical algorithms to provide robust schemes that manage modeling uncertainties, measurement noise, and the computational complexities associated with on-line tracking, estimation, detection, and analysis of nonlinear hybrid behaviors. The theoretical underpinnings for tracking and analysis of nominal and faulty system behavior will be centered on the use of approximate dynamic Bayes net (DBN) techniques. Anomaly detection methods that work in conjunction with the DBN tracker will focus on signal analysis and statistical techniques that include time-frequency representations and maximum likelihood methods for accurate fault detection while keeping the false alarm rate low. Their detection and analysis algorithms will be tuned to analyze different fault types (sensor, actuator, and process) and different fault profiles (abrupt, incipient, and intermittent). Anomaly detection will trigger an innovative fault isolation scheme that combines qualitative reasoning and quantitative analysis of the fault dynamics to isolate and identify the root cause for the observed anomalies. ISIS is also collaborating with NASA Ames Research Center on a project titled “Advanced Diagnostics & Prognostics Techniques (ADAPT) Applied to NASA Spacecraft and Test-beds.” This project has four focus areas. The first focus area is model building for the ADAPT test-bed subsystems which correspond to electrical power distribution systems in spacecraft and aircraft. These models will form the basis for building a simulation test-bed for off-line experiments as well as the basis for running on-line model-based monitoring, fault detection, and fault isolation studies. The second focus area is modelbased diagnosis experiments. A run time environment will be developed for monitoring of nominal behavior (observer-based schemes), fault detection (statistical techniques), fault isolation, and fault identification. This will involve the use of hybrid techniques because the test-bed systems combine continuous and discrete behaviors. Run-time infrastructure for fault adaptive control technology will be implemented on the ADAPT system in a way that different observers, fault detectors, and fault isolation schemes can be plugged in to run hardware in the loop diagnosis experiments. The third focus area is to participate in the comparison of different diagnosis algorithms. The last part of the project includes formal analysis of hybrid diagnosis schemes and development of new approaches that address the development of hybrid diagnosis schemes for NASA applications.

304

Prognostics and Health Management of Electronics

The MACS group at ISIS has also embarked on a new project in conjunction with faculty and researchers in the Structural Reliability Research Group in the Civil and Environmental Engineering Department for Air Vehicle Health and Capability Assessment. This project is funded by the Air Force Research Laboratory (AFRL) in Dayton, Ohio. The goal is to apply integrated modeling approaches that combine vehicle-level modeling with subsystem and component-level modeling to facilitate combined diagnosis and prognosis for system health and capability assessment for aircraft systems. The researchers are developing a top-down diagnosis approach for subsystem-level and component-level damage detection, isolation, and identification based on system-level performance measurements. In conjunction they are developing bottom-up prognosis approaches based on PoF models to link component-level damage profiles to vehicle-level health and capability calculations. In addition, the framework includes methods for uncertainty quantification and uncertainty propagation through the diagnosis and prognosis models, and this provides the framework for risk-based decision making under uncertainty to facilitate decisions regarding in-flight actions, maintenance, and logistics. The project will apply the above methodologies to a practical vehicle-level integration problem.

B.39.2 Related Publications 1. M. Daigle, X. Koutsoukos, and G. Biswas, “A Discrete Event Approach to Diagnosis of Continuous Systems,” paper presented at the 18th International Workshop on Principles of Diagnosis, Nashville, TN, May 2007, pp. 259-266.

2.

S. Poll, A. Patterson-Hine, J. Camisa, D. Garcia, D. Hall, C. Lee, 0. Mengshoel, C. Neukom, D. Nishikawa, J. Ossenfort, A. Sweet, S. Yentus, I. Roychoudhury, M. Daigle, G. Biswas, and X. Koutsoukos, “Advanced Diagnostics and Prognostics Testbed,” paper presented at the 18th International Workshop on Principles of Diagnosis, Nashville, TN, May 2007, pp. 178-185.

3.

A. Moustafa, M. Daigle, I. Roychoudhury, C. Shantz, G. Biswas, S. Mahadevan, and X. Koutsoukos, “Fault Diagnosis of Civil Engineering Structures using the Bond Graph Approach,” paper presented at the 18th International Workshop on Principles of Diagnosis, Nashville, TN, May 2007, pp. 146-153.

4.

S. Poll, A. Patterson-Hine, J. Camisa, D. Nishikawa, L. Spirkovska, D. Garcia, D. Hall, C. Neukom, A. Sweet, S. Yentus, C. Lee, J. Ossenfort, I. Roychoudhury, M. Daigle, G. Biswas, X. Koutsoukos, and R. Lutz, “Evaluation, Selection, and Application of Model-Based Diagnosis Tools and Approaches,” paper presented at the AIAA Infotech@Aerospace 2007 Conference and Exhibit, May 2007.

5.

S. Narasimhan and G. Biswas, “Model-Based Diagnosis of Hybrid Systems,” IEEE Transactions on Systems, Man, and Cybernetics, Part A, Vol. 37, No. 3, pp 348-361, May 2007.

6.

M. Daigle, X. Koutsoukos, and G. Biswas, “A Qualitative Approach to Multiple Fault Isolation in Continuous Systems,” Proceedinps of the Twenty-Second AAAI Conference on Artificial Intelligence, July 22-26, 2007, Vancouver, British Columbia, Canada, pp. 293-298.

7.

M. Daigle, X. Koutsoukos, and G. Biswas, “On Discrete Event Diagnosis Methods for Continuous Systems,” Proceedings of the Mediterranean Conference on Control & Automation, 2007. MED ‘07, pp. 1-6, 27-29, June 2007.

Appendix B

305

8. M. Daigle, X. Koutsoukos, and G. Biswas, “Distributed Diagnosis in Formations of Mobile Robots,” IEEE Transactions on Robotics, Vol. 23, No. 2, pp. 353-369, April 2007. 9. M. Daigle, I. Roychoudhury, G. Biswas, and X. Koutsoukos, “Efficient Simulation of Component-Based Hybrid Models Represented as Hybrid Bond Graphs,” Hybrid Systems: Computation and Control (HSCC 2007L Lecture Notes in Computer Science, Vol. 4416, pp. 680-683, April 2007.

10. G. Biswas and S. Mahadevan, “A Hierarchical Model-Based Approach to Systems Health Management,” Paper no. 11.10-1215, paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 2007. 11. F. Tu, S. Ghoshal, J Luo, G. Biswas, L. Jaw, and K. Navarra, “PHM Integration with Maintenance and Inventory Management Systems,” Paper no. 11.10-1335, paper presented at the IEEE Aerospace Conference, Big Sky, MT, March 2007. 12. I. Roychoudhury, M. Daigle, G. Biswas, X. Koutsoukos, and P. J. Mosterman, “A Method for Efficient Simulation of Hybrid Bond Graphs,” paper presented at the International Conference on Bond Graph Modeling and Simulation (ICBGM 2007), January 2007, pp. 177-1 84, 13. M. Daigle, X. Koutsoukos, and G. Biswas, “Multiple Fault Diagnosis in Complex Physical Systems,” paper presented at the 17th International Workshop on Principles of Diagnosis, Peiiaranda de Duero, Spain, June 2006. 14. I. Rouchoudhury, G. Biswas, and X. Koutsoukos, “A Bayesian Approach to Efficient Diagnosis of Incipient Faults,” paper presented at the 17th International Workshop on Principles of Diagnosis, Pefiaranda de Duero, Spain, June 2006. 15. M. Daigle, X. Koutsoukos, and G. Biswas, “Distributed Diagnosis of Coupled Mobile Robots,” paper presented at the 2006 IEEE International Conference on Robotics and Automation (ICRA), Orlando, FL, 2006. 16. F. Zhao, X. Koutsoukos, H. Haussecker, J. Reich, and P. Cheung, “Monitoring and Fault Diagnosis of Hybrid Systems,” IEEE Transactions on Systems. Man, and Cybernetics, Part B, Vol. 35, No. 6, pp. 1225-1240, December 2005. 17. M. Daigle, X. Koutsoukos, and G. Biswas “Relative Measurement Orderings in Diagnosis of Distributed Physical Systems,” paper presented at the 43rd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, September 2005. 18. I. Roychoudhury, G. Biswas, X. Koutsoukos, and S. Abdelwahed “Designing Distributed Diagnosers for Complex Physical Systems,” paper presented at the 16th International Workshop on Principles of Diagnosis. pp. 3 1-36, June 2005. 19. G. Biswas, S. Abdelwahed, X. Koutsoukos, J. Gandhe, and E. Manders, “Toward Distributed Diagnosis of Complex Physical Systems,” in Proceedinps of 42nd Annual Allerton Conference on Communication, Control. and Computing, September 29October 1,2004. 20. X. Koutsoukos, “Estimation of Hybrid Systems Using Discrete Sensors,” in Proceedings of 42nd IEEE Conference on Decision and Control (IEEE CDC 20031, pp. 155-1 60.2003.

306

Prognostics and Health Management of Electronics

21. X. Koutsoukos. J. Kurien, and F. Zhao, “Estimation of Distributed Hybrid Systems Using Particle Filtering Methods,” Hybrid Systems: Computation and Control (HSCC 20031, Vol. 2623, Lecture Notes in Computer Science, pp. 298-313,2003. 22. R. Su, W.M. Wonham, J. Kurien, and X. Koutsoukos, “Distributed Diagnosis of Qualitative Systems,” paper presented at the 6th International Workshop on Discrete Event Systems, Zaragoza, Spain, pp. 169-174, October 2 4 , 2 0 0 2 . 23. X. Koutsoukos, J. Kurien, and F. Zhao, “Monitoring and Diagnosis of Hybrid Systems Using Particle Filtering Methods,” Proceedings of the 15th International Symposium on Mathematical Theory of Networks and Systems-MTNS 2002, Notre Dame, IN, August 2002. 24. J. Kurien, X. Koutsoukos, and F. Zhao, “Distributed Diagnosis of Networked Embedded Systems,” Proceedings of the 13th International Workshop on Principles of Diagnosis (DX-20021, Semmering, Austria, May 2 4 , 2 0 0 2 , pp. 179-188. 25. J. Kurien, X. Koutsoukos, and F. Zhao, “Distributed Diagnosis of Networked Hybrid Systems,” paper presented at the 2002 AAAI Spring Symposium on Information Refinement and Revision for Decision Making: Modeling for Diagnostics, Prognostics, and Prediction, Stanford, CA, March 25-27, 2002, pp. 3 7 4 4 . 26. X. Koutsoukos, F. Zhao, H. Haussecker, J. Reich, and P. Cheung, “Fault Modeling for Monitoring and Diagnosis of Sensor-Rich Hybrid Systems,” Proceedings of the 40th IEEE Conference on Decision and Control, Orlando, FL, December 2001, pp. 793-801. 27. F. Zhao, X. Koutsoukos, and J. Kurien, “Collaborative Embedded Sensing and Diagnosis,” Proceedings of the SRDS 200 1 Workshop on Reliability in Embedded Systems, New Orleans, LA, October 200 1, pp. 23-27. 28. F. Zhao, X. Koutsoukos, H. Haussecker, J. Reich, P. Cheung, and C. Picardi, “Distributed Monitoring of Hybrid Systems: A Model-Directed Approach,” Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), Seattle WA, August 2001, pp. 554-557. 29. P.J. Mosterman and G. Biswas, “Diagnosis of Continuous Valued Systems in Transient Operating Regions,” IEEE Transactions on Systems, Man and Cybernetics, Vol. 29, No. 6, pp. 554-565, November 1999. 30. P.J. Mosterman and G. Biswas, “A Theory of Discontinuities in Physical System Models.” Journal of the Franklin Institute: Engineering and Applied Mathematics, Vol. 335B, No. 3, pp. 401439, January 1998.

B.39.3 Related Patent U.S. Patent: 1. 7,181,374: Qualitative diagnosis system and method, February 2007

Appendix C Journals and Conference Proceedings Related to PHM The field of PHM involves the integration of sensors, signal processing tools, stress and damage models, statistical methods, machine learning techniques, and supervised and unsupervised prediction methods as well as various maintenance and logistics methods. At this time there is no single journal or conference that covers all aspects of PHM. For the benefit of the reader a list of journals and conferences where PHM-related articles are published has been compiled. This list covers most of the PHM-related journals and conferences. Prognostics research and development in the field of civil and mechanical structures, avionics, mechanical and electronic products, prognostic algorithms and models, sensors, sensor application, health monitoring, and prognostics based maintenance and logistics are covered by this list ofjournals and conferences.

c.1

Journals Aerospace Science and Technology ASCE Journal ofStructura1 Engineering IEEE Aerospace and Electronic Systems Magazine IEEE Transactions on Components and Packaging Technology IEEE Transactions on Control Systems Technology IEEE Transactions on Industrial Electronics IEEE Transactions on Reliability INSIGHT-Non-Destructive Testing and Condition Monitoring (Journal of the British Institute of Non-Destructive Testing) International Journal of COMADEM International Journal of Fatigue International Journal of Machine Tools & Manufacture International Journal on Quality and Reliability Engineering international Journal of Structural Health Monitoring Journal ofthe Acoustical Society ofAmerica Journal of intelligent Material Systems and Structures Journal of Materials Journal of Optical Diagnostics in Engineering Journal of Sound and Vibration Journal ofStructura1 Control and Health Monitoring

Prognostics and Health Management of Electronics. By Michael G. Pecht Copyright Q 2008 John Wiley & Sons, Inc.

307

Prognostics and Health Management of Electronics

308

Journal of Testing and Evaluation Maintenance Journal Measurement Science & Technology Mechanical Systems and Signal Processing NDT & E International Nuclear Technology Reliability Engineering and Safety Systems Sensors and Actuators Smart Materials and Structures Structural Engineering and Mechanics Transactions of the ASME-Journal of Vibration and Acoustics

C.2

Conference Proceedings AAAI Symposium on Artificial Intelligence for Prognostics AIAAiIEEE Digital Avionics Systems Conference Aircraft Airborne Condition Monitoring Conference American Society of Civil Engineers-Structural Health Monitoring Division American Control Conference Annual Forum Proceedings-American Helicopter Society Annual Reliability and Maintainability Symposium ESC Division Mini-Conference IEEE Aerospace Applications Conference IEEE Aerospace Conference IEEE Autotestcon Conference IEEE Control Theory and Applications Conference IEEE Instrumentation and Measurement Technology Conference Government Microcircuit Applications and Critical Technology Conference Maintenance and Reliability Conference (MRC) Society for Optical Engineering (SPIE) Society for Machinery Failure Prevention Technology (MFPT)

INDEX A photoconductor, 30 Absolute humidity sensors, 29 Absolute humidity, 26,28,29 Accuracy, 33, 39, 54, 56,60, 62, 64, 87, 122, 129, 131 Acquisition decision making, 114 Active mode, idle mode and sleep mode, 35 Aging, 4, 13, 23, 109, 127, 131 A1 model based, 129 Analog-to-digital (MD) converters, 32 ARINC, 169 Auburn University, 279 Autonomic logistics, 86 Availability, 1, 7, 17, 32, 38, 85, 99, 114, 119, 125, 120 Avoidance of failures, 93 BAE Systems, 171 Bar coding, 37 Base station, 36 Battery powered sensor systems, 34 Battery-free power, 44 Battery-free sensor systems, 44 Baum-Welch algorithm, 5 1, 63 Bayes’s theorem, 6 1 Bayesian, 11, 19, 51, 58, 65 Biological, 25, 29, 37, 59 Biosensors, 29 Bit error failures, 122 Body heat, 44 Boeing 2, 87, 109 Boeing, 173 Built-in prognostics, 121 Built-in test (BIT), 3 Business case, 47, 85, 88, 93, 104, 108, 106, 110, 116, 134 CALCE, 2,7,14,73,80, 100, 115, 130 Canaries and Fuses, 121 Canary devices, 7, 73, 76, 98, 123 Capacitive or inductive impedances, 28 Prognostics and Health Management of Electronics By Michael G. Pecht Copyright 0 2008 John Wiley & Sons, Inc.

Capacitive RH sensors, 28 Capacitive voltage sensors, 27 Cellular, 37 Chemical sensing principles, 29 Chemical Sensors, 29 Chemical, 11, 13, 25, 38, 74, 79 Chi square test, 49, 54 China, 255 Classification, 30, 49, 52, 66, 69, 74, 120 Classifier, 50, 52, 56 Clustering, 17, 54, 63, 72, 139 COCOMO, 90 Code Architecture, 132 Code Maintenance, 132 Commercially available, 38,40 Common considerations, 33 Component Level, 120, 124, 129 Conditional probability, 50, 56, 59 Condition-based maintenance plus (CBM+), 4 Conductive filament formation, 8 1, 125 Conductive path formation, 124 Continuous distributions, 54 Continuous sampling, 36 Continuous sensing, 35 Continuous, triggered, thresholds, 35 Conventional numeric, 129 Corrosion, 2, 9, 79, 117, 124, 126 Cost avoidance, 85,93 Cost benefit analyses (CBAs), 109 Cost benefit, 86, 109, 115, 131 Cost, 3, 17, 32,38,63, 82 Cost, ROI, Business Case Development, 131 Counterfeit parts, 125 Counterfeit, 121, 125 Counterfeititamper detection, 121 Coupled waveguide sensors, 3 1 Cramer-Rao Lower Bound, 52 Cross-functional, 131 Cumulative distribution functions (CDFs), 54 Current-to-voltage, 27 Damage propagation models, 129

309

310

Data compression, 36,41 Data fits, 48 Data processing, 15, 34,40 Data security, 37 Data storage capacity, 41 Data Transmission, 17, 32,40 Data-driven techniques, 122, 126, 129 Data-driven, 18,47, 82, 122 Decision Tree Classifier, 58, 71 Decision trees, 129 Degrees of freedom, 55 Dell, 120, 130 Department of Defense (DoD), 87 Department of Defense Small Business Innovation Research Program (SBIR), 120 Depot Replaceable Units (DRUs), 89 Device, 44, 73, 76, 84, 89, 98, 119, Dielectric breakdown, 8, 79, 124 DIMM, 127 Discount factor. 109 Discount rate, 88, 109 Discrete Event Simulation, 95, 99, 105, 108, 112, 117 Discriminative approach, 56, 63 Displacements, 28 Distributed sensor networks (DSNs), 45 Distribution, 2, 8, 12, 26, 48, 59, 62, 78, 88,94, 122 Distribution-free techniques, 54 DTrace, 137 Dual-core processors, 122 Dynamic reconfiguration, 120, 124, 131 Ease of implementation, 37 Eased design and qualification of future systems, 94 Economics, 85,90 EEPROM, 35 Electrical attribute, 34 Electrical power, 7, 27 Electrical Sensors, 27 Electrical, 2, 7, 22, 25, 44, 52, 60, 69, 75, 79, 126 Electrochemical biosensors, 29 Electrochemical sensors, 30 Electromagnetic induction, 44 Electromagnetic, 13, 31,44 Electromechanical-optical systems, 120 Electromigration, 18, 76, 79, 124

Prognostics and Health Management of Electronics

Electronics Industries Alliances (EIA), 122 Electronicsielectro-optical prognostics for tactical sensor system, 121, 123 Electrostatic, 8, 27, 44, 79 Embedded algorithms, 19, 35 Embedded computations, 36 Embedded processing, 36 Embedding computational power, 36 EmbedSenseTMWireless Sensor, 149 Emerging Trends, 25,44 Emerson, 186 EMS 200 Environmental Monitoring Sensor, 161 Energy harvesting technology, 44 Engine Control Unit (ECU), 86 Environmental and operational monitoring, 120, 126 Environmentalioperational, 128 Environmentally tolerant, 126, 128 ePrognostics Sensor Tag, 135 Error covariance matrix, 52 Ethernet, 37 Euclidean distance, 53 European Aeronautic Defence and Space Company, 182 Evanescent, 29, 3 1 EWB MicroTAU, 139 Expectation Maximization, 49, 5 1 Expert Microsystems, 187 Expert system, 129 External AC power source, 34 Failure cause, 73, 76 Failure mechanisms, 2, 6, 14, 22, 73, 95, 110,122 Failure mode, 2, 6, 13, 18, 73, 94, 99, 121, 129 Failure models, 73, 95, 99, 130 Failure Modes, Effects, and Criticality Analysis (FMECA), 75, 86 Failure modes, mechanisms and effects analysis (FMMEA), 6, 18, 33, 74, 81, 127 Failure precursor, 4, 7, 122 False alarms, 3, 1 1, 132 False negative, 50 False positive, 50, 59, 98 Fault recognition, 36 Feature extraction, 16, 19, 36,48,73 Federal Aviation Administration, 110

Index

Fiber Bragg gratings, 3 1 Fiber optic cable, 3 1 Financial Costs, 88, 94 Finite state machines, 129 Fisher information matrix, 52 Fixed schedule maintenance interval, 89, 96, 102, 114 Flash memory, 35,41 Frequency counter, 27 Frequency, 2, 9,25, 37, 45, 54, 79, 87, 109, 122 Functional Attributes of Sensor Systems, 34 Fuses and Canaries, 7, 19 Fuzzy C-means Classifier, 65 Galvanomagnetic effect, 3 1 Gate oxide breakdown, 122 Gaussian kernel, 53 General Dynamics, 189 General Electric, 191 General Motors, 120, 130 General Motors, 195 Generative approach, 56. 65 Georgia Institute of Technology, 28 1 G-Link Wireless, 147 Global policies, 125 GMA Industries, 197 Good-as-new repair, 96 Goodness-of-fit hypothesis test, 54 Goodness-of-fit, 49,54 Grating sensors, 3 1 Hall effect sensor, 27, 3 1 ,Hall effect voltage sensors, 27 Health monitoring (HM), 1, 11, 16, 24, 48,63, 81, 89, 96, 106, 114, 124, 131 Hidden Markov Model, 5 1, 58,61 Hierarchical Classifier, 58, 65 High Mobility Multipurpose Wheeled Vehicles (HMMWV), 94, 101 High-power switching electronics, 120 HMM-Based Approach, 58,64 Honeywell, 199 Hot-carrier degradation, 122 Human motion, 44 Humidity Sensors, 28 Humidity, 2, 8, 13, 25, 33, 37, 77, 80 Hypothesis test, 50, 54, 59 IC devices, 120 ICHMB 20120, 157 Idle state, 35

311

IEEE-Aerospace, 122 IEEE-Reliability, 122 Impact Technologies, 202 Imperfect monitoring, 95 Implanted medical, 44 Implementation Costs, 87, 94, 99, 109, 112 Independent Component Analysis, 58,63 Inductive voltage sensors, 27 Infrastructure Costs, 91, 100, 112 Inhospitable and toxic environments, 36 Integrated data Environment (IDE), 87 Intelligent Automation, Inc, 206 Interconnection, 11, 80, 121, 128 Interconnects, 18, 74, 120, 124 Interconnects, 18, 74, 120, 124 Investment cost, 112 IPC, 122 Joint Strike Fighter (JSF), 86, 120 JTAG, 12C and CAN buses, 123 K Nearest Neighbor Classifier, 58, 66 Kalman Filters, 129 Kernel trick, 60 kNN, 52,58,66 Knowledge tool sets, 128 Kolmogorov-Smirnov Test, 49,54 Legacy systems, 4, 9, 82, 120, 126, 131 Liability and litigation, 132 Liability issues, 12, 121, 125, 132 Liability, 131 Life consumption monitoring (LCM), 5, 14, 89, 95 Life cycle, 1, 25, 33, 73, 11 1, 127 Life usage assessment, 128 Life-cycle environmental profile (LCEP), 76,78 Life-cycle profile, 2, 13, 79 Life-safety, 133 Light Armored Vehicle (LAV), 87 Likelihood Ratio Test, 49 Line replaceable units (LRU), 17, 74, 88, 120, 126 Linear Discriminant Analysis, 57 Linear Regression, 129 Linearity, 33, 39,66 Lockheed Martin Aeronautics Company, 208 Logistics footprint reduction, 93, 125 Logistics, 1, 4, 19, 86, 90, 120, 125, 128, 132

312

Prognostics and Health Management of Electronics

Log-likelihood function, 5 1 Lower technologies, 122 LRU dependent fuses, 89, 96 LRU- independent fuses, 89,98 LRU- independent methodologies, 89 LRU level, 99, 120, 126 LRU-Independent Methods, 97, 103 LRU-independent modeling, 98 LRU-independent models, 99 Machine learning, 47, 55, 58, 129, 133 Magnetic coupling, 44 Magnetic Sensors, 3 1 Magnetic, 25, 3 1,44, 64, 74 Magnetodiodes, 3 1 Magnetometers, 3 1 Magneto-optic effect, 3 1 Magnetoresistance, 3 1 Magnetostrictive effect, 3 1 Magnetotransistor sensors, 3 1 Mahalanobis distance, 53 Maintenance costs, 36, 87, 109, 119 Maintenance Culture, 91 Maintenance planning, 93. 112, 117 Maintenance, 3, 14, 36, 73, 82, 85, 125 Maintenance, repair, and overhaul operations (MROs), 1 10 Mass sensor, 30 Mathematical model, 50 Maximum A Posteriori Estimation, 5 1 Maximum Likelihood Estimation, 49 Maximum likelihood, 49, 59 Measurands, 25,3 1 Measurement range, 33,39 Mechanical Sensors, 28 Mechanical, 12. 25, 38,44, 74, 79, 119 Memories, 122 Memory management, 34, 39 Memory, 15, 19,32, 122, 126 MEMS sensing devices, 124 MEMS sensor, 44 Micro-bend sensors, 3 1 Microcontrollers, 37, 122 Microprocessors, 19,34, 121 MicroWIS, 143 Miniature Wireless, 159 Miniaturization, 44 Minimum Mean Square Error Estimation, 49,51 Minuteman I11 Strategic missile fleet, 114 MITE WISTM, 141

Mitigation of Reliability Risks, 124 Modern pacemaker, 44 Monte Carlo analysis, 100 Monte Carlo simulation, 17, 95 Mounting methods, 34 Moving fiber optic hydrophones, 3 1 MSETEPRT, 130 Multifiinction display (MFD), 109 Multilayer perceptron, 59 Multiple functions, 41 Multiple parameters, 33 Multiple sensors, 38,45 Multiple Sockets, 95, 104, 108 Multiple, flexible or add-on sensor ports, 41 Multivariate state estimation technique (MSET), 12,50, 130 Nai've Bayesian Classifier, 58, 61 NASA, 4,63, 86, 119,129, National Aeronautics and Space Administration, 235 National Defense Industrial Association (NDIA), 121, 128 Nearest Neighbor, 49, 52, 58, 63 NEMS, 44 Neural Networks, 19, 54, 129 Neyman-Pearson Criterion, 50 Non-battery powered sensor systems, 34 Nondetection events, 96 Noninvasive PHM techniques, 126 Nonparametric Statistical Method, 52 Nonrecurring Costs, 90, 112 Non-technical Barriers, 121, 133 Normalized distance, 52 Northrop Grumman, 2 11 No-trouble-found/ no-fault-found (NTF/NFF), 119, 126 NVRAM, 35,42 Office of Management and budget (OMB), 109 Onboard battery, 34 Onboard Memory and Memory Management, 34 Onboard memory, 32,39 Onboard Power, 34,41 Onboard processing, 36 Onboard signal processing, 35 Operational Profile, 85, 100, 110 Optical (Radiant), 25 Optical biosensors, 29

Index

Optical interference sensors, 3 1 Optical sensors, 30 Parameters to be monitored, 9, 32 Parametric Statistical methods, 48, 52 Particle Filtering, 52, 58, 63, 131 Parzen Window (or Kernel Density Estimation), 53 Pennsylvania State University, 285 Perfect but partial monitoring, 95 Phase delay, 3 1 PHM hybrid approaches, 122, 126 PHM implementation, 12, 19, 25, 33,41, 47, 86, 89, 119, 123 PHM Roadmap, 3, 119, Photoemissive devices, 30 Photovoltaic devices, 30 Physical attributes of sensor systems, 34 Physics-based, 14, 129 Physics-of- failure model, 130 Physics-of-failure (PoF), 2, 17, 25,47, 73, 79, 97, 121, 129 Piezoelectric effect, 28 Piezoelectric sensors, 29 Piezoelectric, 28,44 Piezoelectricity, 28 Polarization sensors, 3 1 Power consumption, 8, 15,34,41,44 Power demand, 37 Power Management, 34,39, 121, 127 Power management, 34,39, 121, 127 Power sensor, 27 Precision, 13, 33, 39 Precursor to Failure Monitoring, 96 Precursor to failure, 13, 89,96, 111 Prediction, 1, 12, 16, 36,40, 47, 61, 64, 69,95, 103, 114, 120, 126, 132 Present value, 88 Pressure sensors, 28 Preventative maintenance, 89, 93 Principal Component Analysis, 19, 53, 63 Printed circuit boards, 15, 80, 120 Prioritize the considerations, 33 Probability density function (PDF), 53, 65,97, 11 1 Probability theory, 47, 56 Product maintenance, 120, 125, 132 Prognostic distance, 8, 87, 95, 102, 111, 131 Prognostic levels, 74

313

Prognostics and health management (PHM), 2, 16,85, 119 Programmable Sampling Mode and Sampling Rate, 35 Qualtech Systems, Inc, 2 14 Radiation damage, 125 Radio Frequency Identification (RFID), 19, 37,45 Radio frequency identification (RFID), 19, 37, 45, 125 Radio power, 44 Range of communication, 37 Rao-Blackwell Estimation, 52 Rao-Blackwell theorem, 52 Raytheon Company, 2 15 Real time failure avoidance, 93 Reasoner engine, 128 Recurring Costs, 87, 90,99, 109, 112 Reduced waste stream, 94 Reduction in NFFs, 93 Reduction in redundancy, 93 Redundancy, 38,68,92, 124,131 Redundant, 114, 128, 131 Reliability, 1, 11,32,38, 68, 73, 93, 100, 110, 119 Reliable, available, and maintainable (RAM), 125 Remaining Useful Life (RUL), 3, 13, 17, 47,65,73, 80, 89,93, 125, 130 Repair cost reduction, 93 Repeatability, 33 Replaceable or rechargeable batteries, 34 Requirements for the sensor system, 32 Resistive humidity sensors, 28 Resolution, 11, 32, 39, 112, 129 Resource management, 92, 126 Response time, 33,39 Return on Investment (ROI), 85, 100, 111, 131 W I D tag, 37 RFID technology, 37, 125 Ridgetop Group, 2 16 Risk priority number (RPN), 77 Rockwell Automation, 2 18 Rogowski coil, 27 S2NAP@, 163 Safety margin, 95,98 Sampling rate, 34,39,42 Sandia National Laboratories, 238 SAVERTM3x90, 145

314

Prognostics and Health Management of Electronics

Schlumberger, 120 Scientific Monitoring, Inc, 222 Security of wireless data, 37 Self-calibration, 38 Self-diagnostics, 38 Self-healing, 122, 126 Self-monitoring analysis and reporting technology (SMART), 12,22 Semiconductor Industry Association (SIA), 122 Sensing modes, 35 Sensing principles, 25,29 Sensitivity, 17, 31, 39, 59, 63, 67, 81, 132 Sensor fusion, 38 Sensor System Performance, 33 Sensor system selection, 32 Sensor Systems, 19,25,32,41, 121 Sensor validation, 38 Sensor’s environmental and operating range, 38 Sentient Corporation, 220 Sequential Monte Carlo method (SMC), 65 Sequential probability ratio test (SPRT), 12,50, 130 SG Link@ Wireless Strain Node, 153 Shop Replaceable Units (SRUs), 88 Signal Processing Software, 36, 41 Signal processing, 35 Single-event upset, 122 Smart sensor nodes, 45 SmartButton, 137 Smartsignal Corporation, 225 Smiths Aerospace (GE), 226 Socket, 89, 94 Software, 3, 13, 32, 36, 40, 73, 87, 90, 120, 123, 127, 132 Solar cells, 44 Solder fatigue, 125 Source Lines of Code (SLOC), 90 Southwest Airlines, 109, 110 Squid, 3 1 SR-1 Series, 165 SRAM components, 122 Stabilization time, 33, 39 Standards Organizations, 133 State-of-the-art and the availability of the sensor systems, 39 Statistical methods, 47, 52, 67, 70

Statistics, 47, 51, 54, 129 Stochastic analysis, 95, 101 Stochastic decision model, 95 Strain gauges, 15,28 Strain, 13, 15, 25,28, 31, 42, 73, 79 Stroboscopic effect, 27 Structural health monitoring (SHM), 131 Stryker Brigade Combat Team (SBCT), 87 Sufficient Statistic, 52 Sun Microsystems, 12, 120, 127, 130 Sun Microsystems, 230 Sunlight, 44 Supervised learning, 56,60, 130 Supplier, 38, 125, 132 Supply and logistics, 125 Supply chains, 120, 125 Support Vector Machines, 19, 60 Survey, 4,38,41, 119, 129 SVM- Based Approach, 58,63 Switching power electronics, 123 System of systems, 18, 74 System theory, 129 Systems-of-systems, 120 Tag devices, 126 TC- Link@ Wireless, 155 Test statistic, 49, 54 Thermal conductivity humidity sensors, 28 Thermal detectors (RTDs), 26 Thermal gradient, 44 Thermal voltage sensors, 27 Thermal, 13,25,44,74,95, 123, 127 Thermistors, 26, 29 Thermochemical sensors, 29 Thermocouple, 26 Thermoelectric effects, 26,44 Thermoelectric generators, 44 Tin whisker, 121, 124 Total internal reflection sensors, 3 1 Trade Space Visualizer, 87 Training, 4, 50,68, 82, 87,90, 121, 124, Transduction, 25 Transfer range and speed of an W I D tag, 37 U.S. Air Force JSF program, 120 U.S. Army’s Future Combat Systems, 120 Ultra low-power electronics, 44 Ultra-low power consumption, 44

Index

Uncertainties, 17, 88, 94, 98, 114, 130 Uncertainty, 18, 33, 39, 94,98, 100, 105, 130 Underwater acoustic sensors, 3 1 United States Air Force, 239 United States Army, 241 United States Navy, 253 University of California at Los Angeles, 287 University of Maryland-CALCE, 289 University of North Carolina (UNC), 300 University of Tennessee (UT), 295 Unscheduled maintenance costs, 109 Unscheduled Maintenance, 3, 5,82,86, 89, 93, 98, 105 Unsupervised learning, 47, 56, 63, 131 Vanderbilt University, 302 Variance, 52, 66 Verification and validation, 121, 129 VEXTEC Corporation, 234 Vibration, 2, 13, 25, 3 1, 50, 74, 97 Vicinity cards, 37 V-Link@ Wireless Voltage Node, 15 1 Warning of future (but not imminent) failure, 93 Warranty methodologies, 125 Weibull, 110 Whiskers, 124 Wilcoxon Rank Sum Test, 54 Wind turbines, 44 Wind, 44 Wire chafing, 124 Wired data transmission, 37 Wireless networks, 44 Wireless sensor nodes, 36 Wireless transmission, 36, 45

315